Documentation â€” WhisperCore

Getting Started

WhisperCore is an on-device transcription and diarization tool. It runs entirely on your hardware using Whisper AI models via the Faster-Whisper engine â€” no internet connection required for processing.

This guide will walk you through installing, activating, and using WhisperCore effectively.

Installation

System Requirements

OS: Windows 10/11 (64-bit)
RAM: 8 GB minimum (16 GB recommended for large models)
GPU: NVIDIA GPU with CUDA support (optional, but recommended)
Disk: ~2 GB for the application + models

Steps

Request access updates from the Pre-Beta Access page.
Run the .exe installer and follow the prompts.
Launch WhisperCore from the Start Menu or Desktop shortcut.

Activation

WhisperCore uses a Magic Link system â€” no passwords to remember.

Open WhisperCore and click "Sign In".
Enter the email tied to your access approval.
Check your email for a verification code.
Enter the code in the app. Done!

License and seat limits depend on the specific access grant and deployment mode.

GPU Setup (CUDA)

For the fastest transcription, we recommend using an NVIDIA GPU. WhisperCore automatically detects your GPU if CUDA is installed.

Setup

Install CUDA Toolkit 11.8+.
Ensure your NVIDIA drivers are up to date.
Restart WhisperCore â€” it will automatically switch to GPU mode.

You can verify GPU detection in Settings â†’ Device inside the app.

Usage

Basic Transcription

Click "Add File" or drag & drop your audio file.
Select a model (tiny, base, small, medium, large).
Click "Transcribe".
View the results in the app and export as needed.

Speaker Diarization

Enable "Speaker Detection" in the transcription settings to automatically label different speakers in the output. Works best with clear audio and distinct voices.

Model Selection Guide

tiny            â†’ Fastest, lowest accuracy       (~1 GB VRAM)
base            â†’ Good balance                   (~1 GB VRAM)
small           â†’ Better accuracy                (~2 GB VRAM)
medium          â†’ High accuracy                  (~5 GB VRAM)
large-v3        â†’ Best accuracy                  (~10 GB VRAM)
large-v3-turbo  â†’ Near-best accuracy, 3Ã— faster  (~6 GB VRAM)  â˜… Recommended

Supported Formats

Input (Audio / Video)

MP3 WAV M4A FLAC OGG WMA AAC WEBM MP4 MKV

Output (Export)

TXT â€” Plain text transcript

SRT â€” Subtitles with timestamps

VTT â€” Web subtitles format

JSON â€” Structured data with segments and speaker labels

FAQ

Is WhisperCore free?

Pricing is currently coming soon while WhisperCore is in pre-beta hardening.

Does it need internet?

Only for initial activation (Magic Link). After that, all processing is 100% offline.

How accurate is the transcription?

It depends on the model you choose. The large model achieves near-human accuracy in most languages. Audio quality also plays a big role.

Can I use it for meetings?

Yes! Record your meeting audio, then drop it into WhisperCore. With speaker diarization enabled, each participant will be labeled automatically.