Getting Started
WhisperCore is an on-device transcription and diarization tool. It runs entirely on your hardware using Whisper AI models via the Faster-Whisper engine — no internet connection required for processing.
This guide will walk you through installing, activating, and using WhisperCore effectively.
Installation
System Requirements
- OS: Windows 10/11 (64-bit)
- RAM: 8 GB minimum (16 GB recommended for large models)
- GPU: NVIDIA GPU with CUDA support (optional, but recommended)
- Disk: ~2 GB for the application + models
Steps
- Request access updates from the Pre-Beta Access page.
- Run the
.exeinstaller and follow the prompts. - Launch WhisperCore from the Start Menu or Desktop shortcut.
Activation
WhisperCore uses a Magic Link system — no passwords to remember.
- Open WhisperCore and click "Sign In".
- Enter the email tied to your access approval.
- Check your email for a verification code.
- Enter the code in the app. Done!
License and seat limits depend on the specific access grant and deployment mode.
GPU Setup (CUDA)
For the fastest transcription, we recommend using an NVIDIA GPU. WhisperCore automatically detects your GPU if CUDA is installed.
Setup
- Install CUDA Toolkit 11.8+.
- Ensure your NVIDIA drivers are up to date.
- Restart WhisperCore — it will automatically switch to GPU mode.
You can verify GPU detection in Settings → Device inside the app.
Usage
Basic Transcription
- Click "Add File" or drag & drop your audio file.
- Select a model (tiny, base, small, medium, large).
- Click "Transcribe".
- View the results in the app and export as needed.
Speaker Diarization
Enable "Speaker Detection" in the transcription settings to automatically label different speakers in the output. Works best with clear audio and distinct voices.
Model Selection Guide
tiny → Fastest, lowest accuracy (~1 GB VRAM)
base → Good balance (~1 GB VRAM)
small → Better accuracy (~2 GB VRAM)
medium → High accuracy (~5 GB VRAM)
large-v3 → Best accuracy (~10 GB VRAM)
large-v3-turbo → Near-best accuracy, 3× faster (~6 GB VRAM) ★ Recommended
Supported Formats
Input (Audio / Video)
MP3 WAV M4A FLAC OGG WMA
AAC WEBM MP4 MKV
Output (Export)
TXT — Plain text transcript
SRT — Subtitles with timestamps
VTT — Web subtitles format
JSON — Structured data with segments and speaker labels
FAQ
Is WhisperCore free?
Pricing is currently coming soon while WhisperCore is in pre-beta hardening.
Does it need internet?
Only for initial activation (Magic Link). After that, all processing is 100% offline.
How accurate is the transcription?
It depends on the model you choose. The large model achieves near-human accuracy in most
languages. Audio quality also plays a big role.
Can I use it for meetings?
Yes! Record your meeting audio, then drop it into WhisperCore. With speaker diarization enabled, each participant will be labeled automatically.