whisper-windows-mcp
A Windows-native MCP server that lets Claude Desktop transcribe audio files locally using whisper.cpp, with no internet connection required.
README
whisper-windows-mcp
A Windows-native MCP (Model Context Protocol) server that lets Claude Desktop transcribe audio and video files locally using whisper.cpp — no internet connection required, no data sent to the cloud.
Why does this exist? The popular
whisper-mcppackage was built for macOS and assumes a Unix environment. It does not work on Windows. This package was written specifically for Windows users who want the same local transcription functionality in Claude Desktop.
What you can do with it
Once installed, you can say things like this directly in Claude Desktop:
- "Transcribe C:\Users\Me\Downloads\meeting.mp3"
- "Transcribe C:\Users\Me\Videos\interview.mp4" — video files work directly, no conversion needed
- "Transcribe this recording and summarise the key points"
- "Transcribe with timestamps so I can find specific moments"
- "Generate subtitles for C:\Users\Me\Videos\lecture.mp4"
- "Transcribe all files in C:\Users\Me\Videos\clips\ one at a time"
Everything runs on your own machine. No audio ever leaves your computer.
Requirements
| Requirement | Purpose | Download |
|---|---|---|
| Node.js 18+ | Runs the MCP server | nodejs.org |
| whisper.cpp | The transcription engine | Latest release |
| A Whisper model file | The AI model | See Step 2 below |
| FFmpeg | Video file support | ffmpeg.org |
FFmpeg is optional if you only use MP3/WAV files, but required for MP4, MKV, AVI, MOV and other video formats.
Step 1 — Install whisper.cpp
- Go to the whisper.cpp latest release
- Download the file named
whisper-bin-x64.zip(look forwinandx64in the filename) - Extract the ZIP and move the contents to
C:\whisper\Release\— create this folder if it doesn't exist
✅ You should now have C:\whisper\Release\whisper-cli.exe
Why this path? You can install whisper.cpp anywhere, but
C:\whisper\Release\matches the default config below and means less to edit later.
Step 2 — Download a Whisper model
Click a link below to download directly:
| Model | Download | Size | Speed | Best for |
|---|---|---|---|---|
| tiny.en | Download | 75 MB | Very fast | Quick tests |
| base.en | Download | 142 MB | Fast | Recommended starting point |
| small.en | Download | 466 MB | Moderate | Better accuracy |
| medium.en | Download | 1.5 GB | Slow | High accuracy |
| large-v3 | Download | 2.9 GB | Very slow | Maximum accuracy |
Save the downloaded .bin file to C:\whisper\models\ — create this folder if it doesn't exist.
✅ You should now have something like C:\whisper\models\ggml-base.en.bin
Step 3 — Install Node.js
If you don't already have Node.js:
- Go to nodejs.org and download the Windows Installer (.msi) — choose the LTS version
- Run the installer and accept all defaults
✅ Verify: open Command Prompt and run node --version — you should see something like v20.x.x
Step 4 — Install FFmpeg (recommended)
FFmpeg is required for video files (MP4, MKV, AVI, MOV, etc.).
- Go to ffmpeg.org/download.html and download a Windows build
- Extract and move the
binfolder contents (or the whole folder) somewhere permanent, e.g.C:\ffmpeg\bin\ - Add
C:\ffmpeg\binto your system PATH:- Press Win + S → search Environment Variables → open it
- Under User Variables, select Path → Edit → New
- Add
C:\ffmpeg\bin→ click OK on all dialogs
✅ Verify: open a new Command Prompt and run ffmpeg -version
Step 5 — Configure Claude Desktop
- Open Claude Desktop → Settings → Developer → Edit Config
- Add the following (or merge the
mcpServersblock if you have other servers):
{
"mcpServers": {
"whisper": {
"command": "npx",
"args": ["-y", "whisper-windows-mcp"],
"env": {
"WHISPER_CLI_PATH": "C:\\whisper\\Release\\whisper-cli.exe",
"WHISPER_MODEL": "C:\\whisper\\models\\ggml-base.en.bin"
}
}
}
}
⚠️ Path format: In the JSON config, all backslashes must be doubled (
\\). This is a JSON requirement. When typing paths into Claude in chat, use normal single backslashes.
- If you downloaded a different model, update
ggml-base.en.binto match your filename - Save the file, fully quit Claude Desktop, and reopen it
- Go to Settings → Developer — you should see whisper with a green running badge
Step 6 — Test it
In Claude Desktop, type:
"Can you check your whisper config?"
Claude will verify that everything is found correctly. Then try:
"Please transcribe C:\Users\YourName\Downloads\recording.mp3"
Available tools
| Tool | What it does |
|---|---|
transcribe_audio |
Transcribe a single file — audio or video |
transcribe_batch |
Transcribe all files in a folder, one at a time with preview |
generate_subtitles |
Generate an .srt subtitle file next to the source |
check_config |
Verify all paths and FFmpeg are working |
Output formats
| Format | What you get | Ask Claude... |
|---|---|---|
text (default) |
Plain transcript, no timestamps | "Transcribe this file" |
timestamps |
Transcript with [00:00:00 --> 00:00:05] time codes |
"Transcribe with timestamps" |
json |
Structured data | "Transcribe as JSON" |
srt |
Subtitle file saved next to source | "Generate subtitles for..." |
Supported file formats
Audio (native): MP3, WAV
Audio (via FFmpeg): M4A, FLAC, OGG
Video (via FFmpeg): MP4, MKV, AVI, MOV, WebM, FLV, WMV, M4V
H.264 and H.265/HEVC video both work. FFmpeg must be installed and in your PATH for any video or non-MP3 audio format.
Batch transcription
To transcribe multiple files in a folder interactively:
"Transcribe all files in C:\Users\Me\Videos\clips"
Claude will list all detected files (with checkmarks on already-completed ones), then process them one at a time, showing you a preview of each transcript before moving to the next.
For large unattended overnight batches, use whisper-cli directly from the command line — see TROUBLESHOOTING.md for the syntax. This is more reliable than running through Claude for very large jobs.
Transcription speed
Whisper runs on CPU by default. Rough estimates for a 1-hour recording:
| Model | Approximate time (CPU) |
|---|---|
| tiny.en | 5–10 minutes |
| base.en | 10–20 minutes |
| small.en | 20–35 minutes |
| medium.en | 35–60 minutes |
| large-v3 | 60–120 minutes |
You can increase thread count by adding "WHISPER_THREADS": "12" to the env block in your config (replace 12 with however many threads you want — up to your CPU's logical core count).
GPU acceleration for AMD (ROCm) and NVIDIA (CUDA) on Windows is planned for a future update.
Converting video to audio (optional)
whisper-windows-mcp handles video files automatically via FFmpeg, so manual conversion is no longer required. However, if you want a smaller audio-only file for any reason, VLC makes it easy:
- Open VLC → Media → Convert / Save
- Click Add, select your video, then Convert / Save
- Under Profile, choose Audio - MP3
- Set a destination filename and click Start
Full config example
{
"preferences": {
"coworkWebSearchEnabled": true
},
"mcpServers": {
"whisper": {
"command": "npx",
"args": ["-y", "whisper-windows-mcp"],
"env": {
"WHISPER_CLI_PATH": "C:\\whisper\\Release\\whisper-cli.exe",
"WHISPER_MODEL": "C:\\whisper\\models\\ggml-base.en.bin",
"WHISPER_THREADS": "8",
"FFMPEG_PATH": "ffmpeg"
}
}
}
}
Config file location:
C:\Users\YourUsername\AppData\Roaming\Claude\claude_desktop_config.json
The
AppDatafolder is hidden by default. To show it: File Explorer → View → Show → Hidden items
Tested on
- Windows 10 Pro (10.0.19045)
- Windows 11 — untested, feedback welcome via Issues
Troubleshooting
See TROUBLESHOOTING.md for detailed solutions including how to run large overnight batch jobs from the command line.
Quick checklist:
- [ ] Config paths use double backslashes (
C:\\whisper\\...) - [ ]
whisper-cli.exeexists at the path specified - [ ] The model
.binfile exists at the path specified - [ ] Claude Desktop was fully restarted after editing the config
- [ ] Whisper shows running in Settings → Developer
- [ ] FFmpeg is in PATH if using video files
Roadmap
- [ ] AMD GPU acceleration (ROCm)
- [ ] NVIDIA GPU acceleration (CUDA)
- [ ] Speaker diarization (automatic A/B speaker identification)
- [ ] Translation to English from other languages
- [ ] Unattended background batch processing
Support this project
If this tool saved you time and you'd like to support continued development:
- ⭐ Star this repo — it helps others find it
- 💬 Open an issue if you find a bug or have a feature request
- 💖 Sponsor — GitHub Sponsors | Ko-fi | Patreon
License
MIT — free to use, modify, and distribute.
Contributing
Pull requests welcome. GPU acceleration for AMD or NVIDIA especially appreciated. Windows 11 feedback welcome via Issues.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.