MCP Whisper Transcription
An MCP server for audio/video transcription using MLX-optimized Whisper models, offering fast performance on Apple Silicon with support for multiple output formats and batch processing.
README
MCP Whisper Transcription Server
An MCP (Model Context Protocol) server for audio/video transcription using MLX-optimized Whisper models. Optimized for Apple Silicon devices with ultra-fast performance.
โจ Features
- ๐ MLX-Optimized: Leverages Apple Silicon for blazing-fast transcription (up to 10x faster)
- ๐ฏ Multiple Formats: Supports txt, md, srt, and json output formats
- ๐ฌ Video Support: Automatically extracts audio from video files (MP4, MOV, AVI, MKV)
- ๐ฆ Batch Processing: Process multiple files in parallel with configurable workers
- ๐ง MCP Integration: Full MCP protocol support with tools and resources
- ๐ Performance Tracking: Built-in performance monitoring and reporting
- ๐๏ธ Flexible Models: Choose from 6 different Whisper models (tiny to large-v3-turbo)
- ๐ ๏ธ Error Handling: Robust error handling and validation
- ๐ Concurrent Processing: Thread-safe concurrent transcription support
- ๐ Voice Activity Detection: Optional VAD to remove silence and speed up processing
- ๐งน Hallucination Prevention: Advanced filtering to remove common transcription artifacts
๐ Performance
- Speed: Up to 10x realtime transcription on Apple Silicon
- Memory: Optimized memory usage (< 500MB for most files)
- Concurrent: Handle multiple transcriptions simultaneously
- Scalable: Batch process hundreds of files efficiently
๐ Quick Start
Prerequisites
- Apple Silicon Mac (M1, M2, M3, or later)
- Python 3.10+
- FFmpeg (for video support)
Installation
-
Install FFmpeg (if not already installed):
brew install ffmpeg -
Clone the repository:
git clone https://github.com/galacoder/mcp-whisper-transcription.git cd mcp-whisper-transcription -
Install Poetry (if not already installed):
curl -sSL https://install.python-poetry.org | python3 - -
Install dependencies:
poetry install -
Test the installation:
poetry run python src/whisper_mcp_server.py --help
๐ Configuration
Environment Variables
Create a .env file to customize settings:
# Model Configuration
DEFAULT_MODEL=mlx-community/whisper-large-v3-turbo
OUTPUT_FORMATS=txt,md,srt,json
# Performance Settings
MAX_WORKERS=4
TEMP_DIR=./temp
# Optional: API Keys for future cloud features
# OPENAI_API_KEY=your_key_here
Available Models
| Model | Size | Speed | Memory | Best For |
|---|---|---|---|---|
whisper-tiny-mlx |
39M | ~10x | ~150MB | Quick drafts |
whisper-base-mlx |
74M | ~7x | ~250MB | Balanced performance |
whisper-small-mlx |
244M | ~5x | ~600MB | High quality |
whisper-medium-mlx |
769M | ~3x | ~1.5GB | Professional use |
whisper-large-v3-mlx |
1550M | ~2x | ~3GB | Maximum accuracy |
whisper-large-v3-turbo |
809M | ~4x | ~1.6GB | Recommended |
๐ง Usage
Claude Desktop Integration
Add to your Claude Desktop configuration file:
{
"mcpServers": {
"whisper-transcription": {
"command": "poetry",
"args": ["run", "python", "src/whisper_mcp_server.py"],
"cwd": "/absolute/path/to/mcp-whisper-transcription"
}
}
}
๐ Configuration File Locations:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
Standalone Usage
# Run the MCP server directly
poetry run python src/whisper_mcp_server.py
# Or use the development server
poetry run python -m src.whisper_mcp_server
๐ ๏ธ Available Tools & Resources
MCP Tools
| Tool | Description | Key Parameters |
|---|---|---|
transcribe_file |
Transcribe a single audio/video file | file_path, model, output_formats |
batch_transcribe |
Process multiple files in a directory | directory, pattern, max_workers |
list_models |
Show available Whisper models | None |
get_model_info |
Get details about a specific model | model_id |
clear_cache |
Clear model cache | model_id (optional) |
estimate_processing_time |
Estimate transcription time | file_path, model |
validate_media_file |
Check file compatibility | file_path |
get_supported_formats |
List supported input/output formats | None |
MCP Resources
| Resource | Description | Data Provided |
|---|---|---|
transcription://history |
Recent transcriptions | List of all transcriptions |
transcription://history/{id} |
Specific transcription details | Full transcription metadata |
transcription://models |
Available models | Model specifications and status |
transcription://config |
Current configuration | Server settings and environment |
transcription://formats |
Supported formats | Input/output format details |
transcription://performance |
Performance statistics | Speed, memory, and uptime metrics |
Quick Examples
# Single file transcription
result = await client.call_tool("transcribe_file", {
"file_path": "interview.mp4",
"output_formats": "txt,srt",
"model": "mlx-community/whisper-large-v3-turbo"
})
# Transcription with Voice Activity Detection
result = await client.call_tool("transcribe_file", {
"file_path": "long_interview.mp4",
"output_formats": "txt,srt",
"use_vad": True # Remove silence for faster processing
})
# Batch processing
result = await client.call_tool("batch_transcribe", {
"directory": "./podcasts",
"pattern": "*.mp3",
"max_workers": 4
})
# Check supported formats
formats = await client.call_tool("get_supported_formats", {})
๐งช Development
Running Tests
# Run all tests
poetry run pytest
# Run with coverage
poetry run pytest --cov=src --cov-report=html
# Run specific test file
poetry run pytest tests/test_mcp_tools.py -v
Code Quality
# Format code
poetry run black .
poetry run isort .
# Type checking (optional)
poetry run mypy src/
# Lint code
poetry run flake8 src/
Project Structure
mcp-whisper-transcription/
โโโ src/
โ โโโ whisper_mcp_server.py # Main MCP server
โโโ tests/ # Comprehensive test suite
โโโ examples/ # Usage examples and test files
โโโ transcribe_mlx.py # MLX Whisper integration
โโโ whisper_utils.py # Utility functions
โโโ pyproject.toml # Project configuration
๐ Performance Benchmarks
Test Results (Apple M3 Max)
| Model | Audio Duration | Processing Time | Speed | Memory |
|---|---|---|---|---|
| tiny | 10 minutes | 1.2 minutes | 8.3x | 150MB |
| base | 10 minutes | 1.8 minutes | 5.6x | 250MB |
| small | 10 minutes | 2.5 minutes | 4.0x | 600MB |
| medium | 10 minutes | 4.2 minutes | 2.4x | 1.5GB |
| large-v3 | 10 minutes | 5.8 minutes | 1.7x | 3GB |
| large-v3-turbo | 10 minutes | 3.1 minutes | 3.2x | 1.6GB |
๐ง Troubleshooting
Common Issues
-
FFmpeg not found
brew install ffmpeg -
Model download slow
- Models are cached in
~/.cache/huggingface/ - First download can be slow but subsequent runs are fast
- Models are cached in
-
Memory issues
- Use smaller models (tiny/base) for large files
- Reduce
MAX_WORKERSfor concurrent processing
-
Permission errors
- Ensure proper file permissions
- Check output directory write access
See TROUBLESHOOTING.md for detailed solutions.
๐ Requirements
- Python 3.10+
- Apple Silicon Mac (M1, M2, M3, or later)
- FFmpeg (for video file support)
- 4GB+ RAM (8GB+ recommended for large models)
- 2GB+ free disk space (for model cache)
๐ License
MIT License - see LICENSE file for details.
๐ค Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
๐ Acknowledgments
- Built with FastMCP - Modern MCP server framework
- Powered by MLX Whisper - Apple Silicon optimization
- Original Whisper by OpenAI - Revolutionary speech recognition
- Thanks to the MLX team at Apple for the incredible performance optimizations
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.