youtube-transcriber-mcp
Enables intelligent transcription of YouTube videos with automatic optimization for any video length, using local OpenAI Whisper processing and speaker diarization.
README
YouTube Transcriber MCP
A Model Context Protocol (MCP) server that enables intelligent transcription of YouTube videos with automatic optimization for any video length. This tool integrates with desktop applications to provide high-quality, local transcription capabilities using OpenAI Whisper with smart processing strategies.
Features
- Automatic Strategy Selection: Intelligently chooses optimal processing method based on video duration
- Long Video Support: Efficiently handles videos from minutes to hours with smart sampling
- Local Processing: All transcription happens on your machine - no external APIs required
- Speaker Identification: Automatically detects and labels different speakers in videos using local diarization
- High Accuracy: Leverages OpenAI Whisper for state-of-the-art transcription quality
- MCP Integration: Seamlessly works with MCP-compatible applications
- Automatic Cleanup: Downloaded files are automatically removed after processing
- Multiple Model Sizes: Choose from tiny to large models based on your accuracy/speed needs
Installation
Prerequisites
- Python 3.8 or higher
- FFmpeg installed on your system
- MCP-compatible application (e.g., Claude Desktop)
Install FFmpeg
macOS:
brew install ffmpeg
Ubuntu/Debian:
sudo apt update
sudo apt install ffmpeg
Windows: Download from FFmpeg website
Setup
- Clone the repository:
git clone https://github.com/StevenGeller/youtube-transcriber-mcp.git
cd youtube-transcriber-mcp
- Create a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
Configuration
For Claude Desktop
- Open Claude Desktop settings
- Navigate to the "Developer" section
- Under "Edit Config", add the YouTube transcriber to your MCP servers:
{
"mcpServers": {
"youtube-transcriber": {
"command": "/path/to/youtube-transcriber-mcp/venv/bin/python",
"args": ["/path/to/youtube-transcriber-mcp/youtube_mcp_server.py"],
"env": {
"PYTHONUNBUFFERED": "1"
}
}
}
}
Important: Replace /path/to/youtube-transcriber-mcp with the actual path where you cloned the repository.
Example for macOS:
{
"mcpServers": {
"youtube-transcriber": {
"command": "/Users/yourusername/youtube-transcriber-mcp/venv/bin/python",
"args": ["/Users/yourusername/youtube-transcriber-mcp/youtube_mcp_server.py"],
"env": {
"PYTHONUNBUFFERED": "1"
}
}
}
}
- Save the configuration
- Restart Claude Desktop
For Other MCP Clients
The server follows the MCP standard and can be used with any MCP-compatible client. The key configuration elements are:
- Command: Path to the Python interpreter in your virtual environment
- Arguments: Path to
youtube_mcp_server.py - Environment: Set
PYTHONUNBUFFERED=1for proper output handling
Usage
Once configured, you can transcribe YouTube videos by asking:
- "Transcribe this YouTube video: [URL]"
- "Get the transcript from: [URL]"
- "Transcribe [URL] without timestamps"
The server automatically optimizes processing based on video length:
Automatic Strategy Selection
| Video Duration | Strategy | Description |
|---|---|---|
| ≤ 10 minutes | Full Transcription | Complete word-for-word transcription with base model |
| 10-60 minutes | Chunked Processing | Parallel processing of 5-minute segments for faster results |
| > 60 minutes | Smart Sampling | Transcribes key sections (intro, conclusion, quarter points) for quick overview |
Model Sizes
- tiny: Fastest, least accurate (~39M parameters)
- base: Good balance (default for short videos, ~74M parameters)
- small: Better accuracy (~244M parameters)
- medium: High accuracy (~769M parameters)
- large: Best accuracy (~1550M parameters)
Note: The server automatically selects appropriate model sizes based on video duration to optimize performance.
Advanced Features
Long Video Optimization
The transcriber automatically handles long videos efficiently:
- Automatic Detection: Analyzes video duration and selects optimal strategy
- Chunked Processing: For medium videos (10-60 min), splits into chunks for parallel processing
- Smart Sampling: For long videos (>60 min), intelligently samples key sections:
- Introduction (first 2 minutes)
- Key points at 25%, 50%, 75% marks
- Conclusion (last 2 minutes)
- Performance: ~90% time savings on long videos while capturing essential content
Speaker Diarization
The transcriber includes built-in local speaker diarization that works completely offline:
- Detects the number of speakers in the video
- Segments the audio by speaker
- Labels each transcript segment with the appropriate speaker
- Uses MFCC features and clustering for voice identification
Project Structure
youtube-transcriber-mcp/
├── youtube_mcp_server.py # Main MCP server
├── transcriber.py # WhisperX transcription engine
├── local_diarization.py # Local speaker diarization
├── quiet_transcriber.py # Fallback transcriber
├── requirements.txt # Python dependencies
└── README.md # This file
Troubleshooting
"Server disconnected" error
- Ensure FFmpeg is installed and in your PATH
- Check that all Python dependencies are installed
- Verify the file paths in your MCP configuration
Memory issues
- Try using a smaller model size
- Ensure you have sufficient RAM (4GB+ recommended)
Speaker identification issues
- The local diarization should work automatically
- If speaker detection fails, all speech will be labeled as SPEAKER_00
- Check the logs for any error messages
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is released into the public domain under The Unlicense - see the LICENSE file for details.
Acknowledgments
- Built with WhisperX for enhanced transcription
- Uses yt-dlp for reliable YouTube downloads
- Implements the Model Context Protocol specification
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.