MCP Servers

youtube-transcriber-mcp

Enables intelligent transcription of YouTube videos with automatic optimization for any video length, using local OpenAI Whisper processing and speaker diarization.

README

YouTube Transcriber MCP

A Model Context Protocol (MCP) server that enables intelligent transcription of YouTube videos with automatic optimization for any video length. This tool integrates with desktop applications to provide high-quality, local transcription capabilities using OpenAI Whisper with smart processing strategies.

Features

Automatic Strategy Selection: Intelligently chooses optimal processing method based on video duration
Long Video Support: Efficiently handles videos from minutes to hours with smart sampling
Local Processing: All transcription happens on your machine - no external APIs required
Speaker Identification: Automatically detects and labels different speakers in videos using local diarization
High Accuracy: Leverages OpenAI Whisper for state-of-the-art transcription quality
MCP Integration: Seamlessly works with MCP-compatible applications
Automatic Cleanup: Downloaded files are automatically removed after processing
Multiple Model Sizes: Choose from tiny to large models based on your accuracy/speed needs

Installation

Prerequisites

Python 3.8 or higher
FFmpeg installed on your system
MCP-compatible application (e.g., Claude Desktop)

Install FFmpeg

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt update
sudo apt install ffmpeg

Windows: Download from FFmpeg website

Setup

Clone the repository:

git clone https://github.com/StevenGeller/youtube-transcriber-mcp.git
cd youtube-transcriber-mcp

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Configuration

For Claude Desktop

Open Claude Desktop settings
Navigate to the "Developer" section
Under "Edit Config", add the YouTube transcriber to your MCP servers:

{
  "mcpServers": {
    "youtube-transcriber": {
      "command": "/path/to/youtube-transcriber-mcp/venv/bin/python",
      "args": ["/path/to/youtube-transcriber-mcp/youtube_mcp_server.py"],
      "env": {
        "PYTHONUNBUFFERED": "1"
      }
    }
  }
}

Important: Replace /path/to/youtube-transcriber-mcp with the actual path where you cloned the repository.

Example for macOS:

{
  "mcpServers": {
    "youtube-transcriber": {
      "command": "/Users/yourusername/youtube-transcriber-mcp/venv/bin/python",
      "args": ["/Users/yourusername/youtube-transcriber-mcp/youtube_mcp_server.py"],
      "env": {
        "PYTHONUNBUFFERED": "1"
      }
    }
  }
}

Save the configuration
Restart Claude Desktop

For Other MCP Clients

The server follows the MCP standard and can be used with any MCP-compatible client. The key configuration elements are:

Command: Path to the Python interpreter in your virtual environment
Arguments: Path to youtube_mcp_server.py
Environment: Set PYTHONUNBUFFERED=1 for proper output handling

Usage

Once configured, you can transcribe YouTube videos by asking:

"Transcribe this YouTube video: [URL]"
"Get the transcript from: [URL]"
"Transcribe [URL] without timestamps"

The server automatically optimizes processing based on video length:

Automatic Strategy Selection

Video Duration	Strategy	Description
≤ 10 minutes	Full Transcription	Complete word-for-word transcription with base model
10-60 minutes	Chunked Processing	Parallel processing of 5-minute segments for faster results
> 60 minutes	Smart Sampling	Transcribes key sections (intro, conclusion, quarter points) for quick overview

Model Sizes

tiny: Fastest, least accurate (~39M parameters)
base: Good balance (default for short videos, ~74M parameters)
small: Better accuracy (~244M parameters)
medium: High accuracy (~769M parameters)
large: Best accuracy (~1550M parameters)

Note: The server automatically selects appropriate model sizes based on video duration to optimize performance.

Advanced Features

Long Video Optimization

The transcriber automatically handles long videos efficiently:

Automatic Detection: Analyzes video duration and selects optimal strategy
Chunked Processing: For medium videos (10-60 min), splits into chunks for parallel processing
Smart Sampling: For long videos (>60 min), intelligently samples key sections:
- Introduction (first 2 minutes)
- Key points at 25%, 50%, 75% marks
- Conclusion (last 2 minutes)
Performance: ~90% time savings on long videos while capturing essential content

Speaker Diarization

The transcriber includes built-in local speaker diarization that works completely offline:

Detects the number of speakers in the video
Segments the audio by speaker
Labels each transcript segment with the appropriate speaker
Uses MFCC features and clustering for voice identification

Project Structure

youtube-transcriber-mcp/
├── youtube_mcp_server.py     # Main MCP server
├── transcriber.py            # WhisperX transcription engine
├── local_diarization.py      # Local speaker diarization
├── quiet_transcriber.py      # Fallback transcriber
├── requirements.txt          # Python dependencies
└── README.md                # This file

Troubleshooting

"Server disconnected" error

Ensure FFmpeg is installed and in your PATH
Check that all Python dependencies are installed
Verify the file paths in your MCP configuration

Memory issues

Try using a smaller model size
Ensure you have sufficient RAM (4GB+ recommended)

Speaker identification issues

The local diarization should work automatically
If speaker detection fails, all speech will be labeled as SPEAKER_00
Check the logs for any error messages

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is released into the public domain under The Unlicense - see the LICENSE file for details.

Acknowledgments

Built with WhisperX for enhanced transcription
Uses yt-dlp for reliable YouTube downloads
Implements the Model Context Protocol specification

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured