youtube-transcriber-mcp

youtube-transcriber-mcp

Enables intelligent transcription of YouTube videos with automatic optimization for any video length, using local OpenAI Whisper processing and speaker diarization.

Category
Visit Server

README

YouTube Transcriber MCP

A Model Context Protocol (MCP) server that enables intelligent transcription of YouTube videos with automatic optimization for any video length. This tool integrates with desktop applications to provide high-quality, local transcription capabilities using OpenAI Whisper with smart processing strategies.

Features

  • Automatic Strategy Selection: Intelligently chooses optimal processing method based on video duration
  • Long Video Support: Efficiently handles videos from minutes to hours with smart sampling
  • Local Processing: All transcription happens on your machine - no external APIs required
  • Speaker Identification: Automatically detects and labels different speakers in videos using local diarization
  • High Accuracy: Leverages OpenAI Whisper for state-of-the-art transcription quality
  • MCP Integration: Seamlessly works with MCP-compatible applications
  • Automatic Cleanup: Downloaded files are automatically removed after processing
  • Multiple Model Sizes: Choose from tiny to large models based on your accuracy/speed needs

Installation

Prerequisites

  • Python 3.8 or higher
  • FFmpeg installed on your system
  • MCP-compatible application (e.g., Claude Desktop)

Install FFmpeg

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt update
sudo apt install ffmpeg

Windows: Download from FFmpeg website

Setup

  1. Clone the repository:
git clone https://github.com/StevenGeller/youtube-transcriber-mcp.git
cd youtube-transcriber-mcp
  1. Create a virtual environment:
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Configuration

For Claude Desktop

  1. Open Claude Desktop settings
  2. Navigate to the "Developer" section
  3. Under "Edit Config", add the YouTube transcriber to your MCP servers:
{
  "mcpServers": {
    "youtube-transcriber": {
      "command": "/path/to/youtube-transcriber-mcp/venv/bin/python",
      "args": ["/path/to/youtube-transcriber-mcp/youtube_mcp_server.py"],
      "env": {
        "PYTHONUNBUFFERED": "1"
      }
    }
  }
}

Important: Replace /path/to/youtube-transcriber-mcp with the actual path where you cloned the repository.

Example for macOS:

{
  "mcpServers": {
    "youtube-transcriber": {
      "command": "/Users/yourusername/youtube-transcriber-mcp/venv/bin/python",
      "args": ["/Users/yourusername/youtube-transcriber-mcp/youtube_mcp_server.py"],
      "env": {
        "PYTHONUNBUFFERED": "1"
      }
    }
  }
}
  1. Save the configuration
  2. Restart Claude Desktop

For Other MCP Clients

The server follows the MCP standard and can be used with any MCP-compatible client. The key configuration elements are:

  • Command: Path to the Python interpreter in your virtual environment
  • Arguments: Path to youtube_mcp_server.py
  • Environment: Set PYTHONUNBUFFERED=1 for proper output handling

Usage

Once configured, you can transcribe YouTube videos by asking:

  • "Transcribe this YouTube video: [URL]"
  • "Get the transcript from: [URL]"
  • "Transcribe [URL] without timestamps"

The server automatically optimizes processing based on video length:

Automatic Strategy Selection

Video Duration Strategy Description
≤ 10 minutes Full Transcription Complete word-for-word transcription with base model
10-60 minutes Chunked Processing Parallel processing of 5-minute segments for faster results
> 60 minutes Smart Sampling Transcribes key sections (intro, conclusion, quarter points) for quick overview

Model Sizes

  • tiny: Fastest, least accurate (~39M parameters)
  • base: Good balance (default for short videos, ~74M parameters)
  • small: Better accuracy (~244M parameters)
  • medium: High accuracy (~769M parameters)
  • large: Best accuracy (~1550M parameters)

Note: The server automatically selects appropriate model sizes based on video duration to optimize performance.

Advanced Features

Long Video Optimization

The transcriber automatically handles long videos efficiently:

  • Automatic Detection: Analyzes video duration and selects optimal strategy
  • Chunked Processing: For medium videos (10-60 min), splits into chunks for parallel processing
  • Smart Sampling: For long videos (>60 min), intelligently samples key sections:
    • Introduction (first 2 minutes)
    • Key points at 25%, 50%, 75% marks
    • Conclusion (last 2 minutes)
  • Performance: ~90% time savings on long videos while capturing essential content

Speaker Diarization

The transcriber includes built-in local speaker diarization that works completely offline:

  • Detects the number of speakers in the video
  • Segments the audio by speaker
  • Labels each transcript segment with the appropriate speaker
  • Uses MFCC features and clustering for voice identification

Project Structure

youtube-transcriber-mcp/
├── youtube_mcp_server.py     # Main MCP server
├── transcriber.py            # WhisperX transcription engine
├── local_diarization.py      # Local speaker diarization
├── quiet_transcriber.py      # Fallback transcriber
├── requirements.txt          # Python dependencies
└── README.md                # This file

Troubleshooting

"Server disconnected" error

  • Ensure FFmpeg is installed and in your PATH
  • Check that all Python dependencies are installed
  • Verify the file paths in your MCP configuration

Memory issues

  • Try using a smaller model size
  • Ensure you have sufficient RAM (4GB+ recommended)

Speaker identification issues

  • The local diarization should work automatically
  • If speaker detection fails, all speech will be labeled as SPEAKER_00
  • Check the logs for any error messages

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is released into the public domain under The Unlicense - see the LICENSE file for details.

Acknowledgments

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured