video-toolkit-mcp

video-toolkit-mcp

A Model Context Protocol (MCP) server that provides comprehensive video tools: transcript retrieval, video downloading, and automatic subtitle generation using AI speech-to-text. Works with YouTube, Bilibili, Vimeo, and any platform supported by yt-dlp.

Category
Visit Server

README

transcript-mcp

glama

A Model Context Protocol (MCP) server that provides comprehensive video tools: transcript retrieval, video downloading, automatic subtitle generation, and direct audio transcription. Works with YouTube, Bilibili, Vimeo, and any platform supported by yt-dlp.

Features

  • Multi-Platform Support: Works with YouTube, Bilibili, Vimeo, and any platform supported by yt-dlp
  • Video Transcripts: Extract existing transcripts/captions from videos
  • Video Downloads: Download videos to local storage in various formats and qualities
  • Auto Subtitle Generation: Generate subtitles using OpenAI Whisper API or local Whisper
  • Client Audio Transcription: audio_url fetch (allowlisted), small audio_base64, chunked uploads, optional async jobs, server-side Opus compression, structured JSON results
  • Multiple URL Formats: Support for various URL formats from different platforms
  • Timestamp Support: Include or exclude timestamps in transcript output
  • Language Selection: Request transcripts or generate subtitles in specific languages

Tools

Tool Description
get-transcript Retrieve existing transcripts from video platforms
list-transcript-languages List available transcript languages for a video
download-video Download videos to local storage
list-downloads List downloaded video files
generate-subtitles Generate subtitles using AI speech-to-text
transcribe-audio Transcribe client-provided audio (URL / base64 / path / resource URI)
transcribe_upload_start Start chunked upload for large audio payloads
transcribe_upload_append Append one base64 chunk to an upload session
transcribe_upload_finalize Finish upload and run transcription
transcribe_get_job Poll async transcription jobs
transcribe_cancel_job Cancel an async transcription job

Prerequisites

  • Node.js >= 16.0.0
  • yt-dlp - Required for transcript fetching and video downloads
  • ffmpeg - Required for subtitle generation, audio normalization, Opus compression, and silence-aware splitting (install a build with libopus)

Installing Dependencies

yt-dlp (required):

# Using Homebrew (macOS)
brew install yt-dlp

# Using pip
pip install yt-dlp

ffmpeg (required for subtitle generation):

# Using Homebrew (macOS)
brew install ffmpeg

# Using apt (Ubuntu/Debian)
sudo apt install ffmpeg

Local Whisper (optional, for local subtitle generation):

pip install openai-whisper

Installation

From Source

git clone <repository-url>
cd transcript-mcp
npm install
npm run build

Global Installation (after publishing)

npm install -g transcript-mcp

Configuration

For Claude Desktop / Cursor

Add the MCP server to your configuration file:

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "transcript-mcp": {
      "command": "node",
      "args": ["/path/to/transcript-mcp/dist/index.js"],
      "env": {
        "TRANSCRIPT_MCP_STORAGE_DIR": "/path/to/downloads",
        "OPENAI_API_KEY": "your-openai-api-key"
      }
    }
  }
}

Cursor (~/.cursor/mcp.json):

{
  "mcpServers": {
    "transcript-mcp": {
      "command": "node",
      "args": ["/path/to/transcript-mcp/dist/index.js"],
      "env": {
        "TRANSCRIPT_MCP_STORAGE_DIR": "/path/to/downloads",
        "OPENAI_API_KEY": "your-openai-api-key"
      }
    }
  }
}

Environment Variables

Variable Description Default
TRANSCRIPT_MCP_STORAGE_DIR Default directory for downloaded videos ~/.transcript-mcp/downloads
OPENAI_API_KEY OpenAI API key for Whisper-based subtitle generation None
TRANSCRIPT_MCP_WHISPER_ENGINE Preferred whisper engine: openai, local, or auto auto
VIDEO_TOOLKIT_STORAGE_DIR Legacy alias for TRANSCRIPT_MCP_STORAGE_DIR
VIDEO_TOOLKIT_WHISPER_ENGINE Legacy alias for TRANSCRIPT_MCP_WHISPER_ENGINE
WHISPER_BINARY_PATH Path to local whisper binary whisper
WHISPER_MODEL_PATH Path to whisper model (for local whisper) Auto-download
YT_DLP_PATH Path to yt-dlp binary yt-dlp
FFMPEG_PATH Path to ffmpeg binary ffmpeg
FFPROBE_PATH Path to ffprobe binary Derived from FFMPEG_PATH
TRANSCRIPT_MCP_URL_ALLOWLIST Comma-separated host patterns allowed for audio_url (e.g. *.amazonaws.com,localhost). Empty disables all audio_url fetches empty
DEBUG Enable debug logging 0

Usage

1. get-transcript

Retrieve existing transcripts from video platforms.

Parameters:

  • url (required): Video URL
  • lang (optional): Language code (e.g., 'en', 'es', 'zh')
  • include_timestamps (optional): Include timestamps (default: true)

Example:

Get the transcript from https://www.youtube.com/watch?v=VIDEO_ID

2. list-transcript-languages

List available transcript languages for a video.

Parameters:

  • url (required): Video URL

Example:

What transcript languages are available for https://www.youtube.com/watch?v=VIDEO_ID?

3. download-video

Download a video to local storage.

Parameters:

  • url (required): Video URL to download
  • output_dir (optional): Custom output directory
  • filename (optional): Custom filename
  • format (optional): Video format - mp4, webm, mkv (default: mp4)
  • quality (optional): Quality - best, 1080p, 720p, 480p, 360p, audio (default: best)

Example:

Download this video: https://www.youtube.com/watch?v=VIDEO_ID

4. list-downloads

List all downloaded video files.

Parameters:

  • directory (optional): Directory to list (default: storage directory)

Example:

List my downloaded videos

5. generate-subtitles

Generate subtitles for a local video file using AI speech-to-text.

Parameters:

  • video_path (required): Absolute path to the video file
  • engine (optional): openai or local (default: auto-detect)
  • language (optional): Language code for transcription
  • output_format (optional): srt or vtt (default: srt)

Example:

Generate subtitles for /path/to/video.mp4

6. transcribe-audio

Transcribes audio via Whisper. Prefer audio_url (server fetches bytes; configure TRANSCRIPT_MCP_URL_ALLOWLIST). Use audio_base64 only for small clips (about 60KB raw per call; larger payloads should use chunked upload or a URL). audio_path / file:// only work when the MCP host shares a filesystem with the caller (often false in sandboxed clients).

By default the server re-encodes to Opus 16 kHz mono 16 kbps before Whisper. Set skip_compression: true if you already optimized the file.

Audio longer than 5 minutes (or when async: true) returns { job_id, status: "processing" }; poll transcribe_get_job.

Parameters (one required input):

  • audio_url, audio_path, audio_base64, or audio_resource_uri (file:// / data:...;base64,...)
  • filename (optional): Hint when magic-byte detection is inconclusive
  • skip_compression (optional): Skip Opus recompression (default: false)
  • engine (optional): openai, local, or auto (default: auto)
  • language (optional): Language hint for transcription
  • include_timestamps (optional): When as_text is true, include [MM:SS] lines (default: true)
  • as_text (optional): If true, return plain transcript text; if false, return structured JSON (default: false)
  • async (optional): Force async job (default: false)

Examples:

Transcribe this presigned URL (after allowlisting the host): audio_url=...
Transcribe this audio file on the MCP host: /path/to/interview.m4a

7. transcribeupload* (chunked upload)

For large files, split the raw bytes into base64 chunks of at most max_chunk_bytes (~60KB) from transcribe_upload_start, call transcribe_upload_append for each index, then transcribe_upload_finalize. Abandoned uploads are garbage-collected after about an hour.

8. transcribe_get_job / transcribe_cancel_job

Poll or cancel async jobs created by transcribe-audio (long audio or async: true).

Subtitle Generation Engines

OpenAI Whisper API

  • Pros: High accuracy, no local setup needed, supports 50+ languages
  • Cons: Requires API key, costs per audio minute
  • Setup: Set OPENAI_API_KEY environment variable

Local Whisper

  • Pros: Free, runs locally, no API limits
  • Cons: Requires setup, uses local CPU/GPU
  • Setup: pip install openai-whisper

The tool auto-detects which engine to use:

  1. If OPENAI_API_KEY is set, uses OpenAI Whisper
  2. If local whisper is installed, uses local whisper
  3. Returns an error if neither is available

For transcribe-audio, auto uses OpenAI first and falls back to local whisper when local whisper is available.

Example Workflows

Download and Generate Subtitles

1. Download this video: https://www.youtube.com/watch?v=VIDEO_ID
2. Generate subtitles for the downloaded file

Summarize a Video

Get the transcript from https://www.youtube.com/watch?v=VIDEO_ID and summarize the key points

Create Captions for Videos Without Subtitles

1. Download the video: https://vimeo.com/123456789
2. Generate English subtitles for it

Supported Platforms

Any platform supported by yt-dlp, including:

  • YouTube
  • Bilibili
  • Vimeo
  • Twitter/X
  • TikTok
  • Twitch
  • And many more...

Full list: https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md

Project Structure

transcript-mcp/
├── src/
│   ├── index.ts              # Main MCP server entry point
│   ├── transcript-fetcher.ts # Transcript fetching using yt-dlp
│   ├── video-downloader.ts   # Video download functionality
│   ├── subtitle-generator.ts # AI-powered subtitle generation
│   ├── config.ts             # Configuration management
│   ├── url-detector.ts       # Platform detection from URLs
│   ├── parser.ts             # Transcript parsing (SRT, VTT, JSON)
│   └── errors.ts             # Custom error classes
├── test/
│   └── transcript.test.ts    # Unit tests
├── dist/                     # Compiled JavaScript (after build)
└── package.json

Development

# Build
npm run build

# Test
npm test

# Development mode
npm run dev

Troubleshooting

"yt-dlp is not installed"

brew install yt-dlp
# or
pip install yt-dlp

"ffmpeg is not installed"

brew install ffmpeg

"ffprobe is not installed"

brew install ffmpeg

"No Whisper engine available"

Either:

  • Set OPENAI_API_KEY environment variable, or
  • Install local whisper: pip install openai-whisper

Download issues

  • Check if the video is publicly accessible
  • Some platforms may have rate limits
  • Private/restricted videos cannot be downloaded

Subtitle generation is slow

  • OpenAI Whisper API is faster than local
  • Local whisper performance depends on your hardware
  • Consider using a smaller model for local whisper

License

MIT

Acknowledgments

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured