video-toolkit-mcp
A Model Context Protocol (MCP) server that provides comprehensive video tools: transcript retrieval, video downloading, and automatic subtitle generation using AI speech-to-text. Works with YouTube, Bilibili, Vimeo, and any platform supported by yt-dlp.
README
Video Toolkit MCP Server
A Model Context Protocol (MCP) server that provides comprehensive video tools: transcript retrieval, video downloading, and automatic subtitle generation using AI speech-to-text. Works with YouTube, Bilibili, Vimeo, and any platform supported by yt-dlp.
Features
- Multi-Platform Support: Works with YouTube, Bilibili, Vimeo, and any platform supported by yt-dlp
- Video Transcripts: Extract existing transcripts/captions from videos
- Video Downloads: Download videos to local storage in various formats and qualities
- Auto Subtitle Generation: Generate subtitles using OpenAI Whisper API or local Whisper
- Multiple URL Formats: Support for various URL formats from different platforms
- Timestamp Support: Include or exclude timestamps in transcript output
- Language Selection: Request transcripts or generate subtitles in specific languages
Tools
| Tool | Description |
|---|---|
get-transcript |
Retrieve existing transcripts from video platforms |
list-transcript-languages |
List available transcript languages for a video |
download-video |
Download videos to local storage |
list-downloads |
List downloaded video files |
generate-subtitles |
Generate subtitles using AI speech-to-text |
Prerequisites
- Node.js >= 16.0.0
- yt-dlp - Required for transcript fetching and video downloads
- ffmpeg - Required for subtitle generation (audio extraction)
Installing Dependencies
yt-dlp (required):
# Using Homebrew (macOS)
brew install yt-dlp
# Using pip
pip install yt-dlp
ffmpeg (required for subtitle generation):
# Using Homebrew (macOS)
brew install ffmpeg
# Using apt (Ubuntu/Debian)
sudo apt install ffmpeg
Local Whisper (optional, for local subtitle generation):
pip install openai-whisper
Installation
From Source
git clone <repository-url>
cd video-toolkit-mcp
npm install
npm run build
Global Installation (after publishing)
npm install -g video-toolkit-mcp
Configuration
For Claude Desktop / Cursor
Add the MCP server to your configuration file:
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"video-toolkit-mcp": {
"command": "node",
"args": ["/path/to/video-toolkit-mcp/dist/index.js"],
"env": {
"VIDEO_TOOLKIT_STORAGE_DIR": "/path/to/downloads",
"OPENAI_API_KEY": "your-openai-api-key"
}
}
}
}
Cursor (~/.cursor/mcp.json):
{
"mcpServers": {
"video-toolkit-mcp": {
"command": "node",
"args": ["/path/to/video-toolkit-mcp/dist/index.js"],
"env": {
"VIDEO_TOOLKIT_STORAGE_DIR": "/path/to/downloads",
"OPENAI_API_KEY": "your-openai-api-key"
}
}
}
}
Environment Variables
| Variable | Description | Default |
|---|---|---|
VIDEO_TOOLKIT_STORAGE_DIR |
Default directory for downloaded videos | ~/.video-toolkit/downloads |
OPENAI_API_KEY |
OpenAI API key for Whisper-based subtitle generation | None |
VIDEO_TOOLKIT_WHISPER_ENGINE |
Preferred whisper engine: openai, local, or auto |
auto |
WHISPER_BINARY_PATH |
Path to local whisper binary | whisper |
WHISPER_MODEL_PATH |
Path to whisper model (for local whisper) | Auto-download |
YT_DLP_PATH |
Path to yt-dlp binary | yt-dlp |
FFMPEG_PATH |
Path to ffmpeg binary | ffmpeg |
DEBUG |
Enable debug logging | 0 |
Usage
1. get-transcript
Retrieve existing transcripts from video platforms.
Parameters:
url(required): Video URLlang(optional): Language code (e.g., 'en', 'es', 'zh')include_timestamps(optional): Include timestamps (default: true)
Example:
Get the transcript from https://www.youtube.com/watch?v=VIDEO_ID
2. list-transcript-languages
List available transcript languages for a video.
Parameters:
url(required): Video URL
Example:
What transcript languages are available for https://www.youtube.com/watch?v=VIDEO_ID?
3. download-video
Download a video to local storage.
Parameters:
url(required): Video URL to downloadoutput_dir(optional): Custom output directoryfilename(optional): Custom filenameformat(optional): Video format -mp4,webm,mkv(default: mp4)quality(optional): Quality -best,1080p,720p,480p,360p,audio(default: best)
Example:
Download this video: https://www.youtube.com/watch?v=VIDEO_ID
4. list-downloads
List all downloaded video files.
Parameters:
directory(optional): Directory to list (default: storage directory)
Example:
List my downloaded videos
5. generate-subtitles
Generate subtitles for a local video file using AI speech-to-text.
Parameters:
video_path(required): Absolute path to the video fileengine(optional):openaiorlocal(default: auto-detect)language(optional): Language code for transcriptionoutput_format(optional):srtorvtt(default: srt)
Example:
Generate subtitles for /path/to/video.mp4
Subtitle Generation Engines
OpenAI Whisper API
- Pros: High accuracy, no local setup needed, supports 50+ languages
- Cons: Requires API key, costs per audio minute
- Setup: Set
OPENAI_API_KEYenvironment variable
Local Whisper
- Pros: Free, runs locally, no API limits
- Cons: Requires setup, uses local CPU/GPU
- Setup:
pip install openai-whisper
The tool auto-detects which engine to use:
- If
OPENAI_API_KEYis set, uses OpenAI Whisper - If local whisper is installed, uses local whisper
- Returns an error if neither is available
Example Workflows
Download and Generate Subtitles
1. Download this video: https://www.youtube.com/watch?v=VIDEO_ID
2. Generate subtitles for the downloaded file
Summarize a Video
Get the transcript from https://www.youtube.com/watch?v=VIDEO_ID and summarize the key points
Create Captions for Videos Without Subtitles
1. Download the video: https://vimeo.com/123456789
2. Generate English subtitles for it
Supported Platforms
Any platform supported by yt-dlp, including:
- YouTube
- Bilibili
- Vimeo
- Twitter/X
- TikTok
- Twitch
- And many more...
Full list: https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md
Project Structure
video-toolkit-mcp/
├── src/
│ ├── index.ts # Main MCP server entry point
│ ├── transcript-fetcher.ts # Transcript fetching using yt-dlp
│ ├── video-downloader.ts # Video download functionality
│ ├── subtitle-generator.ts # AI-powered subtitle generation
│ ├── config.ts # Configuration management
│ ├── url-detector.ts # Platform detection from URLs
│ ├── parser.ts # Transcript parsing (SRT, VTT, JSON)
│ └── errors.ts # Custom error classes
├── test/
│ └── transcript.test.ts # Unit tests
├── dist/ # Compiled JavaScript (after build)
└── package.json
Development
# Build
npm run build
# Test
npm test
# Development mode
npm run dev
Troubleshooting
"yt-dlp is not installed"
brew install yt-dlp
# or
pip install yt-dlp
"ffmpeg is not installed"
brew install ffmpeg
"No Whisper engine available"
Either:
- Set
OPENAI_API_KEYenvironment variable, or - Install local whisper:
pip install openai-whisper
Download issues
- Check if the video is publicly accessible
- Some platforms may have rate limits
- Private/restricted videos cannot be downloaded
Subtitle generation is slow
- OpenAI Whisper API is faster than local
- Local whisper performance depends on your hardware
- Consider using a smaller model for local whisper
License
MIT
Acknowledgments
- yt-dlp for video platform support
- OpenAI Whisper for speech-to-text
- Model Context Protocol for the MCP framework
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.