youtube-api-mcp
MCP server providing tools to fetch YouTube video transcripts with metadata, supporting direct YouTube transcripts and audio transcription via multiple backends (whisper, AssemblyAI, OpenAI, Gemini).
README
YouTube Transcript API Service
API service to fetch YouTube video transcripts with metadata and local file caching.
Features
- ๐ฅ Fetch YouTube video transcripts with metadata
- ๐ง
transcript_from_audiogeneration usingyt-dlp+ffmpegwith selectable backend:faster-whisper,assembly,openai, orgemini - ๐งต Background transcription jobs with progress polling
- ๐ชต Timestamped development logs with visible active log level and transcription backend at startup
- ๐พ Local file caching with unlimited retention by default
- ๐ Always returns first available transcript (native/original language)
- ๐ณ Docker support
- ๐ MCP (Model Context Protocol) server integration
- ๐ Interactive Swagger documentation
- โก Rate limiting
- ๐ Optional API key authentication
- ๐ผ๏ธ Basic metadata includes: title, author, duration, views, publish date, thumbnail, description
- ๐ Full metadata endpoint with all yt-dlp fields (50+ fields)
Quick Start
Local Development
# Install dependencies
sudo apt install python3-venv ffmpeg
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
# Copy environment variables and configure
cp .env.example .env
# Edit .env if needed
# Run server in dev mode (with hot-reload, no __pycache__)
./run-api-dev.sh
The development startup script:
- loads
.envfrom the project root if present - shows the active log level
- shows the active transcription backend
- enables timestamped Uvicorn logs through
app/uvicorn_log_config.json
Or manually:
PYTHONDONTWRITEBYTECODE=1 uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload --log-config app/uvicorn_log_config.json
Docker
Docker uses a glibc-based Python image for compatibility with faster-whisper and ctranslate2, which are not reliably installable on python:3.14-alpine.
# Copy environment variables
cp .env.example .env
# Edit .env if needed
# Build and run with Docker Compose
sudo docker-compose up --build
# Or build and run manually
sudo docker build -t youtube-transcript-api .
sudo docker run -p 8000:8000 -v $(pwd)/cache:/app/cache -v $(pwd)/data:/app/data youtube-transcript-api
API Endpoints
1. Health Check
GET /api/v1/health
Returns application health and runtime information.
Response:
{
"status": "healthy",
"version": "1.1.0",
"uptime_seconds": 12.34,
"transcription_backend": "faster-whisper",
"transcript_from_audio_enabled": true,
"cache_path": "/app/cache",
"cache_accessible": true,
"whisper_model_loaded": false
}
2. Get Transcript with Basic Metadata
GET /api/v1/youtube/transcript/{video_id}?use_cache=true&force_refresh=false&language=en
When _APP_TRANSCRIPT_FROM_AUDIO=true, this endpoint can automatically queue transcript_from_audio processing if direct YouTube transcript fetch fails for reasons such as disabled transcripts or YouTube-side access issues. Video unavailable remains a hard error.
Parameters:
video_id(path): YouTube video ID (11 chars) or full URL- Examples:
mQ-y2ZOTpr4orhttps://www.youtube.com/watch?v=mQ-y2ZOTpr4
- Examples:
use_cache(query): Enable or disable cache lookup for direct YouTube transcriptsforce_refresh(query): Skip direct transcript cache lookup and overwrite the direct cache sectionlanguage(query): Preferred transcript language code (optional)
Returns:
metadata- Basic video metadatatranscript_youtube- Direct transcript fetched from YouTubetranscript_audio- Transcript generated from the audio track if availablesource_preference- Response source ordering metadata
Response:
{
"video_id": "mQ-y2ZOTpr4",
"metadata": {
"title": "Video Title",
"author": "Channel Name",
"duration": 218,
"publish_date": "20251203",
"view_count": 9084,
"thumbnail": "https://i.ytimg.com/vi/...",
"description": "Full description..."
},
"transcript_youtube": {
"transcript": "Full transcript text here...",
"language": "en",
"source": "youtube",
"cache_used": false,
"cached_at": null
},
"transcript_audio": null,
"source_preference": ["youtube", "audio"]
}
Fallback response when direct transcript fetching queues audio transcription:
{
"video_id": "3LbZP0sYmPw",
"status": "queued",
"message": "Direct YouTube transcript fetch failed: Transcripts are disabled for video 3LbZP0sYmPw. Transcript_from_audio status for video_id 3LbZP0sYmPw is 'queued'. Background transcription message: Transcript queued for background processing by video_id using backend 'assembly'. The transcript should be available in a few minutes. Check status by the same video_id.",
"progress_percent": 0,
"transcript_from_audio_reason": "Transcripts are disabled for video 3LbZP0sYmPw",
"result": null
}
3. Get Transcript with Full Metadata
GET /api/v1/youtube/transcript/raw/{video_id}?use_cache=true&force_refresh=false&language=en
Returns complete yt-dlp metadata together with separated transcript payloads.
Response:
{
"video_id": "mQ-y2ZOTpr4",
"metadata": {
"title": "Video Title",
"channel_id": "UC123..."
},
"transcript_youtube": {
"transcript": "Full transcript text here...",
"language": "en",
"source": "youtube",
"cache_used": true,
"cached_at": "2026-03-13T12:34:56"
},
"transcript_audio": null,
"source_preference": ["youtube", "audio"]
}
4. Queue Audio Transcription
POST /api/v1/youtube/audio-transcript/{video_id}
Queues background transcription for the provided video_id, downloads the audio track, normalizes it with ffmpeg, transcribes it with the configured backend, and stores the result under transcript_from_audio in cache.
Supported backends:
faster-whisperassemblyopenaigemini
Response:
{
"status": "queued",
"video_id": "mQ-y2ZOTpr4",
"message": "Transcript queued for background processing by video_id using backend 'assembly'",
"progress_percent": 0,
"result": null
}
5. Check Background Transcription Status
GET /api/v1/youtube/audio-transcript/{video_id}
Returns the current background transcription state with step-level progress.
Possible states:
queueddownloading_audioextracting_audioloading_modeluploading_audioawaiting_providertranscribingcompletedfailed
Response:
{
"video_id": "mQ-y2ZOTpr4",
"status": "transcribing",
"current_step": "transcribing",
"message": "Transcribing audio with backend 'assembly' from data/work/mQ-y2ZOTpr4/audio.wav",
"progress_percent": 70,
"created_at": "2026-03-12T05:40:00",
"updated_at": "2026-03-12T05:41:10",
"error": null,
"result": null
}
6. Cache Status
GET /api/v1/cache
Response:
{
"status": "healthy",
"cache_size": 12,
"cache_path": "./cache",
"cache_size_bytes": 482102,
"cache_size_mb": 0.46,
"max_cache_size_mb": 0
}
7. List Cache Entries
GET /api/v1/cache/entries
Requires API key authentication when enabled.
Response:
{
"status": "healthy",
"entries": [
{
"video_id": "mQ-y2ZOTpr4",
"file_name": "mQ-y2ZOTpr4.json",
"size_bytes": 20480,
"updated_at": "2026-03-13T20:00:00"
}
],
"cache_size": 1,
"cache_size_bytes": 20480,
"cache_size_mb": 0.02,
"max_cache_size_mb": 0
}
8. Clear All Cache Entries
DELETE /api/v1/cache
Requires API key authentication when enabled.
9. Clear Cache Entry By Video ID
DELETE /api/v1/cache/{video_id}
Requires API key authentication when enabled.
10. Root Endpoint
GET /
Returns API information and available endpoints.
Behavior
Language Handling
The API returns the best available transcript based on YouTube availability and your optional preferred language.
- Prefers manual transcripts over auto-generated ones
- Optional
languagequery parameter can be used as a preferred transcript language hint - Response separates direct and audio transcript payloads instead of concatenating them
- Cache is stored as
video_id.jsonwith separate sections for direct and audio transcripts
Cache Logic
- The API first checks cache by
video_id. - If not found, it fetches a transcript from YouTube.
- Direct transcripts are stored under
direct_from_youtube. - Audio transcription results are stored under
transcript_from_audio. - Cached transcript files live under
cache/{video_id}.json. - By default,
_APP_MAX_CACHE_SIZE_MB=0keeps cache size unlimited. - By default,
_APP_CACHE_TTL_DAYS=0disables cache expiration. - Automatic eviction only happens if
_APP_MAX_CACHE_SIZE_MBis set to a value greater than0. - Automatic expiration only happens if
_APP_CACHE_TTL_DAYSis set to a value greater than0.
If _APP_TRANSCRIPT_FROM_AUDIO=true and direct transcript fetching fails for an eligible reason, the standard transcript endpoint and MCP get_youtube_transcript tool automatically queue or reuse background audio transcription for the same video_id.
Transcript From Audio Logic
- The request endpoint normalizes the input to
video_id. - It checks
cache/<video_id>.jsonfortranscript_from_audio. - If not cached, it creates or reuses a file-backed background status entry in
data/jobs/. - The worker uses
data/work/<video_id>/as temporary workspace. - The worker downloads audio with
yt-dlp. ffmpegconverts audio to mono 16k WAV.- The configured backend generates the final transcript.
- The result is stored in cache and exposed through HTTP and MCP polling.
- Temporary work files are removed after processing when
_APP_JOB_CLEANUP_TEMP_FILES=true.
Documentation
Interactive API documentation is available at:
MCP Server
MCP (Model Context Protocol) server is integrated with FastAPI and supports StreamableHttpTransport.
Set _APP_MCP_HIDE_CLEAR_CACHE=true to hide the clear_cache tool from the MCP tools list.
- MCP Endpoint:
http://localhost:8000/api/v1/mcp - Transport:
streamable_http - Tools:
get_youtube_transcriptrequest_youtube_audio_transcriptget_youtube_audio_transcriptclear_cache
MCP Tool Example
from mcp.client.session import ClientSession
from mcp.client.streamable_http import streamable_http_transport
async with streamable_http_transport("http://localhost:8000/api/v1/mcp") as transport:
async with ClientSession(transport) as session:
await session.initialize()
await session.call_tool(
"get_youtube_transcript",
arguments={"video_id": "9Wg6tiaar9M"},
)
MCP Config for IDEs
{
"mcpServers": {
"youtube-transcript": {
"url": "http://localhost:8000/api/v1/mcp",
"transport": "streamable_http"
}
}
}
Configuration
All environment variables use the _APP_ prefix. Copy .env.example to .env and adjust values for your environment.
Key groups:
- API paths and CORS
- Cache, jobs, and work directories
- Transcript and audio fallback behavior
- Provider/backend configuration
- Optional API key authentication
- Logging and port binding
_APP_API_KEY is used only for external provider authentication. _APP_X_API_KEY independently enables API access control for incoming HTTP and MCP requests. Leaving _APP_X_API_KEY empty keeps incoming API and MCP authentication disabled.
Backend Examples
Faster Whisper
_APP_TRANSCRIPTION_BACKEND=faster-whisper
_APP_WHISPER_MODEL=large-v3
_APP_WHISPER_DEVICE=cpu
_APP_WHISPER_COMPUTE_TYPE=int8
AssemblyAI
_APP_TRANSCRIPTION_BACKEND=assembly
_APP_API_KEY=your-assembly-api-key
_APP_BASE_URL=https://api.assemblyai.com
_APP_MODEL=universal-3-pro,universal-2
_APP_LANGUAGE_DETECTION=true
OpenAI
_APP_TRANSCRIPTION_BACKEND=openai
_APP_API_KEY=your-openai-api-key
_APP_BASE_URL=https://api.openai.com/v1
_APP_MODEL=gpt-4o-mini-transcribe
Gemini
_APP_TRANSCRIPTION_BACKEND=gemini
_APP_API_KEY=your-gemini-api-key
_APP_BASE_URL=https://generativelanguage.googleapis.com
_APP_MODEL=gemini-2.5-flash
Architecture
app/
โโโ main.py
โโโ config.py
โโโ models.py
โโโ middleware/
โ โโโ auth.py
โ โโโ process_time.py
โโโ routers/
โ โโโ transcript.py
โ โโโ transcript_from_audio.py
โโโ services/
โ โโโ background_transcription_service.py
โ โโโ cache_service.py
โ โโโ job_service.py
โ โโโ service_container.py
โ โโโ transcription_backend_service.py
โ โโโ transcript_from_audio_cache_service.py
โ โโโ youtube_service.py
โโโ utils/
โ โโโ transcript_utils.py
โโโ mcp/
โโโ server.py
Development
Project Status
Current version: 1.1.0
Implemented features:
- Cache service with atomic writes, optional TTL, and optional max-size eviction
- Direct transcript and audio transcript fallback flows
- Cache retention defaults to unlimited size and no expiration
- Background job status files stored under
data/jobs - Background worker temporary files stored under
data/work - Shared service container across REST and MCP
- REST cache management endpoints
- Optional CORS middleware
- Optional API key authentication
- Docker and Compose support
- Swagger and ReDoc documentation
Cache Structure
Runtime data is split between persistent cache entries and background-processing state.
cache/
โโโ {video_id}.json
data/
โโโ jobs/
โ โโโ {video_id}.json
โโโ work/
โโโ {video_id}/
โโโ source_audio.*
โโโ audio.wav
cache/{video_id}.json contains:
video_iddirect_from_youtubetranscript_from_audio
data/jobs/{video_id}.json contains the current background job state, including:
statuscurrent_stepprogress_percentmessageerrorresult
data/work/{video_id}/ contains temporary downloaded and normalized audio files while a background job is running.
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.