MCP Servers

Video Caption MCP Server

An MCP server that automatically transcribes video content and burns stylized captions directly into the video file. It leverages the Groq Whisper API for fast transcription and supports multiple visual styles tailored for social media and professional content.

README

🎬 Video Caption MCP Server

An MCP (Model Context Protocol) server that automatically transcribes and burns stylized captions into videos. Designed to work with Poke by Interaction Co.

Upload a video → AI transcribes it → Stylized captions are burned in → Download the result.

How It Works

You → Poke: "Caption this video: https://example.com/video.mp4"
     ↓
Poke calls your MCP server's `caption_video` tool
     ↓
1. Downloads the video
2. FFmpeg extracts audio (16kHz mono WAV)
3. Groq Whisper API transcribes with timestamps (FREE!)
4. Generates SRT subtitle file
5. FFmpeg burns styled captions into the video
     ↓
Returns: download link + full transcript

Caption Styles

Style	Look	Best For
`tiktok` (default)	Poppins bold white on dark box	TikTok, Reels, Shorts
`modern`	Poppins white with outline	General purpose
`classic`	Yellow text, bottom	Movies, TV style
`minimal`	Small Poppins, bottom-left	Clean, professional
`bold`	Impact font, heavy shadow	Maximum readability

Setup (15 minutes)

1. Get a Free Groq API Key

Go to console.groq.com
Sign up (no credit card needed)
Go to API Keys → Create new key
Copy the key – you'll need it in step 3

Groq's free tier includes Whisper transcription at no cost with generous rate limits.

2. Deploy to Render

Option A: One-Click Deploy

Click the button above
Connect your GitHub account
It will create a new repo from this template and deploy it

Option B: Manual Deploy

Fork this repo to your GitHub
Go to render.com → New → Web Service
Connect your forked repo
Render auto-detects the Dockerfile
Click "Create Web Service"

3. Set Environment Variables

In your Render dashboard → Environment:

Variable	Value	Required
`GROQ_API_KEY`	Your Groq API key	✅ Yes
`BASE_URL`	`https://your-app-name.onrender.com`	✅ Yes
`WHISPER_MODEL`	`whisper-large-v3-turbo`	No (default)
`MAX_VIDEO_DURATION_SEC`	`600`	No (default: 10 min)
`PORT`	`8000`	No (default)

⚠️ Important: Set BASE_URL to your actual Render URL so download links work!

4. Connect to Poke

Go to poke.com/settings/connections/integrations/new
Enter a name: Video Captioner
Enter the MCP Server URL: https://your-app-name.onrender.com/mcp
Click Create Integration

5. Test It!

Message Poke:

"Use the Video Captioner integration's caption_video tool to caption this video: https://example.com/my-video.mp4"

Or more naturally:

"Can you add captions to this video? https://example.com/my-video.mp4 Use the bold style."

MCP Tools

`caption_video`

Transcribes and burns captions into a video.

Parameters:

video_url (required): Direct URL to a video file
language (optional): ISO-639-1 code, default "en"
style (optional): modern, classic, minimal, or bold
font_size (optional): 12-72, default 24

`list_caption_styles`

Returns all available caption style presets with descriptions.

Local Development

# Clone the repo
git clone https://github.com/YOUR_USERNAME/video-caption-mcp.git
cd video-caption-mcp

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows

# Install dependencies
pip install -r requirements.txt

# Make sure FFmpeg is installed
ffmpeg -version  # should work

# Set environment variables
export GROQ_API_KEY="your-key-here"
export BASE_URL="http://localhost:8000"

# Run the server
python src/server.py

Test with the MCP Inspector:

npx @modelcontextprotocol/inspector
# Connect to http://localhost:8000/mcp using "Streamable HTTP" transport

Architecture

video-caption-mcp/
├── src/
│   └── server.py          # FastMCP server + file serving
├── Dockerfile             # Python 3.13 + FFmpeg
├── render.yaml            # Render deployment config
├── requirements.txt       # Python dependencies
└── README.md

The server exposes:

POST /mcp – MCP protocol endpoint (for Poke)
GET /files/{job_id}/{filename} – Serves captioned video downloads
GET /health – Health check

Limitations

Video size: Groq free tier accepts audio up to 25 MB (roughly 10-15 min of video audio)
Duration: Default max 10 minutes (configurable via MAX_VIDEO_DURATION_SEC)
Render free tier: May spin down after inactivity; first request after sleep takes ~30s
File cleanup: Output files are auto-deleted after 1 hour
Direct URLs only: The video URL must be a direct download link (not YouTube, etc.)

Tips

For YouTube/social media videos, use a service to get a direct download link first
The modern style works best for vertical/short-form video
Use language parameter for non-English videos for better accuracy
Groq's Whisper is extremely fast – transcription usually takes just seconds

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured