Veo 3.1 MCP Server
Enables high-quality AI video generation using Google's Veo 3.1 model for text-to-video, style-guided, and frame-interpolation tasks. It features token-efficient reference image handling, batch processing, and video extension capabilities with built-in cost estimation.
README
🎬 Veo 3.1 MCP Server
Token-Efficient AI Video Generation with Google's Veo 3.1
🎯 What is This?
An MCP server for Google's Veo 3.1 - the state-of-the-art AI video generation model. Generate stunning videos from text prompts, reference images, or interpolate between first/last frames.
Key Features
- ✅ Text-to-Video - Generate videos from descriptions
- ✅ Reference Images - Up to 3 images for style guidance
- ✅ Frame Interpolation - First + last frame → coherent video
- ✅ Video Extension - Extend Veo-generated videos
- ✅ Batch Generation - Generate multiple videos with concurrency control
- ✅ Cost Estimation - Know costs before generating
- ✅ Token-Efficient - Auto-upload refs to Files API (97% token savings!)
🚀 Quick Start
1. Installation
cd veo-mcp
npm install
npm run build
2. Get API Key
- Go to Google AI Studio
- Create API key
- Enable Veo 3.1 in your project (billing required)
3. Configure
cp environment.template .env
# Edit .env and add your key
4. Add to Cursor
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"veo": {
"command": "node",
"args": ["C:\\Users\\woute\\Githubs\\MCP\\veo-mcp\\dist\\index.js"],
"env": {
"GEMINI_API_KEY": "your_api_key_here"
}
}
}
}
Restart Cursor. Done! ✅
🛠️ Tools
1. start_video_generation - Generate Video
Basic text-to-video:
{
"prompt": "A serene Zen garden at sunrise, cherry blossoms falling, cinematic"
}
With reference images (token-efficient!):
{
"prompt": "A futuristic cityscape at night, neon lights",
"referenceImages": [{
"source": "url",
"url": "https://example.com/style.jpg"
}],
"durationSeconds": 8,
"resolution": "1080p"
}
First/last frame interpolation:
{
"prompt": "Smooth transition between these scenes",
"firstFrame": {
"source": "file_path",
"filePath": "C:\\first.jpg"
},
"lastFrame": {
"source": "file_path",
"filePath": "C:\\last.jpg"
}
}
Parameters:
model-veo-3.1-generate-001(quality) orveo-3.1-fast-generate-001(speed)durationSeconds- 4, 6, or 8aspectRatio-16:9or9:16resolution-720por1080pgenerateAudio- Include synchronized audio (2x cost)seed- For reproducibilitysampleCount- Generate 1-4 videos
2. get_video_job - Check Status
{
"operationName": "operations/xyz"
}
Returns status and video URLs when complete.
3. upload_image - Pre-Upload References
{
"source": "file_path",
"filePath": "C:\\style-ref.jpg"
}
Returns fileUri valid for 48 hours. Reuse across multiple generations!
4. extend_video - Extend Videos
{
"videoFileUri": "files/abc123",
"additionalSeconds": 7,
"prompt": "Continue with the character walking into the sunset"
}
5. start_batch_video_generation - Batch Generate
{
"jobs": [
{"key": "scene1", "request": {"prompt": "..."}},
{"key": "scene2", "request": {"prompt": "..."}}
],
"concurrency": 3
}
6. estimate_veo_cost - Cost Estimation
{
"model": "veo-3.1-fast-generate-001",
"durationSeconds": 8,
"sampleCount": 1,
"generateAudio": false
}
Returns estimated cost in USD.
💰 Pricing
| Model | Video Only | Video + Audio |
|---|---|---|
| veo-3.1-generate-001 (quality) | $0.20/sec | $0.40/sec |
| veo-3.1-fast-generate-001 (speed) | $0.10/sec | $0.15/sec |
Example Costs:
- 8s video (fast, no audio): $0.80
- 8s video (quality, with audio): $3.20
- 4s video (fast, no audio): $0.40
📊 Limits & Constraints
| Parameter | Limit |
|---|---|
| Duration | 4, 6, or 8 seconds |
| Reference images | 0-3 images |
| Sample count | 1-4 videos |
| Resolutions | 720p, 1080p |
| Aspect ratios | 16:9, 9:16 |
| Rate limit | ~50 requests/min |
💡 Usage Examples
Simple Text-to-Video
Generate an 8-second video of a peaceful forest scene with morning mist
With Style Reference
Create a video of a tech startup office, using this image for style: C:\ref.jpg
Frame Interpolation
Generate a smooth transition between first.jpg and last.jpg, 8 seconds, cinematic camera movement
Batch Generation
Generate 5 different video variations of a product showcase with different angles
🔍 How Token Efficiency Works
❌ Naive Approach (Base64)
{
"referenceImages": [{
"base64": "iVBORw0KGgo..." // 500KB → ~50,000 tokens!
}]
}
Cost: Massive token usage per call
✅ Token-Efficient (This MCP)
{
"referenceImages": [{
"source": "url",
"url": "https://example.com/ref.jpg" // ~20 tokens
}]
}
What Happens:
- Server downloads image (no tokens)
- Computes SHA-256 hash
- Checks cache (48h validity)
- Uploads to Files API if needed (~1s)
- Uses short
files/abc123URI (~5 tokens)
Savings: 97%+ fewer tokens! 🎉
⏱️ Generation Times
| Configuration | Typical Time |
|---|---|
| 4s, 720p, no audio | 30-60 sec |
| 8s, 1080p, no audio | 60-120 sec |
| 8s, 1080p, with audio | 90-150 sec |
| With references | +10-30 sec |
| Frame interpolation | +20-40 sec |
Note: Times vary based on prompt complexity and server load.
🎨 Best Practices
1. Start Small, Scale Up
Step 1: Generate 1 video at 720p
Step 2: If good, regenerate at 1080p
Step 3: Use batch for variations
2. Use Fast Model for Testing
{
"model": "veo-3.1-fast-generate-001", // Testing
"resolution": "720p"
}
Switch to quality model for final:
{
"model": "veo-3.1-generate-001", // Final
"resolution": "1080p"
}
3. Pre-Upload Frequently Used References
// Step 1: Upload once
upload_image {"source": "file_path", "filePath": "brand-style.jpg"}
// Returns: files/xyz123
// Step 2: Reuse many times
{
"referenceImages": [{"source": "file_uri", "fileUri": "files/xyz123"}]
}
4. Leverage Batch for Variations
{
"jobs": [
{"key": "v1", "request": {"prompt": "Scene 1...", "seed": 1}},
{"key": "v2", "request": {"prompt": "Scene 1...", "seed": 2}},
{"key": "v3", "request": {"prompt": "Scene 1...", "seed": 3}}
]
}
5. Monitor Costs
Always estimate before large batches:
estimate_veo_cost {
"model": "veo-3.1-fast-generate-001",
"durationSeconds": 8,
"sampleCount": 10
}
// Returns: $8.00 estimate
🎬 Async Operation Flow
Veo uses async long-running operations:
1. start_video_generation
↓ Returns operationName immediately
2. get_video_job (poll every 10-30s)
↓ Returns {done: false, status: "RUNNING"}
3. get_video_job (after 60-120s)
↓ Returns {done: true, videos: [{videoUri: "..."}]}
4. Download video from videoUri
Tip: Don't poll too frequently (< 10s intervals).
🆘 Troubleshooting
"API not enabled" (403)
- Go to Google Cloud Console
- Enable "Generative Language API"
- Enable billing
- Wait 5-10 minutes for propagation
"Rate limit exceeded"
- Veo allows ~50 requests/min
- Use batch tool with
concurrency: 3 - Add delays between requests
"Invalid aspect ratio with references"
- 9:16 may not work with reference images
- Use 16:9 for reference mode
- Check Veo 3.1 docs for updates
"Video extension failed"
- Only Veo-generated videos can be extended
- Cannot extend arbitrary MP4s
- Input must be from previous Veo job
Long generation times
- 1080p takes longer than 720p
- Audio generation adds time
- Reference images add processing
- Frame interpolation is slowest
📚 Resources
🎯 Status: Production Ready ✅
- ✅ All 6 tools implemented
- ✅ Token-efficient file handling
- ✅ Async operation support
- ✅ Batch generation with concurrency control
- ✅ Cost estimation
- ✅ Comprehensive validation
- ✅ Error handling
- ✅ Full documentation
Ready to generate amazing videos! 🚀
Built with 🎬 for AI video generation
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.