gemini-video-mcp-server

gemini-video-mcp-server

Enables AI coding assistants like Claude Code and Cursor to analyze and understand video content using Google's Gemini AI, supporting long videos, segment analysis, and Q\&A.

Category
Visit Server

README

Gemini Video Understanding MCP Server

License: MIT Python 3.10+ MCP Gemini

Give your AI coding assistant the power to understand videos!

An MCP (Model Context Protocol) server that enables Claude Code, Cursor, and other AI coding assistants to analyze and understand video content using Google's Gemini AI. Process security footage, lecture recordings, tutorials, and more - directly from your terminal.

Why This Exists

AI coding assistants like Claude Code and Cursor are incredibly powerful, but they can't natively understand video content. This MCP server bridges that gap by:

  • Using Gemini 3 Flash - latest model with 1M context, 3x faster (up to 6 hours of video!)
  • Providing a standardized MCP interface that works with any MCP-compatible client
  • Offering smart time estimation so you know what you're getting into before processing
  • Supporting segment analysis for efficient processing of long videos

Features

Feature Description
Long Video Support Analyze videos up to 6 hours using Gemini 3 Flash's 1M token context
Smart Estimation Get accurate time/cost estimates before processing
Segment Analysis Analyze specific time ranges for faster results
Multiple Modes Summary, detailed analysis, transcript, or timeline
Q&A Capability Ask specific questions about video content
User Prompts Confirms before processing long videos

Use Cases

Security & Surveillance

"Analyze this security footage and tell me if anyone approaches the car between 2am and 4am"
"What time does the person in the dark hoodie appear in this footage?"
"Summarize all activity in this 8-hour security recording"

Education & Learning

"Transcribe this 2-hour lecture on machine learning"
"Create a timeline of topics covered in this computer science class"
"What does the professor say about recursion? Include timestamps"

Code Tutorials & Demos

"What VS Code extensions does the instructor install in this tutorial?"
"At what timestamp does the presenter start explaining the API integration?"
"Summarize the debugging techniques shown in this video"

Meeting Recordings

"What action items were discussed in this meeting?"
"Summarize the key decisions made in this product review"
"Who presented the sales figures and what were the highlights?"

Content Analysis

"What products are shown in this unboxing video?"
"Describe the UI/UX changes demonstrated in this app walkthrough"
"What error messages appear in this bug report screen recording?"

Installation

Prerequisites

  1. Python 3.10+
  2. Google AI Studio API Key (free): https://aistudio.google.com/apikey
  3. ffmpeg (optional, for segment analysis):
    # macOS
    brew install ffmpeg
    
    # Ubuntu/Debian
    sudo apt install ffmpeg
    
    # Windows
    choco install ffmpeg
    

Quick Install

# Clone the repository
git clone https://github.com/yourusername/gemini-video-mcp-server.git
cd gemini-video-mcp-server

# Create virtual environment and install
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .

Using uv (recommended)

git clone https://github.com/yourusername/gemini-video-mcp-server.git
cd gemini-video-mcp-server
uv venv && source .venv/bin/activate
uv pip install -e .

Configuration

For Claude Code

Add to your Claude Code MCP settings (~/.claude/claude_desktop_config.json or via settings):

{
  "mcpServers": {
    "gemini-video": {
      "command": "python",
      "args": ["/absolute/path/to/gemini-video-mcp-server/server.py"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

For Cursor

Add to your Cursor MCP configuration (Settings > MCP Servers):

{
  "mcpServers": {
    "gemini-video": {
      "command": "python",
      "args": ["/absolute/path/to/gemini-video-mcp-server/server.py"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

For Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "gemini-video": {
      "command": "python",
      "args": ["/absolute/path/to/gemini-video-mcp-server/server.py"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Using with uv (all platforms)

{
  "mcpServers": {
    "gemini-video": {
      "command": "uv",
      "args": ["run", "--directory", "/absolute/path/to/gemini-video-mcp-server", "python", "server.py"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Available Tools

estimate_video_analysis

Always call this first! Get time and resource estimates before processing.

Parameters:
- video_path (required): Path to the video file
- known_duration_seconds (optional): Video duration if known

Returns: File size, duration, upload time, processing time, token estimate, recommendations

analyze_video

Full video analysis with multiple modes.

Parameters:
- video_path (required): Path to the video file
- mode: "summary" | "detailed" | "transcript" | "timeline" (default: "summary")
- custom_prompt (optional): Custom analysis prompt
- confirm_long_video: Set True for videos over 30 minutes

Modes:
- summary: Quick 2-3 paragraph overview
- detailed: Comprehensive scene-by-scene analysis
- transcript: Extract and transcribe speech
- timeline: Timestamped list of events

analyze_video_segment

Analyze a specific time range (requires ffmpeg).

Parameters:
- video_path (required): Path to the video file
- start_time (required): Start time ("HH:MM:SS", "MM:SS", or seconds)
- end_time (required): End time (same formats)
- prompt (optional): What to analyze

ask_video_question

Ask specific questions about video content.

Parameters:
- video_path (required): Path to the video file
- question (required): Your question
- provide_timestamps: Include timestamps in answer (default: true)

list_supported_formats

List all supported video formats and limits.

Example Workflow

You: Analyze this security camera footage: /path/to/footage.mp4

Claude: Let me first estimate the analysis time...

[Calls estimate_video_analysis]

This video is 2 hours and 15 minutes long. Full analysis will take approximately 25 minutes.

Would you like to:
1. Analyze the entire video
2. Analyze specific time ranges (faster)
3. Get a quick summary only

You: Just analyze from 2:00:00 to 2:30:00

Claude: [Calls analyze_video_segment with start_time="2:00:00", end_time="2:30:00"]

Here's what I found in that 30-minute segment...

Processing Time Estimates

Video Length Upload Time Processing Total
5 minutes ~30s ~1 min ~1.5 min
30 minutes ~2 min ~3 min ~5 min
1 hour ~5 min ~6 min ~11 min
3 hours ~15 min ~18 min ~33 min
6 hours ~30 min ~36 min ~66 min

Actual times vary based on file size and network speed.

Supported Formats

  • Video: MP4 (recommended), MPEG, MOV, AVI, FLV, WebM, WMV, 3GP, MPG
  • Max Duration: 6 hours (Gemini 2.5 Pro)
  • Max File Size: 2GB per file

Troubleshooting

"GEMINI_API_KEY environment variable is not set"

Ensure the API key is in your MCP server configuration's env block.

"ffmpeg not found"

Install ffmpeg for segment analysis. Full video analysis works without it.

"Video processing failed"

  • Check the video file isn't corrupted
  • Ensure format is supported
  • Try a smaller segment first

Slow processing

  • Use estimate_video_analysis to set expectations
  • Use analyze_video_segment for specific sections
  • Check your internet upload speed

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
GEMINI_API_KEY=your-key python test_server.py

# Run the server directly
GEMINI_API_KEY=your-key python server.py

How It Works

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Claude Code    │     │   MCP Server    │     │   Gemini API    │
│  Cursor/etc     │────▶│  (this repo)    │────▶│  (Google)       │
└─────────────────┘     └─────────────────┘     └─────────────────┘
        │                       │                       │
        │   "Analyze video"     │   Upload & Process    │
        │──────────────────────▶│──────────────────────▶│
        │                       │                       │
        │   Text description    │   Video understanding │
        │◀──────────────────────│◀──────────────────────│
  1. Your AI assistant receives a request about a video
  2. It calls this MCP server with the video path
  3. The server uploads the video to Gemini API
  4. Gemini processes the video (1 frame/second, ~66 tokens/frame)
  5. The analysis is returned as text your assistant can understand

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details.

Acknowledgments


Made with love to give AI coding assistants superpowers

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured