Gemini Transcription MCP
An MCP server for audio-to-text transcription using Google's Gemini API via OpenRouter, offering multiple tools for raw, cleaned, or formatted transcripts with support for local and remote deployment.
README
Gemini Transcription MCP
An MCP server for audio-to-text transcription using Google's Gemini multimodal API.
Quick Start
Claude Code (Recommended)
claude mcp add gemini-transcription -s user \
-e OPENROUTER_API_KEY=your-key \
-- npx -y gemini-transcription-mcp
Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"gemini-transcription": {
"command": "npx",
"args": ["-y", "gemini-transcription-mcp"],
"env": {
"OPENROUTER_API_KEY": "your-key"
}
}
}
}
MetaMCP
Add via the MetaMCP UI or import JSON:
{
"mcpServers": {
"gemini-transcription": {
"command": "npx",
"args": ["-y", "gemini-transcription-mcp"],
"env": {
"OPENROUTER_API_KEY": "your-key"
},
"description": "Audio transcription using Gemini models via OpenRouter"
}
}
}
Or fill in the Add Server form manually:
| Field | Value |
|---|---|
| Command | npx |
| Arguments | -y gemini-transcription-mcp |
| Environment Variables | OPENROUTER_API_KEY=your-key |
Remote Deployment (HTTP Transport)
For deployments that require HTTP transport:
# Using Docker (recommended for remote)
docker run -d \
-p 3000:3000 \
-e OPENROUTER_API_KEY=your-key \
ghcr.io/danielrosehill/gemini-transcription-mcp
# Or run directly with HTTP transport
OPENROUTER_API_KEY=your-key npx gemini-transcription-mcp --http 3000
The server exposes:
http://host:3000/mcp- MCP endpoint (streamable HTTP)http://host:3000/health- Health check
Tools
| Tool | Description |
|---|---|
transcribe_audio |
Lightly edited transcript (removes filler words, applies corrections) |
transcribe_audio_raw |
Verbatim transcript with no cleanup |
transcribe_audio_vad |
VAD preprocessing to strip silence before transcription |
transcribe_audio_format |
Transcribe and format as a document type (email, to-do list, etc.) |
transcribe_audio_large |
Compresses oversized files to Opus before transcribing |
transcribe_audio_custom |
Full control with your own prompt |
transcribe_audio_devspec |
Format as a development specification for AI coding agents |
Input Methods
All tools accept audio via:
file_content: Base64-encoded audiofile_url: HTTP(S) URL to fetchssh_host+ssh_path: Pull via SCP (local deployment only)
Supported Formats
- Native: MP3, WAV, OGG, FLAC, AAC, AIFF
- Auto-converted: Opus, M4A, WebM, WMA, and others (converted to OGG/Opus)
Note: When manually converting audio, prefer MP3 over WAV. MP3 offers good compression with broad compatibility, while WAV files are unnecessarily large.
Configuration
| Environment Variable | Description |
|---|---|
OPENROUTER_API_KEY |
Required. Your OpenRouter API key |
OPENROUTER_MODEL |
Optional. Model to use (default: Gemini Flash Lite) |
TRANSCRIPT_OUTPUT_DIR |
Optional. Auto-save location (default: ./transcripts). Set to empty string to disable. |
MCP_TRANSPORT |
Optional. Set to http for HTTP transport mode |
MCP_PORT |
Optional. Port for HTTP mode (default: 3000) |
Deployment Options
Local (Claude Code, Claude Desktop)
Uses stdio transport. All features available including SSH file retrieval.
# Via npx (recommended)
npx gemini-transcription-mcp
# Or install globally
npm install -g gemini-transcription-mcp
gemini-transcription-mcp
Remote/Docker (MetaMCP, Aggregators)
Uses HTTP transport. Requires container or server with ffmpeg installed.
Docker Compose:
# docker-compose.yml
services:
gemini-transcription:
image: ghcr.io/danielrosehill/gemini-transcription-mcp
ports:
- "3000:3000"
environment:
- OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
# Create .env file with your API key
echo "OPENROUTER_API_KEY=your-key" > .env
# Start the service
docker compose up -d
Feature Availability by Deployment Type
| Feature | Local (stdio) | Remote (HTTP) |
|---|---|---|
| Base64 audio input | Yes | Yes |
| URL audio input | Yes | Yes |
| SSH file retrieval | Yes | No* |
| Transcript auto-save | Yes | Container volume |
| VAD preprocessing | Yes | Yes |
| Format conversion | Yes | Yes |
* SSH retrieval requires local access to SSH keys and network.
Requirements
- Node.js 18+
- ffmpeg (for format conversion and VAD preprocessing)
- OpenRouter API key
When using Docker, ffmpeg is included in the image.
Building from Source
git clone https://github.com/danielrosehill/Gemini-Transcription-MCP.git
cd Gemini-Transcription-MCP
npm install
npm run build
# Run locally
OPENROUTER_API_KEY=your-key npm start
# Run with HTTP transport
OPENROUTER_API_KEY=your-key MCP_TRANSPORT=http npm start
# Build Docker image
docker build -t gemini-transcription-mcp .
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.