Gemini Transcription MCP

Gemini Transcription MCP

An MCP server for audio-to-text transcription using Google's Gemini API via OpenRouter, offering multiple tools for raw, cleaned, or formatted transcripts with support for local and remote deployment.

Category
Visit Server

README

Gemini Transcription MCP

An MCP server for audio-to-text transcription using Google's Gemini multimodal API.

npm version

Quick Start

Claude Code (Recommended)

claude mcp add gemini-transcription -s user \
  -e OPENROUTER_API_KEY=your-key \
  -- npx -y gemini-transcription-mcp

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "gemini-transcription": {
      "command": "npx",
      "args": ["-y", "gemini-transcription-mcp"],
      "env": {
        "OPENROUTER_API_KEY": "your-key"
      }
    }
  }
}

MetaMCP

Add via the MetaMCP UI or import JSON:

{
  "mcpServers": {
    "gemini-transcription": {
      "command": "npx",
      "args": ["-y", "gemini-transcription-mcp"],
      "env": {
        "OPENROUTER_API_KEY": "your-key"
      },
      "description": "Audio transcription using Gemini models via OpenRouter"
    }
  }
}

Or fill in the Add Server form manually:

Field Value
Command npx
Arguments -y gemini-transcription-mcp
Environment Variables OPENROUTER_API_KEY=your-key

Remote Deployment (HTTP Transport)

For deployments that require HTTP transport:

# Using Docker (recommended for remote)
docker run -d \
  -p 3000:3000 \
  -e OPENROUTER_API_KEY=your-key \
  ghcr.io/danielrosehill/gemini-transcription-mcp

# Or run directly with HTTP transport
OPENROUTER_API_KEY=your-key npx gemini-transcription-mcp --http 3000

The server exposes:

  • http://host:3000/mcp - MCP endpoint (streamable HTTP)
  • http://host:3000/health - Health check

Tools

Tool Description
transcribe_audio Lightly edited transcript (removes filler words, applies corrections)
transcribe_audio_raw Verbatim transcript with no cleanup
transcribe_audio_vad VAD preprocessing to strip silence before transcription
transcribe_audio_format Transcribe and format as a document type (email, to-do list, etc.)
transcribe_audio_large Compresses oversized files to Opus before transcribing
transcribe_audio_custom Full control with your own prompt
transcribe_audio_devspec Format as a development specification for AI coding agents

Input Methods

All tools accept audio via:

  • file_content: Base64-encoded audio
  • file_url: HTTP(S) URL to fetch
  • ssh_host + ssh_path: Pull via SCP (local deployment only)

Supported Formats

  • Native: MP3, WAV, OGG, FLAC, AAC, AIFF
  • Auto-converted: Opus, M4A, WebM, WMA, and others (converted to OGG/Opus)

Note: When manually converting audio, prefer MP3 over WAV. MP3 offers good compression with broad compatibility, while WAV files are unnecessarily large.

Configuration

Environment Variable Description
OPENROUTER_API_KEY Required. Your OpenRouter API key
OPENROUTER_MODEL Optional. Model to use (default: Gemini Flash Lite)
TRANSCRIPT_OUTPUT_DIR Optional. Auto-save location (default: ./transcripts). Set to empty string to disable.
MCP_TRANSPORT Optional. Set to http for HTTP transport mode
MCP_PORT Optional. Port for HTTP mode (default: 3000)

Deployment Options

Local (Claude Code, Claude Desktop)

Uses stdio transport. All features available including SSH file retrieval.

# Via npx (recommended)
npx gemini-transcription-mcp

# Or install globally
npm install -g gemini-transcription-mcp
gemini-transcription-mcp

Remote/Docker (MetaMCP, Aggregators)

Uses HTTP transport. Requires container or server with ffmpeg installed.

Docker Compose:

# docker-compose.yml
services:
  gemini-transcription:
    image: ghcr.io/danielrosehill/gemini-transcription-mcp
    ports:
      - "3000:3000"
    environment:
      - OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
# Create .env file with your API key
echo "OPENROUTER_API_KEY=your-key" > .env

# Start the service
docker compose up -d

Feature Availability by Deployment Type

Feature Local (stdio) Remote (HTTP)
Base64 audio input Yes Yes
URL audio input Yes Yes
SSH file retrieval Yes No*
Transcript auto-save Yes Container volume
VAD preprocessing Yes Yes
Format conversion Yes Yes

* SSH retrieval requires local access to SSH keys and network.

Requirements

When using Docker, ffmpeg is included in the image.

Building from Source

git clone https://github.com/danielrosehill/Gemini-Transcription-MCP.git
cd Gemini-Transcription-MCP
npm install
npm run build

# Run locally
OPENROUTER_API_KEY=your-key npm start

# Run with HTTP transport
OPENROUTER_API_KEY=your-key MCP_TRANSPORT=http npm start

# Build Docker image
docker build -t gemini-transcription-mcp .

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured