whisper-transcribe-mcp

whisper-transcribe-mcp

MCP server for audio transcription using local faster-whisper or OpenAI Whisper API, enabling multilingual transcription with optional GPT post-processing.

Category
Visit Server

README

whisper-transcribe-mcp

PyPI version License: MIT Python 3.10+

MCP server for audio transcription using faster-whisper (local, free, offline) or OpenAI Whisper API (cloud, requires API key). Works with Claude Desktop and Claude Code on macOS, Windows, and Linux.


Prerequisites

macOS

Option A — uv (recommended):

brew install uv
# or
curl -LsSf https://astral.sh/uv/install.sh | sh

Option B — Python: Python 3.10+ is included in macOS 12.3+. You can also install it with brew install python.


Windows

Option A — uv (recommended):

winget install astral-sh.uv

Or download the installer from astral.sh/uv.

Option B — Python: Download Python 3.10+ from python.org. During installation, check "Add Python to PATH".

No need to install ffmpeg or any compiler — everything is bundled in the package.


Linux

Option A — uv (recommended):

curl -LsSf https://astral.sh/uv/install.sh | sh

Option B — Python:

# Debian/Ubuntu
sudo apt install python3.12 python3.12-venv

# Fedora
sudo dnf install python3.12

# Arch
sudo pacman -S python

No additional system dependencies required.


Installation

Option A — uvx (recommended, no permanent install)

uvx automatically downloads and installs the package in an isolated environment. Only requires uv to be installed.

# Local backend:
uvx "whisper-transcribe-mcp[local]"

# OpenAI backend:
uvx "whisper-transcribe-mcp[openai]"

# Both backends:
uvx "whisper-transcribe-mcp[all]"

Option B — pip

# Local backend:
pip install "whisper-transcribe-mcp[local]"

# OpenAI backend:
pip install "whisper-transcribe-mcp[openai]"

# Both backends:
pip install "whisper-transcribe-mcp[all]"

Use Cases

Case 1 — Local backend only (free, works offline)

Uses faster-whisper to transcribe locally. The model is downloaded from HuggingFace on first use (~74MB for base) and cached.

Install:

pip install "whisper-transcribe-mcp[local]"

Environment variables:

WHISPER_MODEL=base   # or tiny, small, medium, large-v3

Case 2 — OpenAI backend only (best accuracy, requires API key)

Uses OpenAI's whisper-1 model. Requires an API key and internet connection. No local model downloads.

Install:

pip install "whisper-transcribe-mcp[openai]"

Environment variables:

OPENAI_API_KEY=sk-...

Case 3 — Both backends (OpenAI if key present, local as fallback)

If OPENAI_API_KEY is set, OpenAI is used automatically. Otherwise falls back to local faster-whisper.

Install:

pip install "whisper-transcribe-mcp[all]"

Configuration

Claude Desktop

Config file location by operating system:

OS Path
macOS ~/Library/Application Support/Claude/claude_desktop_config.json
Windows %APPDATA%\Claude\claude_desktop_config.json
Linux ~/.config/Claude/claude_desktop_config.json

Add the entry inside "mcpServers":

Windows note: Claude Desktop runs in a restricted environment and may not have uvx in its PATH, and it may use a Python version (e.g. 3.14) for which ctranslate2 (a dependency of faster-whisper) does not yet have prebuilt wheels. Two fixes are required:

  1. Use the full path to uvx.exe instead of just uvx. Run where.exe uvx in PowerShell to find it (usually C:\Users\<YourUser>\.local\bin\uvx.exe).
  2. Force Python 3.12 via the --python 3.12 flag so that a compatible wheel is used.

Case 1 — Local:

macOS / Linux:

{
  "mcpServers": {
    "whisper-transcribe": {
      "command": "uvx",
      "args": ["whisper-transcribe-mcp[local]"],
      "env": {
        "WHISPER_MODEL": "base"
      }
    }
  }
}

Windows:

{
  "mcpServers": {
    "whisper-transcribe": {
      "command": "C:\\Users\\<YourUser>\\.local\\bin\\uvx.exe",
      "args": ["--python", "3.12", "whisper-transcribe-mcp[local]"],
      "env": {
        "WHISPER_MODEL": "base"
      }
    }
  }
}

Case 2 — OpenAI:

macOS / Linux:

{
  "mcpServers": {
    "whisper-transcribe": {
      "command": "uvx",
      "args": ["whisper-transcribe-mcp[openai]"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Windows:

{
  "mcpServers": {
    "whisper-transcribe": {
      "command": "C:\\Users\\<YourUser>\\.local\\bin\\uvx.exe",
      "args": ["--python", "3.12", "whisper-transcribe-mcp[openai]"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Case 3 — Both (OpenAI takes priority if key is set):

macOS / Linux:

{
  "mcpServers": {
    "whisper-transcribe": {
      "command": "uvx",
      "args": ["whisper-transcribe-mcp[all]"],
      "env": {
        "OPENAI_API_KEY": "sk-...",
        "WHISPER_MODEL": "base"
      }
    }
  }
}

Windows:

{
  "mcpServers": {
    "whisper-transcribe": {
      "command": "C:\\Users\\<YourUser>\\.local\\bin\\uvx.exe",
      "args": ["--python", "3.12", "whisper-transcribe-mcp[all]"],
      "env": {
        "OPENAI_API_KEY": "sk-...",
        "WHISPER_MODEL": "base"
      }
    }
  }
}

Restart Claude Desktop after editing the file.


Claude Code

Works the same on macOS, Windows, and Linux. Requires uv installed.

Claude Code config file location:

OS Global Per project
macOS / Linux ~/.claude.json .claude/settings.json (project root)
Windows C:\Users\<user>\.claude.json .claude\settings.json (project root)

The easiest way to add the server is via the Claude Code CLI, which updates the config file automatically:

# Case 1 — Local:
claude mcp add whisper-transcribe uvx -- "whisper-transcribe-mcp[local]"

# Case 2 — OpenAI:
claude mcp add whisper-transcribe uvx --env OPENAI_API_KEY=sk-... -- "whisper-transcribe-mcp[openai]"

# Case 3 — Both (OpenAI with local fallback):
claude mcp add whisper-transcribe uvx --env OPENAI_API_KEY=sk-... --env WHISPER_MODEL=base -- "whisper-transcribe-mcp[all]"

Windows + [all]: Add --python 3.12 before the package name to avoid ctranslate2 wheel issues. Edit ~/.claude.json directly and use "args": ["--python", "3.12", "whisper-transcribe-mcp[all]"].

To add it globally (available in all projects), use --scope user:

claude mcp add --scope user whisper-transcribe uvx -- "whisper-transcribe-mcp[local]"

Or edit ~/.claude.json directly and add inside "mcpServers":

{
  "mcpServers": {
    "whisper-transcribe": {
      "command": "uvx",
      "args": ["whisper-transcribe-mcp[local]"],
      "env": {
        "WHISPER_MODEL": "base"
      }
    }
  }
}

Environment Variables

Variable Default Description
WHISPER_MODEL base Local model size: tiny, base, small, medium, large-v3
OPENAI_API_KEY If set, activates the OpenAI backend instead of local

Backend selection and fallback ([all] only)

When installed with [all], the backend is chosen at startup:

  • OPENAI_API_KEY set → OpenAI is used. If the API call fails at runtime (network error, invalid key, quota exceeded), the server automatically falls back to local faster-whisper and includes a "fallback_reason" field in the response.
  • OPENAI_API_KEY not set → local faster-whisper is used directly, no fallback attempted.

Available Tools

transcribe_file

Transcribes an audio file by path (mp3, wav, m4a, ogg, flac, webm, etc.).

Parameters:

  • file_path (required): Absolute path to the audio file
  • language (optional): Language code (es, en, fr, etc.). Auto-detected if not provided.
  • model_size (optional): Local model size. Ignored with the OpenAI backend.
  • post_process (optional, default false): If true, passes the transcription through GPT-4.1 to fix spelling, grammar, and punctuation. Requires the openai package ([openai] or [all]).
  • post_process_prompt (optional): Custom system prompt for GPT post-processing. Use it to provide domain-specific context, proper nouns, or product names that Whisper may have misspelled. Falls back to a generic correction prompt if not provided.

Response (without post-processing):

{
  "text": "Full transcription...",
  "language": "en",
  "language_probability": 0.99,
  "segments": [
    { "start": 0.0, "end": 4.2, "text": "First segment..." }
  ],
  "backend": "local",
  "model": "base"
}

Response (with post_process: true):

{
  "text": "Corrected transcription...",
  "raw_text": "Original transcription from Whisper...",
  "post_process_model": "gpt-4.1",
  "language": "en",
  "language_probability": 0.99,
  "segments": [...],
  "backend": "local",
  "model": "base"
}

If post-processing fails, text retains the original transcription and a post_process_error field is added.


transcribe_base64

Transcribes audio provided as a base64-encoded string. Useful for programmatic integrations.

Parameters:

  • audio_base64 (required): Base64-encoded audio data
  • extension (optional, default mp3): File extension (mp3, wav, ogg, etc.)
  • language (optional): Language code
  • model_size (optional): Local model size
  • post_process (optional, default false): Same as in transcribe_file.
  • post_process_prompt (optional): Same as in transcribe_file.

list_models

Shows the active backend configuration, available local models, and the GPT model used for post-processing.


Local Model Sizes

Model Size Relative Speed Notes
tiny 39 MB ~32x Fastest, least accurate
base 74 MB ~16x Good balance (default)
small 244 MB ~6x Better accuracy
medium 769 MB ~2x High accuracy
large-v3 1.5 GB ~1x Best accuracy, slowest

Models are downloaded automatically from HuggingFace on first use and cached locally.


Troubleshooting

MCP not loading in Claude Desktop on Windows

Symptom: The server fails to start with a dependency resolution error like:

ctranslate2>=4.6.1 has no wheels with a matching platform tag (e.g., `win32`)
hint: You require CPython 3.14 (`cp314`), but we only found wheels for `ctranslate2` with: `cp39`, `cp310`, `cp311`, `cp312`, `cp313`

Cause: Two issues combined:

  1. Claude Desktop does not include the user's local bin in its PATH, so uvx must be referenced by full path.
  2. Claude Desktop's uvx may pick a Python version (e.g. 3.14) for which ctranslate2 — a native dependency of faster-whisper — does not yet have prebuilt wheels for Windows.

Fix: Use the full path to uvx.exe and force Python 3.12 explicitly:

"whisper-transcribe": {
  "command": "C:\\Users\\<YourUser>\\.local\\bin\\uvx.exe",
  "args": ["--python", "3.12", "whisper-transcribe-mcp[local]"],
  "env": { "WHISPER_MODEL": "base" }
}

To find your exact uvx.exe path, run in PowerShell:

where.exe uvx

Transcribing audio files in Claude Desktop

Symptom: Claude Desktop fails to transcribe an uploaded audio file. It may attempt to read the file as base64 and pass it to transcribe_base64, which then fails or hangs for files larger than ~50 KB.

Cause: Claude Desktop runs in a sandboxed Linux container. When you upload a file using the attachment button, it is stored at a path like /mnt/user-data/uploads/audio.mp3 — inside the container. The MCP server runs on your Windows machine and has no access to that container path. Claude's fallback of base64-encoding the file and passing it to transcribe_base64 fails in practice because even a small audio file produces hundreds of kilobytes of base64 text, which overflows the context window before the tool call can be made.

Fix: Do not use the attachment button to upload audio files. Instead, place the file anywhere on your Windows filesystem and reference its path directly in the message:

"Transcribe the file at C:\Users\YourUser\Downloads\audio.mp3"

The MCP server will read the file directly from Windows and send it to the transcription backend. This works for files of any size within the Whisper API limit (25 MB).


License

MIT — see LICENSE

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured