MCP Servers

vibevoice-asr

Local speech-to-text transcription using Microsoft's VibeVoice-ASR model with speaker diarization, enabling audio transcription directly in AI tools like Claude Code, Cursor, and OpenCode.

README

VibeVoice-ASR Server

Local speech-to-text using Microsoft's VibeVoice-ASR model. Run it as an OpenAI-compatible API server or as an MCP server that plugs directly into Claude Code, OpenCode, Cursor, and other AI tools.

Automatic speaker diarization
Timestamps on every segment
Output as plain text, JSON, SRT, or VTT
Runs on CUDA, Apple Silicon (MPS), or CPU
Model downloads automatically on first run

Requirements

Python 3.10+
FFmpeg (used by the model's audio processor)

Install FFmpeg:

# macOS
brew install ffmpeg

# Ubuntu / Debian
sudo apt-get install ffmpeg

# Windows (with Chocolatey)
choco install ffmpeg

Quick Start

# Clone the repo
git clone https://github.com/tjameswilliams/vibevoice-server.git
cd vibevoice-server

# Create a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install
pip install -e .

# For NVIDIA GPU acceleration (optional)
pip install -e ".[cuda]"

The first time you run either the API server or MCP server, the model (~3 GB) will be downloaded from HuggingFace and cached locally.

Option 1: OpenAI-Compatible API Server

Start the server:

vibevoice-server

The server starts on http://localhost:8000 by default. It exposes the same endpoint shape as the OpenAI Audio API, so any client library or tool that speaks that protocol works out of the box.

CLI Options

vibevoice-server [OPTIONS]

  --host        Bind address (default: 0.0.0.0)
  --port        Bind port (default: 8000)
  --device      Device: auto, cuda, mps, cpu (default: auto)
  --dtype       Data type: auto, bfloat16, float32 (default: auto)
  --log-level   Log level: debug, info, warning, error (default: info)

Transcribe Audio

curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@meeting.wav \
  -F response_format=verbose_json

Parameters:

Parameter	Type	Default	Description
`file`	file	required	Audio file (wav, mp3, flac, m4a, ogg, etc.)
`model`	string	`vibevoice-asr`	Model identifier (accepted but ignored)
`response_format`	string	`json`	`text`, `json`, `verbose_json`, `srt`, `vtt`
`prompt`	string		Optional context to guide transcription
`language`	string		Language code (used in verbose_json output)

Response Formats

json (default):

{"text": "Hello, welcome to the meeting."}

verbose_json — includes timestamps, speaker IDs, and segments:

{
  "task": "transcribe",
  "language": "en",
  "duration": 12.5,
  "text": "Hello, welcome to the meeting.",
  "segments": [
    {"id": 0, "start": 0.0, "end": 3.2, "text": "Hello, welcome to the meeting.", "speaker": 0}
  ]
}

srt and vtt — subtitle formats with speaker labels, ready to use with video players.

text — plain transcript string, no JSON wrapper.

Other Endpoints

# List models
curl http://localhost:8000/v1/models

# Health check
curl http://localhost:8000/health

Using with OpenAI Client Libraries

Point any OpenAI SDK at your local server:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

with open("recording.wav", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="vibevoice-asr",
        file=f,
        response_format="verbose_json",
    )

print(transcript.text)

Docker

# Build
docker build -t vibevoice-server .

# Run (CPU)
docker run -p 8000:8000 -v vibevoice-cache:/models vibevoice-server

# Run (NVIDIA GPU)
docker run --gpus all -p 8000:8000 -v vibevoice-cache:/models vibevoice-server

Option 2: MCP Server

The MCP (Model Context Protocol) server lets AI tools call transcription directly — no HTTP server needed. The model runs in the same process as the MCP server.

MCP Tools

Tool	Description
`transcribe_audio`	Transcribe an audio file. Pass an absolute file path and get back the transcript.
`load_vibevoice_model`	Pre-load the model into memory (~60-90s). Optional — the model loads automatically on first transcription.
`get_vibevoice_status`	Check whether the model is loaded, and which device/dtype it's using.

transcribe_audio parameters:

Parameter	Type	Default	Description
`file_path`	string	required	Absolute path to the audio file
`response_format`	string	`text`	`text`, `json`, `verbose_json`, `srt`, `vtt`
`prompt`	string		Optional context to guide transcription
`language`	string		Language code (for verbose_json output)

Claude Code

Add to your project's .mcp.json (or ~/.claude/mcp.json for global access):

{
  "mcpServers": {
    "vibevoice-asr": {
      "command": "vibevoice-mcp",
      "args": []
    }
  }
}

With device override:

{
  "mcpServers": {
    "vibevoice-asr": {
      "command": "vibevoice-mcp",
      "args": ["--device", "mps"]
    }
  }
}

Restart Claude Code after adding the config. The three tools (transcribe_audio, load_vibevoice_model, get_vibevoice_status) will appear automatically.

Cursor

Add to .cursor/mcp.json in your project root:

{
  "mcpServers": {
    "vibevoice-asr": {
      "command": "vibevoice-mcp",
      "args": []
    }
  }
}

OpenCode

Add to your OpenCode MCP configuration (opencode.json or via settings):

{
  "mcpServers": {
    "vibevoice-asr": {
      "command": "vibevoice-mcp",
      "args": []
    }
  }
}

Any MCP-Compatible Tool

The server uses stdio transport — the standard for local MCP servers. Any tool that supports MCP can run it with:

Command: vibevoice-mcp
Args: [] (optional: ["--device", "mps"] or ["--device", "cuda"])
Transport: stdio

The MCP server reads JSON-RPC from stdin and writes responses to stdout. All logs go to stderr.

MCP CLI Options

vibevoice-mcp [OPTIONS]

  --device      Device: auto, cuda, mps, cpu (default: auto)
  --dtype       Data type: auto, bfloat16, float32 (default: auto)
  --log-level   Log level (default: warning)

Configuration

All settings can be controlled via environment variables (prefixed with VIBEVOICE_), CLI flags, or a .env file. See .env.example for the full list.

Variable	Default	Description
`VIBEVOICE_DEVICE`	`auto`	`auto`, `cuda`, `mps`, `cpu`
`VIBEVOICE_DTYPE`	`auto`	`auto`, `bfloat16`, `float32`
`VIBEVOICE_CACHE_DIR`	(HuggingFace default)	Where to store downloaded model weights
`VIBEVOICE_MODEL_ID`	`microsoft/VibeVoice-ASR-HF`	HuggingFace model ID
`VIBEVOICE_HOST`	`0.0.0.0`	API server bind address
`VIBEVOICE_PORT`	`8000`	API server bind port
`VIBEVOICE_LOG_LEVEL`	`info`	Logging level

Device auto-detection picks the best available: CUDA > MPS > CPU.

Hardware Notes

Platform	Device	Dtype	Notes
NVIDIA GPU	`cuda`	`bfloat16`	Fastest. Flash Attention 2 enabled automatically. Install with `.[cuda]`.
Apple Silicon	`mps`	`float32`	Works well on M1/M2/M3/M4.
CPU	`cpu`	`float32`	Slower but works everywhere.

The model is ~3 GB. First load takes 60-90 seconds (downloading + loading weights). Subsequent starts are faster when cached.

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured