mcp-hands-free

mcp-hands-free

Universal hands-free voice input for MCP-compatible AI assistants using Whisper speech-to-text.

Category
Visit Server

README

MCP Hands-Free - Voice Input MCP Server

CI PyPI version Python 3.11+ License: MIT Code style: black

"Why type when you can talk?" - Every developer eating pizza while coding

Demo Voice input in action - speak to your AI assistant hands-free

Why This Exists

Ever tried to:

  • šŸ• Ask your AI assistant a question while eating lunch? (greasy keyboards are NOT fun)
  • šŸƒ Debug code while on the treadmill? (typing + running = broken ankles)
  • 🧘 Practice your "hands-free workday" because your wrists hurt? (RSI is real, folks)
  • šŸ›‹ļø Casually chat with your AI from across the room? (peak laziness achieved)
  • šŸ‘¶ Code with a baby in your arms? (multitasking level: parent - yes, this repo author does this, no shame)

This MCP server lets you talk to any MCP-compatible AI instead of typing. No more keyboard gymnastics. Just speak your mind, and your AI assistant listens.

Perfect for when your hands are busy, tired, dirty, or just... somewhere else.


Universal hands-free voice input for MCP-compatible AI assistants (Claude Code CLI, Gemini, Qwen, etc.) using Whisper speech-to-text.

Screenshots

Browser Interface

Browser Voice Input Interface Clean, minimal browser interface for voice recording

Terminal Integration

CLI Integration Seamless integration with your MCP-compatible AI agent

Architecture

MCP Client (Claude, Gemini, Qwen, etc.)
  ↓ calls get_voice_input() MCP tool
MCP Server (stdio)
  ↓ HTTP POST /api/request-voice
FastAPI Server (coordination)
  ↓ stores request_id
Browser Interface
  ↓ polls /api/pending-requests
  ↓ auto-starts recording
  ↓ user speaks
  ↓ POST audio to /api/submit-voice/{request_id}
FastAPI Server
  ↓ transcribes with Whisper via Wyoming protocol
MCP Server
  ↓ polls /api/result/{request_id}
  ↓ returns transcript
MCP Client
  └─ receives transcript as user input

Architecture Diagram Visual representation of the data flow

Features

  • Hands-Free Input - Speak your requests instead of typing
  • Multi-Language Support - French, English, Spanish, German, Italian
  • Browser-Based Recording - No client software installation needed
  • Whisper STT - High-quality speech recognition via Wyoming protocol
  • Universal MCP Integration - Works with any MCP-compatible AI client
  • Auto-Recording - Browser automatically starts recording when your AI requests voice input

Prerequisites

  • FastAPI Server - Coordination server with Whisper integration
  • Whisper Service - Wyoming-compatible Whisper STT service (port 10300)
  • Browser - Any modern browser with microphone access
  • MCP Client - Any MCP-compatible AI agent CLI (Claude Code, Gemini, Qwen, etc.)

Installation

1. Install MCP Server

Add to your .mcp.json:

{
  "mcpServers": {
    "voice-input": {
      "type": "stdio",
      "command": "uvx",
      "args": [
        "--from",
        "/path/to/mcp-hands-free/mcp-server",
        "claude-voice-mcp"
      ],
      "env": {
        "VOICE_SERVER_URL": "https://your-server:8766"
      }
    }
  }
}

2. Start FastAPI Server

cd /path/to/mcp-hands-free

# Install Python dependencies
pip3 install -r requirements.txt

# Start the server (with SSL for browser microphone access)
python3 server.py

Server runs on port 8766 (HTTPS).

3. Start Whisper Service

# Using Wyoming-compatible Whisper service
docker run -d \
  -p 10300:10300 \
  rhasspy/wyoming-faster-whisper \
  --model base \
  --language fr

4. Open Browser Interface

Navigate to:

https://your-server:8766/static/voice-input.html

Accept SSL certificate warning (self-signed) and grant microphone permissions.

Usage

Basic Voice Input

In your AI agent CLI:

You: "Get my next request via voice"

Your AI calls get_voice_input() tool, browser auto-starts recording, you speak, transcript is returned.

With Language Parameter

You: "Get my next request via voice in English"

Example Workflow

You: Get my next request via voice
[Browser automatically starts recording]
You: [speaking] "List my vault secrets"
AI: Voice input received: "List my vault secrets"
[AI then processes your request, using other MCP tools if needed]

MCP Tool API

get_voice_input

Request voice input from the user.

Parameters:

  • language (optional): Language code (fr, en, es, de, it) - default: "fr"
  • timeout (optional): Maximum seconds to wait - default: 60

Returns:

  • Success: Voice input received: "transcript text"
  • Timeout: Voice input timed out. User did not provide input within the timeout period.
  • Error: Error getting voice input: error message

Example:

# French (default)
get_voice_input()

# English
get_voice_input(language="en")

# With custom timeout
get_voice_input(timeout=30)

API Endpoints

POST /api/request-voice

Create a new voice input request (called by MCP server).

Request:

{"language": "fr"}

Response:

{"request_id": "abc123", "status": "pending"}

GET /api/pending-requests

Get list of pending voice requests (polled by browser).

Response:

{
  "requests": [
    {"id": "abc123", "language": "fr"}
  ]
}

POST /api/claim-request/{request_id}

Claim a pending request to prevent duplicate processing.

Response:

{"status": "recording"}

POST /api/submit-voice/{request_id}

Submit recorded audio for transcription.

Request:

  • Multipart form with audio file (WAV format, 16kHz, mono)

Response:

{
  "transcript": "user's spoken text",
  "status": "completed"
}

GET /api/result/{request_id}

Get transcription result (polled by MCP server).

Response:

{
  "status": "completed",
  "transcript": "user's spoken text",
  "error": null
}

Configuration

Server Configuration

Edit server.py:

WHISPER_HOST = "localhost"
WHISPER_PORT = 10300
PORT = 8766

Whisper Model

Change Whisper model for speed/accuracy tradeoff:

# Faster, less accurate
--model tiny

# Balanced (default)
--model base

# Slower, more accurate
--model medium

SSL Certificates

Generate self-signed certificates:

openssl req -x509 -newkey rsa:4096 \
  -keyout key.pem -out cert.pem \
  -days 365 -nodes \
  -subj "/CN=localhost"

Troubleshooting

Browser Can't Access Microphone

Check HTTPS: Browsers require HTTPS for microphone access

# Verify server is running with SSL
curl -k https://localhost:8766/health

Check Permissions: Grant microphone access in browser settings

MCP Server Not Loaded

Restart your AI agent CLI:

# Exit and restart your AI command

Check .mcp.json path:

# Verify path to mcp-server directory is correct
ls /path/to/mcp-hands-free/mcp-server/pyproject.toml

Whisper Service Not Responding

Check Whisper is running:

curl http://localhost:10300/

Check Wyoming protocol:

# Should show Wyoming service info
curl http://localhost:10300/v1/services

Voice Input Times Out

Check browser is open: Ensure voice-input.html is loaded

Check polling: Open browser console, verify no errors

Check network: Ensure browser can reach FastAPI server

Files

mcp-hands-free/
ā”œā”€ā”€ mcp-server/                    # MCP server package
│   ā”œā”€ā”€ pyproject.toml
│   └── src/claude_voice_mcp/
│       ā”œā”€ā”€ __init__.py            # Entry point
│       ā”œā”€ā”€ server.py              # Tool definition
│       └── client.py              # HTTP client
ā”œā”€ā”€ server.py                      # FastAPI coordination server
ā”œā”€ā”€ static/
│   └── voice-input.html           # Browser recording interface
ā”œā”€ā”€ requirements.txt               # Python dependencies
ā”œā”€ā”€ .gitignore
└── README.md

Security Notes

  • Self-Signed Certificates - Browsers will warn, click "Accept"
  • No Authentication - Use firewall or VPN to restrict access
  • In-Memory Storage - Voice requests stored temporarily in memory
  • Auto-Cleanup - Audio files deleted after transcription

Advanced Usage

Multiple Language Support

Switch languages dynamically:

# Ask for French input
get_voice_input(language="fr")

# Ask for English input
get_voice_input(language="en")

Custom Timeout

Adjust timeout for longer voice inputs:

# Wait up to 2 minutes
get_voice_input(timeout=120)

Integration with Other MCP Tools

Combine voice input with other MCP servers:

You: Get my next request via voice
[speaks] "What's in my vault?"
AI: [uses voice input tool, then vault MCP tool to query]

Resources

  • Model Context Protocol: https://github.com/anthropics/mcp
  • Whisper: https://github.com/openai/whisper
  • Wyoming Protocol: https://github.com/rhasspy/wyoming
  • Claude Code CLI: https://claude.com/claude-code (tested MCP client)

Compatibility

This MCP server follows the standard Model Context Protocol specification and should work with any MCP-compatible client:

  • āœ… Claude Code CLI - Fully tested and working
  • šŸ”œ Other MCP Clients - Should work out of the box (Gemini, Qwen, custom implementations)

If you test this with other MCP clients, please open an issue to share your experience!

License

MIT License - Free for personal and commercial use.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured