mcp-hands-free
Universal hands-free voice input for MCP-compatible AI assistants using Whisper speech-to-text.
README
MCP Hands-Free - Voice Input MCP Server
"Why type when you can talk?" - Every developer eating pizza while coding
Voice input in action - speak to your AI assistant hands-free
Why This Exists
Ever tried to:
- š Ask your AI assistant a question while eating lunch? (greasy keyboards are NOT fun)
- š Debug code while on the treadmill? (typing + running = broken ankles)
- š§ Practice your "hands-free workday" because your wrists hurt? (RSI is real, folks)
- šļø Casually chat with your AI from across the room? (peak laziness achieved)
- š¶ Code with a baby in your arms? (multitasking level: parent - yes, this repo author does this, no shame)
This MCP server lets you talk to any MCP-compatible AI instead of typing. No more keyboard gymnastics. Just speak your mind, and your AI assistant listens.
Perfect for when your hands are busy, tired, dirty, or just... somewhere else.
Universal hands-free voice input for MCP-compatible AI assistants (Claude Code CLI, Gemini, Qwen, etc.) using Whisper speech-to-text.
Screenshots
Browser Interface
Clean, minimal browser interface for voice recording
Terminal Integration
Seamless integration with your MCP-compatible AI agent
Architecture
MCP Client (Claude, Gemini, Qwen, etc.)
ā calls get_voice_input() MCP tool
MCP Server (stdio)
ā HTTP POST /api/request-voice
FastAPI Server (coordination)
ā stores request_id
Browser Interface
ā polls /api/pending-requests
ā auto-starts recording
ā user speaks
ā POST audio to /api/submit-voice/{request_id}
FastAPI Server
ā transcribes with Whisper via Wyoming protocol
MCP Server
ā polls /api/result/{request_id}
ā returns transcript
MCP Client
āā receives transcript as user input
Visual representation of the data flow
Features
- Hands-Free Input - Speak your requests instead of typing
- Multi-Language Support - French, English, Spanish, German, Italian
- Browser-Based Recording - No client software installation needed
- Whisper STT - High-quality speech recognition via Wyoming protocol
- Universal MCP Integration - Works with any MCP-compatible AI client
- Auto-Recording - Browser automatically starts recording when your AI requests voice input
Prerequisites
- FastAPI Server - Coordination server with Whisper integration
- Whisper Service - Wyoming-compatible Whisper STT service (port 10300)
- Browser - Any modern browser with microphone access
- MCP Client - Any MCP-compatible AI agent CLI (Claude Code, Gemini, Qwen, etc.)
Installation
1. Install MCP Server
Add to your .mcp.json:
{
"mcpServers": {
"voice-input": {
"type": "stdio",
"command": "uvx",
"args": [
"--from",
"/path/to/mcp-hands-free/mcp-server",
"claude-voice-mcp"
],
"env": {
"VOICE_SERVER_URL": "https://your-server:8766"
}
}
}
}
2. Start FastAPI Server
cd /path/to/mcp-hands-free
# Install Python dependencies
pip3 install -r requirements.txt
# Start the server (with SSL for browser microphone access)
python3 server.py
Server runs on port 8766 (HTTPS).
3. Start Whisper Service
# Using Wyoming-compatible Whisper service
docker run -d \
-p 10300:10300 \
rhasspy/wyoming-faster-whisper \
--model base \
--language fr
4. Open Browser Interface
Navigate to:
https://your-server:8766/static/voice-input.html
Accept SSL certificate warning (self-signed) and grant microphone permissions.
Usage
Basic Voice Input
In your AI agent CLI:
You: "Get my next request via voice"
Your AI calls get_voice_input() tool, browser auto-starts recording, you speak, transcript is returned.
With Language Parameter
You: "Get my next request via voice in English"
Example Workflow
You: Get my next request via voice
[Browser automatically starts recording]
You: [speaking] "List my vault secrets"
AI: Voice input received: "List my vault secrets"
[AI then processes your request, using other MCP tools if needed]
MCP Tool API
get_voice_input
Request voice input from the user.
Parameters:
language(optional): Language code (fr, en, es, de, it) - default: "fr"timeout(optional): Maximum seconds to wait - default: 60
Returns:
- Success:
Voice input received: "transcript text" - Timeout:
Voice input timed out. User did not provide input within the timeout period. - Error:
Error getting voice input: error message
Example:
# French (default)
get_voice_input()
# English
get_voice_input(language="en")
# With custom timeout
get_voice_input(timeout=30)
API Endpoints
POST /api/request-voice
Create a new voice input request (called by MCP server).
Request:
{"language": "fr"}
Response:
{"request_id": "abc123", "status": "pending"}
GET /api/pending-requests
Get list of pending voice requests (polled by browser).
Response:
{
"requests": [
{"id": "abc123", "language": "fr"}
]
}
POST /api/claim-request/{request_id}
Claim a pending request to prevent duplicate processing.
Response:
{"status": "recording"}
POST /api/submit-voice/{request_id}
Submit recorded audio for transcription.
Request:
- Multipart form with audio file (WAV format, 16kHz, mono)
Response:
{
"transcript": "user's spoken text",
"status": "completed"
}
GET /api/result/{request_id}
Get transcription result (polled by MCP server).
Response:
{
"status": "completed",
"transcript": "user's spoken text",
"error": null
}
Configuration
Server Configuration
Edit server.py:
WHISPER_HOST = "localhost"
WHISPER_PORT = 10300
PORT = 8766
Whisper Model
Change Whisper model for speed/accuracy tradeoff:
# Faster, less accurate
--model tiny
# Balanced (default)
--model base
# Slower, more accurate
--model medium
SSL Certificates
Generate self-signed certificates:
openssl req -x509 -newkey rsa:4096 \
-keyout key.pem -out cert.pem \
-days 365 -nodes \
-subj "/CN=localhost"
Troubleshooting
Browser Can't Access Microphone
Check HTTPS: Browsers require HTTPS for microphone access
# Verify server is running with SSL
curl -k https://localhost:8766/health
Check Permissions: Grant microphone access in browser settings
MCP Server Not Loaded
Restart your AI agent CLI:
# Exit and restart your AI command
Check .mcp.json path:
# Verify path to mcp-server directory is correct
ls /path/to/mcp-hands-free/mcp-server/pyproject.toml
Whisper Service Not Responding
Check Whisper is running:
curl http://localhost:10300/
Check Wyoming protocol:
# Should show Wyoming service info
curl http://localhost:10300/v1/services
Voice Input Times Out
Check browser is open: Ensure voice-input.html is loaded
Check polling: Open browser console, verify no errors
Check network: Ensure browser can reach FastAPI server
Files
mcp-hands-free/
āāā mcp-server/ # MCP server package
ā āāā pyproject.toml
ā āāā src/claude_voice_mcp/
ā āāā __init__.py # Entry point
ā āāā server.py # Tool definition
ā āāā client.py # HTTP client
āāā server.py # FastAPI coordination server
āāā static/
ā āāā voice-input.html # Browser recording interface
āāā requirements.txt # Python dependencies
āāā .gitignore
āāā README.md
Security Notes
- Self-Signed Certificates - Browsers will warn, click "Accept"
- No Authentication - Use firewall or VPN to restrict access
- In-Memory Storage - Voice requests stored temporarily in memory
- Auto-Cleanup - Audio files deleted after transcription
Advanced Usage
Multiple Language Support
Switch languages dynamically:
# Ask for French input
get_voice_input(language="fr")
# Ask for English input
get_voice_input(language="en")
Custom Timeout
Adjust timeout for longer voice inputs:
# Wait up to 2 minutes
get_voice_input(timeout=120)
Integration with Other MCP Tools
Combine voice input with other MCP servers:
You: Get my next request via voice
[speaks] "What's in my vault?"
AI: [uses voice input tool, then vault MCP tool to query]
Resources
- Model Context Protocol: https://github.com/anthropics/mcp
- Whisper: https://github.com/openai/whisper
- Wyoming Protocol: https://github.com/rhasspy/wyoming
- Claude Code CLI: https://claude.com/claude-code (tested MCP client)
Compatibility
This MCP server follows the standard Model Context Protocol specification and should work with any MCP-compatible client:
- ā Claude Code CLI - Fully tested and working
- š Other MCP Clients - Should work out of the box (Gemini, Qwen, custom implementations)
If you test this with other MCP clients, please open an issue to share your experience!
License
MIT License - Free for personal and commercial use.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.