Voice Recorder MCP Server

Voice Recorder MCP Server

Enables recording audio from a microphone and transcribing it using OpenAI's Whisper model. Works as both a standalone MCP server and a Goose AI agent extension.

DefiBax

Speech Processing
Visit Server

README

Voice Recorder MCP Server

An MCP server for recording audio and transcribing it using OpenAI's Whisper model. Designed to work as a Goose custom extension or standalone MCP server.

Features

  • Record audio from the default microphone
  • Transcribe recordings using Whisper
  • Integrates with Goose AI agent as a custom extension
  • Includes prompts for common recording scenarios

Installation

# Install from source
git clone https://github.com/DefiBax/voice-recorder-mcp.git
cd voice-recorder-mcp
pip install -e .

Usage

As a Standalone MCP Server

# Run with default settings (base.en model)
voice-recorder-mcp

# Use a specific Whisper model
voice-recorder-mcp --model medium.en

# Adjust sample rate
voice-recorder-mcp --sample-rate 44100

Testing with MCP Inspector

The MCP Inspector provides an interactive interface to test your server:

# Install the MCP Inspector
npm install -g @modelcontextprotocol/inspector

# Run your server with the inspector
npx @modelcontextprotocol/inspector voice-recorder-mcp

With Goose AI Agent

  1. Open Goose and go to Settings > Extensions > Add > Command Line Extension

  2. Set the name to voice-recorder

  3. In the Command field, enter the full path to the voice-recorder-mcp executable:

    /full/path/to/voice-recorder-mcp
    

    Or for a specific model:

    /full/path/to/voice-recorder-mcp --model medium.en
    

    To find the path, run:

    which voice-recorder-mcp
    
  4. No environment variables are needed for basic functionality

  5. Start a conversation with Goose and introduce the recorder with: "I want you to take action from transcriptions returned by voice-recorder. For example, if I dictate a calculation like 1+1, please return the result."

Available Tools

  • start_recording: Start recording audio from the default microphone
  • stop_and_transcribe: Stop recording and transcribe the audio to text
  • record_and_transcribe: Record audio for a specified duration and transcribe it

Whisper Models

This extension supports various Whisper model sizes:

Model Speed Accuracy Memory Usage Use Case
tiny.en Fastest Lowest Minimal Testing, quick transcriptions
base.en Fast Good Low Everyday use (default)
small.en Medium Better Moderate Good balance
medium.en Slow High High Important recordings
large Slowest Highest Very High Critical transcriptions

The .en suffix indicates models specialized for English, which are faster and more accurate for English content.

Requirements

  • Python 3.12+
  • An audio input device (microphone)

Configuration

You can configure the server using environment variables:

# Set Whisper model
export WHISPER_MODEL=small.en

# Set audio sample rate
export SAMPLE_RATE=44100

# Set maximum recording duration (seconds)
export MAX_DURATION=120

# Then run the server
voice-recorder-mcp

Troubleshooting

Common Issues

  • No audio being recorded: Check your microphone permissions and settings
  • Model download errors: Ensure you have a stable internet connection for the initial model download
  • Integration with Goose: Make sure the command path is correct
  • Audio quality issues: Try adjusting the sample rate (default: 16000)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Recommended Servers

mcp-server-youtube-transcript

mcp-server-youtube-transcript

A Model Context Protocol server that enables retrieval of transcripts from YouTube videos. This server provides direct access to video captions and subtitles through a simple interface.

Featured
JavaScript
Zonos TTS MCP Server

Zonos TTS MCP Server

Facilitates direct speech generation using Claude for multiple languages and emotions, integrating with a Zonos TTS setup via the Model Context Protocol.

Local
TypeScript
MS-Lucidia-Voice-Gateway-MCP

MS-Lucidia-Voice-Gateway-MCP

A server providing text-to-speech and speech-to-text functionalities using Windows' native speech services without external dependencies.

Local
JavaScript
Say MCP Server

Say MCP Server

Enables text-to-speech functionality on macOS using the say command, offering extensive control over speech parameters like voice, rate, volume, and pitch for a customizable auditory experience.

Local
JavaScript
Ollama MCP Server

Ollama MCP Server

Enables seamless integration between Ollama's local LLM models and MCP-compatible applications, supporting model management and chat interactions.

Local
TypeScript
mcp-hfspace

mcp-hfspace

Use HuggingFace Spaces directly from Claude. Use Open Source Image Generation, Chat, Vision tasks and more. Supports Image, Audio and text uploads/downloads.

Local
TypeScript
Kokoro TTS MCP Server

Kokoro TTS MCP Server

Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.

Local
Python
Speech MCP

Speech MCP

A Goose MCP extension providing voice interaction with modern audio visualization, allowing users to communicate with Goose through speech rather than text.

Local
Python
Home Assistant MCP

Home Assistant MCP

Expose all Home Assistant voice intents through a Model Context Protocol Server allowing home control.

Local
Python
ElevenLabs Text-to-Speech MCP

ElevenLabs Text-to-Speech MCP

Contribute to georgi-io/jessica development by creating an account on GitHub.

Local
Python