audio-transcription-mcp
Captures and transcribes system audio in real-time using OpenAI Whisper, enabling meeting transcription, content creation, and accessibility through natural language.
README
Audio Transcription MCP Server
Real-time audio transcription using OpenAI Whisper. Capture and transcribe system audio (meetings, videos, music) automatically with AI assistance through Cursor or Claude Desktop.
β¨ Features
- π€ Real-time transcription - Captures and transcribes audio as it plays
- π Zero installation - Use with
npx, no global install needed - π€ AI-powered - Uses OpenAI's Whisper API for accurate transcription
- π Timestamped transcripts - Every entry is timestamped in markdown format
- π Session isolation - Each session gets its own unique transcript file
- β‘ Smart silence detection - Automatically pauses when no audio detected
- π― Automated setup - One command sets up audio routing
- π§ͺ Built-in testing - Verify your setup before starting
π Quick Start (5 Minutes)
Step 1: Run Automated Setup
The setup script installs everything you need and guides you through configuration:
npx audio-transcription-mcp setup
What this does:
- β Installs Homebrew (if needed)
- β Installs ffmpeg for audio processing
- β Installs BlackHole virtual audio driver
- β Guides you through creating a Multi-Output Device (or does it automatically!)
- β Takes 5 minutes, mostly automated
First time? The script will walk you through everything with clear instructions. Don't worry if it asks for your Mac password - that's normal for installing software!
Step 2: Test Your Setup
Verify everything works before using it:
npx audio-transcription-mcp test
This captures 5 seconds of audio and shows you if it's working correctly.
Step 3: Configure Your AI Assistant
Add to your Cursor or Claude Desktop config:
<details> <summary><b>Cursor Configuration</b> (click to expand)</summary>
Edit ~/.cursor/config.json:
{
"mcpServers": {
"audio-transcription": {
"command": "npx",
"args": ["-y", "audio-transcription-mcp"],
"env": {
"OPENAI_API_KEY": "sk-your-key-here",
"INPUT_DEVICE_NAME": "BlackHole"
}
}
}
}
Then restart Cursor and ask:
"Start transcribing audio"
</details>
<details> <summary><b>Claude Desktop Configuration</b> (click to expand)</summary>
Edit ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"audio-transcription": {
"command": "npx",
"args": ["-y", "audio-transcription-mcp"],
"env": {
"OPENAI_API_KEY": "sk-your-key-here",
"INPUT_DEVICE_NAME": "BlackHole",
"OUTFILE_DIR": "/Users/yourname/Documents/Transcripts"
},
"allowedDirectories": [
"/Users/yourname/Documents/Transcripts"
]
}
}
}
Important:
- Create the directory:
mkdir -p ~/Documents/Transcripts - Replace
yournamewith your actual username - Restart Claude Desktop
Then ask:
"Start transcribing audio"
</details>
Step 4: Set System Output
Go to System Settings > Sound > Output and select "Multi-Output Device"
This routes audio to both your speakers (so you can hear) and BlackHole (for transcription).
Step 5: Start Transcribing!
In Cursor or Claude Desktop, just ask:
"Start transcribing audio"
Your AI assistant will start capturing and transcribing audio in real-time!
π What You Need
- macOS 10.15+ (Catalina or later)
- OpenAI API key - Get one here (pay-as-you-go, ~$0.36/hour - see detailed costs)
- 5 minutes for setup
π― Use Cases
- Meeting transcription - Zoom, Google Meet, Teams calls
- Content creation - Transcribe videos, podcasts, or music
- Accessibility - Real-time captions for any audio
- Note-taking - Automatic transcripts of lectures or presentations
- Research - Transcribe interviews or focus groups
π§ Troubleshooting
Audio Not Being Captured
Problem: Test shows silent or very low audio levels
Solution:
- Check System Settings > Sound > Output is set to "Multi-Output Device"
- Open Audio MIDI Setup and verify both outputs are checked:
- β Built-in Output
- β BlackHole 2ch
- Play some audio and run
npx audio-transcription-mcp testagain
BlackHole Not Showing Up
Problem: BlackHole doesn't appear in device list
Solution: Restart your Mac. Audio drivers require a restart to be recognized by the system.
Setup Script Fails
Problem: Automated setup doesn't work
Solution: The script will fall back to manual mode with clear instructions. This is normal on first run if accessibility permissions aren't granted. Just follow the 4-step guide shown.
Want to Start Over?
If you need to remove everything and start fresh:
# Uninstall BlackHole and ffmpeg
brew uninstall blackhole-2ch ffmpeg
# Delete Multi-Output Device
# 1. Open Audio MIDI Setup
# 2. Select "Multi-Output Device" in left sidebar
# 3. Press Delete key
# Then run setup again
npx audio-transcription-mcp setup
Need More Help?
- π Detailed Setup Guide
- π Report an Issue
- π¬ Discussions
π Additional Documentation
π οΈ Advanced Usage
Standalone CLI Mode
You can use this as a standalone CLI without MCP:
# Start transcription (saves to meeting_transcript.md)
npx audio-transcription-mcp start
# Press Ctrl+C to stop
Configure via .env file:
OPENAI_API_KEY=sk-your-key-here
INPUT_DEVICE_NAME=BlackHole
CHUNK_SECONDS=8
OUTFILE=meeting_transcript.md
MCP Server Tools
When used with Cursor or Claude Desktop, these tools are available:
start_transcription- Start capturing and transcribing audiopause_transcription- Pause transcription temporarilyresume_transcription- Resume after pausestop_transcription- Stop and get session statsget_status- Check if transcription is runningget_transcript- Retrieve current transcript contentclear_transcript- Clear and start freshcleanup_transcript- Delete transcript file
Configuration Options
Environment variables you can customize:
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
(required) | Your OpenAI API key |
INPUT_DEVICE_NAME |
BlackHole |
Audio input device name |
CHUNK_SECONDS |
8 |
Seconds of audio per chunk |
MODEL |
whisper-1 |
OpenAI Whisper model |
OUTFILE_DIR |
process.cwd() |
Output directory for transcripts |
SAMPLE_RATE |
16000 |
Audio sample rate (Hz) |
CHANNELS |
1 |
Number of audio channels |
ποΈ How It Works
- Audio Routing: Multi-Output Device sends system audio to both your speakers and BlackHole
- Capture: ffmpeg captures audio from BlackHole in 8-second chunks
- Processing: Audio is converted to WAV format suitable for Whisper API
- Transcription: Each chunk is sent to OpenAI Whisper for transcription
- Output: Timestamped text is appended to a markdown file in real-time
- Silence Detection: Automatically pauses after 32 seconds of silence to save API costs
π° Costs & Performance
What You're Paying For
You ONLY pay for OpenAI Whisper API calls - everything else runs locally for free!
β FREE (runs locally on your machine):
- Audio capture with ffmpeg
- Audio processing and buffer management
- Silence detection and level analysis
- File operations (writing/reading transcripts)
- All MCP server operations
π° PAID (OpenAI API):
- Only the transcription API calls to OpenAI Whisper
- $0.006 per minute of audio transcribed
- Silent chunks are automatically skipped to save money
Actual Costs
With default 8-second chunks:
| Duration | API Calls | Approximate Cost |
|---|---|---|
| 1 minute | ~7.5 chunks | $0.006 |
| 1 hour | ~450 chunks | $0.36 |
| 8-hour workday | ~3,600 chunks | $2.88 |
Cost per chunk: ~$0.0008 (less than a tenth of a cent!)
Built-in Cost Savings
The tool includes smart silence detection that saves you money:
- π Silent audio chunks are NEVER sent to OpenAI
- π° Automatically tracks cost savings in the debug log
- βΈοΈ Auto-pauses after 32 seconds of silence
- π View statistics with
get_statusto see chunks skipped
Example: In a 1-hour meeting with 15 minutes of silence, you save ~$0.09 automatically!
Performance
- Memory usage: 50-100 MB per session
- CPU usage: Minimal (ffmpeg handles audio processing)
- API latency: 1-3 seconds per chunk
- Accuracy: 90-95% for clear speech
- Network: Only during transcription API calls
Cost Optimization Tips
- Increase chunk size - Fewer API calls (set
CHUNK_SECONDS=15) - Use silence detection - Enabled by default, saves money automatically
- Pause when not needed - Use
pause_transcriptionduring breaks - Monitor usage - Check OpenAI dashboard for actual costs
Bottom line: Transcription is cheap (~36Β’/hour), runs mostly locally, and automatically saves money by skipping silence. You're only charged when actual speech is being transcribed.
π§ͺ Development & Testing
For contributors and developers:
π See MCP_SETUP.md for complete setup instructions
Just add to your config and restart - that's it!
See the npx configuration at the top of this README for Cursor and Claude Desktop.
For Standalone CLI (Local Development)
π See GETTING_STARTED.md for complete setup instructions
# Install dependencies
npm install
npm run build
# Configure environment
cp env.example .env # Then add your OpenAI API key
# Run standalone CLI
npm start
π License & Contributing
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
Development Resources
- π Getting Started Guide
- π§ͺ Testing Documentation
- π MCP Setup Guide
- π§ Installation Guide
Made with β€οΈ for transcribing meetings, content, and conversations.
Star β this repo if you find it useful!
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.