phonecall-mcp
Enables Claude to make and manage real phone calls through Twilio, allowing it to think, search, and respond in real-time conversations.
README
phonecall-mcp
An MCP server that enables Claude to make and manage real phone calls through Twilio. Claude becomes the brain behind the conversation — thinking, searching, and responding — while the phone is just an I/O channel.
What makes this different? Most phone AI solutions put a pre-prompted, standalone agent on the line. Here, your full Claude instance — with all its tools (web search, calendar, Drive, email, etc.) — handles the call. It's not a script-following bot, but a reflective assistant that adapts to the conversation in real time.
How It Works
Claude Desktop (MCP client)
↕ MCP protocol (stdio)
phonecall-mcp server (Python, local)
↕ WebSocket (Twilio Media Streams)
Twilio (PSTN telephone network)
↕
Callee's phone
Parallel services:
phonecall-mcp server
→ ElevenLabs Scribe v2 Realtime (STT: speech → text)
→ ElevenLabs TTS (text → speech)
The entire audio pipeline runs natively in μ-law 8kHz — no format conversion needed.
Features
- Outbound calls — Claude initiates calls on your behalf, with natural TTS voice
- Real-time transcription — ElevenLabs Scribe v2 transcribes the callee's speech live
- GDPR consent — Built-in consent mechanism: no transcription occurs until the callee explicitly agrees (DTMF "5"). Enforced at the server level.
- DTMF turn-taking — Callee presses "1" when done speaking, press "1" to interrupt (barge-in)
- Tool use during calls — Claude can search the web, check calendars, look up documents — all while the callee hears a hold tone
- Inbound voicemail — Incoming calls are handled as a voicemail recorder
- Multi-language — Works in any language supported by ElevenLabs (tested: Hungarian, English, German)
- Full transcript — Every call produces a timestamped transcript with speaker labels
Prerequisites
- Python 3.12+
- Twilio account with a phone number (console.twilio.com)
- ElevenLabs account with API access (elevenlabs.io)
- ngrok account with a static domain (ngrok.com) — free tier works
- Claude Desktop app
Setup
1. Clone the repository
git clone https://github.com/leszini/phonecall-mcp.git
cd phonecall-mcp
2. Install dependencies
pip install -r requirements.txt
On Windows, the tzdata package (included in requirements.txt) is required for timezone support.
3. Download ngrok
Download the ngrok binary for your platform from ngrok.com/download and place ngrok.exe (or ngrok) in the project directory. Authenticate it with your ngrok auth token:
ngrok config add-authtoken YOUR_AUTH_TOKEN
If you have a static ngrok domain (recommended), note it for the next step.
4. Configure environment variables
Copy the example files:
cp .env.example .env
cp config.example.json config.json
Edit .env with your credentials:
TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_PHONE_NUMBER=+1234567890
ELEVENLABS_API_KEY=your_elevenlabs_api_key
NGROK_URL=https://your-subdomain.ngrok-free.app
5. Configure config.json
Edit config.json to set your preferences:
{
"timezone": "UTC",
"tts": {
"voice_id": "YOUR_VOICE_ID_HERE",
"model_id": "eleven_v3",
"language_code": "en"
},
"stt": {
"model_id": "scribe_v2_realtime",
"language_code": "en",
"vad_silence_threshold": 1.5
},
"server": {
"host": "0.0.0.0",
"port": 8765
},
"call": {
"max_duration_seconds": 900
},
"voicemail": {
"default_language": "en",
"local_prefixes": [],
"greeting": {
"en": "Hello! You've reached the voicemail. Please leave your message after the tone. When you're done, press 1 on your phone. Thank you!"
},
"thanks": {
"en": "Thank you for your message. Goodbye!"
}
}
}
Key settings to customize:
| Setting | Description |
|---|---|
timezone |
Your local timezone (e.g. "America/New_York", "Europe/London") |
tts.voice_id |
Your ElevenLabs voice ID (find it in your ElevenLabs dashboard) |
tts.language_code |
Default TTS language code |
stt.language_code |
Default STT language code |
voicemail.default_language |
Language for voicemail greetings |
voicemail.local_prefixes |
Phone prefixes for your country (e.g. ["+1"] for US, ["+44"] for UK). Calls from these prefixes use default_language; others get English. |
voicemail.greeting |
Add greetings in any language you need |
voicemail.thanks |
Add thank-you messages in matching languages |
6. Configure Claude Desktop
Add the MCP server to your Claude Desktop configuration. Open Claude Desktop's settings and add a new MCP server, or edit the config file directly:
Config file location:
- Windows:
%APPDATA%\Claude\claude_desktop_config.json - macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
Add the phonecall-mcp entry under mcpServers:
{
"mcpServers": {
"phonecall-mcp": {
"command": "python",
"args": [
"/absolute/path/to/phonecall-mcp/server.py"
]
}
}
}
Replace /absolute/path/to/phonecall-mcp/server.py with the actual path on your system.
7. Start ngrok
Before making calls, start the ngrok tunnel. You can use the included launcher (Windows):
Double-click ngrok_launcher.pyw
Or start it manually:
ngrok http 8765 --domain=your-subdomain.ngrok-free.app
8. Restart Claude Desktop
Restart Claude Desktop so it picks up the new MCP server configuration. You should see phonecall-mcp in the available tools.
Usage
Making a call
Simply ask Claude to call someone:
"Call +1234567890 and ask about their business hours."
Claude will:
- Tell you who it's calling and why, and wait for your approval
- Start the call with a GDPR-compliant greeting
- Wait for the callee's consent (DTMF "5")
- Conduct the conversation
- Provide you with a transcript and summary after hanging up
The call flow
1. Claude plays GDPR notice + "press 5 to consent"
2. Callee presses 5 → transcription starts, beeping confirms
3. Claude sends confirmation + explains "press 1 when done"
4. Callee speaks → presses 1 → Claude receives transcript
5. Claude thinks (callee hears beeping) → responds via TTS
6. Repeat 4-5 until conversation ends
7. Claude hangs up → provides transcript + summary
GDPR consent mechanism
Every outbound call includes a mandatory consent step:
- The callee hears who is calling, why, and that the conversation will be transcribed
- The callee must press 5 on their phone to consent
- No transcription occurs before consent — this is enforced at the server level (STT engine does not start until DTMF "5" is received)
- If the callee doesn't press 5 within 30 seconds, Claude politely ends the call
- If the callee hangs up, nothing was recorded
Inbound calls (voicemail)
When someone calls your Twilio number, the server automatically handles it as a voicemail:
- Plays a greeting in the configured language
- Records the caller's message
- Logs the transcript
You can ask Claude: "Did I get any calls?" and it will check the call log.
Using the skill file
The SKILL_EN.md file teaches Claude how to use phonecall-mcp effectively — including GDPR notice templates, conversation flow, and best practices. To use it:
- Copy
SKILL_EN.mdto your Claude Desktop skills directory - Customize the templates with your name and preferences
- Claude will automatically follow the skill guidelines when making calls
MCP Tools
| Tool | Description |
|---|---|
phone_call_start |
Initiate an outbound call |
phone_call_listen |
Wait for callee to speak + press DTMF "1" |
phone_call_respond |
Send a spoken response via TTS |
phone_call_control |
Check call status |
phone_call_end |
Hang up and get transcript |
Architecture
State machine (AudioBridge)
IDLE → CONSENT_PENDING → LISTENING ⇄ PROCESSING → SPEAKING → LISTENING
↑ |
└────────────────────────────────────┘
- IDLE — Call not yet connected
- CONSENT_PENDING — Waiting for GDPR consent (DTMF "5"), no STT active
- LISTENING — Routing callee audio to STT, waiting for DTMF "1"
- PROCESSING — DTMF received, Claude is thinking (hold tone plays)
- SPEAKING — TTS audio being sent to Twilio
Key design decisions
- DTMF-based turn-taking instead of silence detection — more reliable on phone lines with background noise
- Server-enforced GDPR consent — STT physically cannot start before consent, regardless of what Claude sends
- Pre-rendered audio — First message TTS is synthesized while the phone rings, eliminating the initial silence gap
- Scribe audio throttling — During PROCESSING state, only keepalive packets are sent to STT to prevent
resource_exhaustederrors - Barge-in support — Callee can interrupt Claude by pressing "1" during TTS playback
Security Considerations
Caller data extraction risk
⚠️ Important: During a phone call, Claude has access to all its connected tools — web search, Google Drive, email, calendar, and any other MCP connectors you have enabled. This means the callee could potentially extract sensitive information from Claude through social engineering (e.g., "Can you check if there's a document about..." or "What's on the calendar for...").
Recommended mitigations:
- Disable sensitive connectors during calls — temporarily turn off MCP connectors that access private data (Drive, email, calendar) if the call doesn't require them
- Define boundaries in your skill file — explicitly instruct Claude what it should NOT do during calls (e.g., "Never read emails or share calendar details during a call", "Only use web search, no internal tools")
- Review transcripts — check call transcripts for any unexpected information disclosure
The tool use capability during calls is powerful — Claude can look up information in real time to help the conversation. But that same power means you should carefully consider which tools Claude should have access to for each call.
API keys and credentials
- All secrets are stored in
.env(excluded from git via.gitignore) - Never commit
.envorconfig.jsonto version control - The Twilio auth token is used for request validation — keep it secure
GDPR consent
The built-in consent mechanism ensures no transcription occurs without the callee's explicit agreement. However, this is a technical safeguard — you are responsible for ensuring your use of this tool complies with applicable privacy laws in your jurisdiction.
Troubleshooting
Call fails immediately
- Check that ngrok is running and the URL in
.envmatches - Verify your Twilio credentials and phone number
- Check
logs/phonecall-mcp.logfor error details
No audio / silence after connecting
- Verify your ElevenLabs API key and voice ID
- Check that the voice ID exists in your ElevenLabs account
- Look for TTS errors in the log file
Callee can't hear the greeting
- Ensure ngrok tunnel is active (the TwiML webhook URL must be reachable)
- Check Twilio console for call logs and error codes
Empty transcripts
- Check for
resource_exhaustederrors in the log — this means Scribe is being overloaded - Verify
stt.language_codematches the language being spoken
Timezone shows UTC
- Install
tzdata:pip install tzdata - Set
timezoneinconfig.jsonto your timezone (e.g."America/New_York")
File Structure
phonecall-mcp/
├── server.py # MCP server entry point
├── audio_bridge.py # Bidirectional audio management + state machine
├── call_manager.py # Call lifecycle (start, listen, respond, end)
├── twilio_handler.py # Twilio webhooks + WebSocket handler
├── stt.py # ElevenLabs Scribe v2 Realtime STT client
├── tts.py # ElevenLabs TTS synthesis
├── config.py # Configuration loader
├── log_setup.py # Centralized logging
├── models.py # Data models (CallState, TranscriptEntry)
├── ngrok_launcher.pyw # Windows ngrok launcher with notifications
├── ngrok_stop.pyw # Windows ngrok stop utility
├── config.example.json # Configuration template
├── .env.example # Environment variables template
├── .gitignore
├── requirements.txt
└── SKILL_EN.md # Claude skill file for phonecall-mcp
Requirements
- Python 3.12+
- Twilio account (trial accounts work but play a short message before connecting)
- ElevenLabs account with API access
- ngrok (free tier with static domain)
- Claude Desktop
License
MIT License — see LICENSE for details.
Acknowledgments
Built with Twilio Media Streams, ElevenLabs (TTS + Scribe STT), and the Model Context Protocol.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.