avatar-mcp
A desktop avatar companion for Claude Code that provides a visible on-screen presence with text-to-speech, speech-to-text, and an interactive avatar overlay.
README
avatar-mcp
Desktop avatar companion for Claude Code. Gives Claude a visible on-screen presence with text-to-speech, speech-to-text, and an interactive avatar overlay.
Runs as an MCP server — Claude Code connects to it automatically and gets access to voice/avatar tools.
What it does
- Avatar overlay — A draggable, always-on-top transparent window that displays poses and emotions (PyQt6)
- Text-to-speech — Claude can speak aloud with emotional prosody. Three engines:
- Kokoro (default) — Free, local, runs via ONNX runtime. Auto-downloads ~350MB model on first use
- Edge TTS — Free, uses Microsoft's neural voices, requires internet
- ElevenLabs — Premium quality, requires API key
- Speech-to-text — Voice input with two engines:
- RealtimeSTT (recommended) — Local Whisper model via faster-whisper, GPU-accelerated, real-time streaming, no API keys
- Google Speech API — Cloud-based fallback, no GPU required, higher latency
- Emotion system — 7 emotions (neutral, happy, sad, excited, angry, shy, smug) that affect avatar pose and voice prosody
- Automatic pose changes — Avatar reacts to what Claude is doing (coding, thinking, planning, listening) via Claude Code hooks. No manual tool calls needed
Why?
Claude Code runs in a terminal — if you alt-tab away, switch monitors, or just glance at another screen, you lose all visual feedback. The avatar sits on top of everything so you always know what Claude is doing: coding, thinking, speaking, or waiting for input. Useful for multi-monitor setups, long-running tasks, and voice-driven workflows where the terminal isn't in focus.
Setup
Requirements
- Python 3.11+
- A microphone (for STT)
- Speakers/headphones (for TTS)
Install
git clone <this-repo>
cd avatar-mcp
pip install -e .
For RealtimeSTT (local Whisper, recommended if you have a GPU):
pip install -e ".[realtime-stt]"
# For CUDA acceleration (replace cu128 with your CUDA version):
pip install --force-reinstall torch torchaudio --index-url https://download.pytorch.org/whl/cu128
For ElevenLabs support:
pip install -e ".[premium]"
For development:
pip install -e ".[dev]"
Configure Claude Code
Add to your project's .mcp.json (or global MCP config):
{
"mcpServers": {
"avatar-mcp": {
"command": "python",
"args": ["-m", "avatar_mcp.server"],
"cwd": "/path/to/avatar-mcp"
}
}
}
Restart Claude Code. The avatar window should appear and tools will be available.
Automatic Poses via Hooks (recommended)
The avatar can change poses automatically based on what Claude is doing — no manual set_pose() calls needed. Add hooks to your Claude Code settings (~/.claude/settings.json):
{
"hooks": {
"UserPromptSubmit": [
{
"hooks": [{ "type": "command", "command": "echo thinking > \"$HOME/.claude/avatar-pose\"" }]
}
],
"PreToolUse": [
{
"matcher": "Edit|Write|NotebookEdit|Bash",
"hooks": [{ "type": "command", "command": "echo coding > \"$HOME/.claude/avatar-pose\"" }]
},
{
"matcher": "Read|Grep|Glob",
"hooks": [{ "type": "command", "command": "echo thinking > \"$HOME/.claude/avatar-pose\"" }]
},
{
"matcher": "Task|EnterPlanMode",
"hooks": [{ "type": "command", "command": "echo planning > \"$HOME/.claude/avatar-pose\"" }]
}
],
"PermissionRequest": [
{
"hooks": [{ "type": "command", "command": "echo listening > \"$HOME/.claude/avatar-pose\"" }]
}
],
"Stop": [
{
"hooks": [{ "type": "command", "command": "echo listening > \"$HOME/.claude/avatar-pose\"" }]
}
]
}
}
This maps avatar poses to Claude's activity:
- User sends message → thinking pose (UserPromptSubmit)
- Edit/Write/Bash → coding pose (PreToolUse)
- Read/Grep/Glob → thinking pose (PreToolUse)
- Task/Plan mode → planning pose (PreToolUse)
- Waiting for approval → listening pose (PermissionRequest)
- Turn complete → listening pose (Stop)
- speak(text, emotion) → emotion-matched pose while speaking (built-in, no hook needed)
The MCP server watches the ~/.claude/avatar-pose file for changes and updates the avatar automatically.
Auto-allow tools (optional)
To skip approval prompts, add to .claude/settings.local.json:
{
"permissions": {
"allow": [
"mcp__avatar-mcp__speak",
"mcp__avatar-mcp__show_avatar",
"mcp__avatar-mcp__hide_avatar"
]
}
}
Configuration
Edit config.toml in the project root:
[avatar]
start_visible = true
start_x = 100 # initial window position
start_y = 100
sprite_scale = 1.0
sprite_directory = "" # empty = use built-in placeholders; set a path to use custom PNGs
poll_interval_ms = 50
[tts]
engine = "kokoro" # "edge", "kokoro", or "elevenlabs"
voice = "jf_alpha" # voice ID (run list_voices tool to see options)
kokoro_lang = "en-us" # language override for Kokoro (auto-detected from voice prefix if empty)
elevenlabs_api_key = ""
elevenlabs_voice_id = ""
elevenlabs_model = "eleven_flash_v2_5"
[stt]
enabled = true
engine = "realtime" # "google" or "realtime" (local whisper)
language = "en-US"
cooldown_seconds = 3.0
pause_threshold = 1.2
wake_words = ["claude", "hey claude"]
# realtime engine (faster-whisper via RealtimeSTT)
realtime_model = "base" # tiny / base / small / medium / large-v3
realtime_device = "cuda" # "cuda" or "cpu"
realtime_silero_sensitivity = 0.4
# google engine (fallback)
energy_threshold = 150
phrase_threshold = 0.1
non_speaking_duration = 0.5
[behavior]
auto_speak = true
MCP Tools
Once connected, Claude Code has access to these tools:
| Tool | Description |
|---|---|
speak(text, emotion) |
Speak text aloud with emotional prosody. Shows emotion-matched pose while speaking |
show_avatar() |
Show the avatar window |
hide_avatar() |
Hide the avatar window |
start_listening() |
Start speech recognition |
stop_listening() |
Stop speech recognition |
set_voice(voice_id, engine) |
Change TTS voice or engine |
list_voices(engine) |
List available voices |
Emotions
neutral, happy, sad, excited, angry, shy, smug
Poses
idle, thinking, coding, angry, smug, shy, planning, speaking, listening, drag
Custom Sprites
To use your own avatar sprites, create a directory with PNG files named after poses:
my-sprites/
idle.png
thinking.png
coding.png
angry.png
smug.png
shy.png
planning.png
speaking.png
listening.png
drag.png
Then set sprite_directory = "path/to/my-sprites" in config.toml.
Voice Input
Speech-to-text uses wake word activation by default. Say "Claude" or "Hey Claude" followed by your message. Text is injected into Claude Code's input as [VOICE] messages.
STT Engines
RealtimeSTT (recommended) — Uses faster-whisper for local, GPU-accelerated transcription. Streams results in real-time with built-in Silero VAD. No network calls, no API keys, no truncation. Requires pip install -e ".[realtime-stt]" and CUDA-enabled PyTorch.
Google Speech API (fallback) — Cloud-based, works without a GPU but has higher latency and may drop long utterances. Set engine = "google" in config to use.
Configure wake words in config.toml under [stt]. Set wake_words = [] to disable filtering (all speech passes through).
Running Tests
pip install -e ".[dev]"
pytest tests/ -v
Process Cleanup
The MCP server spawns several child processes (avatar display, multiprocessing Manager, STT workers). Multiple mechanisms ensure these are cleaned up when Claude Code exits:
- Parent watchdog — A daemon thread polls every 2s to check if the parent process (Claude Code) is alive. If the parent dies, all children are force-killed immediately. This is the most reliable mechanism since it doesn't depend on signals or atexit.
- Job Objects (Windows) — All child processes are assigned to a Win32 Job Object with
JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE, so the OS kills them when the MCP server exits. - Display self-monitoring — The avatar window checks its parent PID every ~2s and self-terminates if the parent is gone.
- atexit + signal handlers — Standard cleanup on normal interpreter shutdown and SIGINT/SIGTERM.
- PID file —
~/.claude/avatar-mcp-children.pidtracks child PIDs for stale process cleanup on next startup.
Architecture
src/avatar_mcp/
server.py # MCP server, pose file watcher, parent watchdog
lifecycle.py # Process lifecycle, TTS/STT init, hook pose logic
config.py # TOML config parsing
state.py # Shared state (multiprocessing.Manager)
avatar/
display.py # PyQt6 overlay window (child process)
sprites.py # Sprite loading and placeholder generation
voice/
tts_base.py # Abstract TTS engine interface
tts_edge.py # Edge TTS (free, cloud)
tts_kokoro.py # Kokoro TTS (free, local ONNX)
tts_eleven.py # ElevenLabs TTS (premium)
audio.py # Playback queue
emotions.py # Emotion → prosody mapping
stt_base.py # Abstract STT engine interface
stt_google.py # Google Speech API (cloud fallback)
stt_realtime.py # RealtimeSTT / faster-whisper (local, GPU)
input/
sender.py # Injects voice text into Claude Code via clipboard
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.