Narrator MCP
Captures screenshots, analyzes changes with Gemini, generates humorous narrations with sound effects, and plays them as speech via ElevenLabs, optionally integrating with a Minecraft mod for in-game events.
README
Narrator MCP
A fun MCP server that takes screenshots, describes what changed, generates hilarious narration, and plays it back as audio!
Datathon2025
We made this product in Texas A&M's DATATHON 2025 - A hackathon with the focus of AI, ML, Computer Science, Data Science, and Statistics! <br></br>
Results:
Agent: 2nd Place š
MCP(this repo): Best use of ElevenLabs š
<br></br>
Contributors:
Jonathan Kalsky '29 (CS):
www.linkedin.com/in/jonathan-kalsky
Aaron Yang '29 (CS):
https://www.linkedin.com/in/nianjin-yang/
Ethan Hince '29 (CS):
https://www.linkedin.com/in/ethan-hince-a831a5381/
<br></br>
Brief Demo:
https://youtu.be/umaCNd4jPfY?si=QRRDb36p3LI-7tyr <br></br>
License
We do not allow the reproducing, forking, or stealing of our idea, code, or intellectual property. For information email jonathan.kalsky@gmail.com
<br></br> <br></br>
Features
Connected to our custom made and simultaneously programmed Minecraft Mod that updates a the HTTP port with data
View: https://github.com/nianjindev/MinecraftMCPSender
There is a UI to view the data on the port. To host the UI, run:
python3 minecraft_reciever.py
python minecraft_reciever.py (windows)
Otherwise
- Get Screenshot: Retrieves the last two screenshots from a directory
- Describe: Uses Gemini to analyze screenshots and describe changes
- Narrate: Generates funny, sarcastic narration about what you're doing
- Sound Effects: Automatically adds comedic sound effects from MyInstants API
- TTS: Converts narration to speech with ElevenLabs TTS
- Auto-cleanup: Only keeps the last 5 screenshots <br></br>
Setup
- Install dependencies:
pip install -r requirements.txt
- Create a
.envfile with your API keys:
cp .env.example .env
# Edit .env and add your API keys
-
Get API keys:
- Gemini API key: https://aistudio.google.com/app/apikey (FREE!)
- ElevenLabs API key: https://elevenlabs.io/app/settings/api-keys (10k chars/month free)
-
In config/mcp/servers.json:
- You may need to switch "Python3" to "Python" in the command field
Usage
Run the Client
Minecraft Mod Integration (Preferred)
To include Minecraft gameplay events in the narration:
- Install the Minecraft Fabric mod (see
MinecraftMCP.java) - Run the screenshot client (it automatically starts the receiver):
python mincraft_client_only.py
- Launch Minecraft - the mod will send events automatically
The narrator will describe both what's happening on screen AND in-game events like:
- Blocks placed/broken
- Damage taken
- Biome changes
- Day/night cycle
Alternate Screenshot Context-based Client
The client takes screenshots every 5 seconds and generates narrated audio:
python screenshot_client.py
This will:
- Take a screenshot every 5 seconds
- After 2 screenshots, compare them
- Generate a funny narration about what changed
- Convert to speech and play it automatically
Note: The Minecraft receiver runs automatically in the background. No need to start it separately!
āāāāāāāāāāāāāāāāāāā<br></br> ā Minecraft Mod ā (Java)<br></br> ā (in game) ā<br></br> āāāāāāāāāā¬āāāāāāāāā<br></br> ā HTTP POST<br></br> ā¼<br></br> āāāāāāāāāāāāāāāāāāāāāāā<br></br> ā minecraft_receiver ā (Flask HTTP server)<br></br> ā Port 8080 ā Saves to minecraft_data.json<br></br> āāāāāāāāāā¬āāāāāāāāāāāāā<br></br> ā File write<br></br> ā¼<br></br> āāāāāāāāāāāāāāāāāāāāāāā<br></br> ā minecraft_data.json ā (Shared file)<br></br> āāāāāāāāāā¬āāāāāāāāāāāāā<br></br> ā File read<br></br> ā¼<br></br> āāāāāāāāāāāāāāāāāāāāāāā<br></br> ā screenshot_client ā Reads file, calls MCP tool<br></br> āāāāāāāāāā¬āāāāāāāāāāāāā<br></br> ā MCP call<br></br> ā¼<br></br> āāāāāāāāāāāāāāāāāāāāāāā<br></br> ā mcp_server.py ā Processes data via get_minecraft_input tool<br></br> āāāāāāāāāāāāāāāāāāāāāāā<br></br>
Use as MCP Server
You can also use this as an MCP server in Kiro or other MCP clients:
{
"mcpServers": {
"screenshot-narrator": {
"command": "python",
"args": ["mcp_server.py"],
"env": {
"GEMINI_API_KEY": "your_key",
"ELEVENLABS_API_KEY": "your_key",
"SCREENSHOT_DIR": "./screenshots"
}
}
}
}
Available MCP Tools
- get_screenshot: Get the last N screenshots
- get_minecraft_input: Receive Minecraft gameplay events
- describe: Analyze screenshots and/or Minecraft data
- narrate: Generate funny narration from a description
- describe_for_narration: Combined tool (faster) - analyze and narrate in one step
- summarize_narrations: Combine multiple narrations into one sentence
- get_sfx: Search for sound effects from MyInstants API
- tts: Convert text to speech with ElevenLabs
How It Works
- Screenshots: Uses macOS
screencaptureto grab screenshots - Analysis: Gemini 2.5 Flash analyzes images and describes changes (fast & free!)
- Narration + SFX Keyword: Gemini generates sarcastic commentary AND extracts the best sound effect keyword in ONE API call (super fast!)
- Sound Effect Search: Searches MyInstants API with the AI-selected keyword and randomly picks from top results
- Speech: ElevenLabs TTS converts text to audio with high-quality voice
- Playback: Plays sound effect and narration audio in parallel for perfect timing
Example Output
šø Screenshot saved: screenshots/screenshot_20251108_143022.png
šø Screenshot saved: screenshots/screenshot_20251108_143027.png
==================================================
š¬ Processing screenshots...
==================================================
š Getting screenshots...
š Describing changes...
Description: The user has switched from their code editor to a web browser,
apparently giving up on debugging to search Stack Overflow instead.
š Generating funny narration...
Narration: And here we observe the developer in their natural habitat,
abandoning all hope of solving the problem themselves and turning to the
ancient wisdom of strangers on the internet. Truly magnificent.
š¤ Converting to speech...
š Playing audio: screenshots/narration_20251108_143030.mp3
Documentation
- QUICKSTART_SFX.md: Quick start guide for sound effects (3 minutes!)
- SFX_INTEGRATION.md: Complete guide to the sound effects system
- IMPLEMENTATION_SUMMARY.md: Technical implementation details
- myinstants-api/README.md: MyInstants API documentation
Notes
- Cross-platform: Works on macOS, Windows, and Linux
- Costs: Gemini is FREE + ElevenLabs has 10k chars/month free tier
- Gemini 2.5 Flash is super fast for vision tasks
- ElevenLabs voices are incredibly realistic and expressive
- MyInstants API provides free sound effects
- Press Ctrl+C to stop the client
Platform-specific details:
- macOS: Uses native
screencaptureandafplaycommands - Windows: Uses PIL for screenshots and pygame for audio
- Linux: Uses
scrot/gnome-screenshotfor screenshots, various audio players
License & Attribution
This project uses sound effects from:
- MyInstants API by abdiputranar: https://github.com/abdipr/myinstants-api
- MyInstants.com: https://www.myinstants.com
Sound effects are obtained via web scraping from MyInstants.com. This project:
- Provides proper attribution to the MyInstants API and MyInstants.com
- Is used for non-commercial, educational, and entertainment purposes only
- Complies with the MyInstants API usage requirements
- Does not abuse the API for personal commercial benefits
If you use this project, please maintain this attribution and follow the same guidelines.
Customization
- Change
INTERVALinscreenshot_client.pyto adjust screenshot frequency - Modify the narration prompt in
mcp_server.pyfor different comedy styles - Change TTS voice in
mcp_server.py(ElevenLabs voices: Adam, Antoni, Arnold, Bella, Domi, Elli, Josh, Rachel, Sam, and more) - Customize SFX selection logic in the
get_sfx_for_narration()function to match different keywords
Sound Effects
The system uses AI to automatically select the perfect sound effect for each narration:
- AI Keyword Extraction: Gemini analyzes the narration and extracts the single best keyword for a sound effect (e.g., "crash", "laugh", "explosion", "oof", "bruh", "scream", etc.)
- MyInstants Search: Searches the MyInstants API with that keyword
- Random Selection: Picks a random sound from the top 10 results for variety
- Parallel Playback: Plays sound effect and narration simultaneously for perfect comedic timing
This gives unlimited variety - every narration gets a unique, contextually appropriate sound effect!
Example keywords extracted by AI: crash, laugh, explosion, scream, bell, drum, bruh, oof, yikes, gasp, applause, horn, punch, falling, and many more!
Credits
Sound effects are provided by:
- MyInstants API: https://github.com/abdipr/myinstants-api (by abdiputranar)
- MyInstants.com: https://www.myinstants.com (original sound library)
Sounds are obtained via web scraping from MyInstants.com. This project complies with the API's usage requirements by providing proper attribution and is used for non-commercial, educational purposes only.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.