Gemini Audio MCP

Gemini Audio MCP

Gemini Audio MCP is a high-performance Model Context Protocol (MCP) server that leverages the power of the Gemini 2.0 Multimodal Live API to generate high-fidelity, environmental soundscapes on-demand.

Category
Visit Server

README

🎵 Gemini Audio MCP

gemini-audio-mcp MCP server gemini-audio-mcp MCP server

Gemini Audio MCP is a high-performance Model Context Protocol (MCP) server that leverages the power of the Gemini 2.0 Multimodal Live API to generate high-fidelity, environmental soundscapes on-demand.


🚀 Mission Statement

Our mission is to provide an immersive, AI-powered audio generation layer for any MCP-compatible environment, enabling the creation of dynamic, seamless, and high-quality environmental audio through simple text prompts.


✨ Key Features

  • 🌊 Dynamic Soundscapes: Generate complex environmental audio using the latest Gemini 2.5 Native Audio models.
  • 🎵 Professional Music: High-fidelity music production via Google's Lyria 3 models:
    • Lyria 3 Pro: Full song generation with structural coherence ($0.08/req).
    • Lyria 3 Clip: Low-latency clips and rhythmic loops ($0.04/req).
  • 🔁 Infinite Looping: Seamless, click-free looping with 100ms micro-crossfades.
  • 🔀 Smooth Crossfades: Transition between two different soundscapes with customizable crossfade durations.
  • 📂 Universal Formats: Export audio to a variety of formats (WAV, MP3, OGG, FLAC) powered by FFmpeg.
  • ▶️ Auto-play Integration: Instantly play generated audio through your system's default player upon completion.
  • ⚙️ Persistent Configuration: Fine-tune default bitrates, sample rates, and durations once and reuse them across sessions.

🛠 Installation Guide

Prerequisites

  1. FFmpeg: Required for audio conversion and processing.
    • macOS: brew install ffmpeg
    • Ubuntu/Debian: sudo apt install ffmpeg
    • Windows: Download from ffmpeg.org.
  2. Rust Toolchain: Required for building the project (cargo).
  3. Gemini API Key: Obtain your key from the Google AI Studio.

1. NPM / NPX (Recommended for non-Rust users)

Add the server directly to your MCP client configuration using npx:

{
  "mcpServers": {
    "gemini-audio": {
      "command": "npx",
      "args": ["-y", "gemini-audio-mcp"],
      "env": {
        "GEMINI_API_KEY": "YOUR_API_KEY"
      }
    }
  }
}

2. Manual Installation (Rust)

  1. Clone the repository:
    git clone https://github.com/mcp-servers/gemini-audio-mcp.git
    cd gemini-audio-mcp
    
  2. Build the project:
    cargo build --release
    
  3. Configure your environment: Set the GEMINI_API_KEY environment variable in your MCP client or system.

3. Docker (Cloud / Self-hosted)

The server is available as a Docker image for easy deployment:

docker run -it \
  -e GEMINI_API_KEY="YOUR_API_KEY" \
  -v gemini-audio-data:/root/.local/share/gemini-audio-mcp \
  ghcr.io/jxoesneon/gemini-audio-mcp:latest

To use it in your MCP client configuration:

{
  "mcpServers": {
    "gemini-audio-docker": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "-e", "GEMINI_API_KEY=YOUR_API_KEY",
        "ghcr.io/jxoesneon/gemini-audio-mcp:latest"
      ]
    }
  }
}

🎮 Use Cases for Game Developers & Creators

Gemini Audio MCP is designed to integrate seamlessly into modern creative workflows, particularly for those using Unreal Engine 5, Godot, or Blender:

  • 🎲 Procedural Soundscapes: Generate unique, non-repeating environmental audio for open-world games or dynamic levels.
  • 🗣️ Dynamic Character Dialogue: Use generate_voice with expressive direction to prototype character lines or create infinite NPC dialogue for RPGs.
  • 🎥 Automated Sound Design: Perfect for Blender artists looking to generate high-quality foley and background textures for animations directly through an AI-assisted pipeline.
  • ⚡ Rapid Prototyping: Instantly generate rhythmic loops and musical stings for game jams or early-stage development.

🔧 Tool Usage Examples

Generate a Soundscape

Create an immersive 30-second loop of a cyberpunk rainy city.

{
  "name": "generate_soundscape",
  "arguments": {
    "prompt": "Heavy rain on neon-lit cyberpunk city streets, distant hover-car hums, muffled holographic advertisements.",
    "duration": 30,
    "format": "mp3",
    "auto_play": true
  }
}

Transition Between Environments

Seamlessly shift from a peaceful forest to a roaring thunderstorm.

{
  "name": "transition_soundscape",
  "arguments": {
    "from_prompt": "Quiet morning forest with chirping birds and rustling leaves.",
    "to_prompt": "Intense tropical thunderstorm with loud thunder claps and heavy downpour.",
    "transition_duration": 10,
    "auto_play": true
  }
}

Update Server Defaults

Set the default output format to FLAC for higher quality.

{
  "name": "configure",
  "arguments": {
    "default_format": "flac",
    "default_sample_rate": 48000
  }
}

🏛 Architecture Overview

The server is built with a modular Rust architecture designed for efficiency and reliability:

  • main.rs: The core MCP protocol engine handling tool registration and request dispatching.
  • gemini.rs: Manages low-level WebSocket communication with the Gemini 2.0 Multimodal Live API.
  • audio.rs: Handles PCM data manipulation, including seamless looping algorithms and FFmpeg integration for format transcoding.
  • mixer.rs: Implements audio processing logic for crossfading and blending multiple audio streams.
  • config.rs: Provides a persistent JSON-based configuration layer for user preferences.

📄 License

Distributed under the MIT License. See LICENSE for more information.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured