MCP Servers

Gemini Audio Upload

Enables audio file analysis using Google's Gemini multimodal models with support for additional context and system instructions to guide the model's behavior.

README

Gemini Multimodal Audio Upload

CodeRabbit Pull Request Reviews

This project provides a Model Context Protocol (MCP) server that enables audio analysis using Google's Gemini models. It allows you to upload audio files, provide optional context (JSON), and receive detailed analysis based on your prompts.

Features

Audio Analysis: Upload and analyze audio files (WAV, MP3, etc.) using Google Gemini.
Multimodal Context: Support for providing additional context via JSON files or strings.
System Instructions: Ability to provide system instructions (e.g., "Gem" definitions) to guide the model's behavior.
MCP Server: Exposes functionality as an MCP tool, making it compatible with MCP clients like Claude Desktop or VS Code extensions.

Prerequisites

Python 3.10 or higher
A Google Cloud Project with the Gemini API enabled.
An API key for the Gemini API.

Installation

Clone the repository:

git clone https://github.com/unscene/gemini-audio-upload.git
cd gemini-audio-upload

Install dependencies with uv:
```
uv sync
```

Configuration

Create a .env file in the root directory:

cp .env.example .env # If .env.example exists, otherwise create new

Add your Google API key to the .env file:
```
GOOGLE_API_KEY=your_api_key_here
```

Usage

Running the MCP Server

You can run the MCP server directly using uv:

uv run gemini_audio/mcp_server.py

However, it is typically run by an MCP client.

MCP Tool: `analyze_audio`

The server exposes a single tool: analyze_audio.

Arguments:

audio_path (string, required): The absolute path to the audio file you want to analyze.
prompt (string, optional): The prompt to guide the analysis. Default: "Describe this audio."
json_path (string, optional): Path to a JSON file containing context data.
json_context (string, optional): A JSON string containing context data (overrides json_path).
instruction_file (string, optional): Path to a text file containing system instructions.
model (string, optional): The Gemini model to use. Default: "gemini-1.5-pro".

Example Usage (Conceptual)

If you are using an MCP client, you might ask:

"Analyze the audio file at C:\path\to\recording.wav and tell me if the speaker sounds happy."

The client would call the analyze_audio tool with:

audio_path: C:\path\to\recording.wav
prompt: "Tell me if the speaker sounds happy."

Client Configuration

Claude Desktop App

To use this server with the Claude Desktop App, add the following configuration to your claude_desktop_config.json file.

Windows Location: %APPDATA%\Claude\claude_desktop_config.json macOS Location: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "gemini-audio": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/gemini-audio-upload",
        "run",
        "gemini_audio/mcp_server.py"
      ],
      "env": {
        "GOOGLE_API_KEY": "your_api_key_here"
      }
    }
  }
}

Note: Replace /absolute/path/to/gemini-audio-upload with the actual path to where you cloned this repository. You can also set the GOOGLE_API_KEY in the .env file in the project directory instead of the config JSON, provided uv picks it up correctly or you use the python executable directly.

VS Code (MCP Extension)

If you are using an MCP extension in VS Code (like the official "Model Context Protocol" extension), you can typically configure it in your VS Code settings.json:

"mcp.servers": {
    "gemini-audio": {
        "command": "uv",
        "args": [
            "--directory",
            "C:\\absolute\\path\\to\\gemini-audio-upload",
            "run",
            "gemini_audio/mcp_server.py"
        ],
        "env": {
            "GOOGLE_API_KEY": "your_api_key_here"
        }
    }
}

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured