MCP Audio RAG Server

MCP Audio RAG Server

Enables transcription of audio files using Google Gemini AI and creates a searchable knowledge base, allowing users to ask natural language questions about content in meetings, podcasts, lectures, and other audio recordings.

Category
Visit Server

README

MCP Audio RAG Server

Transform your audio files into a searchable knowledge base using AI. Ask Claude questions about your meetings, podcasts, lectures, or any audio content.

<p align="center"> <a href="https://www.buymeacoffee.com/matheusslg" target="_blank"> <img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" height="50"> </a> </p>

What is this?

This is an MCP (Model Context Protocol) server that lets you:

  1. Transcribe any audio file using Google's Gemini AI
  2. Store the transcriptions in a searchable database
  3. Search through all your audio content using natural language

Once set up, you can simply ask Claude things like:

  • "What did they discuss about the budget in my meeting recording?"
  • "Find mentions of machine learning in my podcast collection"
  • "What were the key points from yesterday's lecture?"

How It Works

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ Audio File  │ ──▶ │   Gemini    │ ──▶ │  Chunking   │ ──▶ │  Supabase   │
│ (.mp3, etc) │     │ Transcribe  │     │ + Embedding │     │  (pgvector) │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
                                                                   │
┌─────────────┐     ┌─────────────┐     ┌─────────────┐            │
│   Claude    │ ◀── │   Results   │ ◀── │   Search    │ ◀──────────┘
│  Response   │     │ + Snippets  │     │   Query     │
└─────────────┘     └─────────────┘     └─────────────┘

Quick Start

Prerequisites

Step 1: Clone & Install

git clone https://github.com/matheusslg/mcp-audio-rag.git
cd mcp-audio-rag
npm install

Step 2: Set Up Supabase Database

  1. Create a new project at supabase.com
  2. Go to SQL Editor in your dashboard
  3. Paste and run the contents of supabase/schema.sql

Step 3: Get Your API Keys

Supabase (Settings → API):

  • Copy Project URLSUPABASE_URL
  • Copy service_role keySUPABASE_SERVICE_KEY

Google AI Studio:

Step 4: Configure

cp .env.example .env

Edit .env:

GEMINI_API_KEY=your-key-here
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_KEY=your-service-role-key

Step 5: Add to Claude

For Claude Code CLI (~/.claude.json):

{
  "mcpServers": {
    "audio-rag": {
      "command": "npx",
      "args": ["tsx", "/full/path/to/mcp-audio-rag/src/server.ts"],
      "env": {
        "GEMINI_API_KEY": "your-key",
        "SUPABASE_URL": "https://your-project.supabase.co",
        "SUPABASE_SERVICE_KEY": "your-service-role-key"
      }
    }
  }
}

For Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on Mac):

Same config as above.

Usage

Transcribe Audio

Just tell Claude to transcribe a file:

Transcribe /path/to/meeting.mp3

Want to use a specific model? Just ask:

Transcribe /path/to/lecture.m4a using gemini-2.5-pro

Search Your Audio

Ask natural questions:

What did they say about the project timeline?
Search for mentions of "budget" in my recordings
Find discussions about AI in my podcasts

Manage Your Library

List all my transcribed audio files
Delete the recording from last week
Get the full transcript of meeting.mp3
Summarize the podcast episode

Available Models

Model Best For
gemini-2.5-flash Default - Fast & accurate, great balance
gemini-2.5-flash-lite Fastest, cheapest - good for bulk processing
gemini-2.5-pro Best quality - complex audio, multiple speakers
gemini-3-pro-preview Newest - cutting edge capabilities
gemini-2.0-flash Reliable - previous generation
gemini-2.0-flash-lite Fast - previous generation

Supported Audio Formats

.mp3 .mp4 .m4a .wav .webm .mpeg .mpga

Available Tools

Tool Description
ingest_audio Transcribe and store an audio file
search_transcripts Search through your audio using natural language
list_transcripts List all transcribed audio files
get_full_transcript Get the complete transcript of a file
summarize_audio Generate an AI summary of a transcript
delete_transcript Remove a transcribed file from the database

Troubleshooting

Problem Solution
"No relevant segments found" Try rephrasing your search, or check if audio was ingested
"Missing environment variable" Check your .env file or Claude config has all 3 keys
Supabase errors Make sure you're using service_role key, not anon key
Slow transcription Use gemini-2.5-flash-lite for faster processing

Support This Project

If this project saved you time or helped you out, consider buying me a coffee!

<a href="https://www.buymeacoffee.com/matheusslg" target="_blank"> <img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" height="50"> </a>

License

MIT - Use it however you want!


<p align="center"> Made with Gemini + Supabase + Claude </p>

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured