AI Sound MCP Server
Enables AI assistants to programmatically edit, analyze, and export audio projects through MCP tools, including multi-track editing, effects, transcription, and semantic search.
README
<p align="center"> <img src="docs/header.png" alt="AI Sound — AI-Native Audio Editor" width="100%"> </p>
<p align="center"> <strong>An AI-native audio editor that works with any OpenAI-compatible LLM.</strong> </p>
<p align="center"> <img src="https://img.shields.io/badge/runtime-Bun-f9f1e1?logo=bun" alt="Bun"> <img src="https://img.shields.io/badge/React-19-61dafb?logo=react" alt="React 19"> <img src="https://img.shields.io/badge/MCP-compatible-blue" alt="MCP Compatible"> <img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License"> </p>
<p align="center"> <img src="assets/ai-audio-screenshot.png" alt="AI Sound Screenshot" width="90%"> </p>
What is AI Sound?
AI Sound is an AI-native audio editor designed as a modern replacement for desktop tools like Audacity. Instead of bolting AI onto an existing app, AI Sound is built from the ground up with LLM integration at its core — enabling conversational audio editing, automatic transcription, speaker diarization, and semantic search across your audio content.
It works with any OpenAI-compatible API — run it fully local with Ollama, or connect to OpenAI, Groq, or any other compatible provider. No vendor lock-in, no API keys required for local use.
AI Sound also exposes a full MCP (Model Context Protocol) server, letting AI assistants like Claude Desktop directly edit, analyze, and export your audio projects.
Key Features
- Multi-track editing — import, arrange, and mix multiple audio tracks
- AI-powered transcription — speech-to-text with word-level timestamps
- Speaker diarization — automatically identify and split by speaker
- Semantic search — find content by meaning, not just keywords
- Audio effects — normalize, compress, EQ, reverb, noise reduction, fade, pitch shift, speed
- Non-destructive editing — full undo/redo history
- Export — export individual tracks or full project mixes
- MCP integration — expose all editing tools to AI assistants
Quickstart
Prerequisites
- Bun — JavaScript/TypeScript runtime
- FFmpeg — audio processing (
brew install ffmpegon macOS) - An OpenAI-compatible LLM — Ollama for local, or any cloud provider
Install & Run
git clone https://github.com/your-username/ai-sound.git
cd ai-sound
bun install
bun run dev
Open http://localhost:5175 in your browser.
On first launch, configure your LLM provider in Settings (gear icon). The default is Ollama at localhost:11434.
LLM Configuration
AI Sound works with any OpenAI-compatible API. Configure your provider in-app via Settings — no .env files needed.
| Provider | Base URL | Model Example |
|---|---|---|
| Ollama (local) | http://localhost:11434/v1 |
llama3.2 |
| OpenAI | https://api.openai.com/v1 |
gpt-4o |
| Anthropic (via proxy) | provider-specific | claude-sonnet-4-20250514 |
| Groq | https://api.groq.com/openai/v1 |
llama-3.3-70b |
For cloud providers, enter your API key in the Settings panel. For Ollama, no API key is needed.
MCP Server
AI Sound includes a built-in Model Context Protocol server, allowing AI assistants like Claude Desktop to interact with your audio projects programmatically.
Claude Desktop Configuration
Add the following to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"ai-sound": {
"command": "bun",
"args": ["run", "/absolute/path/to/ai-sound/server/lib/mcp/server.ts"]
}
}
}
Replace /absolute/path/to/ai-sound with the actual path to your installation.
Available Tools
Project Management
list_projects— List all projectsset_active_project— Set the active project for subsequent operationsget_project_status— Get all tracks, durations, regions, and transcriptionsget_track_info— Get detailed info about a specific track
Audio Effects & Processing
normalize_audio— Normalize audio levelsadjust_volume— Adjust volume by relative dB amounttrim_audio— Trim to a specific time rangeremove_silence— Detect and remove silent sectionsapply_fade— Apply fade in/outapply_effect— Apply effects: noise reduction, compressor, EQ, reverb, speed, pitch shift
Segment Operations
remove_segments— Remove multiple time ranges from a trackreplace_audio_segment— Replace a time range with silence or a beep
Track Operations
rename_track— Rename a trackdelete_track— Delete a track and its audiomerge_tracks— Merge multiple tracks into oneduplicate_track— Duplicate a trackexport_audio— Export a track or full project mix
Transcription & Search
transcribe_track— Transcribe audio using speech-to-textsplit_by_speaker— Split a track by speakerrename_speaker— Rename a speaker labelsearch_transcription— Search transcription text by patternsearch_transcript_semantic— Semantic search across transcriptionscopy_transcriptions— Copy transcription data between tracks
Tech Stack
| Layer | Technology |
|---|---|
| Runtime | Bun |
| Server | Hono |
| Database | SQLite via Drizzle ORM |
| Frontend | React 19 + Tailwind CSS 4 |
| Audio | wavesurfer.js + FFmpeg |
| AI Integration | OpenAI-compatible API + MCP SDK |
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.