MCP Servers

Augent

MCP server that turns any audio or video source into structured, searchable intelligence for agents, enabling download, transcription, semantic search, speaker identification, and more.

README

Augent — The Audio Layer for Agents

The wormhole stays open. Fully local. Fully private.

<a href="#mcp-tools">MCP Tools</a> · <a href="#cli">CLI</a> · <a href="#claude-code-skill">Claude Code Skill</a> · <a href="#openclaw">OpenClaw</a> · <a href="#web-ui">Web UI</a> · <a href="https://augent.app">Website</a> · <a href="https://docs.augent.app">Docs</a> · <a href="CHANGELOG.md">Changelog</a> · <a href="mailto:hello@augent.app">Contact</a>

If the answer is trapped in audio or video, this is the way through.

Augent turns any audio or video source into structured, searchable intelligence for agents. Give it URLs or files. It downloads, transcribes, indexes, and stores everything in persistent memory. Search by keyword or meaning, find where concepts intersect, identify speakers, generate chapters and notes, batch process entire libraries, and more. One install, full pipeline, entirely on your machine.

If you want the quality info from content without sitting through it, the fastest way, this is it.

Preferred setup: run the one-line installer in your terminal. One command installs Augent, all dependencies, and the MCP server config. Works on macOS and Linux. Windows: install via pip. Works with Claude Code, Codex, and any MCP client. New install? Start here: Getting started.

Install

curl -fsSL https://augent.app/install.sh | bash

Works on macOS and Linux. Installs everything automatically.

Windows: pip install "augent[all] @ git+https://github.com/AugentDevs/Augent.git"

<details> <summary>What does the installer do?</summary>

The installer is a single bash script (source). Every dependency is open source:

Dependency	What it does
Python	Runtime
FFmpeg	Audio processing
yt-dlp	Media downloads
aria2	Parallel downloads
espeak-ng	TTS phonemizer
faster-whisper	Speech-to-text
PyTorch	ML framework
sentence-transformers	Semantic search
pyannote-audio	Speaker diarization
Kokoro	Text-to-speech
Demucs	Audio source separation
FastAPI	Local web UI

No background services. No telemetry. No sudo on macOS.


Full breakdown	What each phase installs and why
Manual install	Step-by-step for macOS, Linux, and Windows
Uninstall	How to fully remove Augent

</details>

How it works (short)

graph TB
    A["URL / File"] --> B["Download + Separate"]
    B --> C["Transcribe"]
    C --> D["Memory + Tag"]

    D --> E["Search"]
    D --> F["Analyze"]
    D --> G["Export"]

    style A fill:#0d2618,stroke:#00f060,color:#00f060,stroke-width:2px
    style B fill:#0d2618,stroke:#00f060,color:#00f060,stroke-width:2px
    style C fill:#0d2618,stroke:#00f060,color:#00f060,stroke-width:2px
    style D fill:#0d2618,stroke:#00f060,color:#00f060,stroke-width:2px
    style E fill:#0a0a0a,stroke:#00f060,color:#00f060,stroke-width:2px
    style F fill:#0a0a0a,stroke:#00f060,color:#00f060,stroke-width:2px
    style G fill:#0a0a0a,stroke:#00f060,color:#00f060,stroke-width:2px

    linkStyle default stroke:#00f060,stroke-width:1.5px

Full architecture →

Project Structure

augent/
├── mcp.py          # MCP server — 22 tools for agents
├── config.py       # User configuration (~/.augent/config.yaml)
├── core.py         # Transcription engine (faster-whisper)
├── search.py       # Keyword search
├── embeddings.py   # Semantic search, chapters, visual scoring
├── speakers.py     # Speaker diarization (pyannote-audio)
├── separator.py    # Audio source separation (Demucs v4)
├── tts.py          # Text-to-speech (Kokoro)
├── memory.py       # Three-layer memory (SQLite)
├── graph.py        # Obsidian graph view (wikilinks, MOCs, frontmatter)
├── clips.py        # CLI clip extraction (audio segments around matches)
├── export.py       # Export formats (JSON, CSV, SRT, VTT, MD)
├── cli.py          # CLI interface
└── web.py          # Web UI (FastAPI)

MCP Tools

The primary way to use Augent. Any MCP client gets direct access to all tools.

Add to ~/.claude.json (global) or .mcp.json (project):

{
  "mcpServers": {
    "augent": {
      "command": "augent-mcp"
    }
  }
}

Restart Claude Code. Run /mcp to verify connection.

Tool	Description
`download_audio`	Download audio from video URLs at maximum speed (1,000+ supported sites)
`transcribe_audio`	Full transcription with metadata
`search_audio`	Find keywords with timestamps and context snippets
`deep_search`	Search audio by meaning, not just keywords (semantic search)
`take_notes`	Take notes from any URL with style presets
`chapters`	Auto-detect topic chapters in audio with timestamps
`batch_search`	Search multiple files in parallel, built for batch workflows and agent swarms
`text_to_speech`	Convert text to natural speech audio (Kokoro TTS, 54 voices, 9 languages)
`search_proximity`	Find where keywords appear near each other
`identify_speakers`	Identify who speaks when in audio (speaker diarization)
`separate_audio`	Isolate vocals from music and background noise (Demucs v4)
`clip_export`	Export a video clip from a URL for a specific time range
`highlights`	Export MP4 clips of specific moments, auto-pick the best or target exactly what you want
`tag`	Add, remove, or list tags on transcriptions for organized filtering
`visual`	Extract visual context from video at moments that matter (query, auto, or manual)
`rebuild_graph`	Rebuild Obsidian graph view data for all transcriptions
`search_memory`	Search across ALL stored transcriptions by keyword or meaning
`list_files`	List media files in a directory
`list_memories`	List stored transcriptions by title
`memory_stats`	View transcription memory statistics
`clear_memory`	Clear stored transcriptions
`spaces`	Download, check, or stop X/Twitter Spaces recordings

Full tool reference →

<details> <summary>Example prompt</summary>

"Download these 10 podcasts and find every moment a host covers a product in a positive or unique way. Not just brand mentions, only real endorsements or life-changing recommendations. Give me the timestamps and exactly what they said: url1, url2, url3, url4, url5, url6, url7, url8, url9, url10"

</details>

CLI

Full CLI for terminal-based workflows. Works standalone or with any agent.

Command	Description
`audio-downloader "URL"`	Download audio from video URL (speed-optimized)
`augent search audio.mp3 "keyword"`	Search for keywords
`augent transcribe audio.mp3`	Full transcription
`augent proximity audio.mp3 "A" "B"`	Find keyword A near keyword B
`augent memory search "query"`	Search across all stored transcriptions
`augent memory stats`	View memory statistics
`augent memory list`	List stored transcriptions
`augent memory clear`	Clear memory

Eyes & Ears

Someone explains their entire workflow in a video. Augent transcribes it, builds the workflow files, maps the sequencing, the decision points, the tool stack. Every piece structured into something an agent can act on.

But some steps are inherently visual. Augent detects where visual context is needed and exports multiple screenshots at those moments, giving the agent frame-by-frame context of the flow being described. Audio intelligence plus visual context equals a complete, replicable system.

graph TB
    A["Expert explains workflow or automation"] --> B["Augent transcribes + structures"]
    B --> C["Builds workflow files + sequencing"]

    C --> D["Maps decision logic"]
    C --> E["Identifies tools + platforms"]
    C --> F["Flags visual gaps"]

    F --> G["Exports screenshots for context"]
    D --> H["Ready to run"]
    E --> H
    G --> H

    style A fill:#0d2618,stroke:#00f060,color:#00f060,stroke-width:2px
    style B fill:#0d2618,stroke:#00f060,color:#00f060,stroke-width:2px
    style C fill:#0d2618,stroke:#00f060,color:#00f060,stroke-width:2px
    style D fill:#0a0a0a,stroke:#00f060,color:#00f060,stroke-width:2px
    style E fill:#0a0a0a,stroke:#00f060,color:#00f060,stroke-width:2px
    style F fill:#0a0a0a,stroke:#00f060,color:#00f060,stroke-width:2px
    style G fill:#0a0a0a,stroke:#00f060,color:#00f060,stroke-width:2px
    style H fill:#0d2618,stroke:#00f060,color:#00f060,stroke-width:2px

Read more →

X/Twitter Spaces

Download or live-record Twitter/X Spaces audio. Auto-detects whether a Space is live or ended and handles both. Live Spaces record from the current moment using ffmpeg, ended Spaces download the full recording via yt-dlp. All downloads run in the background so your agent keeps working.

One-time setup

X/Twitter requires authentication to access Space audio. Any account works, including a burner.

Log into x.com in any browser
Open DevTools (F12 or Cmd+Option+I) > Application > Cookies > https://x.com
Copy the auth_token and ct0 values
Create ~/.augent/auth.json:

{"auth_token": "PASTE_HERE", "ct0": "PASTE_HERE"}

Tokens are stored locally and only sent to Twitter's servers to fetch audio. Augent never posts, DMs, follows, or modifies anything on your account. To revoke access, log out of Twitter or delete ~/.augent/auth.json.

Claude Code Skill

Claude and Codex already know how to use Augent's tools from their descriptions. The skill adds advanced workflows on top: multi-step note-taking pipelines, auto-tagging rules, translation flows, quiz formatting, and optimal search strategies.

mkdir -p ~/.claude/skills/augent
curl -o ~/.claude/skills/augent/SKILL.md \
  https://raw.githubusercontent.com/AugentDevs/Augent/main/skills/augent/SKILL.md

Works globally across all projects. One install, every conversation benefits.

OpenClaw

Augent is available as an OpenClaw skill on ClawHub.

Install via ClawHub:

npx clawhub@latest install augent

Or set up manually:

augent setup openclaw

If you installed Augent with curl -fsSL https://augent.app/install.sh | bash, OpenClaw is detected and configured automatically. These commands are only needed for manual setup or pip installs.

Obsidian Graph View

Every transcription builds a node. Every shared tag builds a connection. Your audio memory becomes a navigable knowledge graph, entirely automatic.

Every take_notes call, every transcription, every tag creates structure: YAML frontmatter, [[wikilinks]] between related content, and MOC hub files that cluster topics. Run rebuild_graph once to upgrade existing memory. The graph grows on its own from there.

Use Augent inside your main Obsidian vault, alongside your personal notes, journals, and projects. Everything compounds together. Full guide.

Using Claude Code or Codex with Obsidian? Set up augent-obsidian to make every .txt and .md file on your Mac open directly in Obsidian, with automatic sync for external edits.

Multilingual

Augent transcribes audio in its original language with full accuracy, powered by OpenAI's Whisper, supporting 99 languages including Chinese, French, Spanish, Japanese, Arabic, Hindi, Korean, German, Russian, Portuguese, and many more. Language is auto-detected, no configuration needed. Translation to English is handled by Claude (or your LLM), producing far better translations than any local model.

When a transcription returns a non-English language, the MCP response includes a translation offer
Accepting stores a clean English (eng) sibling file in memory alongside the original
Both the original and translated versions appear in the Memory Explorer

Model Sizes

tiny is the default. Handles everything from clean studio recordings to noisy field audio. Use small or above for heavy accents, poor audio, or lyrics.

Model	Speed	Accuracy
tiny	Fastest	Excellent (default)
base	Fast	Excellent
small	Medium	Superior
medium	Slow	Outstanding
large	Slowest	Maximum

Configuration

Customize defaults and disable tools you don't need via ~/.augent/config.yaml:

# ~/.augent/config.yaml
model_size: tiny           # Default Whisper model
output_dir: ~/Downloads    # Default download directory
notes_output_dir: ~/Desktop # Notes, clips, TTS output
clip_padding: 15           # Seconds of padding around clips
context_words: 25          # Words of context in search results
tts_voice: af_heart        # Default TTS voice
tts_speed: 1.0             # TTS speed multiplier
disabled_tools: []         # Hide tools from MCP clients

Per-call arguments always override config. No config file needed, all values have sensible defaults.

Configuration docs →

Web UI

Local web interface. Runs 100% locally. No internet, no API keys, no data leaves your machine.

augent-web

Open: http://127.0.0.1:8282

Search view:

Upload an audio file or paste a YouTube/video URL to download audio directly
Enter keywords separated by commas
Click SEARCH and results stream live with timestamps and context
YouTube timestamps are automatically hyperlinked when the source is YouTube

Clip export:

Click the film icon on any search result to create a visual region on the waveform, or drag on the waveform to select any range manually
Nudge buttons (±1s / ±5s) on each edge for precise boundary adjustment
Preview plays only the selected range so you hear exactly what will be exported
Export MP4 downloads only the selected segment, not the full video
Keyboard shortcuts: Space preview, Enter export, Esc close

Memory Explorer:

Browse all stored transcriptions, including files transcribed via MCP or CLI. Every tool writes to the same memory.
View full transcripts with clickable YouTube timestamps
Delete individual transcriptions from memory
Show Audio to reveal the source audio file in Finder
Show Transcript to reveal the .md transcript file in Finder. Drag it into a Claude Code session to run the full MCP pipeline on a previously transcribed file.
Share as HTML to download a self-contained, shareable transcript page
Search across all memories by keyword to find matches across every transcription in your library

Source URL persistence: When audio is downloaded from any URL (YouTube, Twitter/X, TikTok, Instagram, SoundCloud, and 1000+ sites) the source URL is permanently stored by file hash. Any future search or transcription of that file, even weeks later or from a different path, automatically links back to the original source. No need to re-enter the URL.

<details> <summary>Web UI options</summary>

Command	Description
`augent-web`	Start on port 8282
`augent-web --port 8585`	Custom port

</details>

Star History

Contributing

PRs welcome. Open an issue for bugs or feature requests.

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured