Augent

Augent

MCP server that turns any audio or video source into structured, searchable intelligence for agents, enabling download, transcription, semantic search, speaker identification, and more.

Category
Visit Server

README

Augent — The Audio Layer for Agents

<p align="center"> <picture> <img src="./images/logo.png" width="600" alt="Augent"> </picture> </p>

<p align="center"> <strong>The wormhole stays open. Fully local. Fully private.</strong> </p>

<p align="center"> <a href="https://github.com/AugentDevs/Augent/actions/workflows/tests.yml"><img src="https://img.shields.io/github/actions/workflow/status/AugentDevs/Augent/tests.yml?label=build&style=for-the-badge" alt="Build"></a> <img src="https://img.shields.io/badge/dynamic/toml?url=https://raw.githubusercontent.com/AugentDevs/Augent/main/pyproject.toml&query=$.project.version&label=version&style=for-the-badge" alt="Version"> <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10+-3776AB.svg?style=for-the-badge" alt="Python 3.10+"></a> <a href="https://discord.com/invite/DNmaZtaE7b"><img src="https://img.shields.io/badge/Discord-Join-5865F2.svg?style=for-the-badge&logo=discord&logoColor=white" alt="Discord"></a> <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-blue.svg?style=for-the-badge" alt="License: MIT"></a> </p>

<p align="center"> <a href="#mcp-tools">MCP Tools</a> · <a href="#cli">CLI</a> · <a href="#claude-code-skill">Claude Code Skill</a> · <a href="#openclaw">OpenClaw</a> · <a href="#web-ui">Web UI</a> · <a href="https://augent.app">Website</a> · <a href="https://docs.augent.app">Docs</a> · <a href="CHANGELOG.md">Changelog</a> · <a href="mailto:hello@augent.app">Contact</a> </p>

If the answer is trapped in audio or video, this is the way through.

Augent turns any audio or video source into structured, searchable intelligence for agents. Give it URLs or files. It downloads, transcribes, indexes, and stores everything in persistent memory. Search by keyword or meaning, find where concepts intersect, identify speakers, generate chapters and notes, batch process entire libraries, and more. One install, full pipeline, entirely on your machine.

If you want the quality info from content without sitting through it, the fastest way, this is it.

Preferred setup: run the one-line installer in your terminal. One command installs Augent, all dependencies, and the MCP server config. Works on macOS and Linux. Windows: install via pip. Works with Claude Code, Codex, and any MCP client. New install? Start here: Getting started.

<br />

Install

curl -fsSL https://augent.app/install.sh | bash

Works on macOS and Linux. Installs everything automatically.

Windows: pip install "augent[all] @ git+https://github.com/AugentDevs/Augent.git"

<details> <summary><strong>What does the installer do?</strong></summary>

<br />

The installer is a single bash script (source). Every dependency is open source:

Dependency What it does
Python Runtime
FFmpeg Audio processing
yt-dlp Media downloads
aria2 Parallel downloads
espeak-ng TTS phonemizer
faster-whisper Speech-to-text
PyTorch ML framework
sentence-transformers Semantic search
pyannote-audio Speaker diarization
Kokoro Text-to-speech
Demucs Audio source separation
FastAPI Local web UI

No background services. No telemetry. No sudo on macOS.

Full breakdown What each phase installs and why
Manual install Step-by-step for macOS, Linux, and Windows
Uninstall How to fully remove Augent

</details>

<br />

<p align="center"> <picture> <img src="./images/install-demo.svg" alt="Install demo"> </picture> </p>

<br />

How it works (short)

graph TB
    A["URL / File"] --> B["Download + Separate"]
    B --> C["Transcribe"]
    C --> D["Memory + Tag"]

    D --> E["Search"]
    D --> F["Analyze"]
    D --> G["Export"]

    style A fill:#0d2618,stroke:#00f060,color:#00f060,stroke-width:2px
    style B fill:#0d2618,stroke:#00f060,color:#00f060,stroke-width:2px
    style C fill:#0d2618,stroke:#00f060,color:#00f060,stroke-width:2px
    style D fill:#0d2618,stroke:#00f060,color:#00f060,stroke-width:2px
    style E fill:#0a0a0a,stroke:#00f060,color:#00f060,stroke-width:2px
    style F fill:#0a0a0a,stroke:#00f060,color:#00f060,stroke-width:2px
    style G fill:#0a0a0a,stroke:#00f060,color:#00f060,stroke-width:2px

    linkStyle default stroke:#00f060,stroke-width:1.5px

Full architecture →

Project Structure

augent/
├── mcp.py          # MCP server — 22 tools for agents
├── config.py       # User configuration (~/.augent/config.yaml)
├── core.py         # Transcription engine (faster-whisper)
├── search.py       # Keyword search
├── embeddings.py   # Semantic search, chapters, visual scoring
├── speakers.py     # Speaker diarization (pyannote-audio)
├── separator.py    # Audio source separation (Demucs v4)
├── tts.py          # Text-to-speech (Kokoro)
├── memory.py       # Three-layer memory (SQLite)
├── graph.py        # Obsidian graph view (wikilinks, MOCs, frontmatter)
├── clips.py        # CLI clip extraction (audio segments around matches)
├── export.py       # Export formats (JSON, CSV, SRT, VTT, MD)
├── cli.py          # CLI interface
└── web.py          # Web UI (FastAPI)

<br />

MCP Tools

The primary way to use Augent. Any MCP client gets direct access to all tools.

Add to ~/.claude.json (global) or .mcp.json (project):

{
  "mcpServers": {
    "augent": {
      "command": "augent-mcp"
    }
  }
}

Restart Claude Code. Run /mcp to verify connection.

Tool Description
download_audio Download audio from video URLs at maximum speed (1,000+ supported sites)
transcribe_audio Full transcription with metadata
search_audio Find keywords with timestamps and context snippets
deep_search Search audio by meaning, not just keywords (semantic search)
take_notes Take notes from any URL with style presets
chapters Auto-detect topic chapters in audio with timestamps
batch_search Search multiple files in parallel, built for batch workflows and agent swarms
text_to_speech Convert text to natural speech audio (Kokoro TTS, 54 voices, 9 languages)
search_proximity Find where keywords appear near each other
identify_speakers Identify who speaks when in audio (speaker diarization)
separate_audio Isolate vocals from music and background noise (Demucs v4)
clip_export Export a video clip from a URL for a specific time range
highlights Export MP4 clips of specific moments, auto-pick the best or target exactly what you want
tag Add, remove, or list tags on transcriptions for organized filtering
visual Extract visual context from video at moments that matter (query, auto, or manual)
rebuild_graph Rebuild Obsidian graph view data for all transcriptions
search_memory Search across ALL stored transcriptions by keyword or meaning
list_files List media files in a directory
list_memories List stored transcriptions by title
memory_stats View transcription memory statistics
clear_memory Clear stored transcriptions
spaces Download, check, or stop X/Twitter Spaces recordings

Full tool reference →

<details> <summary>Example prompt</summary>

"Download these 10 podcasts and find every moment a host covers a product in a positive or unique way. Not just brand mentions, only real endorsements or life-changing recommendations. Give me the timestamps and exactly what they said: url1, url2, url3, url4, url5, url6, url7, url8, url9, url10"

<p align="center"> <picture> <img src="./images/pipeline.png" alt="Augent Pipeline — From URLs to insights in one prompt" width="100%"> </picture> </p>

</details>

<br />

CLI

Full CLI for terminal-based workflows. Works standalone or with any agent.

<picture> <img src="./images/cli-help.png" alt="Augent CLI"> </picture>

Command Description
audio-downloader "URL" Download audio from video URL (speed-optimized)
augent search audio.mp3 "keyword" Search for keywords
augent transcribe audio.mp3 Full transcription
augent proximity audio.mp3 "A" "B" Find keyword A near keyword B
augent memory search "query" Search across all stored transcriptions
augent memory stats View memory statistics
augent memory list List stored transcriptions
augent memory clear Clear memory

<br />

Eyes & Ears

Someone explains their entire workflow in a video. Augent transcribes it, builds the workflow files, maps the sequencing, the decision points, the tool stack. Every piece structured into something an agent can act on.

But some steps are inherently visual. Augent detects where visual context is needed and exports multiple screenshots at those moments, giving the agent frame-by-frame context of the flow being described. Audio intelligence plus visual context equals a complete, replicable system.

graph TB
    A["Expert explains workflow or automation"] --> B["Augent transcribes + structures"]
    B --> C["Builds workflow files + sequencing"]

    C --> D["Maps decision logic"]
    C --> E["Identifies tools + platforms"]
    C --> F["Flags visual gaps"]

    F --> G["Exports screenshots for context"]
    D --> H["Ready to run"]
    E --> H
    G --> H

    style A fill:#0d2618,stroke:#00f060,color:#00f060,stroke-width:2px
    style B fill:#0d2618,stroke:#00f060,color:#00f060,stroke-width:2px
    style C fill:#0d2618,stroke:#00f060,color:#00f060,stroke-width:2px
    style D fill:#0a0a0a,stroke:#00f060,color:#00f060,stroke-width:2px
    style E fill:#0a0a0a,stroke:#00f060,color:#00f060,stroke-width:2px
    style F fill:#0a0a0a,stroke:#00f060,color:#00f060,stroke-width:2px
    style G fill:#0a0a0a,stroke:#00f060,color:#00f060,stroke-width:2px
    style H fill:#0d2618,stroke:#00f060,color:#00f060,stroke-width:2px

Read more →

<br />

X/Twitter Spaces

Download or live-record Twitter/X Spaces audio. Auto-detects whether a Space is live or ended and handles both. Live Spaces record from the current moment using ffmpeg, ended Spaces download the full recording via yt-dlp. All downloads run in the background so your agent keeps working.

One-time setup

X/Twitter requires authentication to access Space audio. Any account works, including a burner.

  1. Log into x.com in any browser
  2. Open DevTools (F12 or Cmd+Option+I) > Application > Cookies > https://x.com
  3. Copy the auth_token and ct0 values
  4. Create ~/.augent/auth.json:
{"auth_token": "PASTE_HERE", "ct0": "PASTE_HERE"}

Tokens are stored locally and only sent to Twitter's servers to fetch audio. Augent never posts, DMs, follows, or modifies anything on your account. To revoke access, log out of Twitter or delete ~/.augent/auth.json.

<br />

Claude Code Skill

Claude and Codex already know how to use Augent's tools from their descriptions. The skill adds advanced workflows on top: multi-step note-taking pipelines, auto-tagging rules, translation flows, quiz formatting, and optimal search strategies.

mkdir -p ~/.claude/skills/augent
curl -o ~/.claude/skills/augent/SKILL.md \
  https://raw.githubusercontent.com/AugentDevs/Augent/main/skills/augent/SKILL.md

Works globally across all projects. One install, every conversation benefits.

<br />

OpenClaw

Augent is available as an OpenClaw skill on ClawHub.

Install via ClawHub:

npx clawhub@latest install augent

Or set up manually:

augent setup openclaw

If you installed Augent with curl -fsSL https://augent.app/install.sh | bash, OpenClaw is detected and configured automatically. These commands are only needed for manual setup or pip installs.

<br />

Obsidian Graph View

Every transcription builds a node. Every shared tag builds a connection. Your audio memory becomes a navigable knowledge graph, entirely automatic.

<picture> <img src="./images/obsidian-graph.png" alt="Augent knowledge graph in Obsidian"> </picture>

Every take_notes call, every transcription, every tag creates structure: YAML frontmatter, [[wikilinks]] between related content, and MOC hub files that cluster topics. Run rebuild_graph once to upgrade existing memory. The graph grows on its own from there.

Use Augent inside your main Obsidian vault, alongside your personal notes, journals, and projects. Everything compounds together. Full guide.

Using Claude Code or Codex with Obsidian? Set up augent-obsidian to make every .txt and .md file on your Mac open directly in Obsidian, with automatic sync for external edits.

<br />

Multilingual

Augent transcribes audio in its original language with full accuracy, powered by OpenAI's Whisper, supporting 99 languages including Chinese, French, Spanish, Japanese, Arabic, Hindi, Korean, German, Russian, Portuguese, and many more. Language is auto-detected, no configuration needed. Translation to English is handled by Claude (or your LLM), producing far better translations than any local model.

  • When a transcription returns a non-English language, the MCP response includes a translation offer
  • Accepting stores a clean English (eng) sibling file in memory alongside the original
  • Both the original and translated versions appear in the Memory Explorer

<br />

Model Sizes

tiny is the default. Handles everything from clean studio recordings to noisy field audio. Use small or above for heavy accents, poor audio, or lyrics.

Model Speed Accuracy
tiny Fastest Excellent (default)
base Fast Excellent
small Medium Superior
medium Slow Outstanding
large Slowest Maximum

<br />

Configuration

Customize defaults and disable tools you don't need via ~/.augent/config.yaml:

# ~/.augent/config.yaml
model_size: tiny           # Default Whisper model
output_dir: ~/Downloads    # Default download directory
notes_output_dir: ~/Desktop # Notes, clips, TTS output
clip_padding: 15           # Seconds of padding around clips
context_words: 25          # Words of context in search results
tts_voice: af_heart        # Default TTS voice
tts_speed: 1.0             # TTS speed multiplier
disabled_tools: []         # Hide tools from MCP clients

Per-call arguments always override config. No config file needed, all values have sensible defaults.

Configuration docs →

<br />

Web UI

Local web interface. Runs 100% locally. No internet, no API keys, no data leaves your machine.

augent-web

Open: http://127.0.0.1:8282

Search view:

  1. Upload an audio file or paste a YouTube/video URL to download audio directly
  2. Enter keywords separated by commas
  3. Click SEARCH and results stream live with timestamps and context
  4. YouTube timestamps are automatically hyperlinked when the source is YouTube

Clip export:

  • Click the film icon on any search result to create a visual region on the waveform, or drag on the waveform to select any range manually
  • Nudge buttons (±1s / ±5s) on each edge for precise boundary adjustment
  • Preview plays only the selected range so you hear exactly what will be exported
  • Export MP4 downloads only the selected segment, not the full video
  • Keyboard shortcuts: Space preview, Enter export, Esc close

Memory Explorer:

  • Browse all stored transcriptions, including files transcribed via MCP or CLI. Every tool writes to the same memory.
  • View full transcripts with clickable YouTube timestamps
  • Delete individual transcriptions from memory
  • Show Audio to reveal the source audio file in Finder
  • Show Transcript to reveal the .md transcript file in Finder. Drag it into a Claude Code session to run the full MCP pipeline on a previously transcribed file.
  • Share as HTML to download a self-contained, shareable transcript page
  • Search across all memories by keyword to find matches across every transcription in your library

Source URL persistence: When audio is downloaded from any URL (YouTube, Twitter/X, TikTok, Instagram, SoundCloud, and 1000+ sites) the source URL is permanently stored by file hash. Any future search or transcription of that file, even weeks later or from a different path, automatically links back to the original source. No need to re-enter the URL.

<details> <summary>Web UI options</summary>

Command Description
augent-web Start on port 8282
augent-web --port 8585 Custom port

</details>

<picture> <img src="./images/webui-1.png" alt="Augent Web UI - Upload"> </picture> <picture> <img src="./images/webui-2.png" alt="Augent Web UI - Results"> </picture>

<br />

Star History

<a href="https://www.star-history.com/#AugentDevs/Augent&type=date&legend=top-left"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=AugentDevs/Augent&type=date&theme=dark&legend=top-left" /> <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=AugentDevs/Augent&type=date&legend=top-left" /> <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=AugentDevs/Augent&type=date&legend=top-left" /> </picture> </a>

<br />

Contributing

PRs welcome. Open an issue for bugs or feature requests.

<br />

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured