ff-toolkit

ff-toolkit

Enables LLMs to perform FFmpeg operations like clipping, merging, extracting audio, adding subtitles, and transcoding videos via a set of tools exposed as an MCP server.

Category
Visit Server

README

ff-toolkit

FFmpeg operations as LLM-callable tools.

<p align="center"> <img src="https://raw.githubusercontent.com/inthepond/ff-toolkit/main/docs/demo.svg" alt="ff-toolkit demo" width="720"> </p>

Stop hand-writing FFmpeg subprocess calls and JSON tool schemas. ff-toolkit gives you 5 production-ready media operations, dual-format LLM schemas (OpenAI + Anthropic), and an MCP server — all in one pip install.

Real-World Use Cases

"My agent pipeline needs to process uploaded videos" — Give your agent openai_tools() or anthropic_tools() and let it decide how to clip, transcode, or extract audio. The dispatch() function handles execution.

"I need to batch-extract 16kHz WAV for ASR" — One line: extract_audio("video.mp4", "out.wav", codec="pcm_s16le", sample_rate=16000, channels=1)

"I want FFmpeg tools in Claude Desktop / Cursor" — Add the MCP server config (3 lines of JSON) and Claude can edit your videos directly.

"I'm tired of writing the same FFmpeg commands" — Use the CLI: ffkit clip input.mp4 output.mp4 --start 00:01:00 --duration 30

60-Second Quick Start

# Install (requires FFmpeg on PATH)
pip install ff-toolkit

# Verify it works — no API keys needed
ffkit probe some_video.mp4

# Or run the full demo with a generated test video
python -m ff_kit.examples.local

Python API

from ff_kit import clip, extract_audio, merge, transcode

# Trim seconds 60-90
clip("raw.mp4", "highlight.mp4", start="00:01:00", duration="30")

# Extract 16kHz mono audio for Whisper/Paraformer
extract_audio("raw.mp4", "speech.wav", codec="pcm_s16le", sample_rate=16000, channels=1)

# Concatenate intro + main + outro
merge(["intro.mp4", "main.mp4", "outro.mp4"], "final.mp4")

# Compress to 720p WebM for web delivery
transcode("raw.mp4", "web.webm", video_codec="libvpx-vp9", resolution="1280x720", crf=30)

CLI

ffkit clip raw.mp4 highlight.mp4 --start 00:01:00 --duration 30
ffkit extract-audio raw.mp4 speech.wav --codec pcm_s16le --sample-rate 16000 --channels 1
ffkit merge intro.mp4 main.mp4 outro.mp4 -o final.mp4
ffkit transcode raw.mp4 web.webm --video-codec libvpx-vp9 --resolution 1280x720 --crf 30
ffkit probe video.mp4

With OpenAI (3 lines to integrate)

from ff_kit.schemas.openai import openai_tools
from ff_kit.dispatch import dispatch

# 1. Pass tools to the model
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=openai_tools(),       # ← that's it
)

# 2. Execute whatever the model calls
tc = response.choices[0].message.tool_calls[0]
result = dispatch(tc.function.name, json.loads(tc.function.arguments))

With Anthropic (3 lines to integrate)

from ff_kit.schemas.anthropic import anthropic_tools
from ff_kit.dispatch import dispatch

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=anthropic_tools(),    # ← that's it
    messages=messages,
)

for block in response.content:
    if block.type == "tool_use":
        result = dispatch(block.name, block.input)

As an MCP Server (Claude Desktop / Cursor)

Add to your config (claude_desktop_config.json or Cursor settings):

{
  "mcpServers": {
    "ff-toolkit": {
      "command": "ffkit-mcp",
      "args": []
    }
  }
}

That's it. Claude can now clip, merge, extract audio, add subtitles, and transcode your files.

Operations

Tool What it does Example
ffkit_clip Trim a segment by start + end/duration Cut highlight reel from raw footage
ffkit_merge Concatenate multiple files Join intro + content + outro
ffkit_extract_audio Extract audio, optionally re-encode Get 16kHz WAV for speech recognition
ffkit_add_subtitles Burn or embed subtitles (.srt/.ass/.vtt) Hard-sub a translated SRT into video
ffkit_transcode Convert format, codec, resolution, bitrate Compress 4K MP4 to 720p WebM for web

How It Works

Your Agent                    ff-toolkit                         FFmpeg
    │                           │                              │
    ├─ openai_tools() ──────────┤                              │
    │  or anthropic_tools()     │                              │
    │                           │                              │
    ├─ LLM returns tool call ──►│                              │
    │                           │                              │
    ├─ dispatch(name, args) ───►├─ validates & builds cmd ────►│
    │                           │                              │
    │◄── FFmpegResult ─────────┤◄── subprocess result ────────┤
    │                           │                              │

Project Structure

ff-toolkit/
├── src/ff_kit/
│   ├── __init__.py          # Public API: clip, merge, extract_audio, ...
│   ├── cli.py               # CLI entry point (ffkit command)
│   ├── executor.py          # FFmpeg subprocess runner + probe
│   ├── dispatch.py          # Tool name → function router
│   ├── core/                # One module per operation
│   │   ├── clip.py
│   │   ├── merge.py
│   │   ├── extract_audio.py
│   │   ├── add_subtitles.py
│   │   └── transcode.py
│   ├── schemas/             # LLM tool definitions
│   │   ├── openai.py        # OpenAI function-calling format
│   │   └── anthropic.py     # Anthropic tool-use format
│   └── mcp/                 # MCP server (stdio JSON-RPC)
│       └── server.py
├── examples/
│   ├── local_example.py     # ← Run this first! No API key needed
│   ├── openai_example.py
│   ├── anthropic_example.py
│   └── agent_loop_example.py
└── tests/                   # 30 tests, all mocked (no FFmpeg needed)

Development

git clone https://github.com/inthepond/ff-toolkit.git
cd ff-toolkit
pip install -e ".[dev]"
pytest -v                    # 30 tests, runs in <1s

FAQ

Q: Do I need FFmpeg installed? Yes, for actual media operations. Tests are fully mocked and don't need FFmpeg. Install it from ffmpeg.org/download or brew install ffmpeg / apt install ffmpeg.

Q: Can I add custom operations? Yes — add a function in core/, register it in dispatch.py's _REGISTRY, and add schema entries in schemas/openai.py and schemas/anthropic.py. See any existing operation as a template.

Q: Why not just use LangChain / CrewAI tools? Those frameworks are great, but they're heavy dependencies. ff-toolkit is zero-dependency (beyond Python stdlib) and works with any LLM provider. You can use it inside LangChain if you want, or standalone.

Q: What about streaming / progress callbacks? Not in v0.1. FFmpeg progress parsing is planned for v0.2.

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured