sanzaru
Stateless MCP server that wraps OpenAI's Sora, Whisper, GPT-4o Audio, and TTS APIs for generating videos, images, and processing audio.
README
sanzaru
<div align="center"> <img src="https://raw.githubusercontent.com/TJC-LP/sanzaru/main/assets/logo.png" alt="sanzaru logo" width="400">
A stateless, lightweight MCP server that wraps OpenAI's Sora Video API, Whisper, GPT-4o Audio, and TTS APIs via the OpenAI Python SDK.
Features
Video Generation (Sora)
- Create videos with
sora-2orsora-2-promodels - Use reference images to guide generation
- Remix and refine existing videos
- Download variants (video, thumbnail, spritesheet)
Image Generation
- Generate images with gpt-image-2 (recommended), gpt-image-1.5, or GPT-5
- Edit and compose images with up to 16 inputs
- Iterative refinement via Responses API
- Automatic resizing for Sora compatibility
Audio Processing
- Transcription: Whisper and GPT-4o models
- Audio Chat: Interactive analysis with GPT-4o
- Text-to-Speech: Multi-voice TTS generation
- Processing: Format conversion, compression, file management
Podcast Generation
- Multi-voice podcasts with up to 4 speakers and 10 TTS voices
- Parallel segment generation with configurable pacing
- MP3/WAV output with loudness normalization
Note: Content guardrails are enforced by OpenAI. This server does not run local moderation.
Requirements
- Python 3.10+
OPENAI_API_KEYenvironment variable
Media storage (choose one):
# Recommended: unified path (auto-creates videos/, images/, audio/ subdirs)
SANZARU_MEDIA_PATH="/path/to/media"
# Or individual paths (legacy, still supported)
VIDEO_PATH="/path/to/videos"
IMAGE_PATH="/path/to/images"
AUDIO_PATH="/path/to/audio"
Features are auto-detected based on configured paths. Set only what you need.
Quick Start
-
Clone the repository:
git clone https://github.com/TJC-LP/sanzaru.git cd sanzaru -
Run the setup script:
./setup.shThe script will:
- Prompt for your OpenAI API key
- Create directories and
.envconfiguration - Install dependencies with
uv sync --all-extras --dev
-
Start using:
claude
That's it! Claude Code will automatically connect and you can start generating videos, images, and processing audio.
Installation
Claude Code Plugin (Recommended)
Install as a plugin — auto-configures the MCP server + includes prompting guidance:
/plugin marketplace add TJC-LP/sanzaru
Requires OPENAI_API_KEY and SANZARU_MEDIA_PATH environment variables to be set.
Quick Install
# All features
uv add "sanzaru[all]"
# Specific features
uv add "sanzaru[audio]" # With audio support
uv add sanzaru # Base (video + image only)
<details> <summary><strong>Alternative Installation Methods</strong></summary>
From Source
git clone https://github.com/TJC-LP/sanzaru.git
cd sanzaru
uv sync --all-extras
Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"sanzaru": {
"command": "uvx",
"args": ["sanzaru[all]"],
"env": {
"OPENAI_API_KEY": "your-api-key-here",
"SANZARU_MEDIA_PATH": "/absolute/path/to/media"
}
}
}
}
Or from source:
{
"mcpServers": {
"sanzaru": {
"command": "uv",
"args": ["run", "--directory", "/path/to/sanzaru", "sanzaru"]
}
}
}
Codex MCP
# Using uvx (from PyPI)
codex mcp add sanzaru \
--env OPENAI_API_KEY="sk-..." \
--env SANZARU_MEDIA_PATH="$HOME/sanzaru-media" \
-- uvx "sanzaru[all]"
Manual Setup
uv venv
uv sync
# Set required environment variables
export OPENAI_API_KEY=sk-...
export SANZARU_MEDIA_PATH=~/sanzaru-media
# Run server (stdio for MCP clients)
uv run sanzaru
# Or HTTP mode (for remote access)
uv run sanzaru --transport http --port 8000
</details>
Available Tools
| Category | Tools | Description |
|---|---|---|
| Video | create_video, get_video_status, download_video, list_videos, list_local_videos, delete_video, remix_video |
Generate and manage Sora videos with optional reference images |
| Image | generate_image, edit_image, create_image, get_image_status, download_image |
Generate with gpt-image-2 (default, sync) or GPT-5 (polling) |
| Reference | list_reference_images, prepare_reference_image |
Manage and resize images for Sora compatibility |
| Audio | transcribe_audio, chat_with_audio, create_audio, convert_audio, compress_audio, list_audio_files, get_latest_audio, transcribe_with_enhancement |
Transcription, analysis, TTS, and file management |
| Podcast | generate_podcast |
Multi-voice podcast generation with parallel TTS and audio stitching |
| Media | view_media |
Interactive media player via MCP App protocol |
Full API documentation: See docs/api-reference.md
Basic Workflows
Generate a Video
# Create video from text
video = create_video(
prompt="A serene mountain landscape at sunrise",
model="sora-2",
seconds="8",
size="1280x720"
)
# Poll for completion
status = get_video_status(video.id)
# Download when ready
download_video(video.id, filename="mountain_sunrise.mp4")
Generate with Reference Image
# 1. Generate reference image (gpt-image-2, synchronous)
generate_image(
prompt="futuristic pilot in mech cockpit",
size="1536x1024",
filename="pilot.png"
)
# 2. Prepare for video (resize to Sora dimensions)
prepare_reference_image("pilot.png", "1280x720", resize_mode="crop")
# 3. Animate
video = create_video(
prompt="The pilot looks up and smiles",
size="1280x720",
input_reference_filename="pilot_1280x720.png"
)
Audio Transcription
# List available audio files
files = list_audio_files(format="mp3")
# Transcribe
result = transcribe_audio("interview.mp3")
# Or analyze with GPT-4o
analysis = chat_with_audio(
"meeting.mp3",
user_prompt="Summarize key decisions and action items"
)
Generate a Podcast
generate_podcast(script={
"title": "AI Weekly",
"speakers": [
{"id": "host", "name": "Alex", "voice": "nova"},
{"id": "guest", "name": "Sam", "voice": "echo"}
],
"segments": [
{"speaker": "host", "text": "Welcome to AI Weekly!"},
{"speaker": "guest", "text": "Thanks for having me."}
]
})
Documentation
- API Reference - Complete tool documentation with parameters and examples
- Reference Images Guide - Working with reference images and resizing
- Image Generation Guide - Generating and editing reference images
- Sora Prompting Guide - Crafting effective video prompts
- Audio Features - Audio transcription, chat, and TTS
- Performance & Architecture - Technical details and benchmarks
Transport Modes
| Mode | Command | Use Case |
|---|---|---|
| stdio (default) | uv run sanzaru |
Claude Desktop, Claude Code, local MCP clients |
| HTTP | uv run sanzaru --transport http |
Remote access, Databricks Apps, web clients |
Storage Backends
| Backend | Config | Use Case |
|---|---|---|
| Local (default) | SANZARU_MEDIA_PATH=/path/to/media |
Development, local deployments |
| Databricks | STORAGE_BACKEND=databricks |
Databricks Apps with Unity Catalog Volumes |
The Databricks backend supports per-user storage isolation via the user_context module, enabling multi-tenant deployments where each user's media is stored under their own volume prefix.
See CLAUDE.md for full configuration details.
Performance
Fully asynchronous architecture with proven scalability:
- ✅ 32+ concurrent operations verified
- ✅ 8-10x speedup for parallel tasks
- ✅ Non-blocking I/O with
aiofiles+anyio - ✅ Python 3.14 free-threading ready
See docs/async-optimizations.md for technical details.
License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.