mcp-multi-model
One MCP server that routes to 12+ AI providers for text, image, and video generation, with smart task delegation, web search, and model comparison—all from the terminal.
README
mcp-multi-model
Give Claude Code superpowers — image gen, video gen, web search, and smart multi-model routing.
One MCP server. All the models you need. Zero tab-switching.

npx mcp-multi-model
If you find this useful, please give it a ⭐ — it helps others discover the project!
What can it do?
🎨 Generate images and videos — right in the terminal
"Generate a macOS app icon with a glowing indigo orb"
Claude calls Imagen 4 / GPT Image / Nano Banana, saves the PNG, and opens it. No browser, no Figma, no context switch.
Video too — Veo 3.1 generates short clips from a text prompt.
🧠 Smart routing — the right model for the job
Need reasoning / agentic coding → it routes to OpenAI GPT-5 / o-series (auto-handles max_completion_tokens, skips temperature where unsupported).
Tell Claude to research something → it routes to Gemini (Google Search grounding).
Ask it to write code cheaply → it routes to DeepSeek (fast, cheap, great at code).
Need real-time info in Chinese → it routes to Kimi (web search).
You don't pick the model. The routing does it for you.
⚖️ Compare models side by side
"Ask both DeepSeek and Gemini how to implement a B-tree"
Two answers, one terminal. See which model gives you a better solution.
🌐 Web search built in
Gemini uses Google Search grounding. Kimi searches the Chinese web. No separate browser-use MCP needed.
🔧 One-line install
{
"mcpServers": {
"multi-model": {
"command": "npx",
"args": ["-y", "mcp-multi-model"],
"env": {
"DEEPSEEK_API_KEY": "sk-...",
"GEMINI_API_KEY": "AI..."
}
}
}
}
That's it. No git clone, no build step.
Supported Models
12+ providers preconfigured in config.example.yaml. Models without an API key are skipped automatically.
| Provider | Adapter | Why use it |
|---|---|---|
| OpenAI | openai |
GPT-5 / GPT-5.5 reasoning, o1 / o3 / o4 series, GPT Image. Reasoning param handling is automatic (max_completion_tokens, temperature skipped where unsupported). |
| Gemini | gemini |
Long context, Google Search grounding. Image (Imagen 4 Fast / Ultra, Nano Banana 2) and video (Veo 3.1) generation built in. |
| DeepSeek | openai |
Code, math, logic — extremely low cost |
| Kimi (Moonshot) | openai |
Chinese web search, real-time info, tool-calling loop |
| Grok (xAI) | openai |
Real-time X/Twitter context, reasoning |
| Perplexity | openai |
Sonar models with built-in web search and citations |
| Anthropic (via OpenRouter) | openai |
Claude models routed through OpenRouter |
| Mistral / Groq / Qwen / GLM / Together | openai |
EU AI, ultra-fast inference, Chinese-native, open-source aggregators |
| Ollama / LM Studio / llama.cpp / vLLM | openai |
Local — no API key, no cost, full privacy |
Adding a new model is one block in config.yaml — see Configuration.
MCP Tools
Tools are dynamically generated from your config. With the default setup:
| Tool | What it does |
|---|---|
ask_ai |
Query any model — unified entry with temperature / top_p control |
ask_deepseek |
Query DeepSeek directly |
ask_gemini |
Query Gemini directly |
ask_kimi |
Query Kimi directly |
ask_all |
Query all models in parallel, compare results |
ask_both |
Query any two models in parallel |
delegate |
Smart routing — auto-picks the best model for the task |
generate_image |
Text → image via Gemini Imagen |
generate_video |
Text → video via Gemini Veo |
translate |
CN ↔ EN translation |
research |
Deep research with web search |
check_health |
Ping all models, report status and latency |
Installation
Option 1: npx (recommended)
Add to your Claude Code MCP config (~/.mcp.json):
{
"mcpServers": {
"multi-model": {
"command": "npx",
"args": ["-y", "mcp-multi-model"],
"env": {
"DEEPSEEK_API_KEY": "sk-...",
"GEMINI_API_KEY": "AI..."
}
}
}
}
Option 2: Clone and run locally
git clone https://github.com/K1vin1906/mcp-multi-model.git
cd mcp-multi-model
npm install
npm run setup # Interactive setup wizard — validates your API keys
Then add to your MCP config:
{
"mcpServers": {
"multi-model": {
"command": "node",
"args": ["/path/to/mcp-multi-model/index.js"]
}
}
}
API keys can be set via
envin the config above, or in a.envfile in the project directory.
Configuration
cp config.example.yaml config.yaml
defaults:
max_tokens: 4000
temperature: 0.7
timeout_ms: 60000
max_retries: 2
# cache_ttl_ms: 300000 # Cache identical prompts for 5 min
# daily_budget_usd: 5.0 # Daily spending limit in USD
models:
deepseek:
name: DeepSeek
adapter: openai
endpoint: https://api.deepseek.com/chat/completions
api_key_env: DEEPSEEK_API_KEY
model: deepseek-chat
description: "Code, math, logic. Low cost."
fallback_to: gemini
pricing:
input: 0.14 # $/M tokens
output: 0.28
gemini:
name: Gemini
adapter: gemini
endpoint: https://generativelanguage.googleapis.com/v1beta
api_key_env: GEMINI_API_KEY
model: gemini-2.5-flash-preview-04-17
description: "Long context, broad knowledge, Google Search."
features:
- google_search
pricing:
input: 0.10
output: 0.40
# Local models — no API key needed:
# ollama:
# name: Ollama
# adapter: openai
# endpoint: http://localhost:11434/v1/chat/completions
# model: llama3.2
Image Generation
Two endpoint families are routed automatically based on the model ID:
Gemini family (uses GEMINI_API_KEY)
| Model ID | Endpoint | Notes |
|---|---|---|
imagen-4-fast |
:predict |
Default, ~$0.02/image |
imagen-4-ultra |
:predict |
2K quality, ~$0.06/image |
gemini-2.5-flash-image (Nano Banana) |
:generateContent |
Fast (~3s), 2,000 RPM free tier |
gemini-3-pro-image-preview (Nano Banana 2) |
:generateContent |
High quality, 500 RPM |
OpenAI family (uses OPENAI_API_KEY)
| Model ID | Endpoint | Notes |
|---|---|---|
gpt-image-2 |
/v1/images/generations |
Best text rendering. Requires OpenAI org verification. |
Supports aspect_ratio: 1:1, 3:2, 4:3, 16:9, 9:16. quality and size forwarded to OpenAI image endpoints.
Video Generation
Generate short video clips using Gemini Veo 3.1 (uses GEMINI_API_KEY).
| Parameter | Type | Notes |
|---|---|---|
prompt |
string | Text description of the desired video |
aspect_ratio |
16:9 / 9:16 / 1:1 |
|
duration |
4 / 6 / 8 (seconds) |
Must be even — Veo only accepts even durations |
save_path |
string? | Defaults to /tmp/mcp-media/videos/ |
Local Models
Any OpenAI-compatible local runner works — Ollama, LM Studio, llama.cpp, vLLM:
models:
ollama:
name: Ollama
adapter: openai
endpoint: http://localhost:11434/v1/chat/completions
model: llama3.2
Mix local and cloud models freely — use ask_all to compare Ollama vs DeepSeek vs Gemini in one call.
Built-in Features
- Auto-retry & fallback — Exponential backoff on 429/5xx, automatic fallback to backup model
- Conversation history — Multi-turn context with
conversation_id(30min expiry, up to 10 turns) - Cost tracking — Per-call token usage and cost estimation
- Response caching — Cache identical prompts with configurable TTL
- Daily budget limit — Set a spending cap; calls are blocked when exceeded
- Streaming — Real-time SSE streaming for all adapters
Privacy
This is a local relay. No telemetry, no analytics, no data sent to the extension author. Prompts go directly from your machine to the LLM provider you configured.
Full policy: k1vin1906.github.io/mcp-multi-model/privacy.html
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.