MCP Servers

mcp-multi-model

One MCP server that routes to 12+ AI providers for text, image, and video generation, with smart task delegation, web search, and model comparison—all from the terminal.

README

mcp-multi-model

Give Claude Code superpowers — image gen, video gen, web search, and smart multi-model routing.

One MCP server. All the models you need. Zero tab-switching.

demo

npx mcp-multi-model

If you find this useful, please give it a ⭐ — it helps others discover the project!

What can it do?

🎨 Generate images and videos — right in the terminal

"Generate a macOS app icon with a glowing indigo orb"

Claude calls Imagen 4 / GPT Image / Nano Banana, saves the PNG, and opens it. No browser, no Figma, no context switch.

Video too — Veo 3.1 generates short clips from a text prompt.

🧠 Smart routing — the right model for the job

Need reasoning / agentic coding → it routes to OpenAI GPT-5 / o-series (auto-handles max_completion_tokens, skips temperature where unsupported). Tell Claude to research something → it routes to Gemini (Google Search grounding). Ask it to write code cheaply → it routes to DeepSeek (fast, cheap, great at code). Need real-time info in Chinese → it routes to Kimi (web search).

You don't pick the model. The routing does it for you.

⚖️ Compare models side by side

"Ask both DeepSeek and Gemini how to implement a B-tree"

Two answers, one terminal. See which model gives you a better solution.

🌐 Web search built in

Gemini uses Google Search grounding. Kimi searches the Chinese web. No separate browser-use MCP needed.

🔧 One-line install

{
  "mcpServers": {
    "multi-model": {
      "command": "npx",
      "args": ["-y", "mcp-multi-model"],
      "env": {
        "DEEPSEEK_API_KEY": "sk-...",
        "GEMINI_API_KEY": "AI..."
      }
    }
  }
}

That's it. No git clone, no build step.

Supported Models

12+ providers preconfigured in config.example.yaml. Models without an API key are skipped automatically.

Provider	Adapter	Why use it
OpenAI	`openai`	GPT-5 / GPT-5.5 reasoning, o1 / o3 / o4 series, GPT Image. Reasoning param handling is automatic (`max_completion_tokens`, temperature skipped where unsupported).
Gemini	`gemini`	Long context, Google Search grounding. Image (Imagen 4 Fast / Ultra, Nano Banana 2) and video (Veo 3.1) generation built in.
DeepSeek	`openai`	Code, math, logic — extremely low cost
Kimi (Moonshot)	`openai`	Chinese web search, real-time info, tool-calling loop
Grok (xAI)	`openai`	Real-time X/Twitter context, reasoning
Perplexity	`openai`	Sonar models with built-in web search and citations
Anthropic (via OpenRouter)	`openai`	Claude models routed through OpenRouter
Mistral / Groq / Qwen / GLM / Together	`openai`	EU AI, ultra-fast inference, Chinese-native, open-source aggregators
Ollama / LM Studio / llama.cpp / vLLM	`openai`	Local — no API key, no cost, full privacy

Adding a new model is one block in config.yaml — see Configuration.

MCP Tools

Tools are dynamically generated from your config. With the default setup:

Tool	What it does
`ask_ai`	Query any model — unified entry with `temperature` / `top_p` control
`ask_deepseek`	Query DeepSeek directly
`ask_gemini`	Query Gemini directly
`ask_kimi`	Query Kimi directly
`ask_all`	Query all models in parallel, compare results
`ask_both`	Query any two models in parallel
`delegate`	Smart routing — auto-picks the best model for the task
`generate_image`	Text → image via Gemini Imagen
`generate_video`	Text → video via Gemini Veo
`translate`	CN ↔ EN translation
`research`	Deep research with web search
`check_health`	Ping all models, report status and latency

Installation

Option 1: npx (recommended)

Add to your Claude Code MCP config (~/.mcp.json):

{
  "mcpServers": {
    "multi-model": {
      "command": "npx",
      "args": ["-y", "mcp-multi-model"],
      "env": {
        "DEEPSEEK_API_KEY": "sk-...",
        "GEMINI_API_KEY": "AI..."
      }
    }
  }
}

Option 2: Clone and run locally

git clone https://github.com/K1vin1906/mcp-multi-model.git
cd mcp-multi-model
npm install
npm run setup   # Interactive setup wizard — validates your API keys

Then add to your MCP config:

{
  "mcpServers": {
    "multi-model": {
      "command": "node",
      "args": ["/path/to/mcp-multi-model/index.js"]
    }
  }
}

API keys can be set via env in the config above, or in a .env file in the project directory.

Configuration

cp config.example.yaml config.yaml

defaults:
  max_tokens: 4000
  temperature: 0.7
  timeout_ms: 60000
  max_retries: 2
  # cache_ttl_ms: 300000   # Cache identical prompts for 5 min
  # daily_budget_usd: 5.0  # Daily spending limit in USD

models:
  deepseek:
    name: DeepSeek
    adapter: openai
    endpoint: https://api.deepseek.com/chat/completions
    api_key_env: DEEPSEEK_API_KEY
    model: deepseek-chat
    description: "Code, math, logic. Low cost."
    fallback_to: gemini
    pricing:
      input: 0.14    # $/M tokens
      output: 0.28

  gemini:
    name: Gemini
    adapter: gemini
    endpoint: https://generativelanguage.googleapis.com/v1beta
    api_key_env: GEMINI_API_KEY
    model: gemini-2.5-flash-preview-04-17
    description: "Long context, broad knowledge, Google Search."
    features:
      - google_search
    pricing:
      input: 0.10
      output: 0.40

  # Local models — no API key needed:
  # ollama:
  #   name: Ollama
  #   adapter: openai
  #   endpoint: http://localhost:11434/v1/chat/completions
  #   model: llama3.2

Image Generation

Two endpoint families are routed automatically based on the model ID:

Gemini family (uses `GEMINI_API_KEY`)

Model ID	Endpoint	Notes
`imagen-4-fast`	`:predict`	Default, ~$0.02/image
`imagen-4-ultra`	`:predict`	2K quality, ~$0.06/image
`gemini-2.5-flash-image` (Nano Banana)	`:generateContent`	Fast (~3s), 2,000 RPM free tier
`gemini-3-pro-image-preview` (Nano Banana 2)	`:generateContent`	High quality, 500 RPM

OpenAI family (uses `OPENAI_API_KEY`)

Model ID	Endpoint	Notes
`gpt-image-2`	`/v1/images/generations`	Best text rendering. Requires OpenAI org verification.

Supports aspect_ratio: 1:1, 3:2, 4:3, 16:9, 9:16. quality and size forwarded to OpenAI image endpoints.

Video Generation

Generate short video clips using Gemini Veo 3.1 (uses GEMINI_API_KEY).

Parameter	Type	Notes
`prompt`	string	Text description of the desired video
`aspect_ratio`	`16:9` / `9:16` / `1:1`
`duration`	`4` / `6` / `8` (seconds)	Must be even — Veo only accepts even durations
`save_path`	string?	Defaults to `/tmp/mcp-media/videos/`

Local Models

Any OpenAI-compatible local runner works — Ollama, LM Studio, llama.cpp, vLLM:

models:
  ollama:
    name: Ollama
    adapter: openai
    endpoint: http://localhost:11434/v1/chat/completions
    model: llama3.2

Mix local and cloud models freely — use ask_all to compare Ollama vs DeepSeek vs Gemini in one call.

Built-in Features

Auto-retry & fallback — Exponential backoff on 429/5xx, automatic fallback to backup model
Conversation history — Multi-turn context with conversation_id (30min expiry, up to 10 turns)
Cost tracking — Per-call token usage and cost estimation
Response caching — Cache identical prompts with configurable TTL
Daily budget limit — Set a spending cap; calls are blocked when exceeded
Streaming — Real-time SSE streaming for all adapters

Privacy

This is a local relay. No telemetry, no analytics, no data sent to the extension author. Prompts go directly from your machine to the LLM provider you configured.

Full policy: k1vin1906.github.io/mcp-multi-model/privacy.html

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

mcp-multi-model

README

mcp-multi-model

What can it do?

🎨 Generate images and videos — right in the terminal

🧠 Smart routing — the right model for the job

⚖️ Compare models side by side

🌐 Web search built in

🔧 One-line install

Supported Models

MCP Tools

Installation

Option 1: npx (recommended)

Option 2: Clone and run locally

Configuration

Image Generation

Gemini family (uses GEMINI_API_KEY)

OpenAI family (uses OPENAI_API_KEY)

Video Generation

Local Models

Built-in Features

Privacy

License

Recommended Servers

Gemini family (uses `GEMINI_API_KEY`)

OpenAI family (uses `OPENAI_API_KEY`)