Cerebras Multi-Model MCP Server
Use multiple Cerebras models from Claude Desktop & Claude Code with automatic model selection.
README
π§ Cerebras Multi-Model MCP Server
Use multiple Cerebras models from Claude Desktop & Claude Code β with automatic model selection.
The Problem
The official Cerebras MCP package only supports one model per session β you pick a model via an environment variable, and you're stuck with it until you restart. Want to use the fast 8B model for boilerplate and the 357B model for complex reasoning? You'd need two separate MCP server configs.
The Solution
cerebras-multi-mcp exposes 5 tools β one for each Cerebras model plus an auto-selector β so you (or Claude) can pick the right model per task, in the same session, with zero restarts.
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Claude Desktop / Code β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β cerebras_quick β llama3.1-8b (8B) β
β cerebras_complex β gpt-oss-120b (120B) β
β cerebras_reasoning β zai-glm-4.7 (357B) β
β cerebras_instruct β qwen-3-235b (235B) β
β cerebras_auto β picks the best one β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Cerebras API ββ OpenRouter Fallback β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Models
| Tool | Model | Params | Best For |
|---|---|---|---|
cerebras_quick |
llama3.1-8b | 8B | Simple edits, boilerplate, single functions. Fastest. |
cerebras_complex |
gpt-oss-120b | 120B | Multi-file features, CRUD APIs, complex components. |
cerebras_reasoning |
zai-glm-4.7 | 357B | Algorithms, architecture, advanced logic, deep reasoning. |
cerebras_instruct |
qwen-3-235b | 235B | Precise instructions, documentation, typed interfaces, specs. |
cerebras_auto |
auto-selected | β | Analyzes your prompt and picks the best model automatically. |
Auto-Selection Logic
cerebras_auto analyzes your prompt keywords and complexity:
- Reasoning keywords (algorithm, optimize, recursive, big-oβ¦) β 357B
- Instruct keywords (document, jsdoc, schema, openapiβ¦) β 235B
- Complex keywords (crud, rest api, multi-file, databaseβ¦) β 120B
- Everything else or short prompts β 8B (fastest)
Installation
Prerequisites
- Node.js 18+
- A Cerebras API key (free tier available)
- (Optional) An OpenRouter API key for fallback
Setup
git clone https://github.com/khansabassem/cerebras-multi-mcp.git
cd cerebras-multi-mcp
npm install
Configuration
Claude Desktop
Edit your Claude Desktop config file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
Add the cerebras-multi entry:
{
"mcpServers": {
"cerebras-multi": {
"command": "node",
"args": ["<path-to>/cerebras-multi-mcp/src/index.js"],
"env": {
"CEREBRAS_API_KEY": "your-cerebras-api-key",
"OPENROUTER_API_KEY": "your-openrouter-api-key"
}
}
}
}
Restart Claude Desktop to load the new server.
Claude Code
claude mcp add cerebras-multi \
-e CEREBRAS_API_KEY=your-cerebras-api-key \
-e OPENROUTER_API_KEY=your-openrouter-api-key \
-- node /path/to/cerebras-multi-mcp/src/index.js
Usage
Once configured, you'll see 5 new tools in Claude. Each tool accepts:
| Parameter | Required | Description |
|---|---|---|
file_path |
Yes | Absolute path to the file to create or modify |
prompt |
Yes | Detailed code generation instructions |
context_files |
No | Array of file paths to read as context |
temperature |
No | Sampling temperature (default: 0.1) |
max_tokens |
No | Maximum tokens in the response |
Examples
Quick boilerplate with the 8B model:
Tool: cerebras_quick
file_path: /project/src/server.js
prompt: Create an Express server with health check endpoint on port 3000
Complex feature with the 120B model:
Tool: cerebras_complex
file_path: /project/src/auth/middleware.ts
prompt: Create JWT authentication middleware with refresh token rotation
context_files: ["/project/src/types/auth.ts", "/project/src/config/env.ts"]
Algorithm design with the 357B model:
Tool: cerebras_reasoning
file_path: /project/src/utils/graph.ts
prompt: Implement Dijkstra's shortest path with a priority queue, supporting weighted directed graphs
Documentation with the 235B model:
Tool: cerebras_instruct
file_path: /project/src/types/api.ts
prompt: Generate TypeScript interfaces for a REST API with OpenAPI-compatible JSDoc annotations
Let the server decide:
Tool: cerebras_auto
file_path: /project/src/cache.ts
prompt: Build an LRU cache with O(1) get and put using a doubly linked list
Features
- Per-call model selection β no restarts, no env var juggling
- Auto-select mode β keyword analysis picks the right model for you
- OpenRouter fallback β if Cerebras is unavailable, requests fall through to OpenRouter
- Smart file handling β reads existing files for context when editing, creates directories as needed
- Diff summaries β shows additions/removals when updating existing files
- Code cleaning β strips markdown fences from model output automatically
- Context files β pass related files for cross-file awareness
Architecture
src/index.js β Single-file MCP server (~350 lines)
βββ Config β Model definitions, keyword lists, language detection
βββ File helpers β Safe read/write with path resolution
βββ HTTP layer β Cerebras API + OpenRouter fallback
βββ Auto-selector β Keyword-based model routing
βββ Tool handler β Unified handler for all 5 tools
βββ MCP server β ListTools + CallTool with schema factory
Built with @modelcontextprotocol/sdk using stdio transport.
Why Cerebras?
Cerebras inference runs on purpose-built wafer-scale hardware, delivering up to 20x faster inference than traditional GPU setups. Combined with MCP, you get near-instant code generation directly inside Claude.
Author
Bassem EL KHANSAA β @ask.bassem
License
MIT
Contributing
Issues and PRs welcome. If you add a new model, just extend the MODELS object and add a tool entry in the ListToolsRequestSchema handler.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.