Cerebras Multi-Model MCP Server

Cerebras Multi-Model MCP Server

Use multiple Cerebras models from Claude Desktop & Claude Code with automatic model selection.

Category
Visit Server

README

🧠 Cerebras Multi-Model MCP Server

Use multiple Cerebras models from Claude Desktop & Claude Code β€” with automatic model selection.

MCP Compatible Node.js License: MIT


The Problem

The official Cerebras MCP package only supports one model per session β€” you pick a model via an environment variable, and you're stuck with it until you restart. Want to use the fast 8B model for boilerplate and the 357B model for complex reasoning? You'd need two separate MCP server configs.

The Solution

cerebras-multi-mcp exposes 5 tools β€” one for each Cerebras model plus an auto-selector β€” so you (or Claude) can pick the right model per task, in the same session, with zero restarts.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Claude Desktop / Code               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                  β”‚
β”‚  cerebras_quick     β†’ llama3.1-8b      (8B)     β”‚
β”‚  cerebras_complex   β†’ gpt-oss-120b     (120B)   β”‚
β”‚  cerebras_reasoning β†’ zai-glm-4.7      (357B)   β”‚
β”‚  cerebras_instruct  β†’ qwen-3-235b      (235B)   β”‚
β”‚  cerebras_auto      β†’ picks the best one        β”‚
β”‚                                                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚         Cerebras API  ←→  OpenRouter Fallback    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Models

Tool Model Params Best For
cerebras_quick llama3.1-8b 8B Simple edits, boilerplate, single functions. Fastest.
cerebras_complex gpt-oss-120b 120B Multi-file features, CRUD APIs, complex components.
cerebras_reasoning zai-glm-4.7 357B Algorithms, architecture, advanced logic, deep reasoning.
cerebras_instruct qwen-3-235b 235B Precise instructions, documentation, typed interfaces, specs.
cerebras_auto auto-selected β€” Analyzes your prompt and picks the best model automatically.

Auto-Selection Logic

cerebras_auto analyzes your prompt keywords and complexity:

  • Reasoning keywords (algorithm, optimize, recursive, big-o…) β†’ 357B
  • Instruct keywords (document, jsdoc, schema, openapi…) β†’ 235B
  • Complex keywords (crud, rest api, multi-file, database…) β†’ 120B
  • Everything else or short prompts β†’ 8B (fastest)

Installation

Prerequisites

Setup

git clone https://github.com/khansabassem/cerebras-multi-mcp.git
cd cerebras-multi-mcp
npm install

Configuration

Claude Desktop

Edit your Claude Desktop config file:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

Add the cerebras-multi entry:

{
  "mcpServers": {
    "cerebras-multi": {
      "command": "node",
      "args": ["<path-to>/cerebras-multi-mcp/src/index.js"],
      "env": {
        "CEREBRAS_API_KEY": "your-cerebras-api-key",
        "OPENROUTER_API_KEY": "your-openrouter-api-key"
      }
    }
  }
}

Restart Claude Desktop to load the new server.

Claude Code

claude mcp add cerebras-multi \
  -e CEREBRAS_API_KEY=your-cerebras-api-key \
  -e OPENROUTER_API_KEY=your-openrouter-api-key \
  -- node /path/to/cerebras-multi-mcp/src/index.js

Usage

Once configured, you'll see 5 new tools in Claude. Each tool accepts:

Parameter Required Description
file_path Yes Absolute path to the file to create or modify
prompt Yes Detailed code generation instructions
context_files No Array of file paths to read as context
temperature No Sampling temperature (default: 0.1)
max_tokens No Maximum tokens in the response

Examples

Quick boilerplate with the 8B model:

Tool: cerebras_quick
file_path: /project/src/server.js
prompt: Create an Express server with health check endpoint on port 3000

Complex feature with the 120B model:

Tool: cerebras_complex
file_path: /project/src/auth/middleware.ts
prompt: Create JWT authentication middleware with refresh token rotation
context_files: ["/project/src/types/auth.ts", "/project/src/config/env.ts"]

Algorithm design with the 357B model:

Tool: cerebras_reasoning
file_path: /project/src/utils/graph.ts
prompt: Implement Dijkstra's shortest path with a priority queue, supporting weighted directed graphs

Documentation with the 235B model:

Tool: cerebras_instruct
file_path: /project/src/types/api.ts
prompt: Generate TypeScript interfaces for a REST API with OpenAPI-compatible JSDoc annotations

Let the server decide:

Tool: cerebras_auto
file_path: /project/src/cache.ts
prompt: Build an LRU cache with O(1) get and put using a doubly linked list

Features

  • Per-call model selection β€” no restarts, no env var juggling
  • Auto-select mode β€” keyword analysis picks the right model for you
  • OpenRouter fallback β€” if Cerebras is unavailable, requests fall through to OpenRouter
  • Smart file handling β€” reads existing files for context when editing, creates directories as needed
  • Diff summaries β€” shows additions/removals when updating existing files
  • Code cleaning β€” strips markdown fences from model output automatically
  • Context files β€” pass related files for cross-file awareness

Architecture

src/index.js          β€” Single-file MCP server (~350 lines)
β”œβ”€β”€ Config            β€” Model definitions, keyword lists, language detection
β”œβ”€β”€ File helpers      β€” Safe read/write with path resolution
β”œβ”€β”€ HTTP layer        β€” Cerebras API + OpenRouter fallback
β”œβ”€β”€ Auto-selector     β€” Keyword-based model routing
β”œβ”€β”€ Tool handler      β€” Unified handler for all 5 tools
└── MCP server        β€” ListTools + CallTool with schema factory

Built with @modelcontextprotocol/sdk using stdio transport.


Why Cerebras?

Cerebras inference runs on purpose-built wafer-scale hardware, delivering up to 20x faster inference than traditional GPU setups. Combined with MCP, you get near-instant code generation directly inside Claude.


Author

Bassem EL KHANSAA β€” @ask.bassem

GitHub Instagram LinkedIn Website


License

MIT


Contributing

Issues and PRs welcome. If you add a new model, just extend the MODELS object and add a tool entry in the ListToolsRequestSchema handler.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured