ComputeGauge MCP
Provides cost intelligence and a reputation scoring system to help AI agents optimize spending through smart model selection and local-to-cloud routing. It enables real-time cost tracking and rewards agents for making efficient, high-credibility decisions across various LLM providers.
README
@computegauge/mcp
The cost intelligence + agent credibility layer for AI agents. Install once, every session is cost-aware and credibility-building.
npm install -g @computegauge/mcp
Why This Exists
AI agents are powerful but expensive. A single Claude Code session can cost $5-50+. A multi-agent workflow can burn through $100 in minutes. And nobody tells the agent — or the human — until the bill arrives.
ComputeGauge MCP makes agents cost-aware AND gives them a credibility score. Any MCP-compatible agent (Claude, Cursor, Windsurf, custom agents) gets:
- Cost Intelligence — Know which model is optimal for every task, track spend in real-time
- Agent Credibility — Build a reputation score (0-1000) by making smart decisions. Compete on a leaderboard.
- Local→Cloud Routing — Detect when local inference isn't good enough, route to cloud, earn credibility for smart routing
The result: agents that spend 40-70% less, build visible credibility, and know when to route to cloud.
Setup — 30 Seconds
Claude Desktop / Claude Code
Add to ~/.config/claude/claude_desktop_config.json:
{
"mcpServers": {
"computegauge": {
"command": "npx",
"args": ["-y", "@computegauge/mcp"]
}
}
}
That's it. Restart Claude. Every conversation now has cost intelligence + credibility tracking.
With Provider API Keys (Enhanced)
{
"mcpServers": {
"computegauge": {
"command": "npx",
"args": ["-y", "@computegauge/mcp"],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-...",
"OPENAI_API_KEY": "sk-...",
"COMPUTEGAUGE_BUDGET_TOTAL": "50"
}
}
}
}
With Local Inference (Ollama, vLLM, etc.)
{
"mcpServers": {
"computegauge": {
"command": "npx",
"args": ["-y", "@computegauge/mcp"],
"env": {
"OLLAMA_HOST": "http://localhost:11434",
"OLLAMA_MODELS": "llama3.3:70b,qwen2.5:7b,deepseek-r1:14b",
"ANTHROPIC_API_KEY": "sk-ant-...",
"COMPUTEGAUGE_BUDGET_TOTAL": "50"
}
}
}
}
Cursor
Add to Cursor MCP settings:
{
"computegauge": {
"command": "npx",
"args": ["-y", "@computegauge/mcp"]
}
}
Tools Reference
Agent-Native Tools (use automatically every session)
| Tool | When to Call | What It Does | Credibility |
|---|---|---|---|
pick_model |
Before any API request | Returns the optimal model for a task | +8 Routing Intelligence |
log_request |
After any API request | Logs the request cost | +3 Honest Reporting |
session_cost |
Every 5-10 requests | Shows cumulative cost and budget | — |
rate_recommendation |
After completing a task | Rate how well the model performed | +5 Quality Contribution |
model_ratings |
When curious about quality | View model quality leaderboard | — |
improvement_cycle |
At session end | Run continuous improvement engine | +15 Quality Contribution |
integrity_report |
For transparency | View rating acceptance/rejection stats | — |
Credibility Tools (the reputation protocol)
| Tool | When to Call | What It Does | Credibility |
|---|---|---|---|
credibility_profile |
Anytime | View your 0-1000 credibility score, tier, badges | — |
credibility_leaderboard |
To compete | See how you rank vs other agents | — |
route_to_cloud |
After local→cloud routing | Report smart routing decision | +70 Cloud Routing |
assess_routing |
Before choosing local vs cloud | Should this task stay local? | — |
cluster_status |
To check local capabilities | View local endpoints, models, hardware | — |
Intelligence Tools (for user questions)
| Tool | Description |
|---|---|
get_spend_summary |
User's total AI spend across all providers |
get_budget_status |
Budget utilization and alerts |
get_model_pricing |
Current pricing for any model |
get_cost_comparison |
Compare costs for specific workloads |
suggest_savings |
Actionable cost optimization recommendations |
get_usage_trend |
Spend trends and anomaly detection |
Resources
| Resource | URI | Description |
|---|---|---|
| Config | computegauge://config |
Current server configuration |
| Session | computegauge://session |
Real-time session cost data |
| Ratings | computegauge://ratings |
Model quality leaderboard |
| Credibility | computegauge://credibility |
Agent credibility profile + leaderboard |
| Cluster | computegauge://cluster |
Local inference cluster status |
| Quickstart | computegauge://quickstart |
Agent onboarding guide |
Prompts
| Prompt | Description |
|---|---|
cost_aware_system |
System prompt that makes any agent cost-aware + credibility-building |
daily_cost_report |
Generate a quick daily cost report |
optimize_workflow |
Analyze and optimize a described AI workflow |
Agent Credibility System
Every smart decision earns credibility points on a 0-1000 scale:
| Category | How to Earn | Points |
|---|---|---|
| 🧠 Routing Intelligence | Using pick_model wisely, avoiding overspec |
+8 to +15 per event |
| 💰 Cost Efficiency | Staying under budget, significant savings | +5 to +30 per event |
| ✅ Task Success | Completing tasks successfully | +10 to +25 per event |
| 📊 Honest Reporting | Logging requests, reporting failures honestly | +3 to +10 per event |
| ☁️ Cloud Routing | Smart local→cloud routing via ComputeGauge | +25 to +70 per event |
| ⭐ Quality Contribution | Rating models, running improvement cycles | +5 to +15 per event |
Credibility Tiers
| Tier | Score | What It Means |
|---|---|---|
| ⚪ Unrated | 0-99 | Just getting started |
| 🥉 Bronze | 100-299 | Learning the ropes |
| 🥈 Silver | 300-499 | Competent and cost-aware |
| 🥇 Gold | 500-699 | Skilled optimizer |
| 💎 Platinum | 700-849 | Elite decision-maker |
| 👑 Diamond | 850-1000 | Best in class |
Earnable Badges
| Badge | How to Earn |
|---|---|
| 🌱 First Steps | Complete first session |
| 💰 Cost Optimizer | Save >$10 through smart model selection |
| 📊 Transparency Champion | Log 50+ requests accurately |
| ☁️ Smart Router | Successfully route 10+ tasks to cloud |
| ⭐ Quality Pioneer | Submit 25+ model ratings |
| 🔥 Streak Master | 20+ consecutive successful tasks |
| 🥇 Gold Agent | Reach Gold tier (500+ score) |
| 💎 Platinum Agent | Reach Platinum tier (700+ score) |
| 👑 Diamond Agent | Reach Diamond tier (850+ score) |
| 🌐 Hybrid Intelligence | Use both local and cloud models in one session |
Local Cluster Integration
ComputeGauge auto-detects local inference endpoints:
| Platform | Environment Variable | Default |
|---|---|---|
| Ollama | OLLAMA_HOST |
http://localhost:11434 |
| vLLM | VLLM_HOST |
— |
| llama.cpp | LLAMACPP_HOST |
— |
| TGI | TGI_HOST |
— |
| LocalAI | LOCALAI_HOST |
— |
| Custom | LOCAL_LLM_ENDPOINT |
— |
Set OLLAMA_MODELS="llama3.3:70b,qwen2.5:7b" (comma-separated) to declare available models.
The Local→Cloud Routing Flow
1. Agent calls assess_routing("code_generation", quality="good")
2. ComputeGauge checks: local llama3.3:70b quality for code_generation = 80/100
3. "Good" quality threshold = 78 → Local model is sufficient!
4. Agent uses local model → saves money → earns credibility for honest assessment
OR:
1. Agent calls assess_routing("complex_reasoning", quality="excellent")
2. ComputeGauge checks: local llama3.3:70b quality for complex_reasoning = 78/100
3. "Excellent" quality threshold = 88 → Quality gap of 10 points → Route to cloud!
4. Agent calls pick_model → gets Claude Sonnet 4 → executes → calls route_to_cloud
5. Agent earns +70 credibility points for smart routing decision
How pick_model Works
The decision engine scores every model across three dimensions:
Quality — Per-task-type scores for 14 task types Cost — Real pricing from 8 providers, 20+ models, calculated per-call (log-scale normalization) Speed — Relative inference speed scores
| Priority | Quality | Cost | Speed |
|---|---|---|---|
cheapest |
20% | 70% | 10% |
balanced |
45% | 35% | 20% |
best_quality |
70% | 10% | 20% |
fastest |
25% | 15% | 60% |
Model Coverage
| Provider | Models | Tier Range |
|---|---|---|
| Anthropic | Claude Opus 4, Sonnet 4, Sonnet 3.5, Haiku 3.5 | Frontier → Budget |
| OpenAI | o1, GPT-4o, o3-mini, GPT-4o-mini | Frontier → Budget |
| Gemini 2.0 Pro, 1.5 Pro, 2.0 Flash | Premium → Budget | |
| DeepSeek | Reasoner, Chat | Value → Budget |
| Groq | Llama 3.3 70B, Llama 3.1 8B | Value → Budget |
| Together | Llama 3.3 70B Turbo, Qwen 2.5 72B | Value |
| Mistral | Large, Small | Premium → Budget |
Local Models Supported
| Model | Quality (general) | Best For |
|---|---|---|
| llama3.3:70b | 79/100 | General tasks, code |
| qwen2.5:72b | 81/100 | Code, math, translation |
| deepseek-r1:70b | 80/100 | Reasoning, math, code |
| deepseek-r1:14b | 68/100 | Budget reasoning |
| phi3:14b | 60/100 | Simple tasks |
| llama3.1:8b | 58/100 | Classification, simple QA |
| mistral:7b | 58/100 | Simple tasks |
Environment Variables
| Variable | Required | Description |
|---|---|---|
COMPUTEGAUGE_DASHBOARD_URL |
No | URL of ComputeGauge dashboard |
COMPUTEGAUGE_API_KEY |
No | API key for dashboard access |
COMPUTEGAUGE_BUDGET_TOTAL |
No | Session budget limit in USD |
COMPUTEGAUGE_BUDGET_ANTHROPIC |
No | Per-provider monthly budget |
COMPUTEGAUGE_BUDGET_OPENAI |
No | Per-provider monthly budget |
ANTHROPIC_API_KEY |
No | Enables Anthropic provider detection |
OPENAI_API_KEY |
No | Enables OpenAI provider detection |
GOOGLE_API_KEY |
No | Enables Google provider detection |
OLLAMA_HOST |
No | Ollama inference endpoint |
OLLAMA_MODELS |
No | Comma-separated local model names |
VLLM_HOST |
No | vLLM inference endpoint |
COMPUTEGAUGE_GPU |
No | GPU name for hardware detection |
COMPUTEGAUGE_VRAM_GB |
No | VRAM in GB |
COMPUTEGAUGE_COST_PER_HOUR |
No | Amortized hardware cost/hr |
For Agent Developers
If you're building AI agents (via Claude Agent SDK, LangChain, CrewAI, AutoGen, etc.), ComputeGauge MCP is the easiest way to add cost awareness AND agent credibility:
- Zero integration effort — Just add the MCP server to your agent's config
- No code changes — The agent discovers 18 tools via MCP protocol automatically
- Immediate value —
pick_modelreturns recommendations on first call, credibility tracking starts automatically - Session tracking built-in — Full cost visibility per agent run
- Credibility system — Your agent earns a visible reputation score that users can see
- Local cluster support — Auto-detect and leverage on-prem inference
- Budget guardrails — Warnings when approaching limits
Pattern: Cost-Aware + Credibility-Building Agent Loop
1. Agent receives task
2. Agent calls assess_routing(task_type) → local or cloud?
3. Agent calls pick_model(task_type, priority="balanced")
4. Agent uses recommended model for the task
5. Agent calls log_request(provider, model, tokens)
6. Agent calls rate_recommendation(model, rating, success)
7. If cloud-routed: agent calls route_to_cloud(task_type, reason, model)
8. Every 5 requests, agent calls session_cost()
9. If session cost > 80% of budget, switch to priority="cheapest"
10. At session end: check credibility_profile()
This pattern reduces costs by 40-70% while building a credibility score that makes users trust the agent more.
License
Apache-2.0 — Free to use, modify, and distribute.
Links
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.