tokenomics
Provides live LLM pricing data from OpenRouter, enabling agents to search models, get pricing, estimate costs, and compare models.
README
<div align="center">
💸 tokenomics
Fresh, live LLM pricing for AI agents — as a CLI and an MCP server.
No scraping. No database. No API keys. Every call returns current pricing for 400+ models, straight from the OpenRouter Models API and normalized to USD per 1M tokens.
</div>
Why
LLMs in agents need to reason about cost — which model is cheapest for this job, what will this prompt cost, is the cheaper model worth it? — but pricing data goes stale the moment you hardcode it. tokenomics gives an agent a single, always-fresh source of truth and does the per-million math so the agent never fumbles a 1K vs 1M conversion.
It's built the way Anthropic recommends agent tools should be built: a small set of deep, workflow-level tools (not a CRUD wrapper), JSON-by-default output, runtime schema introspection, input hardening, and field masks to keep responses small.
Features
| 🔴 Always live | Fetches OpenRouter on every call — data is fresh by construction. No DB to go stale. |
| 🔌 Two surfaces, one core | The same engine powers an agent-friendly CLI and an MCP server. Identical behavior. |
| 🧮 Does the math | estimate and compare compute real USD cost from token counts and request volume. |
| 🤖 Agent-first | JSON when piped, field masks, NDJSON, machine-readable errors, schema introspection. |
| 🧰 MCP best-practices | Rich server instructions, structured tool descriptions, graceful timeouts, soft "not found". |
| 🛡️ Hardened input | Rejects control chars, path traversal, and embedded query params at the boundary. |
| ⚡ Typed & tested | Built on Effect with tagged errors; Prettier + oxlint + unit tests. |
Quickstart
One-command install (clone, build, and put tokenomics + tokenomics-mcp on your PATH):
git clone https://github.com/tylergibbs1/tokenomics.git && cd tokenomics && npm install && npm run build && npm link
<details> <summary>Prefer it step by step?</summary>
git clone https://github.com/tylergibbs1/tokenomics.git
cd tokenomics
npm install
npm run build
npm link # optional: puts the binaries on your PATH
</details>
Requires Node 20+. No API key needed — the OpenRouter /models endpoint is public.
CLI
Output is JSON when piped (agent-friendly) and a table at a terminal. Override with -o json|ndjson|table.
# 🔎 Search + filter, and trim the response with a field mask
tokenomics search "claude" --max-input 5 --min-context 200000 \
--fields model_id,pricing,context_window
# 📄 One model's full pricing + provenance
tokenomics get openai/gpt-4o
# 🧮 Cost of a workload (raw token counts) × N requests
tokenomics estimate openai/gpt-4o --input-tokens 1000000 --output-tokens 200000 --requests 10
# ⚖️ Rank candidates by total cost for the same workload (cheapest first)
tokenomics compare --models openai/gpt-4o,google/gemini-2.5-flash \
--input-tokens 1000000 --output-tokens 500000
# 🏷️ Providers present in the live data, with counts
tokenomics providers
# 📐 Runtime schema introspection — the CLI documents itself
tokenomics schema estimate
<details> <summary><b>Example output</b> — <code>tokenomics get openai/gpt-4o</code></summary>
{
"provider": "openai",
"model_id": "openai/gpt-4o",
"display_name": "OpenAI: GPT-4o",
"modality": "multimodal",
"pricing": {
"input_per_mtok": 2.5,
"output_per_mtok": 10,
"cached_input_per_mtok": null,
"cache_write_per_mtok": null
},
"context_window": 128000,
"max_output_tokens": 16384,
"unit": "USD per 1M tokens",
"currency": "USD",
"source_url": "https://openrouter.ai/models/openai/gpt-4o",
"fetched_at": "2026-06-25T16:00:21.174Z",
"source": "openrouter"
}
</details>
Errors are machine-readable on stderr with a stable code, an actionable suggestion, and a non-zero exit:
{
"error": true,
"code": "MODEL_NOT_FOUND",
"message": "No model matching 'gtp-4o'.",
"suggestion": "Use 'tokenomics search' to list available models.",
"details": { "suggestions": [] }
}
MCP server
Four read-only, workflow-level tools — each bundles the live fetch, model matching, and cost math so an agent needs one call, not three.
| Tool | Use it for |
|---|---|
search_models |
Discover / shortlist models by price, modality, or context window |
get_model_pricing |
One known model's full pricing (a miss returns candidates, not an error) |
estimate_cost |
The USD cost of a workload on a single model |
compare_models |
Rank candidate models by total cost for the same workload |
Add to Claude Code
claude mcp add tokenomics -- node /absolute/path/to/tokenomics/dist/bin/tokenomics-mcp.js
Add to Claude Desktop
// claude_desktop_config.json
{
"mcpServers": {
"tokenomics": {
"command": "node",
"args": ["/absolute/path/to/tokenomics/dist/bin/tokenomics-mcp.js"],
},
},
}
The server ships rich instructions (purpose, the units convention, when to use which tool) that clients surface to the model automatically.
Units (the one thing to remember)
- Every price is USD per 1,000,000 tokens.
2.5means $2.50 per 1M tokens. estimate/comparetake raw token counts (e.g.1000000), not millions.output_per_mtok: null⇒ non-generative model (embeddings/rerankers); output tokens cost $0.- Model ids are
provider/model, e.g.openai/gpt-4o,anthropic/claude-3.5-sonnet.
How it works
flowchart LR
OR[OpenRouter<br/>/api/v1/models] -->|live fetch + timeout| MAP[map → ModelPricing<br/>USD per 1M tokens]
MAP --> Q{query}
Q --> S[search]
Q --> G[get]
Q --> E[estimate]
Q --> C[compare]
S & G & E & C --> CLI[CLI]
S & G & E & C --> MCP[MCP server]
A single OpenRouter Effect service fetches and normalizes the catalog (cached in-process for TOKENOMICS_CACHE_TTL_SECONDS, default 60s). Both the CLI and the MCP server call one shared operations layer, so they behave identically and share typed, tagged errors.
Configuration
All optional — sensible defaults work out of the box.
| Variable | Default | Description |
|---|---|---|
OPENROUTER_API_URL |
https://openrouter.ai/api/v1/models |
Source endpoint (override to proxy) |
TOKENOMICS_CACHE_TTL_SECONDS |
60 |
In-process reuse window; 0 = fetch fresh every call |
TOKENOMICS_FETCH_TIMEOUT_MS |
15000 |
Hard timeout so a hung network fails fast |
TOKENOMICS_STRICT_LOOKUP |
false |
1 makes a get_model_pricing miss a hard error instead of returning candidates |
Development
npm run dev:cli -- search "gpt" # run the CLI from source (tsx)
npm run dev:mcp # run the MCP server from source
npm run check # prettier --check + oxlint + tsc --noEmit
npm test # unit tests for the pricing math
Roadmap
- [ ] Additional pricing sources (direct provider pages) as sibling Effect services, merged transparently
- [ ] Token counting from raw text/files so
estimatecan price an actual prompt - [ ] Server-side OpenRouter sorts/filters (throughput, tool-calling support)
- [ ] Publish to npm +
.mcpbbundle for one-click Desktop install
License
MIT © Tyler Gibbs
<div align="center"> <sub>Built for agents, with <a href="https://effect.website">Effect</a> + the <a href="https://modelcontextprotocol.io">Model Context Protocol</a>.</sub> </div>
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.