fittok
An MCP server that filters and compresses context by 80-90% before sending to an LLM, using code knowledge graphs and compression.
README
fittok
Retrieve only the relevant source code for a question — instead of the model reading whole files — so an LLM answers codebase questions on a small, focused slice of context. Less input = fewer tokens, lower cost, faster answers.
Works three ways from one install: an MCP server, a CLI, and a Python library — plus a Claude Code plugin that injects context automatically.
How it works
codebase ──▶ graphify ──▶ slurp ──▶ readable slice ──▶ LLM answers
(parse) (select) (trim to budget)
- graphify — parses the repo with tree-sitter into a knowledge graph of functions / classes / methods (Python, JS, JSX, TS, TSX, Java, Go, Rust).
- slurp — scores every node against the question with **semantic embeddings
- TF-IDF + PageRank**, then selects only the genuinely relevant nodes via a relevance cliff (no budget-padding with noise).
- readable output — returns the actual source code of those nodes, top-ranked in full and the supporting tail as signatures, trimmed to a budget. The model answers directly from it.
Note: an earlier design compressed the slice with LLMLingua, but that produced unreadable token-salad the model ignored (then re-read the files). fittok returns real, readable code instead. LLMLingua remains available only as the standalone
compress_contexttool.
Graphs and embeddings are cached on disk (~/.cache/fittok), keyed by content —
so after a code change only the changed functions re-embed.
Install & use
As an MCP server (recommended — for Claude Code / Cursor)
Add one entry to your client's MCP config:
{ "mcpServers": { "fittok": { "command": "uvx", "args": ["fittok"] } } }
Then ask codebase questions normally. To make it trigger without mentioning it,
add one line to your client's CLAUDE.md:
"For any codebase question, call fittok first and answer from its output."
As a CLI (no MCP needed)
pip install fittok
fittok index <repo> # optional one-time pre-warm
fittok query <repo> "how does auth work" # prints the relevant code slice
As a library
from fittok import optimize
result = optimize("/path/to/repo", "how does authentication work")
print(result["optimized_context"])
First query on a repo auto-indexes (~15s once, cached); after that it's instant.
Token savings — honest numbers
fittok cuts the input/exploration cost of a codebase question. On a real
Next.js/TS repo (~5k functions) it returns a ~1.5–3.5k-token slice instead of
the model reading 15–20k+ tokens of files — an ~80–90% reduction on input,
deterministic and reported in the tool's savings footer.
How to measure it honestly:
- ✅ Use the
savingsfooter (e.g.84% — 2,494 vs 15,631 tokens) or your API bill (total tokens — which counts the subagent crawls fittok avoids). - ⚠️ Do not judge by Claude Code's
/context"Messages" number — it excludes subagent tokens and is dominated by the model's own reasoning, which fittok doesn't touch. On thorough models the real saving (e.g. ~84k → ~27k total tokens, by avoiding an Explore subagent) is invisible there but clear on the bill.
Where it shines: broad / multi-file questions, large files, unfamiliar repos, and thorough models that would otherwise explore heavily. On a tiny question a capable model can answer from one small file, so the win is marginal there.
Configuration (env vars)
| Variable | Default | Purpose |
|---|---|---|
FITTOK_SHOW_SAVINGS |
false |
Append a 🪙 saved X% footer to answers |
CONTEXT_OPTIMIZER_EMBED_MODEL |
all-MiniLM-L6-v2 |
Embedding model |
CONTEXT_OPTIMIZER_DEVICE |
auto |
auto / cuda / mps / cpu |
CONTEXT_OPTIMIZER_CACHE_DIR |
~/.cache/fittok |
Cache location |
Requirements
Python ≥ 3.10. First run downloads a ~90 MB embedding model. Optional extras:
pip install "fittok[ui]" (graph visualizer), "fittok[gpu]" (torch/CUDA).
License
MIT.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.