fittok

fittok

An MCP server that filters and compresses context by 80-90% before sending to an LLM, using code knowledge graphs and compression.

Category
Visit Server

README

fittok

Retrieve only the relevant source code for a question — instead of the model reading whole files — so an LLM answers codebase questions on a small, focused slice of context. Less input = fewer tokens, lower cost, faster answers.

Works three ways from one install: an MCP server, a CLI, and a Python library — plus a Claude Code plugin that injects context automatically.


How it works

codebase ──▶ graphify ──▶ slurp ──▶ readable slice ──▶ LLM answers
             (parse)      (select)   (trim to budget)
  1. graphify — parses the repo with tree-sitter into a knowledge graph of functions / classes / methods (Python, JS, JSX, TS, TSX, Java, Go, Rust).
  2. slurp — scores every node against the question with **semantic embeddings
    • TF-IDF + PageRank**, then selects only the genuinely relevant nodes via a relevance cliff (no budget-padding with noise).
  3. readable output — returns the actual source code of those nodes, top-ranked in full and the supporting tail as signatures, trimmed to a budget. The model answers directly from it.

Note: an earlier design compressed the slice with LLMLingua, but that produced unreadable token-salad the model ignored (then re-read the files). fittok returns real, readable code instead. LLMLingua remains available only as the standalone compress_context tool.

Graphs and embeddings are cached on disk (~/.cache/fittok), keyed by content — so after a code change only the changed functions re-embed.


Install & use

As an MCP server (recommended — for Claude Code / Cursor)

Add one entry to your client's MCP config:

{ "mcpServers": { "fittok": { "command": "uvx", "args": ["fittok"] } } }

Then ask codebase questions normally. To make it trigger without mentioning it, add one line to your client's CLAUDE.md:

"For any codebase question, call fittok first and answer from its output."

As a CLI (no MCP needed)

pip install fittok
fittok index <repo>                       # optional one-time pre-warm
fittok query <repo> "how does auth work"  # prints the relevant code slice

As a library

from fittok import optimize
result = optimize("/path/to/repo", "how does authentication work")
print(result["optimized_context"])

First query on a repo auto-indexes (~15s once, cached); after that it's instant.


Token savings — honest numbers

fittok cuts the input/exploration cost of a codebase question. On a real Next.js/TS repo (~5k functions) it returns a ~1.5–3.5k-token slice instead of the model reading 15–20k+ tokens of files — an ~80–90% reduction on input, deterministic and reported in the tool's savings footer.

How to measure it honestly:

  • ✅ Use the savings footer (e.g. 84% — 2,494 vs 15,631 tokens) or your API bill (total tokens — which counts the subagent crawls fittok avoids).
  • ⚠️ Do not judge by Claude Code's /context "Messages" number — it excludes subagent tokens and is dominated by the model's own reasoning, which fittok doesn't touch. On thorough models the real saving (e.g. ~84k → ~27k total tokens, by avoiding an Explore subagent) is invisible there but clear on the bill.

Where it shines: broad / multi-file questions, large files, unfamiliar repos, and thorough models that would otherwise explore heavily. On a tiny question a capable model can answer from one small file, so the win is marginal there.


Configuration (env vars)

Variable Default Purpose
FITTOK_SHOW_SAVINGS false Append a 🪙 saved X% footer to answers
CONTEXT_OPTIMIZER_EMBED_MODEL all-MiniLM-L6-v2 Embedding model
CONTEXT_OPTIMIZER_DEVICE auto auto / cuda / mps / cpu
CONTEXT_OPTIMIZER_CACHE_DIR ~/.cache/fittok Cache location

Requirements

Python ≥ 3.10. First run downloads a ~90 MB embedding model. Optional extras: pip install "fittok[ui]" (graph visualizer), "fittok[gpu]" (torch/CUDA).

License

MIT.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured