knitbrain
Local-first MCP server that gives any AI coding agent per-project memory, workflow intelligence, and always-on, lossless token & context optimization.
README
Knit Brain
Local-first MCP server that gives any AI coding agent per-project memory, workflow intelligence, and always-on, lossless token & context optimization — entirely on your machine, zero cloud.
Pure TypeScript. Two runtime dependencies. No Python, no native binaries, no network beyond npm install.
Why
Coding agents burn context on bulk they rarely re-read in full — large files, logs, JSON, stale tool output, old turns. Knit Brain shrinks that bulk to a navigable skeleton while keeping the exact original one lookup away:
- your context window lasts dramatically longer,
- nothing is ever lost — compression is reversible via a local content-addressed store (CCR),
- your instructions and governance text are never touched (protected verbatim).
Measured on real mixed files: 106,268 → 15,316 tokens (85.6% saved), every byte recoverable. Savings are workload-dependent; redundant JSON/code compresses hardest, declaration-only files pass through untouched (output is never larger than input — enforced).
How it works
One brain, two doors, one lossless store:
- MCP server (
knitbrain) — 21 tools: memory (learnings, session handoff), knowledge graph (imports/exports/dependents), workflow classification, project-specific agent generation, a shared team board, a context-window meter (warns and tells the agent to save a handoff before the window blows), and explicitoptimize/retrieve. Every data payload flows through one dispatch chokepoint where it's compressed structure-preservingly (JSON keeps its schema, code keeps its signatures) and tagged with a⟨ccr:hash⟩handle. - Proxy (
knitbrain-proxy) — a loopback HTTP proxy in front of the LLM API (provider auto-detected per request: Anthropic/v1/messages, OpenAI/v1/chat/completions). Compresses the full request — old turns harder than recent ones, pasted bulk inside your message compressed while your directive stays verbatim — and streams the response back. - CCR store — content-addressed (SHA-256 = handle), integrity-checked on every read, atomic writes, tiered retention (hot → cold gzip archive → budgeted purge). The pristine original is always one
retrieveaway, which is what makes aggressive compression safe. - Self-tuning — a feedback loop watches which compressed payloads actually get retrieved and backs off per content-kind. A wrong tuning only costs efficiency, never correctness.
Quickstart
npm install -g knitbrain
# in your project:
knitbrain setup # detects your platform (Claude Code / Cursor / VS Code / Codex)
# and writes its NATIVE integration: .mcp.json, slash commands,
# rules files — non-clobbering
knitbrain dashboard # live local dashboard (127.0.0.1:8790): context meter,
# tokens saved, CCR tiers, self-tuning stats, team board
# optional — route LLM requests through the optimizer (API-key setups):
knitbrain-proxy # listens on 127.0.0.1:8788
export ANTHROPIC_BASE_URL=http://127.0.0.1:8788
# teams — shared optimized sessions (one URL + one token):
knitbrain hub # start the team hub (host runs this once)
knitbrain join <hub-url> <token> <name> # everyone else; postings mirror automatically
Guarantees (enforced by gated tests, not promises)
- Lossless — every compressed payload recovers byte-for-byte from CCR; the round-trip test gates the build.
- Never-expand — output tokens ≤ input tokens, always.
- Governance verbatim — protocol/classification text is never skeletonized.
- Local-first — proxy binds
127.0.0.1only; nothing leaves your machine.
Development
npm install
npm run verify # typecheck → lint → test → build → bench (all must pass)
npm run e2e # built-artifact E2E: stdio session + real-file compression
npm run audit:prod # cold-start proof: clone → install → pack → installed binaries → all 21 tools
Current proof status: 106 tests passing, and the production audit (audit:prod) passes — fresh clone, clean install, packed tarball installed into a new project, all 21 tools and both binaries verified working. One opt-in test (live LLM endpoint) requires your own API key: KNITBRAIN_LIVE_TEST=1 ANTHROPIC_API_KEY=… npm test.
License
MIT © Piyush Dua
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.