mcp-brain
A Model Context Protocol server that gives Claude Code persistent, structured awareness of your project β without burning tokens on context rebuilding.
README
mcp-brain
<p align="center"> <img src="assets/banner.svg" width="900" alt="mcp-brain banner"/> </p>
<p align="center"> <a href="#-benchmark-results"><img src="https://img.shields.io/badge/SWE--bench-Hit%4010%3A%2063.4%25-success?style=for-the-badge" alt="SWE-bench Hit@10"/></a> <a href="#-token-efficiency"><img src="https://img.shields.io/badge/token%20saving-65%25%20avg-blue?style=for-the-badge" alt="Token saving"/></a> <a href="#-quick-start"><img src="https://img.shields.io/badge/zero%20LLM%20cost-β-purple?style=for-the-badge" alt="Zero LLM cost"/></a> <a href="#-quick-start"><img src="https://img.shields.io/badge/local--first-β-22c55e?style=for-the-badge" alt="Local-first"/></a> <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green?style=for-the-badge" alt="MIT License"/></a> <img src="https://img.shields.io/badge/python-3.10+-yellow?style=for-the-badge" alt="Python 3.10+"/> </p>
<p align="center"> <b>The repo-aware, team-aware, token-efficient memory layer for Claude Code.</b> </p>
<p align="center"> <i>Claude Code doesn't fail because it lacks intelligence.<br/> It fails because it has zero awareness of your repo and your team.</i> </p>
π TL;DR
mcp-brain is a Model Context Protocol (MCP) server that gives Claude Code persistent, structured awareness of your project β without burning tokens on context rebuilding.
| π§ | Compressed awareness in ~100 tokens instead of ~2000 |
|---|---|
| π― | 63.4% Hit@10 on SWE-bench Full (2294 real GitHub issues) β zero LLM cost |
| β‘ | Sub-100ms file prediction (BM25 + code graph + optional semantic reranker) |
| π₯ | Team-aware: soft claims, conflict detection, ownership tracking |
| π | Self-healing: decision lifecycle, automatic staleness, feedback loop |
| π‘οΈ | Local-first: SQLite, no cloud, no embeddings required, GDPR-friendly |
π Table of Contents
- The Problem
- What mcp-brain Changes
- In 60 seconds
- How It Works
- Memory Hierarchy
- Prediction Pipeline
- Decision Lifecycle
- Architecture
- Benchmark Results
- Token Efficiency
- Quick Start
- MCP Tools
- Use Cases
- FAQ
- Trade-offs
- Roadmap
- License
π¨ The Problem
<p align="center"> <img src="assets/workflow.svg" width="850" alt="Workflow comparison: without mcp-brain Claude explores blindly; with mcp-brain Claude starts from structured repo and team awareness"/> </p>
Without persistent awareness, Claude Code operates blindly at the start of every session:
| Without mcp-brain | With mcp-brain |
|---|---|
| β No idea which files matter | β Predicted files in top-K |
| β Re-explores the repo every session | β Compressed context in ~100 tokens |
| β No visibility into teammates' WIP | β Soft claims + conflict detection |
| β Acts on outdated decisions | β Decision lifecycle (active β stale) |
| β Burns 2000β5000 tokens just to "orient" | β One YAML block, ready to act |
Result without mcp-brain: wrong file exploration β outdated suggestions β merge conflicts β massive token waste.
β‘ What mcp-brain Changes
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Without: Claude β explores β guesses β retries β
β β conflicts β high token usage β
β β
β With: Claude β predicts β verifies β acts β
β β aligned β low token usage β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
𧬠Core idea
Instead of giving Claude more context, we give it structured awareness of reality.
We track:
- π what changed (signal extraction from git)
- π― what matters (scoring + lifecycle)
- π₯ who's working on what (team claims)
- π§ where to act (issue β file prediction)
β¦and we deliver it in ~100 tokens.
β±οΈ In 60 seconds
You drop a one-line ticket into Claude Code:
> work on ticket #42 β JWT login broken
Without mcp-brain, Claude starts grep-walking the repo, reading directory listings, opening README, sampling files β burning 2000+ tokens before producing the first useful sentence.
With mcp-brain, in <100ms Claude receives:
predictions:
- file: src/auth.py
confidence: high
why: "path + symbol match: login, jwt"
- file: src/middleware.py
confidence: medium
why: "imports auth (hop 1)"
- file: src/jwt_utils.py
confidence: medium
why: "called_by auth.login"
team_claims:
- { ticket: 39, author: dev-B, files: [middleware.py] } # β οΈ overlap
avoid:
- "HS256 β vulnerable to key confusion. Migrated to RS256 in commit a1b2c3."
decisions:
- "tokens stored in httpOnly cookie, never localStorage"
It's structured reality, not regenerated context. Claude can act on the first turn.
π How It Works
flowchart TD
subgraph Capture[Capture signals]
A[Git commit] -->|filtered signals| B[mcp-brain memory]
C[Session end] -->|structured snapshot| B
end
subgraph Predict[Predict where to act]
E[Ticket opened] --> F[File predictor]
F -->|top-K files + confidence + why| D[Claude Code]
end
subgraph Coordinate[Coordinate team work]
F -->|overlap check| G[Team claims]
G -->|conflict warnings| D
end
subgraph Learn[Learn from outcomes]
H[Outcome recorded] -->|precision / recall| I[Feedback loop]
I -->|demote noisy memories| B
I -->|supersede stale decisions| B
end
B -->|~100-token YAML context| D
- Capture β git hooks promote only high-signal events (decisions, patterns, things to avoid). Ignored: docs, chore, tests, CI noise.
- Compress β three-level memory (L1/L2/L3) auto-assigned by a scoring function (recency 35% + frequency 30% + impact 20% + explicit 15%).
- Predict β issue title/body β ranked file list via BM25 + code graph expansion + optional semantic reranker.
- Coordinate β soft claims warn before two devs touch the same files.
- Self-correct β every closed ticket feeds precision/recall stats; noisy memories are auto-demoted.
π§ Memory Hierarchy
<p align="center"> <img src="assets/memory-levels.svg" width="900" alt="Three-level memory hierarchy: L1 hot context, L2 warm context, L3 cold archive"/> </p>
Memories aren't dumped into one bag. They're scored and tiered, so the high-token slot in your prompt only carries what's signal-dense for this moment:
- L1 β hot context loads automatically every session. Stack, conventions, current branch, recent commits, team claims, active high-confidence decisions. Capped at ~70 tokens.
- L2 β warm context loads only on demand (
brain_get_decisions). Historical reasoning, superseded patterns, the why behind a past trade-off. - L3 β cold archive is never sent to the model. Kept for audit, transparency, and the lifecycle's "undo" path.
The score is a transparent linear formula β no black-box embedding similarity. Every memory's level is reproducible and explainable.
π Prediction Pipeline
<p align="center"> <img src="assets/prediction-pipeline.svg" width="950" alt="Prediction pipeline: BM25 and IDF scoring, graph expansion, optional semantic rerank"/> </p>
The predictor is three deterministic stages:
| Stage | What it does | Cost |
|---|---|---|
| 1. BM25 + IDF | Tokenize issue, match against symbols / identifiers / paths in an inverted index | ~5 ms |
| 2. Graph expansion | Walk imports / imported_by / called_by from seeds. Score decays per hop (Γ0.5, Γ0.25) |
~10 ms |
| 3. Semantic rerank (optional) | MiniLM (80 MB, CPU/GPU) embeds query + candidates, blends 30% cosine sim with 70% BM25 | ~50 ms |
Every prediction comes back with a why field and a full breakdown, so you can audit why a file was suggested β no opaque ranking.
π‘ Default ON. To run lean (CI / containers without PyTorch), set
MCP_BRAIN_SEMANTIC=0and the pipeline degrades gracefully to BM25 + graph.
π Decision Lifecycle
<p align="center"> <img src="assets/decision-lifecycle.svg" width="950" alt="Decision lifecycle: active memories become suspect, stale, or superseded over time and through feedback"/> </p>
Memories aren't immortal. mcp-brain assumes you'll change your mind and bakes the lifecycle in:
- Age-based decay β after
SUSPECT_DAYSa memory gets flagged for re-verification. AfterSTALE_DAYSit's hidden from prompts. - Semantic supersession β write a new memory similar (cosine β₯ 0.85) to an old one and the old one is auto-marked
superseded. - Feedback loop β when a memory is shown 3+ times before a reverted ticket, it gets demoted automatically. Noisy memories die fast.
This is what makes mcp-brain safe to leave running for months without manual cleanup. The L1 stays small and trustworthy; the L3 archives the audit trail.
ποΈ Architecture
<p align="center"> <img src="assets/architecture.svg" width="850" alt="mcp-brain architecture: Claude Code talks to the MCP tools layer, which uses memory retrieval, file prediction, feedback, observability, and local SQLite storage"/> </p>
flowchart TB
subgraph Client
CC[Claude Code]
end
subgraph Server[mcp-brain server]
T[MCP Tools layer<br/>brain_init, brain_get_context,<br/>brain_predict_files, ...]
R[Retriever<br/>+ Compressor]
P[File Predictor<br/>BM25 + Graph + Semantic]
F[Feedback Reconciler]
O[Observability<br/>p50/p95/p99]
end
subgraph Storage[Local storage ~/.mcp-brain/]
DB[(SQLite<br/>memories, sessions,<br/>projects, feedback)]
IDX[Inverted Index<br/>BM25]
G[Code Graph<br/>imports/calls]
Y[YAML claims]
end
CC <-->|MCP/stdio| T
T --> R
T --> P
T --> F
T --> O
R --> DB
P --> IDX
P --> G
F --> DB
O --> DB
Repo layout
mcp-brain/
βββ src/
β βββ brain/ # core logic: retriever, compressor, scorer, predictor
β β # code_graph, file_indexer, semantic_reranker,
β β # staleness, similarity, feedback loop, observability
β βββ capture/ # git hook signal extraction
β βββ storage/ # SQLite layer
β βββ tools/ # MCP tool definitions (FastMCP)
βββ benchmark/ # SWE-bench Lite/Full, Bench4BL, BugLocator harness
βββ tests/ # pytest suite (predictor, feedback, observability, ...)
βββ assets/ # SVG diagrams used in this README
π Benchmark Results
<p align="center"> <img src="assets/benchmark-results.svg" width="950" alt="SWE-bench Full benchmark results: Hit@K, Recall@K, MAP@K, and comparison vs literature"/> </p>
We benchmark file localization β given a real GitHub issue, can mcp-brain rank the production files the accepted patch actually modified?
Dataset: SWE-bench Full
- 2294 real Python bug-fix tasks from major OSS projects (astropy, django, flask, matplotlib, pandas, pytest, requests, scikit-learn, sphinx, sympy, xarray)
- Ground truth = files modified in the accepted reference patch (test files excluded by default β strict production-file evaluation)
Results β mcp-brain v1.4.0 (BM25 + graph + semantic)
| Metric | @1 | @3 | @5 | @10 |
|---|---|---|---|---|
| Hit | 24.5% | 43.4% | 53.7% | 63.4% |
| Recall | 20.1% | 36.6% | 46.1% | 55.8% |
| MAP | 24.5% | 28.4% | 30.4% | 31.8% |
- Instances evaluated: 2294
- Errors: 5 (0.2% failure rate)
- Avg gold files per issue: 1.66
- Avg predicted files: 9.98 (top-10)
Honest comparison vs. literature
| System | Hit@10 (file loc.) | Cost per query | Notes |
|---|---|---|---|
| BM25 baseline (vanilla) | ~45β55% | free | symbol search only |
| mcp-brain v1.4.0 | 63.4% | free | BM25 + graph + semantic, zero LLM |
| Agentless / SWE-agent | ~70β85% | $0.10β$2 | LLM-based, multi-step |
Reading the numbers:
Hit@5 = 53.7%β in more than half of real issues, the right production file is in top-5 before Claude reads a single byte.Hit@10 = 63.4%β expanded to top-10, almost 2 issues out of 3 have the right file ranked.MAP@1 = 24.5%β the very first prediction is dead-on for 1 issue out of 4.0.2% error rateover 2294 runs β robust pipeline.
Reproduce it yourself
# One-time online setup
pip install -e .
pip install -r benchmark/requirements-benchmark.txt
python -m benchmark.adapters.swebench --dataset-name princeton-nlp/SWE-bench \
--output benchmark/datasets/cache/swebench_full.jsonl
python -m benchmark.prepare_repos \
--dataset benchmark/datasets/cache/swebench_full.jsonl \
--repo-cache benchmark/repos
# Offline evaluation (full)
python -m benchmark.run_eval \
--dataset benchmark/datasets/cache/swebench_full.jsonl \
--repo-cache benchmark/repos \
--out benchmark/results/swebench_full.json \
--report-dir benchmark/reports \
--top-k 10 --max-hops 2 --use-semantic
Reports are emitted as Markdown + HTML in benchmark/reports/.
The harness also supports SWE-bench Lite (300 instances), SWE-bench Verified, Bench4BL, and BugLocator β see benchmark/README.md.
π° Token Efficiency
<p align="center"> <img src="assets/cost-optimization.svg" width="950" alt="Cost optimization: from 2000-5500 orientation tokens per session to roughly 650 tokens with mcp-brain"/> </p>
The math
A typical Claude Code session without mcp-brain spends thousands of tokens just to orient itself:
| Phase (no mcp-brain) | Action | ~Tokens |
|---|---|---|
| Session start | List directory, read README, sample files | 800β2000 |
| Issue handling | Grep symbols, follow imports, retry wrong files | 1000β3000 |
| Context restore | Re-explain project conventions | 200β500 |
| Total per session | 2000β5500 |
A session with mcp-brain:
| Phase (with mcp-brain) | Action | ~Tokens |
|---|---|---|
| Session start | brain_get_context returns compressed L1 YAML |
~100 |
| Issue handling | brain_predict_files returns ranked top-K + why |
~250 |
| Decision recall | brain_get_decisions (only when needed) |
~300 |
| Total per session | ~650 |
Estimated saving
Without With mcp-brain Saving
Session start: 2000 ββββββββββΊ 100 tokens ~95%
Per session: 2000β5500 βββΊ 450β950 tokens 40β80%
Per developer*: ~1.2M/month βββΊ ~400k/month ~65%
<sub>*assuming 100 sessions/month/dev</sub>
Why this works
- β No embeddings required for retrieval (BM25 + code graph)
- β No vector DB to query (zero round-trip cost)
- β No history replay β context is reconstructed, not re-scrolled
- β
YAML compression with
default_flow_style=Trueand empty-key stripping - β L1/L2 split β heavy memory only loaded on demand
π‘ The semantic reranker (
use_semantic=True) is on by default and runs locally on CPU/GPU. It does not add LLM cost. Disable withMCP_BRAIN_SEMANTIC=0for lean CI.
π Quick Start
Install β one command, batteries included
git clone https://github.com/PierfrancescoLijoi/mcp-brain.git
cd mcp-brain
pip install -e ".[all]"
The [all] extra installs:
- language parsers (Python, JS, TS, Go, Rust, Java, C#) for the code graph
- semantic reranker (sentence-transformers + numpy)
- dev tooling (pytest, pytest-cov)
Lean install paths
If you want a smaller footprint, you can pick exactly what you need:
pip install -e . # core only β BM25 + graph (no semantic, no parsers)
pip install -e ".[parsers]" # + multi-language parsers
pip install -e ".[semantic]" # + semantic reranker (~700 MB w/ PyTorch)
pip install -e ".[dev]" # + dev tooling
Register with Claude Code
claude mcp add mcp-brain python /absolute/path/to/run.py
On Windows PowerShell:
claude mcp add mcp-brain python "C:\path\to\mcp-brain\run.py"
Initialize your project
mcp-brain init
That's it. Open Claude Code in your repo and the L1 context is automatically available via brain_get_context.
π§ MCP Tools
| Tool | Purpose | When Claude calls it |
|---|---|---|
brain_init |
Register project, stack, conventions | Once per repo |
brain_get_context |
Load L1 context (~70 tokens) | Every session start |
brain_get_decisions |
Load L2 decisions on demand | When historical context needed |
brain_remember |
Store a memory; level auto-assigned | When user makes a decision |
brain_save_session |
Save end-of-session snapshot | At session end |
brain_predict_files |
Issue β ranked file list with why |
When opening a ticket |
brain_start_ticket |
Start ticket workflow + conflict check | Workflow orchestration |
brain_record_outcome |
Log ticket outcome (completed/reverted/...) | After ticket closed |
brain_feedback_stats |
Precision/recall window | Health checks |
brain_memory_health |
Surface noisy memories | Debugging |
brain_observability |
Full unified dashboard (YAML) | Ops / CI |
Example L1 context output (~100 tokens)
p: {name: my-api, stack: [FastAPI, PostgreSQL]}
s: {branch: feat/auth, wip: "JWT refactor", next: "add refresh token"}
git:
recent: ["refactor: JWT moved to RS256"]
changed: [auth.py, middleware.py]
team_claims:
- {ticket: 42, author: dev-B, files: [middleware.py]}
avoid:
- "avoid: HS256 β vulnerable to key confusion"
decisions:
- "decision: tokens stored httpOnly cookie, never localStorage"
π Claude already knows where to act before reading a single source file.
πΌ Use Cases
π― Solo developer
- Cuts session-start exploration: β90% tokens on the first turn
- Remembers your "I always do it this way" patterns
- Auto-supersedes decisions when you change your mind
π₯ Small team (3β10 devs)
- Conflict detection before two devs touch the same files
- Shared decision log with lifecycle (no more "wait, didn't we decide�")
- File ownership inference from git history
π’ Enterprise (with caveats)
- Local-first, no data leaves the machine β GDPR / SOC2-friendly
- Compatible with Managed Identity / on-prem deployments (no cloud calls)
- Token saving compounds: 65% Γ 100 devs Γ 100 sessions/month β measurable infra savings
β FAQ
<details> <summary><b>Is this a RAG system or a vector DB?</b></summary>
No, and on purpose. mcp-brain is a structured awareness layer, not a retrieval-over-embeddings layer. The core retrieval is BM25 + code graph expansion β fully deterministic, sub-100ms, no vector DB to maintain. The semantic reranker is an optional 30% blend on top, used only as a tiebreaker. This is why token cost stays predictable and infra is local-first.
</details>
<details> <summary><b>Why not just use Claude's native context window? It's huge now.</b></summary>
A long context window doesn't fix the problem β it makes it cheaper to waste. The bottleneck isn't capacity, it's signal density. Pasting your whole repo into the context still leaves Claude searching for the right file linearly. mcp-brain pre-ranks reality so the model spends its attention on the right 3 files, not the wrong 30.
</details>
<details> <summary><b>Will it leak my code or memories anywhere?</b></summary>
No. Storage is SQLite under ~/.mcp-brain/ (local) and <repo>/.brain/shared/ (versioned with git if you choose). No outbound network calls, no telemetry, no cloud component. The semantic model runs on your CPU/GPU. This makes mcp-brain compatible with GDPR-restricted and air-gapped environments.
</details>
<details> <summary><b>What if I disagree with a decision mcp-brain remembers?</b></summary>
Write a new memory that contradicts it. Semantic supersession (cosine β₯ 0.85) will auto-mark the old one as superseded. You can also manually demote via brain_memory_health or wait for age-based decay (SUSPECT_DAYS / STALE_DAYS). The lifecycle assumes you'll change your mind.
</details>
<details> <summary><b>Does it work with languages other than Python?</b></summary>
Yes for indexing/predicting (BM25 is language-agnostic). The code graph currently supports Python, JavaScript, TypeScript, Go, Rust, Java, C# via tree-sitter parsers. Adding a new language is a single registry entry β see src/brain/parsers.py.
</details>
<details> <summary><b>How does it compare to SWE-agent / Aider / Cursor?</b></summary>
Different layer of the stack. SWE-agent and similar tools are autonomous coders β they read, plan, and patch via LLM calls. mcp-brain is the awareness layer underneath them. You could pair it with Aider or any MCP-compatible client; it makes whatever LLM you use start from a smarter zero.
</details>
<details> <summary><b>What's the catch?</b></summary>
Honest answer: file prediction is heuristic. Hit@1 = 24.5% means 3 issues out of 4 still need Claude to validate the prediction before acting. mcp-brain orients, it doesn't replace exploration. That's also why it's free β it's a force multiplier, not an oracle.
</details>
β οΈ Trade-offs
I'm honest about what this is and isn't.
| Strength | Limitation |
|---|---|
| β Zero LLM cost for retrieval | β οΈ Heuristic-based: edge cases with no symbol/path overlap can miss |
| β Sub-100ms predictions | β οΈ Requires good commit hygiene (semantic commit messages help) |
| β Local-first, no cloud | β οΈ No cross-machine sync out of the box (use git for .brain/shared/) |
| β Deterministic (replays produce same output) | β οΈ Hit@1 = 24.5% β orients, doesn't replace exploration |
| β Works on any size repo | β οΈ Best on medium/large repos (small repos don't benefit much) |
This is NOT:
- β a vector DB memory
- β a RAG system
- β an SWE-agent / autonomous coder
- β a checkpoint / replay tool
This IS:
- β a repo-aware, team-aware, token-efficient awareness layer
- β a force multiplier for Claude Code, not a replacement
π£οΈ Roadmap
- [x] BM25 + code graph + semantic reranker
- [x] Decision lifecycle with semantic supersession
- [x] Feedback loop with precision/recall reconciliation
- [x] Observability dashboard
- [x] SWE-bench Full benchmark (2294 instances)
- [x] Multi-language code graph (Python, JS, TS, Go, Rust, Java, C#)
- [ ] Cross-repo memory federation (opt-in)
- [ ] Real-time conflict push (currently pull-based)
- [ ] VS Code extension companion
- [ ] Hosted shared
.brain/for distributed teams (still local-first per dev)
π§ͺ Run the test suite
pip install -e ".[dev]"
pytest tests/ -v
Expected: full pass on Python 3.10, 3.11, 3.12.
π€ Contributing
PRs welcome. Before opening one:
pytest tests/ -vmust pass- New behavior needs new tests
- New MCP tools must be wrapped with
@observed("brain_<name>") - Avoid heavy dependencies for the default install path β anything ML-flavored goes behind an optional extra
π License
MIT β see LICENSE.
<p align="center"> <b>Built for Claude Code β but the architecture is MCP-standard, so any MCP-compatible client works.</b> </p>
<p align="center"> <sub>If mcp-brain saved you tokens, β the repo. That's the only payment I ask for.</sub> </p>
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.