MCP Servers

mcp-brain

A Model Context Protocol server that gives Claude Code persistent, structured awareness of your project — without burning tokens on context rebuilding.

README

mcp-brain

The repo-aware, team-aware, token-efficient memory layer for Claude Code.

Claude Code doesn't fail because it lacks intelligence. It fails because it has zero awareness of your repo and your team.

🚀 TL;DR

mcp-brain is a Model Context Protocol (MCP) server that gives Claude Code persistent, structured awareness of your project — without burning tokens on context rebuilding.

🧠	Compressed awareness in ~100 tokens instead of ~2000
🎯	63.4% Hit@10 on SWE-bench Full (2294 real GitHub issues) — zero LLM cost
⚡	Sub-100ms file prediction (BM25 + code graph + optional semantic reranker)
👥	Team-aware: soft claims, conflict detection, ownership tracking
🔄	Self-healing: decision lifecycle, automatic staleness, feedback loop
🛡️	Local-first: SQLite, no cloud, no embeddings required, GDPR-friendly

📑 Table of Contents

The Problem
What mcp-brain Changes
In 60 seconds
How It Works
Memory Hierarchy
Prediction Pipeline
Decision Lifecycle
Architecture
Benchmark Results
Token Efficiency
Quick Start
MCP Tools
Use Cases
FAQ
Trade-offs
Roadmap
License

🚨 The Problem

Without persistent awareness, Claude Code operates blindly at the start of every session:

Without mcp-brain	With mcp-brain
❌ No idea which files matter	✅ Predicted files in top-K
❌ Re-explores the repo every session	✅ Compressed context in ~100 tokens
❌ No visibility into teammates' WIP	✅ Soft claims + conflict detection
❌ Acts on outdated decisions	✅ Decision lifecycle (active → stale)
❌ Burns 2000–5000 tokens just to "orient"	✅ One YAML block, ready to act

Result without mcp-brain: wrong file exploration → outdated suggestions → merge conflicts → massive token waste.

⚡ What mcp-brain Changes

┌──────────────────────────────────────────────────────┐
│                                                      │
│   Without:  Claude → explores → guesses → retries    │
│             → conflicts → high token usage           │
│                                                      │
│   With:     Claude → predicts → verifies → acts      │
│             → aligned → low token usage              │
│                                                      │
└──────────────────────────────────────────────────────┘

🧬 Core idea

Instead of giving Claude more context, we give it structured awareness of reality.

We track:

📌 what changed (signal extraction from git)
🎯 what matters (scoring + lifecycle)
👥 who's working on what (team claims)
🧭 where to act (issue → file prediction)

…and we deliver it in ~100 tokens.

⏱️ In 60 seconds

You drop a one-line ticket into Claude Code:

> work on ticket #42 — JWT login broken

Without mcp-brain, Claude starts grep-walking the repo, reading directory listings, opening README, sampling files — burning 2000+ tokens before producing the first useful sentence.

With mcp-brain, in <100ms Claude receives:

predictions:
  - file: src/auth.py
    confidence: high
    why: "path + symbol match: login, jwt"
  - file: src/middleware.py
    confidence: medium
    why: "imports auth (hop 1)"
  - file: src/jwt_utils.py
    confidence: medium
    why: "called_by auth.login"
team_claims:
  - { ticket: 39, author: dev-B, files: [middleware.py] }   # ⚠️ overlap
avoid:
  - "HS256 — vulnerable to key confusion. Migrated to RS256 in commit a1b2c3."
decisions:
  - "tokens stored in httpOnly cookie, never localStorage"

It's structured reality, not regenerated context. Claude can act on the first turn.

🔑 How It Works

flowchart TD
    subgraph Capture[Capture signals]
        A[Git commit] -->|filtered signals| B[mcp-brain memory]
        C[Session end] -->|structured snapshot| B
    end

    subgraph Predict[Predict where to act]
        E[Ticket opened] --> F[File predictor]
        F -->|top-K files + confidence + why| D[Claude Code]
    end

    subgraph Coordinate[Coordinate team work]
        F -->|overlap check| G[Team claims]
        G -->|conflict warnings| D
    end

    subgraph Learn[Learn from outcomes]
        H[Outcome recorded] -->|precision / recall| I[Feedback loop]
        I -->|demote noisy memories| B
        I -->|supersede stale decisions| B
    end

    B -->|~100-token YAML context| D

Capture — git hooks promote only high-signal events (decisions, patterns, things to avoid). Ignored: docs, chore, tests, CI noise.
Compress — three-level memory (L1/L2/L3) auto-assigned by a scoring function (recency 35% + frequency 30% + impact 20% + explicit 15%).
Predict — issue title/body → ranked file list via BM25 + code graph expansion + optional semantic reranker.
Coordinate — soft claims warn before two devs touch the same files.
Self-correct — every closed ticket feeds precision/recall stats; noisy memories are auto-demoted.

🧠 Memory Hierarchy

Memories aren't dumped into one bag. They're scored and tiered, so the high-token slot in your prompt only carries what's signal-dense for this moment:

L1 — hot context loads automatically every session. Stack, conventions, current branch, recent commits, team claims, active high-confidence decisions. Capped at ~70 tokens.
L2 — warm context loads only on demand (brain_get_decisions). Historical reasoning, superseded patterns, the why behind a past trade-off.
L3 — cold archive is never sent to the model. Kept for audit, transparency, and the lifecycle's "undo" path.

The score is a transparent linear formula — no black-box embedding similarity. Every memory's level is reproducible and explainable.

🔍 Prediction Pipeline

The predictor is three deterministic stages:

Stage	What it does	Cost
1. BM25 + IDF	Tokenize issue, match against symbols / identifiers / paths in an inverted index	~5 ms
2. Graph expansion	Walk `imports` / `imported_by` / `called_by` from seeds. Score decays per hop (`×0.5`, `×0.25`)	~10 ms
3. Semantic rerank (optional)	MiniLM (80 MB, CPU/GPU) embeds query + candidates, blends 30% cosine sim with 70% BM25	~50 ms

Every prediction comes back with a why field and a full breakdown, so you can audit why a file was suggested — no opaque ranking.

💡 Default ON. To run lean (CI / containers without PyTorch), set MCP_BRAIN_SEMANTIC=0 and the pipeline degrades gracefully to BM25 + graph.

🔄 Decision Lifecycle

Memories aren't immortal. mcp-brain assumes you'll change your mind and bakes the lifecycle in:

Age-based decay — after SUSPECT_DAYS a memory gets flagged for re-verification. After STALE_DAYS it's hidden from prompts.
Semantic supersession — write a new memory similar (cosine ≥ 0.85) to an old one and the old one is auto-marked superseded.
Feedback loop — when a memory is shown 3+ times before a reverted ticket, it gets demoted automatically. Noisy memories die fast.

This is what makes mcp-brain safe to leave running for months without manual cleanup. The L1 stays small and trustworthy; the L3 archives the audit trail.

🏗️ Architecture

flowchart TB
    subgraph Client
        CC[Claude Code]
    end
    subgraph Server[mcp-brain server]
        T[MCP Tools layer<br/>brain_init, brain_get_context,<br/>brain_predict_files, ...]
        R[Retriever<br/>+ Compressor]
        P[File Predictor<br/>BM25 + Graph + Semantic]
        F[Feedback Reconciler]
        O[Observability<br/>p50/p95/p99]
    end
    subgraph Storage[Local storage ~/.mcp-brain/]
        DB[(SQLite<br/>memories, sessions,<br/>projects, feedback)]
        IDX[Inverted Index<br/>BM25]
        G[Code Graph<br/>imports/calls]
        Y[YAML claims]
    end
    CC <-->|MCP/stdio| T
    T --> R
    T --> P
    T --> F
    T --> O
    R --> DB
    P --> IDX
    P --> G
    F --> DB
    O --> DB

Repo layout

mcp-brain/
├── src/
│   ├── brain/         # core logic: retriever, compressor, scorer, predictor
│   │                  # code_graph, file_indexer, semantic_reranker,
│   │                  # staleness, similarity, feedback loop, observability
│   ├── capture/       # git hook signal extraction
│   ├── storage/       # SQLite layer
│   └── tools/         # MCP tool definitions (FastMCP)
├── benchmark/         # SWE-bench Lite/Full, Bench4BL, BugLocator harness
├── tests/             # pytest suite (predictor, feedback, observability, ...)
└── assets/            # SVG diagrams used in this README

📊 Benchmark Results

We benchmark file localization — given a real GitHub issue, can mcp-brain rank the production files the accepted patch actually modified?

Dataset: SWE-bench Full

2294 real Python bug-fix tasks from major OSS projects (astropy, django, flask, matplotlib, pandas, pytest, requests, scikit-learn, sphinx, sympy, xarray)
Ground truth = files modified in the accepted reference patch (test files excluded by default — strict production-file evaluation)

Results — `mcp-brain` v1.4.0 (BM25 + graph + semantic)

Metric	@1	@3	@5	@10
Hit	24.5%	43.4%	53.7%	63.4%
Recall	20.1%	36.6%	46.1%	55.8%
MAP	24.5%	28.4%	30.4%	31.8%

Instances evaluated: 2294
Errors: 5 (0.2% failure rate)
Avg gold files per issue: 1.66
Avg predicted files: 9.98 (top-10)

Honest comparison vs. literature

System	Hit@10 (file loc.)	Cost per query	Notes
BM25 baseline (vanilla)	~45–55%	free	symbol search only
mcp-brain v1.4.0	63.4%	free	BM25 + graph + semantic, zero LLM
Agentless / SWE-agent	~70–85%	$0.10–$2	LLM-based, multi-step

Reading the numbers:

Hit@5 = 53.7% → in more than half of real issues, the right production file is in top-5 before Claude reads a single byte.
Hit@10 = 63.4% → expanded to top-10, almost 2 issues out of 3 have the right file ranked.
MAP@1 = 24.5% → the very first prediction is dead-on for 1 issue out of 4.
0.2% error rate over 2294 runs → robust pipeline.

Reproduce it yourself

# One-time online setup
pip install -e .
pip install -r benchmark/requirements-benchmark.txt
python -m benchmark.adapters.swebench --dataset-name princeton-nlp/SWE-bench \
  --output benchmark/datasets/cache/swebench_full.jsonl
python -m benchmark.prepare_repos \
  --dataset benchmark/datasets/cache/swebench_full.jsonl \
  --repo-cache benchmark/repos

# Offline evaluation (full)
python -m benchmark.run_eval \
  --dataset benchmark/datasets/cache/swebench_full.jsonl \
  --repo-cache benchmark/repos \
  --out benchmark/results/swebench_full.json \
  --report-dir benchmark/reports \
  --top-k 10 --max-hops 2 --use-semantic

Reports are emitted as Markdown + HTML in benchmark/reports/.

The harness also supports SWE-bench Lite (300 instances), SWE-bench Verified, Bench4BL, and BugLocator — see benchmark/README.md.

💰 Token Efficiency

The math

A typical Claude Code session without mcp-brain spends thousands of tokens just to orient itself:

Phase (no mcp-brain)	Action	~Tokens
Session start	List directory, read README, sample files	800–2000
Issue handling	Grep symbols, follow imports, retry wrong files	1000–3000
Context restore	Re-explain project conventions	200–500
Total per session		2000–5500

A session with mcp-brain:

Phase (with mcp-brain)	Action	~Tokens
Session start	`brain_get_context` returns compressed L1 YAML	~100
Issue handling	`brain_predict_files` returns ranked top-K + why	~250
Decision recall	`brain_get_decisions` (only when needed)	~300
Total per session		~650

Estimated saving

                        Without          With mcp-brain     Saving
  Session start:    2000 ─────────►       100 tokens        ~95%
  Per session:      2000–5500 ──►       450–950 tokens      40–80%
  Per developer*:   ~1.2M/month ──►    ~400k/month          ~65%

*assuming 100 sessions/month/dev

Why this works

✅ No embeddings required for retrieval (BM25 + code graph)
✅ No vector DB to query (zero round-trip cost)
✅ No history replay — context is reconstructed, not re-scrolled
✅ YAML compression with default_flow_style=True and empty-key stripping
✅ L1/L2 split — heavy memory only loaded on demand

💡 The semantic reranker (use_semantic=True) is on by default and runs locally on CPU/GPU. It does not add LLM cost. Disable with MCP_BRAIN_SEMANTIC=0 for lean CI.

🚀 Quick Start

Install — one command, batteries included

git clone https://github.com/PierfrancescoLijoi/mcp-brain.git
cd mcp-brain
pip install -e ".[all]"

The [all] extra installs:

language parsers (Python, JS, TS, Go, Rust, Java, C#) for the code graph
semantic reranker (sentence-transformers + numpy)
dev tooling (pytest, pytest-cov)

Lean install paths

If you want a smaller footprint, you can pick exactly what you need:

pip install -e .                      # core only — BM25 + graph (no semantic, no parsers)
pip install -e ".[parsers]"           # + multi-language parsers
pip install -e ".[semantic]"          # + semantic reranker (~700 MB w/ PyTorch)
pip install -e ".[dev]"               # + dev tooling

Register with Claude Code

claude mcp add mcp-brain python /absolute/path/to/run.py

On Windows PowerShell:

claude mcp add mcp-brain python "C:\path\to\mcp-brain\run.py"

Initialize your project

mcp-brain init

That's it. Open Claude Code in your repo and the L1 context is automatically available via brain_get_context.

🧠 MCP Tools

Tool	Purpose	When Claude calls it
`brain_init`	Register project, stack, conventions	Once per repo
`brain_get_context`	Load L1 context (~70 tokens)	Every session start
`brain_get_decisions`	Load L2 decisions on demand	When historical context needed
`brain_remember`	Store a memory; level auto-assigned	When user makes a decision
`brain_save_session`	Save end-of-session snapshot	At session end
`brain_predict_files`	Issue → ranked file list with `why`	When opening a ticket
`brain_start_ticket`	Start ticket workflow + conflict check	Workflow orchestration
`brain_record_outcome`	Log ticket outcome (completed/reverted/...)	After ticket closed
`brain_feedback_stats`	Precision/recall window	Health checks
`brain_memory_health`	Surface noisy memories	Debugging
`brain_observability`	Full unified dashboard (YAML)	Ops / CI

Example L1 context output (~100 tokens)

p: {name: my-api, stack: [FastAPI, PostgreSQL]}
s: {branch: feat/auth, wip: "JWT refactor", next: "add refresh token"}

git:
  recent: ["refactor: JWT moved to RS256"]
  changed: [auth.py, middleware.py]

team_claims:
  - {ticket: 42, author: dev-B, files: [middleware.py]}

avoid:
  - "avoid: HS256 — vulnerable to key confusion"

decisions:
  - "decision: tokens stored httpOnly cookie, never localStorage"

👉 Claude already knows where to act before reading a single source file.

💼 Use Cases

🎯 Solo developer

Cuts session-start exploration: −90% tokens on the first turn
Remembers your "I always do it this way" patterns
Auto-supersedes decisions when you change your mind

👥 Small team (3–10 devs)

Conflict detection before two devs touch the same files
Shared decision log with lifecycle (no more "wait, didn't we decide…?")
File ownership inference from git history

🏢 Enterprise (with caveats)

Local-first, no data leaves the machine → GDPR / SOC2-friendly
Compatible with Managed Identity / on-prem deployments (no cloud calls)
Token saving compounds: 65% × 100 devs × 100 sessions/month → measurable infra savings

❓ FAQ

<details> <summary>Is this a RAG system or a vector DB?</summary>

No, and on purpose. mcp-brain is a structured awareness layer, not a retrieval-over-embeddings layer. The core retrieval is BM25 + code graph expansion — fully deterministic, sub-100ms, no vector DB to maintain. The semantic reranker is an optional 30% blend on top, used only as a tiebreaker. This is why token cost stays predictable and infra is local-first.

</details>

<details> <summary>Why not just use Claude's native context window? It's huge now.</summary>

A long context window doesn't fix the problem — it makes it cheaper to waste. The bottleneck isn't capacity, it's signal density. Pasting your whole repo into the context still leaves Claude searching for the right file linearly. mcp-brain pre-ranks reality so the model spends its attention on the right 3 files, not the wrong 30.

</details>

<details> <summary>Will it leak my code or memories anywhere?</summary>

No. Storage is SQLite under ~/.mcp-brain/ (local) and <repo>/.brain/shared/ (versioned with git if you choose). No outbound network calls, no telemetry, no cloud component. The semantic model runs on your CPU/GPU. This makes mcp-brain compatible with GDPR-restricted and air-gapped environments.

</details>

<details> <summary>What if I disagree with a decision mcp-brain remembers?</summary>

Write a new memory that contradicts it. Semantic supersession (cosine ≥ 0.85) will auto-mark the old one as superseded. You can also manually demote via brain_memory_health or wait for age-based decay (SUSPECT_DAYS / STALE_DAYS). The lifecycle assumes you'll change your mind.

</details>

<details> <summary>Does it work with languages other than Python?</summary>

Yes for indexing/predicting (BM25 is language-agnostic). The code graph currently supports Python, JavaScript, TypeScript, Go, Rust, Java, C# via tree-sitter parsers. Adding a new language is a single registry entry — see src/brain/parsers.py.

</details>

<details> <summary>How does it compare to SWE-agent / Aider / Cursor?</summary>

Different layer of the stack. SWE-agent and similar tools are autonomous coders — they read, plan, and patch via LLM calls. mcp-brain is the awareness layer underneath them. You could pair it with Aider or any MCP-compatible client; it makes whatever LLM you use start from a smarter zero.

</details>

<details> <summary>What's the catch?</summary>

Honest answer: file prediction is heuristic. Hit@1 = 24.5% means 3 issues out of 4 still need Claude to validate the prediction before acting. mcp-brain orients, it doesn't replace exploration. That's also why it's free — it's a force multiplier, not an oracle.

</details>

⚠️ Trade-offs

I'm honest about what this is and isn't.

Strength	Limitation
✅ Zero LLM cost for retrieval	⚠️ Heuristic-based: edge cases with no symbol/path overlap can miss
✅ Sub-100ms predictions	⚠️ Requires good commit hygiene (semantic commit messages help)
✅ Local-first, no cloud	⚠️ No cross-machine sync out of the box (use git for `.brain/shared/`)
✅ Deterministic (replays produce same output)	⚠️ Hit@1 = 24.5% → orients, doesn't replace exploration
✅ Works on any size repo	⚠️ Best on medium/large repos (small repos don't benefit much)

This is NOT:

❌ a vector DB memory
❌ a RAG system
❌ an SWE-agent / autonomous coder
❌ a checkpoint / replay tool

This IS:

✅ a repo-aware, team-aware, token-efficient awareness layer
✅ a force multiplier for Claude Code, not a replacement

🛣️ Roadmap

[x] BM25 + code graph + semantic reranker
[x] Decision lifecycle with semantic supersession
[x] Feedback loop with precision/recall reconciliation
[x] Observability dashboard
[x] SWE-bench Full benchmark (2294 instances)
[x] Multi-language code graph (Python, JS, TS, Go, Rust, Java, C#)
[ ] Cross-repo memory federation (opt-in)
[ ] Real-time conflict push (currently pull-based)
[ ] VS Code extension companion
[ ] Hosted shared .brain/ for distributed teams (still local-first per dev)

🧪 Run the test suite

pip install -e ".[dev]"
pytest tests/ -v

Expected: full pass on Python 3.10, 3.11, 3.12.

🤝 Contributing

PRs welcome. Before opening one:

pytest tests/ -v must pass
New behavior needs new tests
New MCP tools must be wrapped with @observed("brain_<name>")
Avoid heavy dependencies for the default install path — anything ML-flavored goes behind an optional extra

📄 License

MIT — see LICENSE.

Built for Claude Code — but the architecture is MCP-standard, so any MCP-compatible client works.

If mcp-brain saved you tokens, ⭐ the repo. That's the only payment I ask for.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

mcp-brain

README

mcp-brain

🚀 TL;DR

📑 Table of Contents

🚨 The Problem

⚡ What mcp-brain Changes

🧬 Core idea

⏱️ In 60 seconds

🔑 How It Works

🧠 Memory Hierarchy

🔍 Prediction Pipeline

🔄 Decision Lifecycle

🏗️ Architecture

Repo layout

📊 Benchmark Results

Dataset: SWE-bench Full

Results — mcp-brain v1.4.0 (BM25 + graph + semantic)

Honest comparison vs. literature

Reproduce it yourself

💰 Token Efficiency

The math

Estimated saving

Why this works

🚀 Quick Start

Install — one command, batteries included

Lean install paths

Register with Claude Code

Initialize your project

🧠 MCP Tools

Example L1 context output (~100 tokens)

💼 Use Cases

🎯 Solo developer

👥 Small team (3–10 devs)

🏢 Enterprise (with caveats)

❓ FAQ

⚠️ Trade-offs

🛣️ Roadmap

🧪 Run the test suite

🤝 Contributing

📄 License

Recommended Servers

Results — `mcp-brain` v1.4.0 (BM25 + graph + semantic)