mcp-reposkein
Deterministic code-graph (GraphRAG) over your repo for LLM agents — local-first, git-native, zero-infra, served via MCP. Python, TS/JS, Rust, Go, Java, C#.
README
<div align="center">
<!-- animated gradient name banner (deep-navy → teal → amber) --> <img src="https://capsule-render.vercel.app/api?type=waving&color=0:070A12,45:2DD4BF,100:F2B84B&height=200§ion=header&text=RepoSkein&fontColor=EAE7DC&fontSize=72&fontAlignY=38&animation=fadeIn&desc=Thread%20your%20repo%20into%20agent-ready%20context&descSize=17&descAlignY=60" width="100%" alt="RepoSkein — thread your repo into agent-ready context" />
<!-- animated typing tagline --> <a href="https://github.com/reposkein/reposkein"> <img src="https://readme-typing-svg.demolab.com?font=JetBrains+Mono&weight=600&size=21&pause=1200&color=2DD4BF¢er=true&vCenter=true&width=780&height=38&lines=A+deterministic+code+graph+for+AI+agents;Navigate+structure%2C+not+grep-and-guess;7+languages+%C2%B7+zero-infra+%C2%B7+git-native;~8.4x+fewer+context+tokens+than+grep" alt="A deterministic code graph for AI agents" /> </a>
<br /><br />
🔭 Live demo → — RepoSkein's own graph, rendered as an interactive 3D constellation in your browser.
</div>
Introduction
RepoSkein gives your AI coding agent a map of your codebase — so it navigates structure instead of grepping and guessing.
It uses Tree-sitter to build a deterministic Code Property Graph of your repo — files, classes, functions, imports, and call edges — and serves it to any MCP-capable agent (Claude Code, Cursor, Codex, …). As the agent works, it writes short natural-language summaries onto graph nodes; those summaries are versioned in git alongside the code, so an agent's understanding becomes shared team memory that the next agent — or teammate — starts from.
Who it's for: developers using AI coding agents on real, large, or nested/polyglot codebases, who are tired of the agent burning its context window on grep; and teams who want that hard-won understanding to persist and be shared rather than re-derived every session.
- ⚡ Zero-infra — no database, no Docker. The graph lives in committed
.reposkein/*.jsonlfiles. - 🔒 Deterministic — same code → byte-identical graph. No LLM in the construction path.
- 🌐 7 languages — Python, TypeScript, JavaScript, Rust, Go, Java, C#.
- 🧩 Local-first & git-native — the graph and its summaries travel with your code.
| Your agent asks | RepoSkein answers — directly from the graph |
|---|---|
"Who calls charge()?" |
the exact callers, with one-line summaries |
| "What breaks if I change this?" | the impacted callers + the tests that cover them |
| "Where do I even start?" | ranked entry-point functions by meaning, not filename |
| "What usually changes with this file?" | co-change history from git |
In a deterministic, no-LLM benchmark, RepoSkein surfaces the right functions with a mean ~8.4× fewer context tokens than a grep-based agent on structural queries.
Table of contents
- Prerequisites
- Installation
- Usage — working with your agent
- Supported languages
- How it works
- Visualize the graph — the constellation viewer
- MCP tools
- Optional: semantic embeddings
- Optional: Neo4j backend
- Benchmarks
- Build from source
- Documentation
- Contributing
- Acknowledgements
- Contact
- License
Prerequisites
- Node.js 18+ — to run
npx @reposkein/mcp(the indexer binary is fetched automatically). - An MCP-capable agent — Claude Code, Cursor, Codex, Zed, etc.
- A git repository to index (RepoSkein installs git hooks and commits the graph).
- Optional: Docker (only for the embeddings server or the Neo4j backend); Rust (only to build from source).
Installation
In the repo you want your agent to understand:
npx @reposkein/mcp init
This downloads the indexer for your platform, installs git hooks + the navigation skill, builds the initial code graph, and prints an MCP config block. Then:
- Add the printed config to your agent (e.g. Claude Code's
.mcp.json):{ "mcpServers": { "reposkein": { "command": "reposkein-mcp", "env": { "REPOSKEIN_REPO_PATH": "/path/to/your/repo" } } } } - Verify and commit the graph (
initalready built it):Re-index after big changes withreposkein-mcp doctor . # ✓ binary ✓ indexed (N nodes) ✓ ready git add .reposkein && git commit -m "add RepoSkein code graph"reposkein-mcp index .(or the agent'sreindex_filetool). - Ask your agent "what calls this function?" or "what breaks if I change X?" — it answers from the graph.
Prefer to let your agent set it up? Install the skills and tell it to run the
reposkein-setupskill — it installs, indexes, and verifies everything:npx skills add reposkein/reposkein --all
Platforms: prebuilt binaries for macOS (Apple Silicon), Linux (x64/arm64), and Windows (x64). Elsewhere, point REPOSKEIN_INDEXER_BIN at a from-source build.
Usage — working with your agent
You ask in plain language; the bundled reposkein-graph-rag skill drives the tools. The natural loop:
- Find where to start —
semantic_find("jwt auth validation")ranks the right functions by meaning, no symbol name needed. → "where's the rate limiter?" - Understand it —
get_context_profilereturns the node's callers + callees as ready-to-read prose (hops: 2widens,federated: truespans nested repos). - Before you change it —
impactlists transitive callers (what could break) split from the tests that cover it (what to run). → "what breaks if I changecharge()?" - What moves with it —
get_temporal_contextsurfaces files that historically change together, plus churn and ownership. → "what usually changes with the auth config?" - Record what you learned —
write_semantic_summaryattaches a 1–3 sentence note to the node, committed to git for the next agent/teammate. - After editing —
reindex_filerefreshes the graph for the changed file.
<details> <summary><b>Example interaction</b></summary>
You: Refactor how we validate auth tokens — what's involved?
Agent: → semantic_find("validate auth token") → resolves auth.validate_jwt
→ get_context_profile(auth.validate_jwt) → 3 callers, 2 callees (+ summaries)
→ impact(auth.validate_jwt) → 6 impacted functions, 4 covering tests
→ get_temporal_context("src/auth/jwt.py") → co-changes with config/keys.py (conf 0.8)
Agent: "validate_jwt is called by the login, refresh, and middleware paths;
changing its signature touches 6 functions and 4 tests (test_jwt.py,
test_middleware.py, …). Heads up: jwt.py historically changes together
with config/keys.py — you'll likely need to update both."
🎥 A short screen recording is on the roadmap — see Documentation. </details>
Agent skills
RepoSkein ships two cross-agent Agent Skills — npx skills add reposkein/reposkein --all installs both into Claude Code, Cursor, Codex, and 70+ agents:
reposkein-setup— installs RepoSkein in a repo and verifies it's running (binary → index → MCP reachability). Ask your agent to run it.reposkein-graph-rag— teaches your agent when to use each tool (the loop above).reposkein-mcp initinstalls it automatically for Claude Code.
Supported languages
| Language | Definitions | Imports → edges | Cross-file calls |
|---|---|---|---|
| Python | functions, classes, methods, nested defs, vars | ✅ relative / absolute / aliased | import-resolved (exact) |
| TypeScript / TSX | classes, interfaces, enums, methods, arrows | ✅ named / default / aliased / * as ns |
import-resolved (exact) |
| JavaScript / JSX | (via the TS grammar) | ✅ ES imports (no CommonJS yet) | import-resolved (exact) |
| Rust | fns, structs, traits, enums, impl methods |
✅ use (groups, aliases, globs, pub use chains; workspace-aware) |
import-resolved (exact) |
| Go | funcs, methods (Type.method), structs, interfaces |
not yet (cross-package planned) | same-package (same-dir); cross-package by name |
| Java | classes, records, interfaces, enums, methods, constructors, fields | ✅ package-path (no wildcard/static yet) | import-resolved (exact) |
| C# | classes, structs, records, interfaces, enums, methods, properties | not yet (cross-namespace planned) | same-dir; cross-namespace by name |
What resolves — honestly. Every edge carries a resolution (exact / name_match / ambiguous) + confidence, so your agent knows what to trust. Same-file calls, self/this methods, and import-followed free-function calls resolve exact. Python module-alias calls (import foo as f; f.bar()) resolve exact to the target module's function. Cross-file INHERITS/IMPLEMENTS edges are resolved repo-wide: import-followed bases resolve exact (confidence 1.0); unique same-directory or repo-wide bases resolve name_match (0.8/0.7); ambiguous bases are skipped to avoid false hierarchy edges — and bases that live in a federated child repo are stitched into cross-repo heritage edges at load time. Go's struct/interface embedding (type Dog struct { Animal }) is captured as INHERITS. Constructors emit a distinct INSTANTIATES edge (new Foo() in TS/Java/C#, Foo { .. } and Foo::new() in Rust, Foo{} / &Foo{} composite literals in Go, and Python Foo() whose name resolves to a class) so an agent can ask who creates instances of this type — resolved against the type index and skipped when ambiguous. The graph is type-free by design (deterministic, no compiler in the loop), but it does track types where it can do so soundly from source alone: when a local is assigned a constructor (x = Foo(); x.bar()), that x.bar() resolves exact to Foo.bar (intraprocedural receiver typing). Method calls on receivers it can't trace that way (parameters, fields, return values) resolve by name (≤ name_match), and overloaded calls are flagged ambiguous. Go and C# don't emit import edges yet, so their cross-package/namespace calls resolve by name (same-package/-directory calls do resolve). These limits are inherent to the zero-infra, type-free design; a deeper optional type-aware layer (SCIP) is gated on benchmark evidence. Adding a language is a well-trodden path — contributions welcome.
How it works
Your agent (Claude Code / Cursor / …) ── guided by the reposkein skill
│ MCP
▼
@reposkein/mcp semantic_find · get_context_profile · impact · get_temporal_context
(TypeScript) read_cypher · write_semantic_summary · init_cpg_skeleton · reindex_file
CLI: init · doctor · index · view
│ reads
▼
.reposkein/*.jsonl ← the code graph, committed to git (zero-infra, in-memory store)
▲ writes
│
reposkein-indexer Tree-sitter parse → stable IDs → canonical JSONL
(Rust) + git hooks & a 3-way merge driver for conflict-free summaries
- Structure is static. The skeleton comes only from parsing — identical code produces a byte-identical graph (a CI-tested invariant), independent of who runs it.
- Meaning is just-in-time. Summaries are written as the agent visits nodes; they're content-hash-stamped (so they flag stale when code changes) and committed to git.
- Local-first. The committed JSONL is the source of truth; the optional Neo4j backend is a reconstructable projection most users never need.
Cross-repo federation
Got nested repositories (a monorepo of indexed repos)? RepoSkein discovers them, links them with FEDERATES_TO, and stitches cross-repo call, import, and heritage edges (INHERITS/IMPLEMENTS to a base in a child repo) at load time. Pass federated: true to traverse across repo boundaries. Federation edges are derived at load (never committed), so each repo stays independently deterministic.
Visualize the graph — the constellation viewer
reposkein-mcp view . # opens http://127.0.0.1:<port> in your browser
view starts a local, read-only, zero-infra web app (React + three.js, bound to 127.0.0.1) that renders the committed .reposkein graph as an interactive 3D astronomy-style constellation. There's no Neo4j and no external service — it reads the committed JSONL directly and never mutates it. Try the live demo → (RepoSkein viewing its own multi-language graph).
The map is deterministic: a seeded force layout means the same graph always lays out the same way (cached in IndexedDB for instant reloads), and the layout is render-time only — it never touches the committed JSONL. Levels of detail map onto an astronomy metaphor — Repository → Directory → File → Symbol become galaxy → constellation → solar-system → star — so you zoom or click to expand a cluster (a brief supernova animation) and click a star to inspect it. Federation galaxies and agent-written summaries render when present.
- Legible — per-edge-type colors + legend, importance-sized stars, adaptive labels, breadcrumb, per-language galaxy coloring, depth fog / bloom / nebula halos.
- Edges encode resolution — color = edge type (
CALLS/IMPORTS/INHERITS/IMPLEMENTS/INSTANTIATES), opacity = confidence (exact/name_match/ambiguous), and flow particles show call direction. - Analytical — one-click lenses (call graph / type hierarchy / imports / tests), an impact overlay (transitive callers + covering tests), a confidence-audit mode (see where the type-free resolver guesses), and a temporal-coupling overlay (git co-change).
- Explorable — ranked search-to-fly, N-hop neighborhood focus, source peek in the detail panel (a path-guarded read-only file slice + an "Open in editor"
vscode://link), keyboard nav (/search,fframe-all, arrows to hop neighbors,Escback), a minimap, and PNG screenshot export. - Guided tour — a cinematic, deterministically-derived flythrough (overview → largest modules → busiest hub → type hierarchy → entry point) with captions.
reposkein-mcp view --export ./site . # write a self-contained static site
--export bakes the graph into graph-data.js (as window.__REPOSKEIN_GRAPH__) and emits a self-contained static site — it works from file:// or any static host with no server, which is exactly how the live demo above is published. Handy for sharing a snapshot, embedding in docs, or a project landing page.
MCP tools
| Tool | What it does |
|---|---|
semantic_find |
find where to start — rank functions/classes by meaning (lexical BM25F; optional embeddings) |
get_context_profile |
resolve a function/class → its caller/callee neighborhood as ready-to-read prose |
impact |
transitive callers split into impacted code vs covering tests |
get_temporal_context |
git-derived co-change, churn, and ownership for a file |
read_cypher |
read-only graph queries (writes rejected, results capped) |
write_semantic_summary |
attach a hash-stamped summary to a node |
init_cpg_skeleton |
build/rebuild the graph |
reindex_file |
refresh after editing a file |
The reposkein-mcp CLI adds init (set up a repo), doctor (health check), index (rebuild the graph), and view (the constellation viewer; --export <dir> writes a self-contained static site).
Optional: semantic embeddings
By default semantic_find is deterministic and lexical (BM25F — zero-infra, no keys). You can opt into a hybrid tier (lexical + embedding cosine, fused via RRF) for fuzzier queries. It's default-off, vectors are cached in .reposkein/local/embeddings/ (gitignored, never committed), and it falls back to lexical automatically on any error. Set env vars on the MCP server and pick one:
A) Voyage AI — cloud, easiest, best for code
Get a key, then:
REPOSKEIN_EMBED_PROVIDER=voyage
VOYAGE_API_KEY=pa-...
# optional: REPOSKEIN_EMBED_MODEL=voyage-code-3 # default — code-specialized
Sends document strings (qualified names, signatures, summaries) to Voyage's API. Use B or C if you can't egress code.
B) Ollama — local, off-the-shelf, no key
ollama pull nomic-embed-text # 768-dim (or mxbai-embed-large=1024, bge-m3=1024)
REPOSKEIN_EMBED_PROVIDER=http
REPOSKEIN_EMBED_URL=http://127.0.0.1:11434/v1/embeddings
REPOSKEIN_EMBED_MODEL=nomic-embed-text
REPOSKEIN_EMBED_DIMS=768 # must match the model
C) Voyage's open model, self-hosted — offline + Voyage quality
voyage-4-nano (Apache-2.0) is a custom Qwen3-based model Ollama can't run, so RepoSkein ships a prebuilt server. The image is published to GHCR — public and multi-arch (amd64/arm64) — so there's nothing to build:
docker run -p 8080:8080 -v reposkein-hf:/root/.cache/huggingface \
ghcr.io/reposkein/reposkein-embed # auto-picks your architecture; first run downloads the model
REPOSKEIN_EMBED_PROVIDER=http
REPOSKEIN_EMBED_URL=http://127.0.0.1:8080/v1/embeddings
REPOSKEIN_EMBED_MODEL=voyage-4-nano
REPOSKEIN_EMBED_DIMS=1024 # must equal the server's EMBED_DIMS
Everything stays on your machine. The image is CPU-only and runs with no NVIDIA GPU on Apple Silicon / ARM unified-memory, x64 Linux, and Windows (CI builds + smoke-tests both arches). Docker can't use Apple's Metal/MPS — for that, run the server natively with EMBED_DEVICE=mps. Full details (root docker compose up, GPU, other models): embed-server/README.md.
REPOSKEIN_EMBED_DIMSon the client must match the model's output dimension, or cosine scoring is skipped.
Optional: Neo4j backend
The zero-infra JSONL store is the default. Neo4j is an optional projection for very large graphs and raw Cypher at scale:
docker compose --profile neo4j up -d # from the repo root
NEO4J_PASSWORD=reposkeintest reposkein-indexer load .
Then set REPOSKEIN_STORE=neo4j + the NEO4J_* env vars on the MCP server. (REPOSKEIN_STORE=auto, the default, uses JSONL when present and falls back to Neo4j only if configured.)
Benchmarks
Two tracks, both under mcp/bench/:
- Track 1 — retrieval efficiency (deterministic, no LLM): RepoSkein vs a grep agent on hand-labeled tasks → mean ~8.4× fewer context tokens on structural queries, at F0.5 = 1.00 vs grep 0.11–0.71. Details.
- Track 2 — end-task (SWE-bench-Verified): a minimal agent loop where the only difference is the navigation toolset (RepoSkein vs grep), graded on resolve-rate + tokens + turns. Built + unit-tested; the API+Docker run is opt-in.
Build from source
Requirements: Rust (stable), Node 24. Docker only for the optional Neo4j backend.
cd indexer && cargo build --release # → indexer/target/release/reposkein-indexer
cd ../mcp && npm install && npm run build
Wire it into your agent with command: node, args: [".../mcp/dist/index.js"], env REPOSKEIN_REPO_PATH + REPOSKEIN_INDEXER_BIN. Tests: cd indexer && cargo test && cargo clippy --all-targets -- -D warnings; cd mcp && npm test.
Repository layout
indexer/ Rust workspace: core, lang-{python,ts,rust,go,java,csharp}, lang-common, neo4j-io, cli
mcp/ @reposkein/mcp — the TypeScript MCP server (tools + graph-store backends)
mcp/bench/ benchmarks: retrieval efficiency (Track 1) + end-task SWE-bench harness (Track 2)
skills/ reposkein-graph-rag + reposkein-setup — cross-agent skills (skills.sh)
embed-server/ one-command local embedding server (voyage-4-nano) for hybrid semantic_find
viz/ @reposkein/viz — the 3D constellation viewer SPA (served by `reposkein-mcp view`)
Documentation
| Doc | What's in it |
|---|---|
mcp/README.md |
the @reposkein/mcp package — tools, config, env vars |
viz/README.md |
the @reposkein/viz constellation viewer — architecture, dev/build |
embed-server/README.md |
the local embedding server — Docker/GHCR, platforms, GPU |
mcp/bench/README.md |
Track 1 retrieval benchmark — method + results |
mcp/bench/track2/README.md |
Track 2 end-task (SWE-bench) harness |
CHANGELOG.md |
release history (Keep a Changelog) |
skills/ |
the two cross-agent skills |
Contributing
Contributions are welcome — bug fixes, new languages, docs. See CONTRIBUTING.md for the dev setup, the determinism invariants you must preserve, and the step-by-step recipe for adding a new language (it's a well-trodden path — Go, Java, and C# were each added the same way). RepoSkein uses Conventional Commits and keeps CI green (determinism gates + clippy + tests).
Acknowledgements
- Tree-sitter — the parsers behind every language extractor.
- Model Context Protocol — the agent integration standard.
- Voyage AI —
voyage-code-3and the open-weightvoyage-4-nanopowering the optional embeddings tier. - Discovery via Glama, skills.sh, mcpservers.org, and the awesome-mcp community lists.
- README header by capsule-render + readme-typing-svg.
Contact
- 🐛 Bugs / features: open an issue
- 💬 Questions / ideas: GitHub Discussions
License
<div align="center"> <img src="https://capsule-render.vercel.app/api?type=waving&color=0:F2B84B,55:2DD4BF,100:070A12&height=120§ion=footer" width="100%" alt="" /> <sub>Built for agents that read structure, not noise.</sub> </div>
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.