astrograph
An MCP server that detects duplicate code using AST graph isomorphism, blocking writes of structurally identical code to reduce redundancy.
README
ASTrograph
<p align="center"> <img src="astrograph_poster.jpg" alt="ASTrograph" width="400"> </p>
An MCP server that helps AI agents detect duplicate code before writing it. It provides write and edit tools that compare new code against existing functions in your codebase using AST graph isomorphism — powered by algorithms, not LLM tokens. When a structural duplicate is found, the operation is blocked with a pointer to the existing code. Variable names, formatting, and comments are ignored — if two pieces of code share the same abstract structure, ASTrograph flags them as duplicates.
Installation
Add .mcp.json to your project root:
{
"mcpServers": {
"astrograph": {
"command": "docker",
"args": [
"run", "--rm", "-i", "--pull", "missing",
"--add-host", "host.docker.internal:host-gateway",
"-v", ".:/workspace",
"thaylo/astrograph:latest"
]
}
}
}
The image is multi-arch (amd64, arm64). The codebase is indexed at startup. Metadata is stored outside the project directory (in the user data dir) so it never interferes with your codebase.
To update to a new release:
docker pull thaylo/astrograph:latest
The running version is always visible in the MCP serverInfo.version field on connect.
<details> <summary><strong>Claude Desktop</strong></summary>
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"astrograph": {
"command": "docker",
"args": [
"run", "--rm", "-i", "--pull", "missing",
"--add-host", "host.docker.internal:host-gateway",
"-v", "/absolute/path/to/project:/workspace",
"thaylo/astrograph:latest"
]
}
}
}
</details>
<details> <summary><strong>Codex</strong></summary>
~/.codex/config.toml:
[mcp_servers.astrograph]
command = "docker"
args = [
"run", "--rm", "-i", "--pull", "missing",
"--add-host", "host.docker.internal:host-gateway",
"-v", "/absolute/path/to/project:/workspace",
"thaylo/astrograph:latest"
]
</details>
<details> <summary><strong>wmark</strong></summary>
~/.config/wmark/.mcp.json (user-level, applies to all projects on macOS):
{
"mcpServers": {
"astrograph": {
"command": "docker",
"args": [
"run", "--rm", "-i", "--pull", "missing",
"--add-host", "host.docker.internal:host-gateway",
"-v", "/Users:/Users:rw",
"thaylo/astrograph:latest"
]
}
}
}
Mounting /Users makes all macOS home paths accessible inside the container unchanged. Call set_workspace with the full host path (e.g. /Users/yourname/project) to index a project.
For Linux, replace /Users:/Users:rw with /home:/home:rw.
</details>
<details> <summary><strong>Without Docker</strong></summary>
pip install .
{
"mcpServers": {
"astrograph": {
"command": "python",
"args": ["-m", "astrograph.server"],
"cwd": "/path/to/astrograph"
}
}
}
</details>
How it works
Your codebase already contains:
# src/math.py
def calculate_sum(a, b):
return a + b
An AI agent tries to write:
# src/utils.py
def add_numbers(x, y):
return x + y
ASTrograph detects the duplicate and blocks the write:
BLOCKED: Cannot write - identical code exists at src/math.py:calculate_sum (lines 1-2).
Reuse the existing implementation instead.
Different variable names, identical structure. Source code is converted into labeled directed graphs and compared using Weisfeiler-Leman hashing with VF2 isomorphism verification — all algorithmic, no LLM tokens spent on the search.
Detection types
ASTrograph detects four types of structural duplication:
| Type | What it catches | How it works |
|---|---|---|
| Exact | Identical AST structure with renamed variables or different formatting | WL hash identity + VF2 graph isomorphism verification |
| Pattern | Same control flow with different operators or constants | Operator-normalized graph hashing |
| Block | Duplicate inner blocks (for/if/while/try) within functions | Block-level AST extraction + hash matching |
| Near-duplicate | ~80% structural similarity — copy-paste-modify patterns | Hierarchy hash prefix matching at 4/5 depth levels |
Near-duplicate detection catches Type-3 clones that exact and pattern detection miss. For example, Flask's TagBytes, TagDateTime, TagTuple, and TagUUID classes share 80%+ identical structure but differ in leaf-level details.
Language support
Python, JavaScript, and TypeScript work out of the box. C, C++, Java, and Go attach to an already-running language server over TCP.
| Language | Versions | Mode | Default endpoint |
|---|---|---|---|
| Python | 3.11 -- 3.14 | bundled | pylsp |
| JavaScript | ES2021+, Node 20/22/24 LTS | bundled | typescript-language-server --stdio |
| TypeScript | TypeScript 5.x, Node 20/22/24 LTS | bundled | typescript-language-server --stdio |
| Go | 1.21 -- 1.25 | attach | tcp://127.0.0.1:2091 |
| C | C11, C17, C23 | attach | tcp://127.0.0.1:2087 |
| C++ | C++17, C++20, C++23 | attach | tcp://127.0.0.1:2088 |
| Java | 11, 17, 21, 25 | attach | tcp://127.0.0.1:2089 |
The Docker image bundles Python and JS/TS LSP runtimes. For attach-based languages, expose the language server on a TCP port using socat and configure via your MCP JSON:
{
"mcpServers": {
"astrograph": {
"command": "docker",
"args": ["run", "--rm", "-i", "--add-host", "host.docker.internal:host-gateway", "-v", ".:/workspace", "thaylo/astrograph:latest"],
"env": {
"ASTROGRAPH_CPP_LSP_COMMAND": "tcp://host.docker.internal:2088",
"ASTROGRAPH_GO_LSP_COMMAND": "tcp://host.docker.internal:2091",
"ASTROGRAPH_JAVA_LSP_COMMAND": "tcp://host.docker.internal:2089",
"ASTROGRAPH_C_LSP_COMMAND": "tcp://host.docker.internal:2087"
}
}
}
}
| Language | Env var | Socat bridge example |
|---|---|---|
| C | ASTROGRAPH_C_LSP_COMMAND |
socat TCP-LISTEN:2087,reuseaddr,fork EXEC:clangd |
| C++ | ASTROGRAPH_CPP_LSP_COMMAND |
socat TCP-LISTEN:2088,reuseaddr,fork EXEC:clangd |
| Java | ASTROGRAPH_JAVA_LSP_COMMAND |
socat TCP-LISTEN:2089,reuseaddr,fork EXEC:jdtls |
| Go | ASTROGRAPH_GO_LSP_COMMAND |
socat TCP-LISTEN:2091,reuseaddr,fork EXEC:"gopls serve" |
| Python | ASTROGRAPH_PY_LSP_COMMAND |
(bundled, override if needed) |
| JS | ASTROGRAPH_JS_LSP_COMMAND |
(bundled, override if needed) |
| TS | ASTROGRAPH_TS_LSP_COMMAND |
(bundled, override if needed) |
Run lsp_setup(mode='inspect') to see which languages are available and what's missing.
Real-world results
Tested on popular open-source projects:
| Project | Language | Files | Code Units | Duplicates Found |
|---|---|---|---|---|
| Redis | C | 208 | 18,272 | 556 groups |
| TypeORM | TypeScript | 492 | 7,107 | 511 groups |
| Express.js | JavaScript | 141 | 3,866 | 468 groups |
| nlohmann/json | C++ | 488 | 9,103 | 959 groups |
| Gin | Go | 99 | 1,557 | 141 groups |
| Flask | Python | 24 | 910 | 48 groups |
| Spring PetClinic | Java | 47 | 270 | 17 groups |
Exact, pattern, and block findings are verified via VF2 graph isomorphism. Near-duplicates are matched via hierarchy hash prefix similarity (~80% structural identity).
Star History
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.