repo-graphrag-mcp

repo-graphrag-mcp

An MCP server that builds a knowledge graph from code and text documents, enabling Q\&A and implementation planning via tools like graph_create, graph_plan, and graph_query.

Category
Visit Server

README

Repo GraphRAG MCP Server

Repo GraphRAG MCP Server is an MCP (Model Context Protocol) server that uses LightRAG and Tree-sitter to build a knowledge graph from code and text-based documents (text-only; PDFs/Word/Excel are not parsed) in a repository/directory, and leverages it for Q&A and implementation planning. It provides tools for graph building (graph_create), implementation planning (graph_plan), and Q&A (graph_query).

  • šŸ“Š Knowledge graph creation (graph_create): Analyze code/documents to build a knowledge graph and embedding index (supports incremental updates)
  • šŸ”§ Implementation planning (graph_plan): Output implementation plans and concrete change steps for modification/addition requests based on the knowledge graph (optionally combined with vector search)
  • šŸ” Q&A (graph_query): Answer questions based on the knowledge graph (optionally combined with vector search)

Table of Contents

šŸš€ Quick Start

Prerequisites

  • python 3.11+
  • uv package manager
  • Credentials for your chosen LLM provider (set the required environment variables; see the LLM Providers section below)

1. Installation

# Clone from GitHub
git clone https://github.com/yumeiriowl/repo-graphrag-mcp.git
cd repo-graphrag-mcp

# Install dependencies
uv sync

2. Environment Setup

# Copy the settings file
cp .env.example .env

# Edit the settings file
nano .env  # or any editor

3. Environment Variables (LLM Setup)

Configure settings in the .env file:

Example: Using Anthropic models

# LLM provider for graph creation
GRAPH_CREATE_PROVIDER=anthropic  # or openai, gemini, azure_openai

# Provider for planning and Q&A
GRAPH_ANALYSIS_PROVIDER=anthropic # or openai, gemini, azure_openai

# API keys (set the variables corresponding to your chosen provider)
ANTHROPIC_API_KEY=your_anthropic_api_key # or openai, gemini, azure_openai

# AZURE_OPENAI_API_KEY=your_azure_openai_api_key
# AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
# AZURE_API_VERSION=azure_openai_api_version

# OPENAI_API_KEY=your_openai_api_key
# OPENAI_BASE_URL=http://localhost:1234/v1  # For LM Studio or other OpenAI-compatible local servers

# GEMINI_API_KEY=your_gemini_api_key

# LLM model for graph creation
GRAPH_CREATE_MODEL_NAME=claude-haiku-4-5

# LLM model for planning and Q&A
GRAPH_ANALYSIS_MODEL_NAME=claude-sonnet-4-5

4. MCP Client Setup

Claude Code

claude mcp add repo-graphrag \
-- uv --directory /absolute/path/to/repo-graphrag-mcp run server.py

VS Code GitHub Copilot Extensions

mcp.json:

{
  "servers": {
    "repo-graphrag-server": {
      "type": "stdio",
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/repo-graphrag-mcp",
        "run",
        "server.py"
      ]
    }
  }
}

Other MCP Clients

Any client that supports the MCP protocol can be used.

5. Usage

The following tools are available in MCP clients. All commands must start with graph:.

graph_create - Build/Update Knowledge Graph

Analyze the target repository/directory and build a knowledge graph and vector embedding index (supports incremental updates). Uses GRAPH_CREATE_PROVIDER and GRAPH_CREATE_MODEL_NAME.

Elements:

  • graph: (required)
  • Directory path to analyze (absolute path recommended)
  • Storage name to create (default: "storage")

Examples:

graph: /absolute/path/to/your/repository my_project
graph: /absolute/path/to/your/repository my_project graphify
graph: C:\\projects\\myapp webapp_storage please create storage

About Incremental Updates: When you run graph_create again with an existing storage name, only changed/added/deleted files are reanalyzed; others are skipped. If you want to rebuild after changing the embedding model or extraction settings (DOC_DEFINITION_LIST, NO_PROCESS_LIST, target extensions, etc.), delete the existing storage or specify a new storage name and recreate with graph_create or standalone_graph_creator.py.

Note (Performance): The first graph creation takes longer as the number of files increases. As a guideline, if there are more than 1,000 files, consider narrowing the target directory (processing time depends on environment and file sizes). Incremental updates reanalyze only the diffs, so the above guideline does not necessarily apply to updates.

Note (First download): If the specified embedding model is not cached on first graph creation, it will be automatically downloaded (subsequent runs use the cache).

graph_plan - Implementation Support

Based on the knowledge graph (optionally combined with vector search), provide a detailed implementation plan and instructions so that the MCP client (agent) can perform actual work. Uses GRAPH_ANALYSIS_PROVIDER and GRAPH_ANALYSIS_MODEL_NAME.

Elements:

  • graph: (required)
  • Implementation/modification request
  • Storage name (default: "storage")

Examples:

graph: I want to add user authentication my_project
graph: my_project Add GraphQL support to the REST API
graph: Improve API performance under high load webapp_storage

graph_query - Q&A

Based on the knowledge graph (optionally combined with vector search), answer questions about the target repository/directory. Uses GRAPH_ANALYSIS_PROVIDER and GRAPH_ANALYSIS_MODEL_NAME.

Elements:

  • graph: (required)
  • Question content
  • Storage name (default: "storage")

Examples:

graph: Tell me about this project's API endpoints my_project
graph: my_project Explain the main classes and their roles
graph: About the database design webapp_storage

āš™ļø Configuration Options

LLM Providers

Supported providers and required environment variables

Provider Identifier Required environment variables
Anthropic Claude anthropic ANTHROPIC_API_KEY
OpenAI GPT openai OPENAI_API_KEY
Google Gemini gemini GEMINI_API_KEY
Azure OpenAI azure_openai AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_API_VERSION

Specify the identifiers in .env as GRAPH_CREATE_PROVIDER / GRAPH_ANALYSIS_PROVIDER.

Embedding Model

  • Default: voyageai/voyage-4-nano

  • Compatibility: Supports Hugging Face sentence-transformers compatible models

  • First run: If the specified embedding model is not cached, it will be downloaded automatically. Cache location depends on environment/settings. Download time and disk space depend on model size.

  • Authenticated models: For Hugging Face models that require authentication, set HUGGINGFACE_HUB_TOKEN in .env.

    HUGGINGFACE_HUB_TOKEN=your_hf_token
    

Planning/Query Settings for graph_plan and graph_query

Implementation note: The settings in this section are passed directly to LightRAG's built-in QueryParam. This MCP does not implement custom retrieval or token-budgeting logic; it reuses LightRAG's behavior as-is.

Retrieval/Search Modes

Search modes follow LightRAG. Set one of the following in .env SEARCH_MODE.

  • mix: Combination of vector search and knowledge graph search (recommended)
  • hybrid: Combination of local and global search
  • naive: Simple vector search
  • local: Community-based search
  • global: Global community search

Token Budgets (Input-side)

Input-side token budgets control how much context is assembled for planning and Q&A (LightRAG QueryParam). These are independent from model output token limits.

  • MAX_TOTAL_TOKENS: Overall input context budget per query (entities + relations + retrieved chunks + system prompt). Default: 30000.
  • MAX_ENTITY_TOKENS: Budget for entity context (input-side). Default: 6000.
  • MAX_RELATION_TOKENS: Budget for relation context (input-side). Default: 8000.

Note: Output token limits are controlled separately via GRAPH_ANALYSIS_MAX_TOKEN_SIZE (for planning/Q&A) and GRAPH_CREATE_MAX_TOKEN_SIZE (for graph creation tasks). If you increase input budgets significantly, ensure your model's total context window can accommodate both input and output.

Entity Merge

This MCP can merge entities extracted from documents with entities extracted from code based on semantic similarity. The goal is to unify references (e.g., a class or function defined in code and mentioned in documentation) into a single consolidated entity.

  • How it works: Names are normalized and filtered via exclusion rules; document entities and current-pass code entities are embedded and compared using cosine similarity (FAISS). Pairs above the threshold are merged, consolidating descriptions and file paths.
  • Controls:
    • MERGE_ENABLED (default: true): Toggle entity merge.
    • MERGE_SCORE_THRESHOLD (default: 0.95): Cosine similarity threshold for merging.
    • Exclusion settings: MERGE_EXCLUDE_* lists, private name exclusion, name length bounds, and custom patterns.
  • Execution:
    • When enabled, merge runs within the graph creation/update flow (after entity extraction).
    • You can also run the standalone tool: uv run standalone_entity_merger.py <storage_dir_path>

Detailed Environment Variables

All environment variables and defaults can be configured by copying .env.example to .env.

Quick reference for all items

Variable Purpose/Description
GRAPH_CREATE_PROVIDER LLM provider for graph creation
GRAPH_ANALYSIS_PROVIDER LLM provider for planning/Q&A
ANTHROPIC_API_KEY Anthropic API key
AZURE_OPENAI_API_KEY Azure OpenAI API key
AZURE_OPENAI_ENDPOINT Azure OpenAI endpoint URL
AZURE_API_VERSION Azure OpenAI API version
OPENAI_API_KEY OpenAI API key
OPENAI_BASE_URL OpenAI-compatible endpoint base URL (e.g. LM Studio http://localhost:1234/v1)
GEMINI_API_KEY Google Gemini API key
GRAPH_CREATE_MODEL_NAME LLM model name for graph creation
GRAPH_ANALYSIS_MODEL_NAME LLM model name for planning/Q&A
GRAPH_CREATE_MAX_TOKEN_SIZE Max output tokens for LLM during graph creation
GRAPH_ANALYSIS_MAX_TOKEN_SIZE Max output tokens for LLM during planning/Q&A
MAX_TOTAL_TOKENS Overall input-side token budget per planning/query (entities + relations + chunks + system)
MAX_ENTITY_TOKENS Input-side token budget for entity context
MAX_RELATION_TOKENS Input-side token budget for relation context
EMBEDDING_MODEL_NAME Embedding model name (Hugging Face)
EMBEDDING_DIM Embedding vector dimension
EMBEDDING_MAX_TOKEN_SIZE Max token length for embedding
HUGGINGFACE_HUB_TOKEN HF auth token (optional)
PARALLEL_NUM Parallelism (concurrent LLM/embedding tasks)
CHUNK_MAX_TOKENS Max tokens per chunk
MAX_DEPTH Max Tree-sitter traversal depth
RATE_LIMIT_MIN_INTERVAL Minimum interval between API calls (seconds)
RATE_LIMIT_ERROR_WAIT_TIME Wait time on rate limit errors (seconds)
SEARCH_TOP_K Number of results to retrieve in search
SEARCH_MODE Search mode (naive/local/global/hybrid/mix)
DOC_EXT_TEXT_FILES Extensions treated as document (text) files (comma-separated)
DOC_EXT_SPECIAL_FILES Special filenames without extension (text) (comma-separated)
DOC_DEFINITION_LIST Entity types to extract from documents
NO_PROCESS_LIST Files/directories to exclude (comma-separated)
MERGE_ENABLED Enable entity merge (true/false)
MERGE_SCORE_THRESHOLD Cosine similarity threshold for merge
MERGE_EXCLUDE_MAGIC_METHODS Exclusion list for magic methods
MERGE_EXCLUDE_GENERIC_TERMS Exclusion list for generic terms
MERGE_EXCLUDE_TEST_RELATED Exclusion list for test-related terms
MERGE_EXCLUDE_PRIVATE_ENTITIES_ENABLED Exclude private entities (leading underscore) (true/false)
MERGE_EXCLUDE_CUSTOM_PATTERNS Additional exclusion patterns (wildcards allowed)
MERGE_MIN_NAME_LENGTH Minimum entity name length for merge
MERGE_MAX_NAME_LENGTH Maximum entity name length for merge

🧬 Supported Languages (v0.2.5)

The following 13 languages are supported:

  • Python
  • C
  • C++
  • Rust
  • C#
  • Go
  • Ruby
  • Java
  • Kotlin
  • JavaScript
  • TypeScript
  • HTML
  • CSS

šŸ—ļø MCP Structure

repo-graphrag-mcp/
ā”œā”€ā”€ README.md
ā”œā”€ā”€ AGENTS.md                 # MCP usage guide (for agents)
ā”œā”€ā”€ CHANGELOG.md              # Changelog
ā”œā”€ā”€ LICENSE                   # License (MIT)
ā”œā”€ā”€ pyproject.toml            # Package settings
ā”œā”€ā”€ server.py                 # MCP server entrypoint
ā”œā”€ā”€ .env.example              # Environment variable template
ā”œā”€ā”€ standalone_graph_creator.py   # Standalone graph builder
ā”œā”€ā”€ standalone_entity_merger.py   # Standalone entity merger
ā”œā”€ā”€ repo_graphrag/            # Package
│   ā”œā”€ā”€ config/               # Configuration
│   ā”œā”€ā”€ initialization/       # Initialization
│   ā”œā”€ā”€ llm/                  # LLM clients
│   ā”œā”€ā”€ processors/           # Analysis/graph building
│   ā”œā”€ā”€ utils/                # Utilities
│   ā”œā”€ā”€ graph_storage_creator.py  # Storage creation
│   └── prompts.py            # Prompts
└── logs/                     # Log output

šŸ› ļø Standalone Execution

You can also run without an MCP client:

standalone_graph_creator.py - Build Knowledge Graph

Analyze a repository and create a knowledge graph:

uv run standalone_graph_creator.py <read_dir_path> <storage_name>

Examples:

uv run standalone_graph_creator.py /home/user/myproject my_storage
uv run standalone_graph_creator.py C:\\projects\\webapp webapp_storage

standalone_entity_merger.py - Entity Merge

Merge entities within an existing storage:

uv run standalone_entity_merger.py <storage_dir_path>

Examples:

uv run standalone_entity_merger.py /home/user/myproject/my_storage
uv run standalone_entity_merger.py C:\\projects\\webapp/webapp_storage

Note:

  • The storage directory must be created beforehand by graph_create or standalone_graph_creator.py.

šŸ™ Acknowledgments

This MCP is built on the following libraries:

šŸ“„ License

This MCP is released under the MIT License. See the LICENSE file for details.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured