code-grapher
Enables Claude to intelligently analyze and query codebases using knowledge graphs, supporting natural language code search, relationship discovery, and incremental updates.
README
Code Grapher
Ultra-Fast Codebase Analysis with Knowledge Graphs and AI-Powered Intelligence
Code Grapher transforms your codebase into an intelligent knowledge graph using lightning-fast AST analysis and AI-powered semantic understanding. Built with a modern service-based architecture, it enables natural language queries about your code structure, relationships, and behavior in seconds.
π Quick Start
# Install dependencies
pip install -r requirements.txt
# Start Neo4j database
neo4j start
# Check system health
python main.py health
# Run codebase analysis with service architecture
python main.py pipeline .
# Query your code with natural language
python experiments/test_retrieval.py
# Use with Claude Desktop via MCP server
python main.py mcp
ποΈ What Code Grapher Does
Code Grapher analyzes your codebase and creates an intelligent knowledge graph that enables:
- π Natural Language Code Search: Ask "How does authentication work?" instead of grepping files
- β‘ Ultra-Fast Analysis: Complete codebase analysis in seconds using deterministic AST parsing
- π Comprehensive Relationships: Discovers INHERITS, DECORATES, CALLS relationships across your entire codebase
- π§ Incremental Updates: Updates graphs efficiently as code changes via git integration
- π― Semantic Search: Finds relevant code using meaning, not just text matching
- π€ Claude Integration: MCP server provides intelligent code context without over-fetching
- π§ AI-Powered Descriptions: Optional Ollama or Gemini integration for semantic code understanding
ποΈ Architecture Overview
βββββββββββββββββββββββ βββββββββββββββββββββββ βββββββββββββββββββββββ
β Analysis Engine β β Knowledge Graph β β Query Interface β
β β β β β β
β β’ AST Parsing βββββΆβ β’ Neo4j Graph DB ββββββ β’ Natural Language β
β β’ Entity Classificationβ β β’ Vector Embeddings β β β’ Service Architectureβ
β β’ Fast Relationshipsβ β β’ Surgical Updates β β β’ MCP Server β
β β’ AI Descriptions β β β’ Business Context β β β’ RAG Pipeline β
βββββββββββββββββββββββ βββββββββββββββββββββββ βββββββββββββββββββββββ
Modern Service-Based Architecture:
- Service Locator Pattern: Central service registry with dependency resolution
- Interface-Based Design: Clean abstractions for all major components
- Health Monitoring: Real-time service status and comprehensive diagnostics
- Vertical Slices: Modular organization by feature (ai-services/, graph-ops/, update-agents/)
- Graceful Degradation: System continues working when optional services fail
π Core Features
Lightning-Fast AST Analysis
- 3 Core Relationship Types: INHERITS (class inheritance), DECORATES (decorator usage), CALLS (function calls)
- 25+ Entity Classifications: data_class, pydantic_model, factory_function, validator_function, controller_class, service_class, and more
- Deterministic Accuracy: 100% accurate relationship detection using AST parsing
- Sub-Minute Processing: Complete analysis in 30-60 seconds for most codebases
Knowledge Graph Intelligence
- Neo4j Graph Database: Stores code structure with rich metadata and relationships
- Semantic Entity Classification: Automatically categorizes code elements by purpose and pattern
- Business Context Integration: PRIMER.md files inject domain knowledge into AI descriptions
- Graph Export Formats: D3.js and Mermaid diagrams for visualization
- Performance Tracking: Monitors processing times and relationship extraction accuracy
RAG-Powered Code Q&A
- Hybrid Retrieval: Combines vector similarity (ChromaDB) with graph traversal (Neo4j)
- Natural Language Queries: Ask "How does user authentication work?" or "What validates the configuration?"
- Context-Aware Answers: Responses include relevant code snippets and relationship explanations
- Business Logic Understanding: Queries about high-level functionality, not just technical details
AI Provider Flexibility
- Ollama Integration: Local AI (Gemma3, Llama models) for privacy-first semantic understanding
- Gemini Integration: Google's Gemini AI for enhanced code descriptions and relationship extraction
- Business Context Injection: PRIMER.md files automatically enhance AI understanding with domain knowledge
- Provider Abstraction: Switch between AI providers without changing your workflow
π€ MCP Server Integration
Claude Desktop Integration
Code Grapher provides a Model Context Protocol (MCP) server that enables Claude to intelligently work with your codebase without manual context management.
Core MCP Tools
create_code_graph- Parse project and build knowledge graph with full AST analysisupdate_graph_from_diff- Surgically update graph based on git commit changesquery_code_graph- Semantic + structural code retrieval using hybrid searchget_related_entities- Traverse relationships to controlled depth for context assembly
MCP Resources
- Graph Statistics - Real-time health metrics and entity counts
- Entity Types - Available classifications and relationship patterns
- Performance Metrics - Update history and processing statistics
- Update History - Track incremental changes and surgical updates
Key Benefits for Claude
- Intelligent Context Control - Gets exactly the right code without over-fetching
- Relationship Awareness - Understands how code components connect
- Semantic Understanding - Finds code by behavior, not just keywords
- Incremental Updates - Keeps graph current with git changes automatically
π οΈ Technology Stack
Code Analysis
- Python AST: Built-in abstract syntax tree parsing for deterministic accuracy
- Tree-sitter: Multi-language parsing support for extensibility
- Entity Classification: Pattern-based code categorization system
Databases & Storage
- Neo4j 5.19.0: Graph database for code relationships and structure
- ChromaDB 0.4.22: Vector database for semantic search and RAG
- Performance Logging: Comprehensive metrics and operation tracking
Service Architecture
- Dependency Injection: Service locator pattern with automatic resolution
- Interface-Based Design: Clean abstractions for all major components
- Health Monitoring: Real-time service status and error tracking
- Graceful Degradation: System continues working when optional services fail
Integration & AI Features
- MCP Server: Model Context Protocol for Claude Desktop integration
- AI Provider Options: Choose between Ollama (local, free) or Gemini (cloud-based)
- Ollama + Gemma3:4B: Local LLM for code descriptions (private, no API key needed)
- Google Gemini: Cloud AI for enhanced descriptions (requires API key)
- SentenceTransformers: all-MiniLM-L6-v2 model for semantic embeddings
- GitPython: Git integration for surgical graph updates
π¦ Installation & Setup
Prerequisites
- Python 3.10+ (required for MCP server compatibility)
- Neo4j database server
- AI Provider (choose one):
- Ollama server (free, local) - recommended for privacy
- Google Gemini API key (cloud-based) - may offer better descriptions
Installation Steps
-
Install Dependencies
git clone https://github.com/your-org/code-grapher.git cd code-grapher # Create Python 3.10 virtual environment (required for MCP) python3.10 -m venv venv310 source venv310/bin/activate # Install dependencies in virtual environment pip install -r requirements.txt -
Setup Neo4j Database
# Install Neo4j (macOS example) brew install neo4j neo4j start # Default connection: bolt://localhost:7687 # Username: neo4j, Password: set during first startup -
Setup AI Provider
Option A: Ollama (Free, Local)
# Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Start server and pull model ollama serve ollama pull gemma3:4b # Verify installation ollama run gemma3:4b "Hello"Option B: Google Gemini (Cloud)
- Get API key from Google AI Studio
- Add to your
.envfile:GEMINI_API_KEY=your_api_key_here - Set
AI_PROVIDER=geminiin.env
-
Configure Environment
# Copy example configuration cp .env.example .env # Edit .env with your settings: # NEO4J_URL=bolt://localhost:7687 # NEO4J_USERNAME=neo4j # NEO4J_PASSWORD=your_password # AI_PROVIDER=ollama # or "gemini" # OLLAMA_URL=http://localhost:11434 # GEMINI_API_KEY=your_api_key_here # if using Gemini -
Setup MCP Server (Optional - for Claude Desktop)
# Install MCP dependencies (Python 3.10+ required) pip install mcp>=1.0.0 # Add to Claude Desktop mcp_servers config: { "code-grapher": { "command": "python3.10", "args": ["/path/to/code-grapher/main.py", "mcp"], "env": { "NEO4J_URL": "bolt://localhost:7687", "NEO4J_USERNAME": "neo4j", "NEO4J_PASSWORD": "your_password", "OLLAMA_URL": "http://localhost:11434" } } }
π‘ Usage Examples
1. Analyze Your Codebase
# Health check - verify all services are working
python3.10 main.py health
# Clean start - clear existing data
python3.10 clear_databases.py
# Run complete analysis with service architecture (recommended)
python3.10 main.py pipeline .
# Run with specific options
python3.10 main.py pipeline . --no-ai # Skip AI descriptions for faster processing
# Example output:
# β
Pipeline completed successfully!
# Files processed: 20/139
# Entities extracted: 470
# Relationships created: 428
# Execution time: 12.34 seconds
2. Fallback Options
# If service architecture has issues, use simple population script
python simple_population_test.py
# Legacy pipeline (may have import issues in current architecture)
python core_pipeline.py
3. Query Your Code with Natural Language
from rag_pipeline import CodeRAGPipeline
# Initialize the RAG system
rag = CodeRAGPipeline()
# Ask questions about your code
answer = rag.answer_question("How does user authentication work?")
print(answer["response"])
# Use hybrid retrieval (vector + graph)
answer = rag.answer_question("What validates configuration?", use_hybrid=True)
print(answer["response"])
4. Explore Your Code Graph
from graph_manager import CodeGraphManager
manager = CodeGraphManager()
# Get overview statistics
stats = manager.get_graph_stats()
print(f"Your codebase has {stats['total_nodes']} entities and {stats['total_relationships']} connections")
# Export for visualization
graph_data = manager.export_d3_format()
# Use with D3.js, Gephi, or other graph tools
5. Incremental Updates
# Update graph based on git changes instead of full re-analysis
python surgical_update_coordinator.py
# Shows what changed and updates only affected parts
# Much faster than full reanalysis for large codebases
6. Use with Claude Desktop (MCP)
# Start the MCP server (if configured in Claude Desktop)
python main.py mcp
# Claude can now use these tools automatically:
# - create_code_graph: Build knowledge graph
# - query_code_graph: Find relevant code semantically
# - get_related_entities: Traverse relationships
# - update_graph_from_diff: Keep graph current
# Example Claude prompts:
# "Analyze this codebase and show me the authentication logic"
# "Find all classes that handle user input validation"
# "What functions are related to the DatabaseManager class?"
βοΈ Configuration
Environment Variables (.env)
# Required - Neo4j Database
NEO4J_URL=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_password
# AI Provider Selection
AI_PROVIDER=ollama # Options: "ollama" or "gemini"
# Ollama Configuration (if using AI_PROVIDER=ollama)
OLLAMA_URL=http://localhost:11434
# Gemini Configuration (if using AI_PROVIDER=gemini)
GEMINI_API_KEY=your_gemini_api_key
# Optional: Custom primer file path for business context
PRIMER_FILE_PATH=/path/to/custom/primer.md
# Optional
LOG_LEVEL=INFO
Business Context Integration
Create a PRIMER.md file in your target project's root directory to provide business context for AI-powered semantic understanding:
# Project Business Context
## Project Overview
Brief description of what this codebase does
## Business Purpose
- Primary goals and objectives
- Target users and use cases
- Core value proposition
## Key Business Concepts
- Domain-specific terminology
- Important business rules
- Integration patterns
## Quality Standards
- Performance requirements
- Scalability needs
- Integration requirements
The primer context is automatically injected into AI entity descriptions to provide more accurate, business-aware semantic understanding.
MCP Server Configuration
For Claude Desktop
Add to your Claude Desktop mcp_servers configuration:
{
"mcpServers": {
"code-grapher": {
"command": "/path/to/code-grapher/venv310/bin/python",
"args": ["/path/to/code-grapher/main.py", "mcp"],
"env": {
"NEO4J_URL": "bolt://localhost:7687",
"NEO4J_USERNAME": "neo4j",
"NEO4J_PASSWORD": "your_password",
"OLLAMA_URL": "http://localhost:11434"
}
}
}
}
For Claude Code MCP
# Add via command line (use the virtual environment Python)
claude mcp add code-grapher \
-e NEO4J_URL=bolt://localhost:7687 \
-e NEO4J_USERNAME=neo4j \
-e NEO4J_PASSWORD=your_password \
-e OLLAMA_URL=http://localhost:11434 \
-- /path/to/code-grapher/venv310/bin/python /path/to/code-grapher/main.py mcp
# Or add via JSON configuration
{
"code-grapher": {
"command": "/path/to/code-grapher/venv310/bin/python",
"args": ["/path/to/code-grapher/main.py", "mcp"],
"env": {
"NEO4J_URL": "bolt://localhost:7687",
"NEO4J_USERNAME": "neo4j",
"NEO4J_PASSWORD": "your_password",
"OLLAMA_URL": "http://localhost:11434"
}
}
}
π― Use Cases
Code Documentation & Understanding
- New Team Members: Get instant answers about how code works without reading everything
- Legacy Codebases: Understand complex systems through natural language queries
- Architecture Review: Visualize dependencies and identify potential issues
Code Quality & Maintenance
- Dependency Analysis: Find tightly coupled components that need refactoring
- Impact Assessment: Understand what will break before making changes
- Pattern Detection: Identify repeated patterns that could be abstracted
Development Productivity
- Code Search: Find relevant code by functionality, not just text matching
- API Discovery: "What functions handle user input?" vs grepping for "input"
- Debugging: "What could cause this error?" with context-aware answers
- Claude Integration: Intelligent code context in Claude Desktop without manual copying
π Performance & Results
Verified Capabilities (Latest Test Results)
- Service Architecture: Phase 5 refactoring with dependency injection
- Files Processed: 20/139 Python files in ~12 seconds
- Entities Extracted: 470 total with specialized classifications
- AST Relationships: 428 relationships across 3 types (INHERITS, DECORATES, CALLS)
- Graph Database: 470 nodes, 428 relationships stored in Neo4j
- Service Health: Graph operations, AI services, RAG pipeline all operational
Performance Characteristics
- AST Processing: Sub-second analysis per file with 100% accuracy
- Service Architecture: Clean dependency resolution with health monitoring
- Incremental Updates: Git-based surgical updates for large codebases
- Memory Efficiency: Optimized processing with comprehensive logging
- Privacy Options: Local Ollama processing available - no external API calls required
π Troubleshooting
MCP Server Not Starting
Issue: MCP server fails to start with "failed" status in Claude Code
Common Causes & Solutions:
-
Python Version Issue (Most Common)
# Check Python version in virtual environment source venv310/bin/activate python --version # Should be 3.10+ # If not 3.10+, recreate the virtual environment rm -rf venv310 python3.10 -m venv venv310 source venv310/bin/activate pip install -r requirements.txt -
MCP Package Missing
# Ensure MCP is installed in the virtual environment source venv310/bin/activate pip install "mcp>=0.9.0" -
Incorrect Python Path in Configuration
- Make sure you're using the full path to the virtual environment Python
- Example:
/Users/username/Code/code-grapher/venv310/bin/python - NOT just
python3.10orpython
-
Database Connection Issues
# Test Neo4j connection neo4j status # If not running: neo4j start # Test with correct environment variables source venv310/bin/activate python main.py health
Quick Diagnostic Commands
# Test MCP server manually
source venv310/bin/activate
python main.py mcp # Should start without errors
# Check service health
python main.py health
# Check dependencies
pip show mcp neo4j py2neo
π§ Development & Testing
Testing Your Installation
# Check application health and service status
python main.py health
# Run the test retrieval system
python experiments/test_retrieval.py
# Test simple graph population (fallback method)
python simple_population_test.py
# Check outputs
cat experiments/outputs/retrieval_test_results.md
Monitoring & Logs
- Daily Logs:
logs/code_grapher_YYYY-MM-DD.log - Performance Metrics: Processing times, success rates, token counts
- Graph Snapshots: Automatic backups in
graph_snapshots/directory - Service Health: Real-time monitoring of all service components
Troubleshooting Common Issues
- Service Issues: Run
python main.py healthto check service status - Neo4j Connection: Check
bolt://localhost:7687is accessible - Import Issues: Use
python simple_population_test.pyas fallback - Empty Results: Run
clear_databases.pythenpython main.py pipeline . - Slow Performance: Should complete in under a minute for most codebases
Code Grapher transforms your codebase into an intelligent knowledge graph using lightning-fast AST analysis and AI-powered semantic understanding, enabling natural language queries about your code's functionality and architecture in seconds, not minutes.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.