Code Graph Knowledge System
Transforms code repositories and development documentation into a queryable Neo4j knowledge graph, enabling AI assistants to perform intelligent code analysis, dependency mapping, impact assessment, and automated documentation generation across 15+ programming languages.
README
Code Graph Knowledge System
Enterprise knowledge management platform with Neo4j graph database, multi-interface architecture (MCP/Web/REST), and intelligent code analysis for modern software development teams.
Overview
Code Graph Knowledge System is a production-ready platform that transforms code repositories and development documentation into a queryable knowledge graph. Built on Neo4j's graph database technology and powered by large language models, the system provides three distinct interfaces for different use cases: MCP protocol for AI assistants, Web UI for human users, and REST API for programmatic access.
The platform combines vector search, graph traversal, and LLM-driven analysis to deliver intelligent code intelligence capabilities including repository analysis, dependency mapping, impact assessment, and automated documentation generation.
Core Capabilities
Multi-Interface Architecture
MCP Protocol (Port 8000) - Model Context Protocol server for AI assistant integration
- Direct integration with Claude Desktop, Cursor, and other MCP-compatible tools
- 25+ specialized tools for code analysis and knowledge management
- Real-time task monitoring via Server-Sent Events
- Supports stdio and SSE transport modes
Web UI (Port 8080) - Browser-based interface for team collaboration
- Real-time task monitoring dashboard
- Repository ingestion and management
- Metrics visualization with interactive charts
- Built with React 18, TypeScript, and shadcn/ui components
REST API (Ports 8000, 8080) - HTTP endpoints for system integration
- Document ingestion and knowledge querying
- Task management and monitoring
- Prometheus metrics export
- OpenAPI/Swagger documentation
Knowledge Graph Engine
Code Intelligence - Graph-based code analysis without requiring LLMs
- Repository structure mapping and dependency tracking
- Function and class relationship analysis
- Impact analysis for code changes
- Context pack generation for AI assistants
- Support for 15+ programming languages
Memory Store - Project knowledge tracking with temporal awareness
- Fact, decision, pattern, and insight recording
- Memory evolution with superseding relationships
- Automatic extraction from conversations, commits, and code
- Vector search with embedding-based retrieval
Knowledge RAG - Document processing with hybrid search
- Multi-format document ingestion (Markdown, PDF, code files)
- Neo4j native vector indexing
- Hybrid search combining vector similarity and graph traversal
- Configurable chunking and embedding strategies
SQL Schema Parser - Database schema analysis with business domain classification
- Multi-dialect support (Oracle, MySQL, PostgreSQL, SQL Server)
- Configurable business domain templates (Insurance, E-commerce, Banking, Healthcare)
- Automated relationship detection and documentation generation
- Integration with knowledge graph for cross-referencing
Technology Stack
Backend Infrastructure
- FastAPI - High-performance async web framework
- Neo4j 5.x - Graph database with native vector indexing
- Python 3.13+ - Modern Python with type hints
- Uvicorn - ASGI server with WebSocket support
AI and ML Integration
- LlamaIndex - Document processing and retrieval pipeline
- Multiple LLM providers (Ollama, OpenAI, Gemini, OpenRouter)
- Flexible embedding models (HuggingFace, Ollama, OpenAI)
- Model Context Protocol (MCP) for AI assistant integration
Frontend Technology
- React 18 - Modern UI library with concurrent features
- TypeScript - Type-safe development
- TanStack Router - Type-safe routing
- shadcn/ui - Accessible component library
- Vite - Fast build tooling
Quick Start
Prerequisites
- Python 3.13 or higher
- Neo4j 5.0 or higher
- Docker (optional, for containerized deployment)
- Node.js 18+ (for frontend development)
Querying Knowledge
# Query the knowledge base
response = httpx.post("http://localhost:8000/api/v1/knowledge/query", json={
"question": "How does the authentication system work?",
"mode": "hybrid", # or "graph_only", "vector_only"
"use_tools": False,
"top_k": 5
})
# Search similar documents
response = httpx.post("http://localhost:8000/api/v1/knowledge/search", json={
"query": "user authentication",
"top_k": 10
})
Installation
Clone the repository and install dependencies:
git clone https://github.com/royisme/codebase-rag.git
cd codebase-rag
pip install -r requirements.txt
# or using uv (recommended)
uv pip install -e .
Configure environment variables:
cp env.example .env
# Edit .env with your Neo4j credentials and LLM provider settings
Start Neo4j database:
docker run --name neo4j-code-graph \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password \
-e NEO4J_PLUGINS='["apoc"]' \
neo4j:5.15
Running the System
Complete System (MCP + Web UI + REST API)
python start.py
Access points:
- MCP SSE Service:
http://localhost:8000/sse - Web UI:
http://localhost:8080 - REST API Documentation:
http://localhost:8080/docs - Prometheus Metrics:
http://localhost:8080/metrics
MCP Server Only
python start_mcp.py
Docker Deployment
Three deployment modes available:
Minimal Mode - Code Graph only (no LLM required)
make docker-minimal
Standard Mode - Code Graph + Memory Store (embedding model required)
make docker-standard
Full Mode - All features (LLM + embedding required)
make docker-full
Usage Examples
MCP Integration
Configure in Claude Desktop or compatible MCP client:
{
"mcpServers": {
"code-graph": {
"command": "python",
"args": ["/path/to/start_mcp.py"],
"cwd": "/path/to/codebase-rag"
}
}
}
Available MCP tools include:
code_graph_ingest_repo- Ingest code repositorycode_graph_related- Find related code elementscode_graph_impact- Analyze change impactquery_knowledge- Query knowledge baseadd_memory- Store project knowledgeextract_from_conversation- Extract insights from chatwatch_task- Monitor task progress
REST API
Ingest a repository:
curl -X POST http://localhost:8080/api/v1/repositories/ingest \
-H "Content-Type: application/json" \
-d '{
"url": "https://github.com/user/repo.git",
"mode": "incremental",
"languages": ["python", "typescript"]
}'
Query knowledge base:
curl -X POST http://localhost:8080/api/v1/knowledge/query \
-H "Content-Type: application/json" \
-d '{
"question": "How does authentication work in this codebase?",
"mode": "hybrid",
"top_k": 5
}'
Monitor tasks:
curl http://localhost:8080/api/v1/tasks?status=processing
Web UI
Navigate to http://localhost:8080 to access:
- Dashboard - System health and quick actions
- Tasks - Real-time task monitoring with progress indicators
- Repositories - Repository management and ingestion
- Metrics - System performance and usage metrics
Configuration
Key environment variables:
# Server Ports
MCP_PORT=8000 # MCP SSE service
WEB_UI_PORT=8080 # Web UI and REST API
# Neo4j Configuration
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password
NEO4J_DATABASE=neo4j
# LLM Provider (ollama, openai, gemini, openrouter)
LLM_PROVIDER=ollama
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama3.2
# Embedding Provider (ollama, openai, gemini, openrouter)
EMBEDDING_PROVIDER=ollama
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# Processing Configuration
CHUNK_SIZE=512
CHUNK_OVERLAP=50
TOP_K=5
VECTOR_DIMENSION=384
For complete configuration options, see Configuration Guide.
Architecture
Dual-Server Design
The system employs a dual-server architecture optimized for different access patterns:
Port 8000 (Primary) - MCP SSE Service
- Server-Sent Events endpoint for real-time communication
- Optimized for AI assistant integration
- Handles long-running task monitoring
- WebSocket support for bidirectional communication
Port 8080 (Secondary) - Web UI + REST API
- React-based monitoring interface
- RESTful API for external integrations
- Prometheus metrics endpoint
- Static file serving for frontend
Both servers share the same backend services and Neo4j database, ensuring consistency across all interfaces.
Component Architecture
┌─────────────────────────────────────────────────────────┐
│ Client Interfaces │
├──────────────┬──────────────┬──────────────────────────┤
│ MCP Client │ Web UI │ REST API │
│ (AI Tools) │ (Browser) │ (External Systems) │
└──────┬───────┴──────┬───────┴──────────┬───────────────┘
│ │ │
└──────────────┼──────────────────┘
│
┌──────────────▼──────────────┐
│ FastAPI Application │
├──────────────┬──────────────┤
│ Services │ Task Queue │
└──────┬───────┴──────┬───────┘
│ │
┌──────▼──────┐ ┌───▼────┐
│ Neo4j │ │ LLM │
│ Database │ │Provider│
└─────────────┘ └────────┘
Development
Project Structure
codebase-rag/
├── src/codebase_rag/
│ ├── api/ # FastAPI routes
│ ├── core/ # Application core
│ ├── services/ # Business logic
│ │ ├── code_ingestor.py # Code repository processing
│ │ ├── graph_service.py # Graph operations
│ │ ├── memory_store.py # Project memory management
│ │ ├── neo4j_knowledge_service.py # Knowledge base
│ │ ├── task_queue.py # Async task processing
│ │ └── sql/ # SQL parsing services
│ └── mcp/ # MCP protocol handlers
├── frontend/ # React Web UI
│ ├── src/
│ │ ├── components/ # UI components
│ │ ├── routes/ # Page routes
│ │ └── lib/ # API client
│ └── package.json
├── tests/ # Test suite
├── docs/ # Documentation
└── scripts/ # Utility scripts
Running Tests
# Backend tests
pytest tests/ -v
# Frontend tests
cd frontend && npm test
# Integration tests (requires Neo4j)
pytest tests/ -m integration
# Coverage report
pytest tests/ --cov=src --cov-report=html
Code Quality
# Format code
black .
isort .
# Linting
ruff check .
ruff check . --fix
# Type checking
mypy src/
Frontend Development
cd frontend
npm install
npm run dev # Start dev server at http://localhost:3000
npm run build # Build for production
npm run lint # Check for issues
npm test # Run tests
Deployment
Production Deployment
See Docker Deployment Guide for production deployment configurations including:
- Multi-stage Docker builds
- Environment-specific configurations
- Scaling and load balancing
- Security best practices
- Monitoring and logging setup
System Requirements
Minimum Configuration
- CPU: 2 cores
- RAM: 4 GB
- Storage: 10 GB
Recommended Configuration
- CPU: 4+ cores
- RAM: 8+ GB
- Storage: 50+ GB SSD
- Network: 100 Mbps+
Documentation
Complete documentation available at https://vantagecraft.dev/docs/code-graph
Key Documentation Sections
- Quick Start Guide - Get up and running in 5 minutes
- Architecture Overview - System design and components
- MCP Integration - AI assistant integration
- REST API Reference - Complete API documentation
- Deployment Guide - Production deployment
- Development Guide - Contributing and development
Community and Support
- Documentation: Complete Documentation
- Neo4j Guide: README_Neo4j.md
- Issues: GitHub Issues
- Discussions: GitHub Discussions
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
Built with excellent open source technologies:
- Neo4j - Graph database platform
- LlamaIndex - Data framework for LLM applications
- FastAPI - Modern web framework for Python
- React - Library for building user interfaces
- Model Context Protocol - AI assistant integration standard
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
E2B
Using MCP to run code via e2b.