MCP Codebase RAG Server
Provides semantic vector search over local codebases via MCP, enabling hybrid search (dense + sparse + RRF) for any MCP client like GitHub Copilot or Claude Desktop.
README
MCP Codebase RAG Server
Self-hosted MCP server that adds semantic vector search over your local codebases to any MCP-capable client (GitHub Copilot, Cline, Claude Desktop, etc.).
Goal: Robust RAG for Copilot (or any MCP client) without paying for Cursor/Windsurf.
Zero cost. Zero limits. Full control.
๐ Overview
Problem Solved
- GitHub Copilot Pro has an excellent model but limited codebase RAG
- Cursor/Windsurf have good RAG but cost $15โ20/month
- Continue.dev has RAG but doesn't integrate natively with MCP-aware agents
Solution
This MCP server provides:
- Indexing of local codebases using vector embeddings
- Hybrid semantic search via
search_codebasetool โ combines dense (embeddings) + sparse (BM25) + RRF fusion - Multi-project support with isolated ChromaDB collections
- Universal integration with any MCP client
Tech Stack
| Component | Technology | Why |
|---|---|---|
| Embeddings | sentence-transformers (all-MiniLM-L6-v2) |
Fast, lightweight, 384-dim |
| Code embeddings | microsoft/unixcoder-base (optional) | Code-specific model, activated via EMBEDDING_MODEL |
| Vector DB | ChromaDB | Simple, persistent, zero config |
| Code parsing | Tree-sitter + BM25 + RRF | Universal language-agnostic chunking and hybrid search |
| MCP SDK | modelcontextprotocol/python-sdk | Official standard |
| Runtime | Python 3.11+ | โ |
๐ Installation
Prerequisites
- Python 3.11+
piporuv
Install
git clone https://github.com/di5rupt0r/codebase-rag.git
cd codebase-rag
# Install as a package (adds the `codebase-rag` command to ~/.local/bin)
pip install -e .
Health Check
python scripts/health_check.py
Expected output:
๐ MCP Codebase RAG Server Health Check
==================================================
Checking Embedding Provider... โ OK (3.03s)
Checking ChromaDB Connection... โ OK (0.14s)
Checking Search Functionality... โ OK (2.36s)
Checking Data Directory... โ OK (0.00s)
==================================================
Health Check Summary: 4/4 checks passed
๐ All systems operational!
๐ Quick Start
1. Index a Project
# Index the current directory
python scripts/index_project.py . --name my-project
# Index a specific path
python scripts/index_project.py ~/projects/api --name api-backend
# Force full reindex
python scripts/index_project.py . --name my-project --force
# Dry run to preview what will be indexed
python scripts/index_project.py . --name my-project --dry-run
2. Start the MCP Server
stdio (default โ for local clients)
codebase-rag
HTTP (for remote clients or always-on service)
MCP_TRANSPORT=streamable-http MCP_PORT=8080 codebase-rag
3. Configure Your MCP Client
VS Code (GitHub Copilot / Cline) โ stdio mode
Add to your VS Code mcp.json:
{
"servers": {
"codebase-rag": {
"type": "stdio",
"command": "codebase-rag"
}
}
}
VS Code โ HTTP mode (when running as a service)
{
"servers": {
"codebase-rag": {
"type": "http",
"url": "http://127.0.0.1:8080/mcp"
}
}
}
Claude Desktop
{
"mcpServers": {
"codebase-rag": {
"command": "codebase-rag"
}
}
}
๏ฟฝ Search Capabilities
Hybrid Search Architecture
The server implements a hybrid search system that combines:
-
Dense Search (Vector Embeddings)
- Semantic similarity using sentence-transformers
- Finds conceptually similar code
- Base: ChromaDB vector similarity
-
Sparse Search (BM25)
- Exact lexical term matching
- Finds precise identifiers and keywords
- Base: rank-bm25 with regex tokenization
-
Reciprocal Rank Fusion (RRF)
- Intelligent fusion of dense + sparse results
- k=60 (standard literature value)
- Improves both precision and recall
Search Results
{
"results": [
{
"path": "src/auth.py",
"content": "def authenticate_user(user, password): ...",
"score": 0.0325,
"type": "function",
"name": "authenticate_user",
"line_start": 15,
"line_end": 25
}
],
"total_indexed_chunks": 1247,
"query_time_ms": 23.4,
"search_type": "hybrid_rrf"
}
Performance Characteristics
| Metric | Target | Description |
|---|---|---|
| Tree-sitter parsing | < 50ms/file | Universal language parsing |
| BM25 indexing | < 10ms/query | In-memory reconstruction |
| RRF fusion | < 1ms | In-memory score calculation |
| Total query time | < 100ms | End-to-end hybrid search |
| Memory overhead | < 50MB | For 5k chunks |
Fallback Behavior
- Tree-sitter unavailable โ Line-based chunking
- BM25 unavailable โ Dense-only search
- Both unavailable โ Original dense search with keyword reranking
๏ฟฝ๏ธ MCP Tools
search_codebase
Hybrid semantic search over an indexed project using vector embeddings + BM25 + RRF fusion.
Input:
{
"query": "where is the authentication logic?",
"top_k": 5,
"project": "my-project",
"file_types": [".py", ".js"]
}
Output:
{
"results": [
{
"path": "src/auth.py",
"content": "def authenticate_user(user, password):\n ...",
"score": 0.89
}
],
"total_indexed_chunks": 1247,
"query_time_ms": 23
}
reindex_project
Re-index a project after large changes.
Input:
{
"project_path": "/path/to/your/project",
"project_name": "my-project",
"force": false
}
list_indexed_projects
List all indexed projects.
get_files
List indexed files in a project.
Input: { "project": "my-project" }
get_file_content
Return the full content of an indexed file.
Input: { "path": "src/main.py" }
โ๏ธ Configuration
Environment Variables
# ChromaDB path (default: ./data/chroma_db relative to install dir)
export CHROMA_DB_PATH="/custom/path/to/chroma"
# Embedding model (default: all-MiniLM-L6-v2)
# Use microsoft/unixcoder-base for better code-specific embeddings (~2GB, requires torch)
export EMBEDDING_MODEL="microsoft/unixcoder-base"
# HTTP transport settings (only needed in HTTP/service mode)
export MCP_TRANSPORT="streamable-http"
export MCP_HOST="127.0.0.1"
export MCP_PORT="8080"
# Set this when exposing via reverse proxy or Tailscale Funnel
export MCP_ALLOWED_HOST="your-hostname.example.com"
# Log level (default: INFO)
export LOG_LEVEL="DEBUG"
Chunking (Advanced)
Edit src/codebase_rag/config.py:
CHUNK_SIZE = 500 # characters per chunk
CHUNK_OVERLAP = 50 # overlap between chunks
DEFAULT_TOP_K = 5 # default results per search
Supported File Types
Python, JavaScript, TypeScript, JSX, TSX, Java, C, C++, Go, Rust, Ruby, PHP, C#, Shell, YAML, JSON.
Ignored Patterns
*.pyc, __pycache__, .git, node_modules, .venv, venv, *.egg-info, .pytest_cache
๐ Benchmarks
| Operation | Expected Time | Notes |
|---|---|---|
| Index 20 .py files (~5k LOC) | ~5โ8s | First run; incremental is much faster |
| Vector search (top_k=5) | ~20โ50ms | ChromaDB in-process |
| Query embedding | ~10โ20ms | sentence-transformers, CPU |
| Server cold start | ~2โ3s | Model loaded into memory |
๐ค Automation Scripts
Auto-discovery
Scan a directory for Git repositories and index them all automatically:
python scripts/auto_index.py ~/projects
Watch Mode
Watch a project for file changes and reindex incrementally (debounced, 5s):
python scripts/watch.py /path/to/project --name my-project
Git Hook (post-commit reindex)
Install a post-commit hook so changed files are reindexed automatically after every commit:
python scripts/setup_git_hook.py /path/to/your/repo my-project
๐งช Tests
# All tests (116 passing)
pytest -v
# Specific modules
pytest tests/test_config.py -v
pytest tests/test_embeddings.py -v
pytest tests/test_indexer.py -v
pytest tests/test_server.py -v
# With coverage
pytest --cov=codebase_rag --cov-report=html
๐ง Deploy as a systemd Service (Linux)
A template service file is provided at systemd/codebase-rag-server.service.
Replace YOUR_USERNAME with your actual Linux username before installing:
# Substitute your username in-place
sed -i "s/YOUR_USERNAME/$USER/g" systemd/codebase-rag-server.service
# Install and start
sudo cp systemd/codebase-rag-server.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable codebase-rag-server
sudo systemctl start codebase-rag-server
# Check
sudo systemctl status codebase-rag-server
sudo journalctl -u codebase-rag-server -f
Exposing Remotely via Tailscale Funnel (optional)
To use the server from a remote machine (Codespaces, company laptop, etc.):
# Expose port 8080 via Tailscale Funnel
tailscale funnel 8080
# Add to your service file:
# Environment="MCP_ALLOWED_HOST=your-machine.your-tailnet.ts.net"
# Then in your remote mcp.json:
# "url": "https://your-machine.your-tailnet.ts.net/mcp"
๐ Troubleshooting
Slow first start: The embedding model (~100MB) is downloaded on first use. Run health_check.py to pre-load it.
High memory usage: The default model uses ~500MB RAM. If needed, use an even smaller model via EMBEDDING_MODEL.
Permission errors: Ensure the running user has write access to data/chroma_db/.
Debug mode:
LOG_LEVEL=DEBUG codebase-rag
๐ Contributing
- Fork the project
- Create a feature branch:
git checkout -b feature/your-feature - Follow strict TDD: RED โ GREEN โ REFACTOR
- Atomic, descriptive commits
- Open a pull request with tests
# Dev setup
pip install -e ".[dev]"
pytest -v --cov=codebase_rag
๐ License
MIT License โ see LICENSE.
๐ References
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.