MCP Servers

MCP Indexer

Enables semantic code search across multiple repositories using natural language queries. Provides intelligent code discovery, symbol lookups, and cross-repo dependency analysis for AI coding agents.

README

MCP Indexer

Semantic code search indexer for AI tools via the Model Context Protocol (MCP).

For AI Coding Agents

If you're an AI agent working on this project, please read AGENTS.MD first. It contains instructions for using Beads issue tracking to manage tasks systematically across sessions.

Overview

MCP Indexer provides intelligent code search capabilities to any MCP-compatible LLM (Claude, etc.). It indexes your repositories using semantic embeddings, enabling natural language code search, symbol lookups, and cross-repo dependency analysis.

Features

Semantic Search: Natural language queries find relevant code by meaning, not just keywords
Multi-Language Support: Python, JavaScript, TypeScript, Ruby, Go
Cross-Repo Analysis: Detect dependencies and suggest missing repos
Incremental Updates: Track git commits and reindex only when needed
MCP Integration: Works with any MCP-compatible LLM client
Stack Management: Persistent configuration for repo collections

Installation

Prerequisites

Python 3.8 or higher
pip

Steps

Clone the repository:

git clone https://github.com/gkatechis/mcpIndexer.git
cd mcpIndexer

Install dependencies:

pip install -r requirements.txt

Set up environment variables:

export PYTHONPATH=/absolute/path/to/mcpIndexer/src
export MCP_INDEXER_DB_PATH=~/.mcpindexer/db  # Optional, defaults to this location

Configure MCP integration (for Claude Code or other MCP clients):

cp .mcp.json.example .mcp.json
# Edit .mcp.json and update paths to your installation directory

Quick Start

1. Try the Demo

Run the demo to see mcpIndexer in action:

python3 examples/demo.py

2. Index Your Repositories

import os
from mcpindexer.indexer import MultiRepoIndexer
from mcpindexer.embeddings import EmbeddingStore

# Initialize with your database path
db_path = os.getenv("MCP_INDEXER_DB_PATH", os.path.expanduser("~/.mcpindexer/db"))
store = EmbeddingStore(db_path=db_path, collection_name='mcp_code_index')
indexer = MultiRepoIndexer(store)

# Add and index your repository
indexer.add_repo(
    repo_path='/path/to/your/repo',
    repo_name='my-repo',
    auto_index=True
)

3. Use with MCP Clients

Once configured in .mcp.json, the MCP server automatically starts when you use an MCP client like Claude Code.

The MCP server exposes 12 tools:

Search Tools:

semantic_search - Natural language code search
find_definition - Find where symbols are defined
find_references - Find where symbols are used
find_related_code - Find architecturally related files

Repository Management:

add_repo_to_stack - Add a new repository
remove_repo - Remove a repository
list_repos - List all indexed repos
get_repo_stats - Get detailed repo statistics
reindex_repo - Force reindex a repository

Cross-Repo Analysis:

get_cross_repo_dependencies - Find dependencies between repos
suggest_missing_repos - Suggest repos to add based on imports

Stack Management:

get_stack_status - Get overall indexing status

CLI Commands

Check for Updates

Check which repos need reindexing:

python3 -m mcpindexer check-updates

Reindex Changed Repos

Automatically reindex repos with new commits:

python3 -m mcpindexer reindex-changed

Stack Status

View current stack status:

python3 -m mcpindexer status

Install Git Hooks

Auto-reindex on git pull:

python3 -m mcpindexer install-hook /path/to/repo

This installs a post-merge hook that triggers reindexing after pulls.

Usage Examples

Semantic Search

import os
from mcpindexer.embeddings import EmbeddingStore

db_path = os.getenv("MCP_INDEXER_DB_PATH", os.path.expanduser("~/.mcpindexer/db"))
store = EmbeddingStore(db_path=db_path, collection_name='mcp_code_index')

# Natural language queries
results = store.semantic_search(
    query="authentication logic",
    n_results=10
)

for result in results:
    print(f"{result.file_path}:{result.metadata['start_line']}")
    print(f"  {result.symbol_name} - Score: {result.score:.4f}")

Find Symbol Definitions

results = store.find_by_symbol(
    symbol_name="authenticate_user",
    repo_filter=["my-backend"]
)

Cross-Repo Dependencies

from mcpindexer.indexer import MultiRepoIndexer

indexer = MultiRepoIndexer(store)

# Find dependencies between repos
cross_deps = indexer.get_cross_repo_dependencies()

# Suggest missing repos to add
suggestions = indexer.suggest_missing_repos()

Configuration

Environment Variables

MCP_INDEXER_DB_PATH - Database path (default: ~/.mcpindexer/db)
PYTHONPATH - Must include the src/ directory of your installation

Stack Configuration

Configuration is stored at ~/.mcpindexer/stack.json:

{
  "version": "1.0",
  "repos": {
    "my-repo": {
      "name": "my-repo",
      "path": "/path/to/repo",
      "status": "indexed",
      "last_indexed": "2025-10-14T12:34:56.789Z",
      "last_commit": "abc123...",
      "files_indexed": 162,
      "chunks_indexed": 302,
      "auto_reindex": true
    }
  }
}

Architecture

Components

Parser (parser.py) - Tree-sitter based multi-language AST parsing
Chunker (chunker.py) - Intelligent code chunking respecting AST boundaries
Embeddings (embeddings.py) - ChromaDB + sentence-transformers for semantic search
Indexer (indexer.py) - Orchestrates parsing → chunking → embedding → storage
Dependency Analyzer (dependency_analyzer.py) - Tracks imports and dependencies
Stack Config (stack_config.py) - Persistent configuration management
MCP Server (server.py) - Exposes tools via Model Context Protocol
CLI (cli.py) - Command-line interface

Indexing Pipeline

Code File → Parser → AST → Chunker → Semantic Chunks
                                            ↓
                                      Embeddings
                                            ↓
                                      ChromaDB Store

Performance

Based on testing with real-world repos:

Speed: ~56 files/sec
Zendesk App Framework: 162 files, 302 chunks in 1.86s
3 Repos: 255 files, 595 chunks in 4.58s
Search Latency: ~100-200ms per query

Troubleshooting

Issue: "ModuleNotFoundError: No module named 'tree_sitter'"

Solution: Install dependencies

pip install -r requirements.txt

Issue: Slow indexing

Causes:

Large files with many symbols
Complex nested structures
First-time embedding generation

Solutions:

Use file filters to skip test/build directories
Increase chunk size target
Use GPU-accelerated embeddings (if available)

Issue: Poor search results

Causes:

Query too generic
Code not indexed
Wrong language filter

Solutions:

Use more specific queries ("JWT token validation" vs "auth")
Check list_repos to verify indexing
Try without language filter
Increase n_results parameter

Issue: Out of memory

Causes:

Indexing too many repos at once
Very large monoliths

Solutions:

Index repos individually
Increase system memory
Use incremental indexing (git commit-based)

Issue: Git hooks not triggering

Causes:

Hook not executable
PYTHONPATH not set
Hook overwritten

Solutions:

# Check hook exists and is executable
ls -la /path/to/repo/.git/hooks/post-merge

# Make executable
chmod +x /path/to/repo/.git/hooks/post-merge

# Test manually
cd /path/to/repo && .git/hooks/post-merge

Issue: Stale results after code changes

Solutions:

# Force reindex specific repo
python3 -c "
from mcpindexer.indexer import MultiRepoIndexer, EmbeddingStore
store = EmbeddingStore('./mcp_index_data', 'mcp_code_index')
indexer = MultiRepoIndexer(store)
indexer.repo_indexers['my-repo'].reindex(force=True)
"

# Or use CLI
python3 -m mcpindexer reindex-changed

Example Queries

Finding Implementations

"password hashing"
"JWT token validation"
"database connection pool"
"API rate limiting"

Finding Patterns

"error handling"
"logging configuration"
"caching strategy"
"retry logic"

Finding Components

"user authentication"
"payment processing"
"email sending"
"file upload handling"

Architecture Understanding

"dependency injection setup"
"middleware configuration"
"router registration"
"database migration"

Testing

# Run all tests
export PYTHONPATH=/path/to/mcpIndexer/src
python3 -m pytest tests/ -v

# Run specific test file
python3 -m pytest tests/test_embeddings.py -v

# Run example scripts
python3 examples/demo.py

See the examples/ directory for more usage examples.

Contributing

The codebase is organized by component:

src/mcpindexer/ - Main source code
tests/ - Test suite (130+ tests)
test_*.py - Integration test scripts

All components are independently tested with comprehensive coverage.

License

MIT License - see LICENSE file for details.

Support

For issues or questions, please open an issue on the repository.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured