simple-index-mcp
A Python MCP server for indexing projects using embeddings, providing semantic search to help AI agents navigate codebases.
README
Simple-Index MCP Server
A Python Model Context Protocol (MCP) server for indexing projects, files, and folders using embeddings. Provides semantic search capabilities to help AI agents understand and navigate your codebase.
Features
- Semantic Indexing: Uses embeddings (via Ollama) to create a searchable index of your project files
- Single Index File: All indexes are stored in
projectIndex.si(in project root or global location) - Incremental Updates: Only re-indexes files that have changed (based on content hash)
- Extensible Provider System: Easy to add new embedding providers beyond Ollama
- MCP Tools: Exposes powerful tools for indexing, searching, and retrieving context
Installation
Prerequisites
- Python 3.10 or higher
- Ollama installed and running locally
nomic-embed-textmodel pulled in Ollama:ollama pull nomic-embed-text
Setup
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install "mcp>=1.2.0" aiohttp
- Place
simple_index_server.pyin your project directory
Configuration
Claude Desktop Configuration
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"simple-index": {
"command": "python",
"args": [
"/path/to/simple_index_server.py",
"/path/to/your/project"
],
"env": {
"OLLAMA_MODEL": "nomic-embed-text",
"OLLAMA_URL": "http://localhost:11434"
}
}
}
}
}
}
} }
### Global Index Mode (Optional)
If you prefer to store the index file in a central location (instead of the project root), set the `SIMPLE_INDEX_ROOT` environment variable.
```json
{
"mcpServers": {
"simple-index": {
"command": "python",
"args": ["/path/to/simple_index_server.py"],
"env": {
"SIMPLE_INDEX_ROOT": "/path/to/central/indexes",
"OLLAMA_MODEL": "nomic-embed-text"
}
}
}
}
Alternative Configuration Options
You can also specify the Ollama model and URL as command-line arguments:
{
"mcpServers": {
"simple-index": {
"command": "python",
"args": [
"/path/to/simple_index_server.py",
"/path/to/your/project",
"nomic-embed-text",
"http://localhost:11434"
]
}
}
}
Available Tools
1. index_file
Index a single file with embeddings.
Parameters:
file_path(string, required): Path to the file to indexforce(boolean, optional): Force reindexing even if file hasn't changed
Example:
Claude, please index the file /path/to/my/script.py
2. index_directory
Index all matching files in a directory recursively.
Parameters:
directory(string, required): Path to directory to indexpatterns(array of strings, optional): File patterns to match (default:["*.py", "*.js", "*.ts", "*.md", "*.txt"])exclude_patterns(array of strings, optional): Patterns to exclude (default:["*/node_modules/*", "*/.git/*", "*/venv/*"])
Example:
Claude, index all Python and JavaScript files in my project, excluding the tests directory
3. search
Search for files similar to a query using semantic search.
Parameters:
query(string, required): Search query describing what you're looking fortop_k(integer, optional): Number of results to return (default: 10)
Example:
Claude, search for files related to "database connection logic"
4. get_context
Get full file contents for the most relevant files to a query.
Parameters:
query(string, required): Query describing what context you needtop_k(integer, optional): Number of files to include in context (default: 5)
Example:
Claude, get context for "authentication implementation"
5. list_indexed_files
List all files currently in the index.
Example:
Claude, show me all indexed files
6. get_index_stats
Get statistics about the current index.
Example:
Claude, what are the index statistics?
7. remove_file
Remove a file from the index.
Parameters:
file_path(string, required): Path to file to remove from index
Example:
Claude, remove the file /path/to/old/file.py from the index
Index File Format
The projectIndex.si file is stored in JSON format with the following structure:
{
"version": "1.0",
"created_at": "2026-01-31T10:00:00Z",
"updated_at": "2026-01-31T12:30:00Z",
"project_root": "/path/to/project",
"metadata": {
"total_files": 42,
"total_size": 150000,
"embedding_model": "ollama:nomic-embed-text"
},
"files": {
"src/main.py": {
"path": "src/main.py",
"absolute_path": "/path/to/project/src/main.py",
"hash": "abc123...",
"embedding": [0.1, 0.2, ...],
"indexed_at": "2026-01-31T12:30:00Z",
"size": 1024,
"metadata": {
"extension": ".py",
"name": "main.py"
}
}
}
}
Index Management
- Atomic Writes: The index is written atomically using a temporary file to prevent corruption
- Change Detection: Files are only re-indexed if their content hash changes
- Incremental Updates: You can index new files without affecting existing entries
Usage Examples
Initial Project Indexing
You: Claude, index my entire project directory at /Users/me/myproject
Claude: [Uses index_directory tool]
I've indexed your project. Found 45 files, indexed 42, skipped 3 binary files.
Searching for Relevant Files
You: Find files related to user authentication
Claude: [Uses search tool]
I found these relevant files:
1. src/auth/login.py (similarity: 0.89)
2. src/middleware/auth_check.py (similarity: 0.85)
3. tests/test_auth.py (similarity: 0.78)
Getting Context for Development
You: I need to modify the payment processing logic. Show me the relevant code.
Claude: [Uses get_context tool]
Here's the relevant code from 3 files:
## File 1: src/payments/processor.py (similarity: 0.92)
[Full file contents...]
## File 2: src/payments/validators.py (similarity: 0.87)
[Full file contents...]
Checking Index Status
You: What's the status of the index?
Claude: [Uses get_index_stats tool]
Index Statistics:
- Total files: 42
- Total size: 150 KB
- Last updated: 2026-01-31T12:30:00Z
- Embedding model: ollama:nomic-embed-text
Architecture
Components
-
EmbeddingProvider: Abstract base class for embedding providers
OllamaProvider: Implementation using Ollama API- Easy to extend with OpenAI, Cohere, etc.
-
ProjectIndex: Manages the
projectIndex.sifile- Loading and saving with atomic writes
- Adding, removing, and querying files
- Computing statistics
-
SimpleIndexServer: Main MCP server implementation
- File reading and hashing
- Orchestrating indexing operations
- Semantic search functionality
-
MCP Integration: Standard Model Context Protocol server
- Tool registration and handling
- STDIO transport for communication
Adding New Embedding Providers
To add a new provider (e.g., OpenAI):
class OpenAIProvider(EmbeddingProvider):
def __init__(self, api_key: str, model: str = "text-embedding-3-small"):
self.api_key = api_key
self.model = model
async def embed(self, text: str) -> List[float]:
# Implementation using OpenAI API
pass
Then modify the main() function to support the new provider.
Best Practices
- Index Regularly: Run indexing after significant code changes
- Use Exclude Patterns: Exclude
node_modules,venv, build artifacts - Semantic Queries: Use descriptive queries like "error handling for API requests" rather than just "error"
- Monitor Index Size: Large projects may need chunking strategies for very large files
Troubleshooting
Ollama Connection Issues
# Check if Ollama is running
curl http://localhost:11434/api/version
# Pull the embedding model if not available
ollama pull nomic-embed-text
Index Corruption
If projectIndex.si becomes corrupted, simply delete it and re-index:
rm projectIndex.si
# Then ask Claude to re-index the directory
Logging
The server uses Python's logging module and writes to stderr (MCP requirement). Check your MCP client's logs for debugging information.
Performance Considerations
- Embedding Generation: ~100-500ms per file depending on size
- Index Size: ~4KB per file (768-dim embeddings + metadata)
- Search Speed: <100ms for typical project sizes (hundreds of files)
License
MIT License - Feel free to use and modify as needed.
Contributing
This is a reference implementation. Feel free to fork and extend with:
- Additional embedding providers
- Chunking strategies for large files
- Multi-language support
- Custom metadata extraction
- Integration with other tools
Credits
Built with:
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.