simple-index-mcp

simple-index-mcp

A Python MCP server for indexing projects using embeddings, providing semantic search to help AI agents navigate codebases.

Category
Visit Server

README

Simple-Index MCP Server

A Python Model Context Protocol (MCP) server for indexing projects, files, and folders using embeddings. Provides semantic search capabilities to help AI agents understand and navigate your codebase.

Features

  • Semantic Indexing: Uses embeddings (via Ollama) to create a searchable index of your project files
  • Single Index File: All indexes are stored in projectIndex.si (in project root or global location)
  • Incremental Updates: Only re-indexes files that have changed (based on content hash)
  • Extensible Provider System: Easy to add new embedding providers beyond Ollama
  • MCP Tools: Exposes powerful tools for indexing, searching, and retrieving context

Installation

Prerequisites

  • Python 3.10 or higher
  • Ollama installed and running locally
  • nomic-embed-text model pulled in Ollama: ollama pull nomic-embed-text

Setup

  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install "mcp>=1.2.0" aiohttp
  1. Place simple_index_server.py in your project directory

Configuration

Claude Desktop Configuration

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "simple-index": {
      "command": "python",
      "args": [
        "/path/to/simple_index_server.py",
        "/path/to/your/project"
      ],
      "env": {
        "OLLAMA_MODEL": "nomic-embed-text",
        "OLLAMA_URL": "http://localhost:11434"
      }
    }
  }
}
  }
}

} }


### Global Index Mode (Optional)

If you prefer to store the index file in a central location (instead of the project root), set the `SIMPLE_INDEX_ROOT` environment variable.

```json
{
  "mcpServers": {
    "simple-index": {
      "command": "python",
      "args": ["/path/to/simple_index_server.py"],
      "env": {
        "SIMPLE_INDEX_ROOT": "/path/to/central/indexes",
        "OLLAMA_MODEL": "nomic-embed-text"
      }
    }
  }
}

Alternative Configuration Options

You can also specify the Ollama model and URL as command-line arguments:

{
  "mcpServers": {
    "simple-index": {
      "command": "python",
      "args": [
        "/path/to/simple_index_server.py",
        "/path/to/your/project",
        "nomic-embed-text",
        "http://localhost:11434"
      ]
    }
  }
}

Available Tools

1. index_file

Index a single file with embeddings.

Parameters:

  • file_path (string, required): Path to the file to index
  • force (boolean, optional): Force reindexing even if file hasn't changed

Example:

Claude, please index the file /path/to/my/script.py

2. index_directory

Index all matching files in a directory recursively.

Parameters:

  • directory (string, required): Path to directory to index
  • patterns (array of strings, optional): File patterns to match (default: ["*.py", "*.js", "*.ts", "*.md", "*.txt"])
  • exclude_patterns (array of strings, optional): Patterns to exclude (default: ["*/node_modules/*", "*/.git/*", "*/venv/*"])

Example:

Claude, index all Python and JavaScript files in my project, excluding the tests directory

3. search

Search for files similar to a query using semantic search.

Parameters:

  • query (string, required): Search query describing what you're looking for
  • top_k (integer, optional): Number of results to return (default: 10)

Example:

Claude, search for files related to "database connection logic"

4. get_context

Get full file contents for the most relevant files to a query.

Parameters:

  • query (string, required): Query describing what context you need
  • top_k (integer, optional): Number of files to include in context (default: 5)

Example:

Claude, get context for "authentication implementation"

5. list_indexed_files

List all files currently in the index.

Example:

Claude, show me all indexed files

6. get_index_stats

Get statistics about the current index.

Example:

Claude, what are the index statistics?

7. remove_file

Remove a file from the index.

Parameters:

  • file_path (string, required): Path to file to remove from index

Example:

Claude, remove the file /path/to/old/file.py from the index

Index File Format

The projectIndex.si file is stored in JSON format with the following structure:

{
  "version": "1.0",
  "created_at": "2026-01-31T10:00:00Z",
  "updated_at": "2026-01-31T12:30:00Z",
  "project_root": "/path/to/project",
  "metadata": {
    "total_files": 42,
    "total_size": 150000,
    "embedding_model": "ollama:nomic-embed-text"
  },
  "files": {
    "src/main.py": {
      "path": "src/main.py",
      "absolute_path": "/path/to/project/src/main.py",
      "hash": "abc123...",
      "embedding": [0.1, 0.2, ...],
      "indexed_at": "2026-01-31T12:30:00Z",
      "size": 1024,
      "metadata": {
        "extension": ".py",
        "name": "main.py"
      }
    }
  }
}

Index Management

  • Atomic Writes: The index is written atomically using a temporary file to prevent corruption
  • Change Detection: Files are only re-indexed if their content hash changes
  • Incremental Updates: You can index new files without affecting existing entries

Usage Examples

Initial Project Indexing

You: Claude, index my entire project directory at /Users/me/myproject

Claude: [Uses index_directory tool]
I've indexed your project. Found 45 files, indexed 42, skipped 3 binary files.

Searching for Relevant Files

You: Find files related to user authentication

Claude: [Uses search tool]
I found these relevant files:
1. src/auth/login.py (similarity: 0.89)
2. src/middleware/auth_check.py (similarity: 0.85)
3. tests/test_auth.py (similarity: 0.78)

Getting Context for Development

You: I need to modify the payment processing logic. Show me the relevant code.

Claude: [Uses get_context tool]
Here's the relevant code from 3 files:

## File 1: src/payments/processor.py (similarity: 0.92)
[Full file contents...]

## File 2: src/payments/validators.py (similarity: 0.87)
[Full file contents...]

Checking Index Status

You: What's the status of the index?

Claude: [Uses get_index_stats tool]
Index Statistics:
- Total files: 42
- Total size: 150 KB
- Last updated: 2026-01-31T12:30:00Z
- Embedding model: ollama:nomic-embed-text

Architecture

Components

  1. EmbeddingProvider: Abstract base class for embedding providers

    • OllamaProvider: Implementation using Ollama API
    • Easy to extend with OpenAI, Cohere, etc.
  2. ProjectIndex: Manages the projectIndex.si file

    • Loading and saving with atomic writes
    • Adding, removing, and querying files
    • Computing statistics
  3. SimpleIndexServer: Main MCP server implementation

    • File reading and hashing
    • Orchestrating indexing operations
    • Semantic search functionality
  4. MCP Integration: Standard Model Context Protocol server

    • Tool registration and handling
    • STDIO transport for communication

Adding New Embedding Providers

To add a new provider (e.g., OpenAI):

class OpenAIProvider(EmbeddingProvider):
    def __init__(self, api_key: str, model: str = "text-embedding-3-small"):
        self.api_key = api_key
        self.model = model
    
    async def embed(self, text: str) -> List[float]:
        # Implementation using OpenAI API
        pass

Then modify the main() function to support the new provider.

Best Practices

  1. Index Regularly: Run indexing after significant code changes
  2. Use Exclude Patterns: Exclude node_modules, venv, build artifacts
  3. Semantic Queries: Use descriptive queries like "error handling for API requests" rather than just "error"
  4. Monitor Index Size: Large projects may need chunking strategies for very large files

Troubleshooting

Ollama Connection Issues

# Check if Ollama is running
curl http://localhost:11434/api/version

# Pull the embedding model if not available
ollama pull nomic-embed-text

Index Corruption

If projectIndex.si becomes corrupted, simply delete it and re-index:

rm projectIndex.si
# Then ask Claude to re-index the directory

Logging

The server uses Python's logging module and writes to stderr (MCP requirement). Check your MCP client's logs for debugging information.

Performance Considerations

  • Embedding Generation: ~100-500ms per file depending on size
  • Index Size: ~4KB per file (768-dim embeddings + metadata)
  • Search Speed: <100ms for typical project sizes (hundreds of files)

License

MIT License - Feel free to use and modify as needed.

Contributing

This is a reference implementation. Feel free to fork and extend with:

  • Additional embedding providers
  • Chunking strategies for large files
  • Multi-language support
  • Custom metadata extraction
  • Integration with other tools

Credits

Built with:

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured