mcp-retrieve

mcp-retrieve

An MCP server for indexing and searching local text files using late-interaction retrieval (ColBERT-style MaxSim), enabling token-level relevance matching.

Category
Visit Server

README

mcp-retrieve

An MCP server that exposes late-interaction document retrieval (ColBERT-style MaxSim) over a local folder. Point it at a directory of text/markdown/code, and any MCP client — Claude Desktop, an IDE agent, your own host — can index it and search it with token-level relevance.

It ships with a deterministic, model-free default embedder, so the server and its full test suite run offline with no model weights, no API key, no network. When you want production-grade semantics, drop in a real ColBERT / ColQwen encoder behind a small protocol — ranking code does not change.

What is MCP?

The Model Context Protocol is an open standard that lets LLM applications connect to external tools and data through a uniform server interface. A host (e.g. Claude Desktop) launches MCP servers and calls the tools they advertise. mcp-retrieve is such a server; it advertises two tools:

Tool Purpose
index_folder(folder) Read text files under folder, chunk them, embed each chunk into a multi-vector representation, and build an in-memory index.
search(query, k=5) Rank indexed chunks by MaxSim late interaction and return the top k with source path, score, and snippet.

The retrieval approach: late interaction (ColBERT)

Most dense retrievers compress a passage into one vector and compare it to one query vector — cheap, but lossy. Late interaction, introduced by ColBERT (Khattab & Zaharia, ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT, SIGIR 2020, arXiv:2004.12832), keeps one vector per token for both the query and the document and defers their interaction to scoring time via the MaxSim operator:

score(q, d) = Σ_i  max_j  sim(q_i, d_j)

Each query token q_i is matched to its single most similar document token d_j, and those per-token maxima are summed. This preserves fine-grained term matching (a query term can find its evidence anywhere in the passage) while staying efficient. With L2-normalised vectors, sim is cosine similarity, so MaxSim reduces to a dot product followed by a row-wise max and a sum — which is exactly what mcp_retrieve.retrieval.maxsim computes.

Install

pip install -e .          # core: mcp + numpy
pip install -e ".[dev]"   # plus pytest for the test suite

Register with an MCP client

For Claude Desktop, add the server to its mcpServers config (claude_desktop_config.json):

{
  "mcpServers": {
    "mcp-retrieve": {
      "command": "mcp-retrieve"
    }
  }
}

If you installed into a virtual environment, use the absolute path to the mcp-retrieve console script (or "command": "python", "args": ["-m", "mcp_retrieve"]). Restart the client, then ask it to index a folder and search it — the model will call the index_folder and search tools.

Usage from Python

from mcp_retrieve import RetrievalIndex

index = RetrievalIndex()                 # deterministic default embedder
index.index_folder("./docs")
for hit in index.search("late interaction maxsim", k=5):
    print(f"{hit.score:.3f}  {hit.chunk.source}  {hit.snippet}")

Plugging in a real late-interaction model

The default HashingEmbedder makes the project run anywhere, but it matches on character n-grams, not meaning. For real semantics, implement the Embedder protocol around a trained encoder and pass it in:

import numpy as np
from mcp_retrieve import RetrievalIndex
from mcp_retrieve.server import create_server

class ColbertEmbedder:
    """Wrap a ColBERT checkpoint as a multi-vector Embedder."""
    def __init__(self, checkpoint: str) -> None:
        from colbert.modeling.checkpoint import Checkpoint
        from colbert.infra import ColBERTConfig
        self._ckpt = Checkpoint(checkpoint, ColBERTConfig())

    @property
    def dim(self) -> int:
        return 128

    def embed(self, text: str) -> "np.ndarray":
        vecs = self._ckpt.docFromText([text])[0]   # (num_tokens, 128)
        return np.asarray(vecs, dtype=np.float32)

# Use it from Python …
index = RetrievalIndex(embedder=ColbertEmbedder("colbert-ir/colbertv2.0"))

# … or run the MCP server with it.
server = create_server(embedder=ColbertEmbedder("colbert-ir/colbertv2.0"))
server.run()

Any object exposing dim: int and embed(text) -> ndarray[num_tokens, dim] with L2-normalised rows satisfies the protocol — ColBERT, ColQwen, ColPali, or your own. The ranking and chunking code is encoder-agnostic.

Architecture

src/mcp_retrieve/
  embedder.py    # Embedder protocol + deterministic HashingEmbedder default
  retrieval.py   # chunking, MaxSim, RetrievalIndex (pure — no MCP dependency)
  server.py      # FastMCP server exposing index_folder + search (only MCP import)

The retrieval and embedding cores import nothing MCP-related, so they are testable and reusable on their own; the SDK is isolated to server.py and imported lazily.

Testing

python -m pytest

All retrieval and embedder tests run offline with the default embedder. The end-to-end FastMCP tool test is skipped automatically when the mcp package is not installed.

License

MIT © 2026 Max Baluev

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured