mcp-docpilot-server
An MCP server that exposes document retrieval as tools (semantic search and source listing) for any LLM, using a vector index built from DocPilot's ingestion pipeline.
README
mcp-docpilot-server
An MCP server that exposes document retrieval
as tools any LLM provider can call. It puts a single, stable interface in front
of a vector index (built from DocPilot's
ingestion pipeline) so a model never has to know how the documents are stored or
which embedding backend is in use - it just calls docpilot_search.
The server provides the tools and data access; the connected model does the generation. That split is what makes it provider-agnostic: Claude Desktop, or any client that speaks MCP, gets the same retrieval tools.
Tools
| Tool | What it does |
|---|---|
docpilot_search |
Semantic search over the corpus; returns ranked chunks with source and score |
docpilot_list_sources |
Lists indexed source documents with per-source chunk counts |
Both tools are read-only.
How it works
docs/*.md ──ingest.py──> chunk + embed ──> ChromaDB (persistent)
│
server.py exposes ──┤── docpilot_search
MCP tools over └── docpilot_list_sources
stdio or HTTP
│
Claude Desktop / any MCP client ──┘ (model calls the tools)
Setup
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Build the index from the docs folder (swap in your own .txt/.md files)
python ingest.py ./docs
Embeddings use ChromaDB's local default model, so it runs with no API key. To
point it at a hosted embedding provider instead, set a ChromaDB embedding
function in ingest.py and server.py - the rest of the pipeline is unchanged.
Run
stdio (local clients like Claude Desktop):
python server.py
Streamable HTTP (remote server):
DOCPILOT_TRANSPORT=http python server.py
# serves MCP at http://localhost:8000/mcp
The SDK's HTTP transport supersedes the older SSE transport; point HTTP-based
MCP clients at the /mcp endpoint.
Connect to Claude Desktop
Add this to claude_desktop_config.json:
{
"mcpServers": {
"docpilot": {
"command": "python",
"args": ["/absolute/path/to/mcp-docpilot-server/server.py"],
"env": {
"DOCPILOT_CHROMA_PATH": "/absolute/path/to/mcp-docpilot-server/chroma"
}
}
}
}
Or, for an HTTP server:
claude mcp add --transport http docpilot http://localhost:8000/mcp
Configuration
| Env var | Default | Meaning |
|---|---|---|
DOCPILOT_CHROMA_PATH |
./chroma |
Persistent ChromaDB store |
DOCPILOT_COLLECTION |
docpilot |
Collection name |
DOCPILOT_TRANSPORT |
stdio |
stdio or http |
DOCPILOT_CHUNK_SIZE |
800 |
Characters per chunk (ingest) |
DOCPILOT_CHUNK_OVERLAP |
100 |
Overlap between chunks (ingest) |
DOCPILOT_EMBEDDINGS |
default |
default (local ONNX model) or hash (offline, for CI/tests) |
Switching the embedding backend changes the vector space, so re-ingest into a
fresh store when you change it (rm -rf chroma && python ingest.py ./docs). All
backend selection lives in embeddings.py - that one file is the seam for the
embedding lifecycle.
Test
pytest -q
The test ingests a tiny corpus and confirms retrieval ranks the expected
document first. CI runs it on every push (.github/workflows/ci.yml).
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.