mcp-ai-workspace
Exposes document retrieval as an MCP tool, enabling LLMs to search a local vector store of markdown documents. Includes a retrieval evaluation harness to measure hit rate and MRR.
README
mcp-ai-workspace
Give an LLM a tool that searches your own documents — and measure whether it retrieves the right ones.
A small, from-scratch implementation of the pattern behind every serious "chat with your data" product: an MCP server that exposes document retrieval as a tool, a vector-RAG pipeline underneath it, and an evals harness that scores retrieval quality. No framework magic — ~200 lines you can read in one sitting.
The problem it solves
An LLM on its own can't see your private documents, and when asked about them it will confidently make things up. The fix is retrieval: look up the relevant real passages first, then answer from them, with citations.
The interesting question is how an agent gets access to that retrieval. The answer here is MCP (the Model Context Protocol) — the emerging standard for giving models tools. This repo wires retrieval up as an MCP tool, so any MCP-capable client (Claude Desktop, an agent framework, a self-hosted chat UI) can call it and answer grounded in your documents instead of guessing.
What it does
- Indexes a folder of markdown docs into a local vector store.
- Serves a single MCP tool,
search_knowledge_base, that returns the passages most relevant to a question, each with its source file and a similarity score. - Ships an evals harness that checks, over a set of known question→document
pairs, whether retrieval actually surfaces the right source (
hit@k+MRR).
The bundled corpus is a fictional company handbook (corpus/), so the whole
thing runs end-to-end with no setup beyond pip install.
Architecture
flowchart LR
subgraph Offline["Indexing (ingest.py)"]
D[corpus/*.md] --> C[chunk into passages]
C --> E1[embed]
E1 --> Q[(Qdrant<br/>vector store)]
end
subgraph Online["Serving (server.py)"]
U[MCP client / LLM] -->|calls tool| T[search_knowledge_base]
T --> E2[embed query]
E2 --> Q
Q -->|top-k passages + sources| T
T -->|grounded context| U
end
EV[evals/run_evals.py] -.->|same retrieval path| Q
The model on the left never talks to the vector store directly. It calls the tool; the tool does the retrieval. That indirection is the whole point of MCP.
How the agent knows what it can do
An MCP tool is defined by three things, and the model reads all three to decide when and how to call it:
| Part | In this repo | What it's for |
|---|---|---|
| name | search_knowledge_base |
how the model refers to the tool |
| description | the tool's docstring in server.py |
the model reads this to decide when to call it |
| input schema | the typed arguments (query: str, top_k: int) |
tells the model how to call it |
That contract — name + description + schema — is the entire interface between the model and your code. Get the description right and the model uses the tool well; that's most of the "prompt engineering" in an agentic system.
Run it
make install # create a venv, install deps
make ingest # build the vector index from corpus/
make evals # score retrieval quality
make serve # run the MCP server (stdio)
# or just:
make demo # ingest + evals, end to end
To use it from an MCP client (e.g. Claude Desktop), register the server with the
example in mcp-client-config.example.json (fix the absolute path), restart the
client, and the model gains a search_knowledge_base tool.
Evals
make evals runs evals/evalset.json — questions whose correct source document
is known — and reports:
- hit@k — fraction of questions where the right document is in the top-k
- MRR — mean reciprocal rank, which rewards ranking the right doc first
The run exits non-zero if hit@k falls below the threshold, so it can gate CI.
"I built RAG" is cheap; "I measure RAG, and here's the number" is the point.
Stack
| Layer | Choice | Why |
|---|---|---|
| Tool protocol | MCP (mcp SDK, FastMCP) |
the standard way to expose tools to an LLM |
| Vector store | Qdrant (local, on-disk) | a real vector DB API with no service to run |
| Embeddings | fastembed (BAAI/bge-small-en-v1.5) |
ONNX, CPU-only, no torch, no API key |
Every choice is swappable: a stronger embedding model, a hosted Qdrant, or a synthesis step that calls an LLM to write the final answer from the retrieved passages.
What I'd add next
- An
answertool that calls an LLM to synthesise a cited answer from the retrieved passages (kept out of the core so the repo runs with no API key). - Chunking by semantics rather than character budget.
- A reranker, and reporting precision/recall per document, not just hit@k.
Acknowledgements
The architecture here — a self-hosted LLM fronted by MCP tools and a vector-RAG layer — follows the pattern I learned from my DevOps professor, Oriol Rius, whose course stack first showed me how these pieces fit together. This repo is my own from-scratch, minimal re-implementation, written to internalise the concepts and demonstrate them honestly in code I wrote myself.
License
MIT — see LICENSE.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.