MCP Servers

mcp-ai-workspace

Exposes document retrieval as an MCP tool, enabling LLMs to search a local vector store of markdown documents. Includes a retrieval evaluation harness to measure hit rate and MRR.

README

mcp-ai-workspace

Give an LLM a tool that searches your own documents — and measure whether it retrieves the right ones.

A small, from-scratch implementation of the pattern behind every serious "chat with your data" product: an MCP server that exposes document retrieval as a tool, a vector-RAG pipeline underneath it, and an evals harness that scores retrieval quality. No framework magic — ~200 lines you can read in one sitting.

The problem it solves

An LLM on its own can't see your private documents, and when asked about them it will confidently make things up. The fix is retrieval: look up the relevant real passages first, then answer from them, with citations.

The interesting question is how an agent gets access to that retrieval. The answer here is MCP (the Model Context Protocol) — the emerging standard for giving models tools. This repo wires retrieval up as an MCP tool, so any MCP-capable client (Claude Desktop, an agent framework, a self-hosted chat UI) can call it and answer grounded in your documents instead of guessing.

What it does

Indexes a folder of markdown docs into a local vector store.
Serves a single MCP tool, search_knowledge_base, that returns the passages most relevant to a question, each with its source file and a similarity score.
Ships an evals harness that checks, over a set of known question→document pairs, whether retrieval actually surfaces the right source (hit@k + MRR).

The bundled corpus is a fictional company handbook (corpus/), so the whole thing runs end-to-end with no setup beyond pip install.

Architecture

flowchart LR
    subgraph Offline["Indexing (ingest.py)"]
        D[corpus/*.md] --> C[chunk into passages]
        C --> E1[embed]
        E1 --> Q[(Qdrant<br/>vector store)]
    end

    subgraph Online["Serving (server.py)"]
        U[MCP client / LLM] -->|calls tool| T[search_knowledge_base]
        T --> E2[embed query]
        E2 --> Q
        Q -->|top-k passages + sources| T
        T -->|grounded context| U
    end

    EV[evals/run_evals.py] -.->|same retrieval path| Q

The model on the left never talks to the vector store directly. It calls the tool; the tool does the retrieval. That indirection is the whole point of MCP.

How the agent knows what it can do

An MCP tool is defined by three things, and the model reads all three to decide when and how to call it:

Part	In this repo	What it's for
name	`search_knowledge_base`	how the model refers to the tool
description	the tool's docstring in `server.py`	the model reads this to decide when to call it
input schema	the typed arguments (`query: str`, `top_k: int`)	tells the model how to call it

That contract — name + description + schema — is the entire interface between the model and your code. Get the description right and the model uses the tool well; that's most of the "prompt engineering" in an agentic system.

Run it

make install      # create a venv, install deps
make ingest       # build the vector index from corpus/
make evals        # score retrieval quality
make serve        # run the MCP server (stdio)
# or just:
make demo         # ingest + evals, end to end

To use it from an MCP client (e.g. Claude Desktop), register the server with the example in mcp-client-config.example.json (fix the absolute path), restart the client, and the model gains a search_knowledge_base tool.

Evals

make evals runs evals/evalset.json — questions whose correct source document is known — and reports:

hit@k — fraction of questions where the right document is in the top-k
MRR — mean reciprocal rank, which rewards ranking the right doc first

The run exits non-zero if hit@k falls below the threshold, so it can gate CI. "I built RAG" is cheap; "I measure RAG, and here's the number" is the point.

Stack

Layer	Choice	Why
Tool protocol	MCP (`mcp` SDK, FastMCP)	the standard way to expose tools to an LLM
Vector store	Qdrant (local, on-disk)	a real vector DB API with no service to run
Embeddings	fastembed (`BAAI/bge-small-en-v1.5`)	ONNX, CPU-only, no torch, no API key

Every choice is swappable: a stronger embedding model, a hosted Qdrant, or a synthesis step that calls an LLM to write the final answer from the retrieved passages.

What I'd add next

An answer tool that calls an LLM to synthesise a cited answer from the retrieved passages (kept out of the core so the repo runs with no API key).
Chunking by semantics rather than character budget.
A reranker, and reporting precision/recall per document, not just hit@k.

Acknowledgements

The architecture here — a self-hosted LLM fronted by MCP tools and a vector-RAG layer — follows the pattern I learned from my DevOps professor, Oriol Rius, whose course stack first showed me how these pieces fit together. This repo is my own from-scratch, minimal re-implementation, written to internalise the concepts and demonstrate them honestly in code I wrote myself.

License

MIT — see LICENSE.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured