Knowledge Assistant MCP Server
A multi-agent RAG MCP server that answers questions from your documents with a human-in-the-loop approval step, using a coordinator, retriever, and synthesizer agents.
README
Knowledge Assistant MCP Server
A multi-agent RAG (Retrieval-Augmented Generation) MCP server built with FastMCP in Python. It answers questions from your documents using a coordinator, retriever, and synthesizer agents, and includes a human-in-the-loop step where you approve or request edits before finalizing answers.
What it does
- Query your knowledge base: Ask questions in natural language; the server retrieves relevant chunks and proposes an answer with citations.
- Multi-agent pipeline: A coordinator decides whether to use the knowledge base, a retriever (RAG) fetches relevant documents, and a synthesizer produces a structured answer proposal.
- Human-in-the-loop: You review the proposed answer and either approve it or request edits before the answer is finalized.
- Add documents: Ingest text into the vector store (ChromaDB) so the assistant can answer from your own content.
Use cases: Internal knowledge assistant, FAQ over your docs, Q&A over notes or wikis, and similar RAG workflows that require a human approval step.
Project Structure
knowledge-assistant-mcp/
├── src/
│ ├── server.py # FastMCP app entry point
│ ├── config/
│ │ └── settings.py # pydantic-settings (server name, API keys, model, RAG settings)
│ ├── routers/
│ │ ├── tools.py # Register MCP tools
│ │ ├── resources.py # Register MCP resources
│ │ └── prompts.py # Register MCP prompts
│ ├── tools/ # Tool implementations
│ ├── resources/ # Resource implementations
│ ├── prompts/ # Prompt content (workflow with human-in-the-loop)
│ ├── app/ # Core logic: RAG, LLM, orchestrator (coordinator/retriever/synthesizer)
│ ├── models/ # Pydantic schemas (structured outputs)
│ └── utils/ # Helpers (e.g. Opik)
├── pyproject.toml
├── .env.sample
├── Dockerfile
└── README.md
Setup
Prerequisites: Python 3.13, uv.
Clone the repository
git clone https://github.com/YOUR_USERNAME/knowledge-assistant-mcp.git
cd knowledge-assistant-mcp
Install dependencies with uv
uv sync
This creates a virtual environment (Python 3.13) and installs dependencies from pyproject.toml.
Configure environment variables
cp .env.sample .env
Edit .env and set at least:
GOOGLE_API_KEY(required): Used for Gemini (LLM and embeddings).
Get it from Google AI Studio.
Optional:
OPIK_API_KEY: For observability (tracing). Get it from Opik.OPIK_PROJECT_NAME: Opik project name (default:knowledge-assistant).MODEL_NAME: Gemini model (default:gemini-2.0-flash).CHROMA_PERSIST_DIR: Directory for ChromaDB (default:./chroma_data).CHROMA_COLLECTION: Collection name (default:knowledge_base).RAG_TOP_K: Number of chunks to retrieve (default:5).EMBEDDING_MODEL: Google embedding model for RAG (default:models/gemini-embedding-001). Override if your API uses a different model.
Run the server
Stdio (for Cursor / Claude Desktop):
uv run python -m src.server --transport stdio
HTTP:
uv run python -m src.server --transport http --port 8000
Or use the entry point:
uv run knowledge-assistant-mcp --transport stdio
You should see the FastMCP banner and the process waiting for connections; stop with Ctrl+C.
Environment variables
Variables you can set in .env, and where to get API keys:
Environment variables summary
| Variable | Required | Description |
|---|---|---|
GOOGLE_API_KEY |
Yes | Google AI (Gemini) API key – Google AI Studio |
OPIK_API_KEY |
No | Opik API key for observability – Opik |
OPIK_PROJECT_NAME |
No | Opik project name (default: knowledge-assistant) |
MODEL_NAME |
No | Gemini model (default: gemini-2.0-flash) |
CHROMA_PERSIST_DIR |
No | ChromaDB persistence directory (default: ./chroma_data) |
CHROMA_COLLECTION |
No | ChromaDB collection name (default: knowledge_base) |
RAG_TOP_K |
No | Number of chunks to retrieve (default: 5) |
EMBEDDING_MODEL |
No | Google embedding model for RAG (default: models/gemini-embedding-001) |
Connecting from Cursor (or another MCP client)
Add this to your Cursor MCP settings (e.g. .cursor/mcp.json), replacing the path and API key as needed:
{
"mcpServers": {
"knowledge-assistant": {
"command": "uv",
"args": [
"--directory",
"/absolute/path/to/knowledge-assistant-mcp",
"run",
"python",
"-m",
"src.server",
"--transport",
"stdio"
],
"env": {
"GOOGLE_API_KEY": "your-google-api-key-here"
}
}
}
}
You can also rely on a .env file in the project directory and omit env or only set ENV_FILE_PATH if your client supports it.
How to use
Once the server is running and connected (e.g. in Cursor):
-
Add documents (optional but needed for RAG answers)
Use the add_documents tool: passtext(the content to ingest) and optionallysource(e.g."Context Engineering Book"). The server chunks and embeds the text into ChromaDB. You can add more documents anytime. -
Ask a question
Use the query_knowledge_base tool with your question. The server runs the multi-agent pipeline (coordinator → retriever → synthesizer) and returns a proposed answer with citations. -
Human-in-the-loop
Review the proposal, then call approve_or_edit_answer:- To accept:
approved=True, sameproposal_answeras returned. - To request changes:
approved=False, sameproposal_answer, and setuser_feedbackto your requested edits. The server can then produce a revised answer.
- To accept:
You can also use search_knowledge_base to only search the vector store (no generated answer), and the knowledge_assistant_workflow prompt as a step-by-step guide. The resource knowledge-assistant://server_info exposes server metadata and RAG settings.
Features
Core:
FastMCP server (src/server.py) with tools (query_knowledge_base, approve_or_edit_answer, add_documents, search_knowledge_base), one workflow prompt (knowledge_assistant_workflow) with a human-in-the-loop step (review proposal → approve or edit via approve_or_edit_answer), uv-based setup, and the structure above. No API keys in the repo; .env.sample and .gitignore are included.
Additional:
- Multi-agent orchestration – Coordinator, retriever (RAG), and synthesizer agents in
src/app/orchestrator.py. - RAG with vector database – ChromaDB + LangChain + Google embeddings;
search_knowledge_baseandadd_documents; persistence viaCHROMA_PERSIST_DIR. - MCP resource –
knowledge-assistant://server_infoexposes server name, version, collection, and RAG settings. - Human-in-the-loop validation – Workflow returns a proposal; the user approves or requests edits with
approve_or_edit_answerbefore finalizing. - Structured outputs – Pydantic models (
AnswerProposal,SearchResult,RetrievedChunk,SynthesisResult) for synthesizer and API responses. - Observability (Opik) – Optional tracing when
OPIK_API_KEYis set.
Docker
Build and run with Docker:
docker build -t knowledge-assistant-mcp .
docker run --rm -e GOOGLE_API_KEY=your-key -v $(pwd)/chroma_data:/app/chroma_data knowledge-assistant-mcp --transport stdio
For HTTP on port 8000:
docker run --rm -p 8000:8000 -e GOOGLE_API_KEY=your-key -v $(pwd)/chroma_data:/app/chroma_data knowledge-assistant-mcp --transport http --port 8000
License
MIT (or your chosen license).
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.