MCP Servers

Knowledge Assistant MCP Server

A multi-agent RAG MCP server that answers questions from your documents with a human-in-the-loop approval step, using a coordinator, retriever, and synthesizer agents.

README

Knowledge Assistant MCP Server

A multi-agent RAG (Retrieval-Augmented Generation) MCP server built with FastMCP in Python. It answers questions from your documents using a coordinator, retriever, and synthesizer agents, and includes a human-in-the-loop step where you approve or request edits before finalizing answers.

What it does

Query your knowledge base: Ask questions in natural language; the server retrieves relevant chunks and proposes an answer with citations.
Multi-agent pipeline: A coordinator decides whether to use the knowledge base, a retriever (RAG) fetches relevant documents, and a synthesizer produces a structured answer proposal.
Human-in-the-loop: You review the proposed answer and either approve it or request edits before the answer is finalized.
Add documents: Ingest text into the vector store (ChromaDB) so the assistant can answer from your own content.

Use cases: Internal knowledge assistant, FAQ over your docs, Q&A over notes or wikis, and similar RAG workflows that require a human approval step.

Project Structure

knowledge-assistant-mcp/
├── src/
│   ├── server.py           # FastMCP app entry point
│   ├── config/
│   │   └── settings.py     # pydantic-settings (server name, API keys, model, RAG settings)
│   ├── routers/
│   │   ├── tools.py        # Register MCP tools
│   │   ├── resources.py    # Register MCP resources
│   │   └── prompts.py      # Register MCP prompts
│   ├── tools/              # Tool implementations
│   ├── resources/          # Resource implementations
│   ├── prompts/            # Prompt content (workflow with human-in-the-loop)
│   ├── app/                # Core logic: RAG, LLM, orchestrator (coordinator/retriever/synthesizer)
│   ├── models/             # Pydantic schemas (structured outputs)
│   └── utils/              # Helpers (e.g. Opik)
├── pyproject.toml
├── .env.sample
├── Dockerfile
└── README.md

Setup

Prerequisites: Python 3.13, uv.

Clone the repository

git clone https://github.com/YOUR_USERNAME/knowledge-assistant-mcp.git
cd knowledge-assistant-mcp

Install dependencies with `uv`

uv sync

This creates a virtual environment (Python 3.13) and installs dependencies from pyproject.toml.

Configure environment variables

cp .env.sample .env

Edit .env and set at least:

GOOGLE_API_KEY (required): Used for Gemini (LLM and embeddings).
Get it from Google AI Studio.

Optional:

OPIK_API_KEY: For observability (tracing). Get it from Opik.
OPIK_PROJECT_NAME: Opik project name (default: knowledge-assistant).
MODEL_NAME: Gemini model (default: gemini-2.0-flash).
CHROMA_PERSIST_DIR: Directory for ChromaDB (default: ./chroma_data).
CHROMA_COLLECTION: Collection name (default: knowledge_base).
RAG_TOP_K: Number of chunks to retrieve (default: 5).
EMBEDDING_MODEL: Google embedding model for RAG (default: models/gemini-embedding-001). Override if your API uses a different model.

Run the server

Stdio (for Cursor / Claude Desktop):

uv run python -m src.server --transport stdio

HTTP:

uv run python -m src.server --transport http --port 8000

Or use the entry point:

uv run knowledge-assistant-mcp --transport stdio

You should see the FastMCP banner and the process waiting for connections; stop with Ctrl+C.

Environment variables

Variables you can set in .env, and where to get API keys:

Environment variables summary

Variable	Required	Description
`GOOGLE_API_KEY`	Yes	Google AI (Gemini) API key – Google AI Studio
`OPIK_API_KEY`	No	Opik API key for observability – Opik
`OPIK_PROJECT_NAME`	No	Opik project name (default: `knowledge-assistant`)
`MODEL_NAME`	No	Gemini model (default: `gemini-2.0-flash`)
`CHROMA_PERSIST_DIR`	No	ChromaDB persistence directory (default: `./chroma_data`)
`CHROMA_COLLECTION`	No	ChromaDB collection name (default: `knowledge_base`)
`RAG_TOP_K`	No	Number of chunks to retrieve (default: `5`)
`EMBEDDING_MODEL`	No	Google embedding model for RAG (default: `models/gemini-embedding-001`)

Connecting from Cursor (or another MCP client)

Add this to your Cursor MCP settings (e.g. .cursor/mcp.json), replacing the path and API key as needed:

{
  "mcpServers": {
    "knowledge-assistant": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/knowledge-assistant-mcp",
        "run",
        "python",
        "-m",
        "src.server",
        "--transport",
        "stdio"
      ],
      "env": {
        "GOOGLE_API_KEY": "your-google-api-key-here"
      }
    }
  }
}

You can also rely on a .env file in the project directory and omit env or only set ENV_FILE_PATH if your client supports it.

How to use

Once the server is running and connected (e.g. in Cursor):

Add documents (optional but needed for RAG answers)
Use the add_documents tool: pass text (the content to ingest) and optionally source (e.g. "Context Engineering Book"). The server chunks and embeds the text into ChromaDB. You can add more documents anytime.
Ask a question
Use the query_knowledge_base tool with your question. The server runs the multi-agent pipeline (coordinator → retriever → synthesizer) and returns a proposed answer with citations.
Human-in-the-loop
Review the proposal, then call approve_or_edit_answer:
- To accept: approved=True, same proposal_answer as returned.
- To request changes: approved=False, same proposal_answer, and set user_feedback to your requested edits. The server can then produce a revised answer.

You can also use search_knowledge_base to only search the vector store (no generated answer), and the knowledge_assistant_workflow prompt as a step-by-step guide. The resource knowledge-assistant://server_info exposes server metadata and RAG settings.

Features

Core:

FastMCP server (src/server.py) with tools (query_knowledge_base, approve_or_edit_answer, add_documents, search_knowledge_base), one workflow prompt (knowledge_assistant_workflow) with a human-in-the-loop step (review proposal → approve or edit via approve_or_edit_answer), uv-based setup, and the structure above. No API keys in the repo; .env.sample and .gitignore are included.

Additional:

Multi-agent orchestration – Coordinator, retriever (RAG), and synthesizer agents in src/app/orchestrator.py.
RAG with vector database – ChromaDB + LangChain + Google embeddings; search_knowledge_base and add_documents; persistence via CHROMA_PERSIST_DIR.
MCP resource – knowledge-assistant://server_info exposes server name, version, collection, and RAG settings.
Human-in-the-loop validation – Workflow returns a proposal; the user approves or requests edits with approve_or_edit_answer before finalizing.
Structured outputs – Pydantic models (AnswerProposal, SearchResult, RetrievedChunk, SynthesisResult) for synthesizer and API responses.
Observability (Opik) – Optional tracing when OPIK_API_KEY is set.

Docker

Build and run with Docker:

docker build -t knowledge-assistant-mcp .
docker run --rm -e GOOGLE_API_KEY=your-key -v $(pwd)/chroma_data:/app/chroma_data knowledge-assistant-mcp --transport stdio

For HTTP on port 8000:

docker run --rm -p 8000:8000 -e GOOGLE_API_KEY=your-key -v $(pwd)/chroma_data:/app/chroma_data knowledge-assistant-mcp --transport http --port 8000

License

MIT (or your chosen license).

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured