Knowledge Assistant MCP Server

Knowledge Assistant MCP Server

A multi-agent RAG MCP server that answers questions from your documents with a human-in-the-loop approval step, using a coordinator, retriever, and synthesizer agents.

Category
Visit Server

README

Knowledge Assistant MCP Server

A multi-agent RAG (Retrieval-Augmented Generation) MCP server built with FastMCP in Python. It answers questions from your documents using a coordinator, retriever, and synthesizer agents, and includes a human-in-the-loop step where you approve or request edits before finalizing answers.

What it does

  • Query your knowledge base: Ask questions in natural language; the server retrieves relevant chunks and proposes an answer with citations.
  • Multi-agent pipeline: A coordinator decides whether to use the knowledge base, a retriever (RAG) fetches relevant documents, and a synthesizer produces a structured answer proposal.
  • Human-in-the-loop: You review the proposed answer and either approve it or request edits before the answer is finalized.
  • Add documents: Ingest text into the vector store (ChromaDB) so the assistant can answer from your own content.

Use cases: Internal knowledge assistant, FAQ over your docs, Q&A over notes or wikis, and similar RAG workflows that require a human approval step.


Project Structure

knowledge-assistant-mcp/
├── src/
│   ├── server.py           # FastMCP app entry point
│   ├── config/
│   │   └── settings.py     # pydantic-settings (server name, API keys, model, RAG settings)
│   ├── routers/
│   │   ├── tools.py        # Register MCP tools
│   │   ├── resources.py    # Register MCP resources
│   │   └── prompts.py      # Register MCP prompts
│   ├── tools/              # Tool implementations
│   ├── resources/          # Resource implementations
│   ├── prompts/            # Prompt content (workflow with human-in-the-loop)
│   ├── app/                # Core logic: RAG, LLM, orchestrator (coordinator/retriever/synthesizer)
│   ├── models/             # Pydantic schemas (structured outputs)
│   └── utils/              # Helpers (e.g. Opik)
├── pyproject.toml
├── .env.sample
├── Dockerfile
└── README.md

Setup

Prerequisites: Python 3.13, uv.

Clone the repository

git clone https://github.com/YOUR_USERNAME/knowledge-assistant-mcp.git
cd knowledge-assistant-mcp

Install dependencies with uv

uv sync

This creates a virtual environment (Python 3.13) and installs dependencies from pyproject.toml.

Configure environment variables

cp .env.sample .env

Edit .env and set at least:

  • GOOGLE_API_KEY (required): Used for Gemini (LLM and embeddings).
    Get it from Google AI Studio.

Optional:

  • OPIK_API_KEY: For observability (tracing). Get it from Opik.
  • OPIK_PROJECT_NAME: Opik project name (default: knowledge-assistant).
  • MODEL_NAME: Gemini model (default: gemini-2.0-flash).
  • CHROMA_PERSIST_DIR: Directory for ChromaDB (default: ./chroma_data).
  • CHROMA_COLLECTION: Collection name (default: knowledge_base).
  • RAG_TOP_K: Number of chunks to retrieve (default: 5).
  • EMBEDDING_MODEL: Google embedding model for RAG (default: models/gemini-embedding-001). Override if your API uses a different model.

Run the server

Stdio (for Cursor / Claude Desktop):

uv run python -m src.server --transport stdio

HTTP:

uv run python -m src.server --transport http --port 8000

Or use the entry point:

uv run knowledge-assistant-mcp --transport stdio

You should see the FastMCP banner and the process waiting for connections; stop with Ctrl+C.


Environment variables

Variables you can set in .env, and where to get API keys:

Environment variables summary

Variable Required Description
GOOGLE_API_KEY Yes Google AI (Gemini) API key – Google AI Studio
OPIK_API_KEY No Opik API key for observability – Opik
OPIK_PROJECT_NAME No Opik project name (default: knowledge-assistant)
MODEL_NAME No Gemini model (default: gemini-2.0-flash)
CHROMA_PERSIST_DIR No ChromaDB persistence directory (default: ./chroma_data)
CHROMA_COLLECTION No ChromaDB collection name (default: knowledge_base)
RAG_TOP_K No Number of chunks to retrieve (default: 5)
EMBEDDING_MODEL No Google embedding model for RAG (default: models/gemini-embedding-001)

Connecting from Cursor (or another MCP client)

Add this to your Cursor MCP settings (e.g. .cursor/mcp.json), replacing the path and API key as needed:

{
  "mcpServers": {
    "knowledge-assistant": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/knowledge-assistant-mcp",
        "run",
        "python",
        "-m",
        "src.server",
        "--transport",
        "stdio"
      ],
      "env": {
        "GOOGLE_API_KEY": "your-google-api-key-here"
      }
    }
  }
}

You can also rely on a .env file in the project directory and omit env or only set ENV_FILE_PATH if your client supports it.


How to use

Once the server is running and connected (e.g. in Cursor):

  1. Add documents (optional but needed for RAG answers)
    Use the add_documents tool: pass text (the content to ingest) and optionally source (e.g. "Context Engineering Book"). The server chunks and embeds the text into ChromaDB. You can add more documents anytime.

  2. Ask a question
    Use the query_knowledge_base tool with your question. The server runs the multi-agent pipeline (coordinator → retriever → synthesizer) and returns a proposed answer with citations.

  3. Human-in-the-loop
    Review the proposal, then call approve_or_edit_answer:

    • To accept: approved=True, same proposal_answer as returned.
    • To request changes: approved=False, same proposal_answer, and set user_feedback to your requested edits. The server can then produce a revised answer.

You can also use search_knowledge_base to only search the vector store (no generated answer), and the knowledge_assistant_workflow prompt as a step-by-step guide. The resource knowledge-assistant://server_info exposes server metadata and RAG settings.


Features

Core:

FastMCP server (src/server.py) with tools (query_knowledge_base, approve_or_edit_answer, add_documents, search_knowledge_base), one workflow prompt (knowledge_assistant_workflow) with a human-in-the-loop step (review proposal → approve or edit via approve_or_edit_answer), uv-based setup, and the structure above. No API keys in the repo; .env.sample and .gitignore are included.

Additional:

  • Multi-agent orchestration – Coordinator, retriever (RAG), and synthesizer agents in src/app/orchestrator.py.
  • RAG with vector database – ChromaDB + LangChain + Google embeddings; search_knowledge_base and add_documents; persistence via CHROMA_PERSIST_DIR.
  • MCP resourceknowledge-assistant://server_info exposes server name, version, collection, and RAG settings.
  • Human-in-the-loop validation – Workflow returns a proposal; the user approves or requests edits with approve_or_edit_answer before finalizing.
  • Structured outputs – Pydantic models (AnswerProposal, SearchResult, RetrievedChunk, SynthesisResult) for synthesizer and API responses.
  • Observability (Opik) – Optional tracing when OPIK_API_KEY is set.

Docker

Build and run with Docker:

docker build -t knowledge-assistant-mcp .
docker run --rm -e GOOGLE_API_KEY=your-key -v $(pwd)/chroma_data:/app/chroma_data knowledge-assistant-mcp --transport stdio

For HTTP on port 8000:

docker run --rm -p 8000:8000 -e GOOGLE_API_KEY=your-key -v $(pwd)/chroma_data:/app/chroma_data knowledge-assistant-mcp --transport http --port 8000

License

MIT (or your chosen license).

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured