MCP Servers

MCP-Knowledge-Toolbox

A local knowledge-base MCP toolbox for Agent workflows, exposing 11 tools for document ingest, retrieval, context reading, citation checking, and evaluation-report reading.

README

MCP-Knowledge-Toolbox

MCP-Knowledge-Toolbox is a local knowledge-base MCP toolbox built on top of the Project 1 DocuPilot-RAG baseline. Project 2 does not modify Project 1 core code. It packages local document ingest, retrieval, context reading, citation checking, and evaluation-report reading as MCP-callable tools.

This repository is currently an engineering MVP, not a production multi-tenant RAG platform.

Architecture

flowchart LR
    A[Local Documents] --> B[Parser]
    B --> C[Chunker]
    C --> D[SQLite Metadata Store]
    C --> E[Vector Index]
    C --> F[BM25 Index]
    E --> G[Hybrid Retriever]
    F --> G
    G --> H[Lightweight Reranker]
    H --> I[MCP Tools]
    I --> J[MCP stdio Client]
    I --> K[Citation Verifier]
    I --> L[Eval Report Reader]

Tech Stack

Python 3.10/3.11 compatible code path
SQLite metadata store
MCP stdio JSON-RPC compatible MVP transport
Optional official MCP Python SDK when installed
sentence-transformers with BAAI/bge-small-zh-v1.5 as the default embedding model
hashing vector fallback when the embedding model is unavailable
PyMuPDF for PDF, python-docx for docx, native readers for Markdown/txt
pytest integration tests

Tools

The server exposes 11 tools:

ingest_file, ingest_folder, search_knowledge, read_chunk_neighbors, summarize_document, query_table, verify_citation, get_eval_report, list_documents, delete_document, server_status.

MCP Compatibility

Current implementation is an MCP stdio JSON-RPC compatible MVP. It can use the official MCP Python SDK if installed; otherwise it uses the built-in stdio JSON-RPC transport.

MCP capability	Status	Notes
stdio transport	Supported	Used by `scripts/run_mcp_server.py`.
`initialize`	Supported	Returns protocol version, server info, and tool capability.
`tools/list`	Supported	Returns all registered tool schemas.
`tools/call`	Supported	Returns text content and structuredContent.
notifications/initialized	Accepted	Notification is ignored safely.
resources	Not implemented	No MCP resources are exposed yet.
prompts	Not implemented	No MCP prompts are exposed yet.
sampling	Not implemented	No LLM sampling bridge.
streaming progress	Not verified	Tool calls are request/response only.
official SDK mode	Optional	Depends on `mcp` package availability.

Reproduce From Scratch

From a fresh clone:

pip install -r requirements.txt
python scripts/ingest_demo_docs.py --input data/raw --collection demo
python scripts/build_index.py --collection demo
python scripts/run_mcp_stdio_client_demo.py
pytest tests

Expected scale after ingest:

ingested files: 20
success: 20
failed: 0
chunks: 1201
documents: 20
collections: demo
embedding_provider: sentence-transformers

End-to-End Demo

Generate the full E2E MCP log:

python scripts/run_e2e_demo.py --collection e2e --input data/raw --output docs/e2e_demo_log.md

The log records:

MCP server startup through stdio subprocess
stdio client initialize
tools/list
tools/call ingest_folder
tools/call list_documents
tools/call search_knowledge
tools/call read_chunk_neighbors
tools/call verify_citation
final answer with citations

See docs/e2e_demo_log.md.

Retrieval Evaluation

Generate 50 QA samples and evaluate four retrieval strategies:

python scripts/run_retrieval_eval.py --collection demo

Outputs:

data/eval/demo_qa.jsonl
docs/retrieval_eval_report.md

Current measured metrics:

Strategy	Hit@3	Hit@5	MRR	Avg Latency (ms)
bm25	0.400	0.400	0.400	193.55
vector	0.340	0.340	0.340	82.97
hybrid	0.460	0.460	0.460	84.71
hybrid_rerank	0.460	0.460	0.460	80.97

Hybrid improved over individual retrieval modes on this demo set. Hybrid + rerank did not improve over hybrid; the report explains that the corpus is synthetic and repetitive, so first-stage retrieval already ranks many expected documents at the top.

Final Acceptance Artifacts

docs/e2e_demo_log.md
docs/retrieval_eval_report.md
docs/final_acceptance.md
data/eval/demo_qa.jsonl

Limitations

hashing vector is only a fallback when the sentence-transformers model is unavailable.
verify_citation is a lightweight keyword/similarity check, not an LLM judge.
query_table is Markdown table caption/content matching, not complex table reasoning.
rerank is lightweight token-overlap reranking, not a cross-encoder reranker.
summarize_document uses extractive summarization when no LLM is configured.
current storage is local SQLite and local JSON indexes, not a distributed vector database.
current MCP support covers tools over stdio, not resources/prompts/sampling.
this is not a production-grade multi-tenant platform.

Resume Wording

MCP-Knowledge-Toolbox: a local knowledge-base MCP toolbox for Agent workflows. Built an MCP stdio JSON-RPC compatible server exposing 11 tools for document ingest, SQLite metadata management, sentence-transformers vector retrieval, BM25, hybrid retrieval, context reading, citation verification, document deletion sync, and evaluation report reading. Added an end-to-end stdio client demo, 50-sample retrieval evaluation, and 37 pytest tests. Demo acceptance reached 20 documents and 1201 chunks across Markdown, txt, docx, and PDF.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured