MCP Servers

MMWRAG

Bilingual RAG for scientific literature with search-only MCP server. Enables hybrid retrieval and reranking over indexed textbooks/papers.

README

MMWRAG

A bilingual (Russian/English) RAG over scientific literature (textbooks/papers): vision PDF parsing, BGE-M3 hybrid (dense+sparse) retrieval with a cross-encoder reranker, exposed as an MCP tool (search only — the consumer composes the answer). Every retrieval decision here is measurement-driven — see DECISIONS.md.

Features

Vision PDF parsing behind a swappable interface (cloud PaddleOCR-VL / local PP-StructureV3) — required because the text layer doesn't encode formula structure.
Structure-aware chunking (~512-token packing over blocks, page spans kept for citations).
BGE-M3 dense + sparse embeddings; Qdrant hybrid search with server-side RRF.
Cross-encoder reranker (bge-reranker-v2-m3) over the top-N pool.
Book-aware cross-lingual routing — search(book_id=...) targets a specific book/language.
MCP server (search, list_books) over streamable HTTP — no answer generation.
Eval harness — page-level hit@k / MRR / recall@k, cross-book and cross-lingual.

Architecture

INDEXING   PDF ─parse─> Page[] ─chunk─> Chunk[] ─BGE-M3 (dense+sparse)─> Qdrant
QUERY      question ─HybridRetriever (RRF)─> top-N ─cross-encoder rerank─> top-k Source[]
MCP        client ─/mcp─> search(query, top_k, book_id) ─> fragments {book_id, pages, text, score}
                          list_books() ─> indexed books + language

Details in ARCHITECTURE.md.

Quickstart

# 1. dependencies (paddlepaddle-gpu is a manual prereq for the PARSING path only)
uv sync

# 2. vector database
docker compose up -d            # Qdrant on :6333

# 3. bring your own PDF and index it
#    parsing needs PADDLEOCR_TOKEN in .env (see .env.example);
#    pipeline: parse(pdf) -> chunk_pages(...) -> index_chunks(...)  (see notebooks/ for examples)

# 4. run the MCP server
uv run python -m src.mcp.server # streamable-http on 127.0.0.1:8000

The corpus is not included (copyright). Search/MCP need Qdrant + the local models (BGE-M3, the reranker); CPU works (slower), GPU is faster. Parsing additionally needs a PaddleOCR-VL cloud token.

Demo

A real session against the MCP server (notebooks/mcp_smoke.py, output trimmed to metadata):

tools: ['search', 'list_books']

list_books:
  {'book_id': 'zorich_v1', 'title': 'Zorich — Mathematical Analysis I', 'language': 'ru', 'chunks': 1472}
  {'book_id': 'zorich_v2', 'title': 'Zorich — Mathematical Analysis II', 'language': 'ru', 'chunks': 2526}
  {'book_id': 'lebl', 'title': 'Lebl — Basic Analysis I', 'language': 'en', 'chunks': 722}

search RU (all books), top 3:
  zorich_v1 159 2.125
  zorich_v1 158–159 0.297
  zorich_v2 517 -0.357

search RU routed to lebl (cross-lingual), top 3:
  lebl 135–136 0.123
  lebl 167 -0.047
  lebl 208 -0.141

The last call shows book-aware cross-lingual routing: a Russian query with book_id="lebl" returns the English source (Lebl, p.135–136) that a plain cross-book search buries behind the Russian equivalent (see DECISIONS.md §5).

Project structure

src/
  parse/   vision PDF -> Page[]   (cloud / local engines, idempotent cache)
  chunk/   Page[] -> Chunk[]      (structure-aware packing, page spans)
  index/   Chunk[] -> BGE-M3 -> Qdrant   (Embedder / VectorStore interfaces)
  query/   HybridRetriever + RerankingRetriever; answer() with citations
  mcp/     MCP server: search / list_books (pure core + thin FastMCP server)
  eval/    page-level hit@k / MRR / recall@k; cross-book & cross-lingual
tests/     unit tests (pure logic on fakes; integration tests skip offline)
notebooks/ runnable examples & measurement runners (mcp_smoke, eval_*, diag_*)

Status & roadmap

Pipeline (parse → chunk → index → query) and a measured retrieval stack (hybrid + reranker) are done; the MCP search server is done. Next: a network model-serving backend, client ingestion, and an agent layer over MCP. The reasoning and numbers behind each choice are in DECISIONS.md.

License

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured