MMWRAG

MMWRAG

Bilingual RAG for scientific literature with search-only MCP server. Enables hybrid retrieval and reranking over indexed textbooks/papers.

Category
Visit Server

README

MMWRAG

CI License: MIT Python

A bilingual (Russian/English) RAG over scientific literature (textbooks/papers): vision PDF parsing, BGE-M3 hybrid (dense+sparse) retrieval with a cross-encoder reranker, exposed as an MCP tool (search only — the consumer composes the answer). Every retrieval decision here is measurement-driven — see DECISIONS.md.

Features

  • Vision PDF parsing behind a swappable interface (cloud PaddleOCR-VL / local PP-StructureV3) — required because the text layer doesn't encode formula structure.
  • Structure-aware chunking (~512-token packing over blocks, page spans kept for citations).
  • BGE-M3 dense + sparse embeddings; Qdrant hybrid search with server-side RRF.
  • Cross-encoder reranker (bge-reranker-v2-m3) over the top-N pool.
  • Book-aware cross-lingual routingsearch(book_id=...) targets a specific book/language.
  • MCP server (search, list_books) over streamable HTTP — no answer generation.
  • Eval harness — page-level hit@k / MRR / recall@k, cross-book and cross-lingual.

Architecture

INDEXING   PDF ─parse─> Page[] ─chunk─> Chunk[] ─BGE-M3 (dense+sparse)─> Qdrant
QUERY      question ─HybridRetriever (RRF)─> top-N ─cross-encoder rerank─> top-k Source[]
MCP        client ─/mcp─> search(query, top_k, book_id) ─> fragments {book_id, pages, text, score}
                          list_books() ─> indexed books + language

Details in ARCHITECTURE.md.

Quickstart

# 1. dependencies (paddlepaddle-gpu is a manual prereq for the PARSING path only)
uv sync

# 2. vector database
docker compose up -d            # Qdrant on :6333

# 3. bring your own PDF and index it
#    parsing needs PADDLEOCR_TOKEN in .env (see .env.example);
#    pipeline: parse(pdf) -> chunk_pages(...) -> index_chunks(...)  (see notebooks/ for examples)

# 4. run the MCP server
uv run python -m src.mcp.server # streamable-http on 127.0.0.1:8000

The corpus is not included (copyright). Search/MCP need Qdrant + the local models (BGE-M3, the reranker); CPU works (slower), GPU is faster. Parsing additionally needs a PaddleOCR-VL cloud token.

Demo

A real session against the MCP server (notebooks/mcp_smoke.py, output trimmed to metadata):

tools: ['search', 'list_books']

list_books:
  {'book_id': 'zorich_v1', 'title': 'Zorich — Mathematical Analysis I', 'language': 'ru', 'chunks': 1472}
  {'book_id': 'zorich_v2', 'title': 'Zorich — Mathematical Analysis II', 'language': 'ru', 'chunks': 2526}
  {'book_id': 'lebl', 'title': 'Lebl — Basic Analysis I', 'language': 'en', 'chunks': 722}

search RU (all books), top 3:
  zorich_v1 159 2.125
  zorich_v1 158–159 0.297
  zorich_v2 517 -0.357

search RU routed to lebl (cross-lingual), top 3:
  lebl 135–136 0.123
  lebl 167 -0.047
  lebl 208 -0.141

The last call shows book-aware cross-lingual routing: a Russian query with book_id="lebl" returns the English source (Lebl, p.135–136) that a plain cross-book search buries behind the Russian equivalent (see DECISIONS.md §5).

Project structure

src/
  parse/   vision PDF -> Page[]   (cloud / local engines, idempotent cache)
  chunk/   Page[] -> Chunk[]      (structure-aware packing, page spans)
  index/   Chunk[] -> BGE-M3 -> Qdrant   (Embedder / VectorStore interfaces)
  query/   HybridRetriever + RerankingRetriever; answer() with citations
  mcp/     MCP server: search / list_books (pure core + thin FastMCP server)
  eval/    page-level hit@k / MRR / recall@k; cross-book & cross-lingual
tests/     unit tests (pure logic on fakes; integration tests skip offline)
notebooks/ runnable examples & measurement runners (mcp_smoke, eval_*, diag_*)

Status & roadmap

Pipeline (parse → chunk → index → query) and a measured retrieval stack (hybrid + reranker) are done; the MCP search server is done. Next: a network model-serving backend, client ingestion, and an agent layer over MCP. The reasoning and numbers behind each choice are in DECISIONS.md.

License

MIT © 2026 mikrominiw

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured