MMWRAG
Bilingual RAG for scientific literature with search-only MCP server. Enables hybrid retrieval and reranking over indexed textbooks/papers.
README
MMWRAG
A bilingual (Russian/English) RAG over scientific literature (textbooks/papers): vision PDF parsing, BGE-M3 hybrid (dense+sparse) retrieval with a cross-encoder reranker, exposed as an MCP tool (search only — the consumer composes the answer). Every retrieval decision here is measurement-driven — see DECISIONS.md.
Features
- Vision PDF parsing behind a swappable interface (cloud PaddleOCR-VL / local PP-StructureV3) — required because the text layer doesn't encode formula structure.
- Structure-aware chunking (~512-token packing over blocks, page spans kept for citations).
- BGE-M3 dense + sparse embeddings; Qdrant hybrid search with server-side RRF.
- Cross-encoder reranker (
bge-reranker-v2-m3) over the top-N pool. - Book-aware cross-lingual routing —
search(book_id=...)targets a specific book/language. - MCP server (
search,list_books) over streamable HTTP — no answer generation. - Eval harness — page-level
hit@k/MRR/recall@k, cross-book and cross-lingual.
Architecture
INDEXING PDF ─parse─> Page[] ─chunk─> Chunk[] ─BGE-M3 (dense+sparse)─> Qdrant
QUERY question ─HybridRetriever (RRF)─> top-N ─cross-encoder rerank─> top-k Source[]
MCP client ─/mcp─> search(query, top_k, book_id) ─> fragments {book_id, pages, text, score}
list_books() ─> indexed books + language
Details in ARCHITECTURE.md.
Quickstart
# 1. dependencies (paddlepaddle-gpu is a manual prereq for the PARSING path only)
uv sync
# 2. vector database
docker compose up -d # Qdrant on :6333
# 3. bring your own PDF and index it
# parsing needs PADDLEOCR_TOKEN in .env (see .env.example);
# pipeline: parse(pdf) -> chunk_pages(...) -> index_chunks(...) (see notebooks/ for examples)
# 4. run the MCP server
uv run python -m src.mcp.server # streamable-http on 127.0.0.1:8000
The corpus is not included (copyright). Search/MCP need Qdrant + the local models (BGE-M3, the reranker); CPU works (slower), GPU is faster. Parsing additionally needs a PaddleOCR-VL cloud token.
Demo
A real session against the MCP server (notebooks/mcp_smoke.py, output trimmed to
metadata):
tools: ['search', 'list_books']
list_books:
{'book_id': 'zorich_v1', 'title': 'Zorich — Mathematical Analysis I', 'language': 'ru', 'chunks': 1472}
{'book_id': 'zorich_v2', 'title': 'Zorich — Mathematical Analysis II', 'language': 'ru', 'chunks': 2526}
{'book_id': 'lebl', 'title': 'Lebl — Basic Analysis I', 'language': 'en', 'chunks': 722}
search RU (all books), top 3:
zorich_v1 159 2.125
zorich_v1 158–159 0.297
zorich_v2 517 -0.357
search RU routed to lebl (cross-lingual), top 3:
lebl 135–136 0.123
lebl 167 -0.047
lebl 208 -0.141
The last call shows book-aware cross-lingual routing: a Russian query with
book_id="lebl" returns the English source (Lebl, p.135–136) that a plain cross-book
search buries behind the Russian equivalent (see DECISIONS.md §5).
Project structure
src/
parse/ vision PDF -> Page[] (cloud / local engines, idempotent cache)
chunk/ Page[] -> Chunk[] (structure-aware packing, page spans)
index/ Chunk[] -> BGE-M3 -> Qdrant (Embedder / VectorStore interfaces)
query/ HybridRetriever + RerankingRetriever; answer() with citations
mcp/ MCP server: search / list_books (pure core + thin FastMCP server)
eval/ page-level hit@k / MRR / recall@k; cross-book & cross-lingual
tests/ unit tests (pure logic on fakes; integration tests skip offline)
notebooks/ runnable examples & measurement runners (mcp_smoke, eval_*, diag_*)
Status & roadmap
Pipeline (parse → chunk → index → query) and a measured retrieval stack (hybrid + reranker) are done; the MCP search server is done. Next: a network model-serving backend, client ingestion, and an agent layer over MCP. The reasoning and numbers behind each choice are in DECISIONS.md.
License
MIT © 2026 mikrominiw
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.