MCP-based Academic Paper Retrieval RAG System

MCP-based Academic Paper Retrieval RAG System

Enables AI coding assistants to query private academic paper collections via standard MCP tools, with hybrid retrieval, reranking, and inline citations.

Category
Visit Server

README

MCP-based Academic Paper Retrieval RAG System

A production-oriented, fully pluggable Retrieval-Augmented Generation (RAG) system exposed as an MCP (Model Context Protocol) server — enabling AI coding assistants like GitHub Copilot and Claude Desktop to query private knowledge bases via standard Tool Calling.


Overview

Engineers on R&D teams frequently search academic literature for technical research, but papers are scattered across internal file systems and keyword search fails to understand semantic intent — or integrate with AI coding assistant workflows.

This project solves that by building an MCP-based academic paper retrieval RAG system: ingest your PDFs once, then ask questions directly from GitHub Copilot or Claude Desktop via standard Tool Calling. Responses include inline citations and are grounded in your private document collection. The system supports both API-based LLM backends and fully local Ollama deployment. When configured with local embedding and local LLM backends, it can run without external network dependency, making it suitable for privacy-sensitive or air-gapped environments.

Benchmark results (10 papers, 1,104 chunks, 100 Golden QA pairs):

Metric Hybrid Search (baseline) + BGE Reranker v2-m3
Hit Rate@10 89% 89%
MRR 0.61 0.83 (+36%)
NDCG@10 0.68 0.84 (+24%)

Ablation: replacing BGE Reranker v2-m3 with a general-purpose MS MARCO Cross-Encoder degraded MRR on academic text — domain-aligned reranking matters.


Architecture

PDF Documents
     ↓
┌─────────────────────────────────────┐
│         Ingestion Pipeline          │
│  Load → Split → Transform → Embed  │
│           → Upsert                  │
└──────────────────┬──────────────────┘
                   │
         ┌─────────┴──────────┐
         │                    │
   ChromaDB (Dense)      BM25 Index (Sparse)
   BGE-M3 vectors        Term frequencies
         │                    │
         └─────────┬──────────┘
                   │ RRF Fusion
                   ↓
          BGE Reranker v2-m3
                   ↓
┌─────────────────────────────────────┐
│           MCP Server                │
│  JSON-RPC 2.0 + Stdio Transport     │
│                                     │
│  • query_knowledge_hub              │
│  • list_collections                 │
│  • get_document_summary             │
└─────────────────────────────────────┘
         ↓                ↓
  GitHub Copilot    Claude Desktop

Key Features

Two-Stage Hybrid Retrieval

  • Coarse ranking: BGE-M3 dense vectors + BM25 sparse retrieval run in parallel, fused via Reciprocal Rank Fusion (RRF)
  • Fine ranking: BGE Reranker v2-m3 (same model family as the embedder) re-scores the top candidates — domain-aligned reranking for academic text
  • Graceful fallback: reranker failure automatically falls back to fusion order with fallback=True metadata

MCP-Compliant RAG Server

  • Full JSON-RPC 2.0 protocol over stdio transport
  • Three tools available to any MCP client:
    • query_knowledge_hub — semantic search with inline citations
    • list_collections — enumerate available knowledge bases
    • get_document_summary — retrieve title, summary, tags for a document
  • Structured multi-modal responses with citation blocks

Intelligent Ingestion Pipeline

  • 5 stages: Load → Split → Transform → Embed → Upsert
  • pdfplumber PDF parsing with image extraction and [IMAGE: id] placeholder injection
  • Optional LLM-driven chunk refinement and metadata enrichment (Title / Summary / Tags)
  • SHA-256 content hashing for idempotent incremental ingestion — re-running on unchanged files is a no-op

Fully Pluggable Architecture

Every core component is swappable via config/settings.yaml with zero code changes:

Component Supported Backends
LLM OpenAI / Azure / DeepSeek / Ollama
Embedding OpenAI / Azure / Ollama (BGE-M3, nomic-embed-text, …)
Vector Store ChromaDB (local persistence)
Reranker BGE / Cross-Encoder / LLM / None
Splitter Fixed / Recursive / Semantic (TODO)
Evaluator Custom (Hit Rate / MRR / NDCG) / Ragas

When configured with Ollama-based local LLM and embedding backends, the system can run fully locally with no data leaving the environment.

Observability Dashboard

6-page Streamlit management platform:

Page Function
System Overview Live component configuration view
Data Browser Browse documents and chunks with metadata
Ingestion Manager Upload PDFs, monitor real-time progress, delete documents
Ingestion Traces Per-run stage timing (load / split / transform / embed / upsert)
Query Traces Per-query stage breakdown (dense / sparse / fusion / rerank)
Evaluation Run Golden QA evaluation and view Hit Rate / MRR / NDCG results

Evaluation Framework

  • Custom evaluator: Hit Rate@K, MRR, NDCG@10
  • Ragas integration (Faithfulness / Answer Relevancy / Context Precision) — optional, requires API key
  • Golden Test Set regression pipeline — every retrieval strategy change produces quantified before/after metrics

Quick Start

Prerequisites

  • Python 3.10+
  • Ollama (for local embedding) or an OpenAI-compatible API key
git clone <repo-url>
cd Modular-RAG-MCP-Server

python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate
pip install -e ".[dev]"

Configure

Edit config/settings.yaml to set your embedding provider and model:

embedding:
  provider: ollama        # or openai / azure
  model: bge-m3

rerank:
  backend: cross_encoder  # or none / llm
  model: BAAI/bge-reranker-v2-m3

Set API keys if using cloud providers:

export OPENAI_API_KEY="sk-..."

Ingest Documents

python scripts/ingest.py --path /path/to/papers/ --collection research

Query

python scripts/query.py --query "How does RRF fusion work?" --verbose

Launch Dashboard

streamlit run src/observability/dashboard/app.py
# Open http://localhost:8501

Run Evaluation

python scripts/evaluate.py --collection research --output results/eval.json

MCP Integration

GitHub Copilot (VS Code)

Create .vscode/mcp.json:

{
  "servers": {
    "modular-rag": {
      "type": "stdio",
      "command": "python",
      "args": ["src/mcp_server/server.py", "--config", "config/settings.yaml"],
      "env": {
        "OPENAI_API_KEY": "${env:OPENAI_API_KEY}"
      }
    }
  }
}

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "modular-rag": {
      "command": "python",
      "args": ["/absolute/path/to/src/mcp_server/server.py"],
      "env": {
        "OPENAI_API_KEY": "your-api-key"
      }
    }
  }
}

Running Tests

pytest -q                        # full suite (649 tests)
pytest -q tests/unit/            # unit tests only
pytest -q tests/integration/     # integration tests
pytest -q tests/e2e/             # E2E: MCP client, dashboard smoke, recall regression

Tech Stack

  • Retrieval: BGE-M3 · BM25 · RRF · BGE Reranker v2-m3
  • Storage: ChromaDB · SQLite · JSON Lines
  • Protocol: MCP / JSON-RPC 2.0 / Stdio Transport
  • LLM Backends: OpenAI · Azure OpenAI · Ollama · DeepSeek
  • Evaluation: Custom (Hit Rate / MRR / NDCG) · Ragas
  • Dashboard: Streamlit
  • Testing: Pytest · 638 tests across Unit / Integration / E2E
  • Design Patterns: Abstract Base Class · Factory Pattern · Dependency Injection · TDD

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured