arXiv Discovery MCP

arXiv Discovery MCP

MCP server for discovering, triaging, and monitoring arXiv papers with transparent interest modeling and inspectable ranking.

Category
Visit Server

README

arXiv Discovery MCP

MCP server for arXiv paper discovery, triage, and monitoring with inspectable ranking.

What This Is

arXiv Discovery MCP is a research discovery substrate inspired by arxiv-sanity. It helps researchers and AI agents discover, triage, and monitor arXiv papers through the Model Context Protocol (MCP) -- exposing tools, resources, and prompts that integrate directly into Claude Desktop, Claude Code, or any MCP-compatible client.

Unlike "chat with papers" wrappers, this system provides explicit interest modeling, inspectable ranking explanations, and structured workflow state. You build an interest profile from seed papers, followed authors, and saved queries. The system uses that profile to rank search results, surface new papers, and explain why each result scored the way it did.

The project tracks content provenance and respects reuse constraints per content type. All ranking signals are transparent: you can see exactly which interest signals contributed to each paper's score.

Features

MCP Tools (13)

Discovery

  • search_papers -- Full-text search with optional profile-ranked results
  • browse_recent -- Browse recent papers by arXiv category
  • find_related_papers -- Find papers related to one or more seed papers
  • get_paper -- Retrieve full metadata for a single paper

Workflow

  • triage_paper -- Mark papers as shortlisted, seen, or dismissed
  • add_to_collection -- Add papers to named collections (auto-creates)
  • create_watch -- Create a saved query that monitors for new papers

Interest & Enrichment

  • add_signal -- Add an interest signal (seed paper, followed author, etc.)
  • batch_add_signals -- Add multiple interest signals at once
  • create_profile -- Create a named interest profile
  • suggest_signals -- Get profile expansion suggestions based on usage patterns
  • enrich_paper -- Fetch citation counts, FWCI, and topics from OpenAlex

Content

  • get_content_variant -- Retrieve paper content (abstract, HTML, or PDF-to-markdown) with rights gating

MCP Resources (4)

  • paper://{arxiv_id} -- Paper metadata, triage state, enrichment, and content variants
  • collection://{slug} -- Collection contents with pagination
  • profile://{slug} -- Interest profile with all signals
  • watch://{slug}/deltas -- New papers since last check

MCP Prompts (3)

  • daily-digest -- Workflow guidance for reviewing new papers across watches
  • literature-map-from-seeds -- Workflow for building a literature map from seed papers
  • triage-shortlist -- Workflow for reviewing and triaging a collection

CLI

All MCP capabilities are mirrored in a full CLI (arxiv-mcp) for terminal workflows, scripting, and debugging.

Test Coverage

493 tests passing across ingestion, search, workflow, interest modeling, enrichment, content normalization, and MCP integration.

Prerequisites

  • Python 3.13+ (uses 3.13 language features)
  • PostgreSQL 16+ (must be running and accessible)
  • Git (for cloning the repository)

Installation

git clone https://github.com/loganrooks/arxiv-sanity-mcp.git
cd arxiv-sanity-mcp

# Create and activate a virtual environment (recommended)
python3.13 -m venv .venv
source .venv/bin/activate  # Linux/macOS
# .venv\Scripts\activate   # Windows

pip install -e .

For development (tests, linting):

pip install -e ".[dev]"
# Or manually:
pip install pytest pytest-asyncio pytest-cov pytest-timeout respx ruff

Note: The MCP server configuration below requires the absolute path to your venv's Python interpreter. You can find it with which python after activating the venv.

Database Setup

  1. Create the database user and databases:
sudo -u postgres psql -c "CREATE USER arxiv_mcp WITH PASSWORD 'arxiv_mcp_dev';"
sudo -u postgres psql -c "CREATE DATABASE arxiv_mcp OWNER arxiv_mcp;"
sudo -u postgres psql -c "CREATE DATABASE arxiv_mcp_test OWNER arxiv_mcp;"
  1. Create a .env file in the project root (or set environment variables):
DATABASE_URL=postgresql+asyncpg://arxiv_mcp:arxiv_mcp_dev@localhost:5432/arxiv_mcp
  1. Run database migrations:
alembic upgrade head

Quick Start

Once installed and the database is set up, try these commands:

# Harvest a paper by arXiv ID
arxiv-mcp harvest fetch 2301.00001

# Search for papers
arxiv-mcp search query "attention mechanism"

# Browse recent papers in a category
arxiv-mcp search browse --category cs.AI

# Create a collection
arxiv-mcp collection create "reading-list"

# Triage a paper
arxiv-mcp triage mark 2301.00001 shortlisted

MCP Server Configuration

Claude Code

Use claude mcp add-json with your venv's absolute Python path:

claude mcp add-json arxiv-discovery --scope local '{
  "command": "/absolute/path/to/arxiv-sanity-mcp/.venv/bin/python",
  "args": ["-m", "arxiv_mcp.mcp"],
  "cwd": "/absolute/path/to/arxiv-sanity-mcp",
  "env": {
    "DATABASE_URL": "postgresql+asyncpg://arxiv_mcp:arxiv_mcp_dev@localhost:5432/arxiv_mcp"
  }
}'

Replace /absolute/path/to/arxiv-sanity-mcp with the actual path to your cloned repository.

Important: Use the absolute path to the venv Python binary (e.g., /home/user/projects/arxiv-sanity-mcp/.venv/bin/python), not just python. This ensures the MCP server uses the correct environment regardless of which directory you launch Claude Code from.

Claude Desktop

Add this to your claude_desktop_config.json:

{
  "mcpServers": {
    "arxiv-discovery": {
      "command": "/absolute/path/to/arxiv-sanity-mcp/.venv/bin/python",
      "args": ["-m", "arxiv_mcp.mcp"],
      "cwd": "/absolute/path/to/arxiv-sanity-mcp",
      "env": {
        "DATABASE_URL": "postgresql+asyncpg://arxiv_mcp:arxiv_mcp_dev@localhost:5432/arxiv_mcp"
      }
    }
  }
}

Replace /absolute/path/to/arxiv-sanity-mcp with the actual path to your cloned repository.

Configuration

Variable Required Default Description
DATABASE_URL Yes postgresql+asyncpg://arxiv_mcp:arxiv_mcp_dev@localhost:5432/arxiv_mcp PostgreSQL connection string
OPENALEX_EMAIL No (empty) Email for OpenAlex polite pool (recommended; increases rate limit from 1 to 10 req/s)
OPENALEX_API_KEY No (empty) OpenAlex API key for enrichment
DEPLOYMENT_MODE No local local or hosted -- controls content license enforcement

Design Documents

Architectural documentation is in the docs/ directory:

Document Description
01 - Project Vision Goals and product values
02 - Product Principles Design principles and constraints
03 - Design Space Retrieval and ranking options explored
04 - Reference Designs Systems studied for design inspiration
05 - Architecture Hypotheses Architectural bets and rationale
06 - MCP Surface Options MCP interface design decisions
07 - Data Sources & Content Rights arXiv data access and licensing
08 - Evaluation & Experiments Testing methodology
09 - Roadmap Development phases
10 - Open Questions Unresolved design questions
11 - Sources External references

Architecture Decision Records

  • ADR-0001 -- Exploration-first architecture
  • ADR-0002 -- Metadata-first, lazy enrichment
  • ADR-0003 -- License and provenance first
  • ADR-0004 -- MCP as workflow substrate

See docs/adrs/ for full details.

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured