arXiv Discovery MCP
MCP server for discovering, triaging, and monitoring arXiv papers with transparent interest modeling and inspectable ranking.
README
arXiv Discovery MCP
MCP server for arXiv paper discovery, triage, and monitoring with inspectable ranking.
What This Is
arXiv Discovery MCP is a research discovery substrate inspired by arxiv-sanity. It helps researchers and AI agents discover, triage, and monitor arXiv papers through the Model Context Protocol (MCP) -- exposing tools, resources, and prompts that integrate directly into Claude Desktop, Claude Code, or any MCP-compatible client.
Unlike "chat with papers" wrappers, this system provides explicit interest modeling, inspectable ranking explanations, and structured workflow state. You build an interest profile from seed papers, followed authors, and saved queries. The system uses that profile to rank search results, surface new papers, and explain why each result scored the way it did.
The project tracks content provenance and respects reuse constraints per content type. All ranking signals are transparent: you can see exactly which interest signals contributed to each paper's score.
Features
MCP Tools (13)
Discovery
search_papers-- Full-text search with optional profile-ranked resultsbrowse_recent-- Browse recent papers by arXiv categoryfind_related_papers-- Find papers related to one or more seed papersget_paper-- Retrieve full metadata for a single paper
Workflow
triage_paper-- Mark papers as shortlisted, seen, or dismissedadd_to_collection-- Add papers to named collections (auto-creates)create_watch-- Create a saved query that monitors for new papers
Interest & Enrichment
add_signal-- Add an interest signal (seed paper, followed author, etc.)batch_add_signals-- Add multiple interest signals at oncecreate_profile-- Create a named interest profilesuggest_signals-- Get profile expansion suggestions based on usage patternsenrich_paper-- Fetch citation counts, FWCI, and topics from OpenAlex
Content
get_content_variant-- Retrieve paper content (abstract, HTML, or PDF-to-markdown) with rights gating
MCP Resources (4)
paper://{arxiv_id}-- Paper metadata, triage state, enrichment, and content variantscollection://{slug}-- Collection contents with paginationprofile://{slug}-- Interest profile with all signalswatch://{slug}/deltas-- New papers since last check
MCP Prompts (3)
daily-digest-- Workflow guidance for reviewing new papers across watchesliterature-map-from-seeds-- Workflow for building a literature map from seed paperstriage-shortlist-- Workflow for reviewing and triaging a collection
CLI
All MCP capabilities are mirrored in a full CLI (arxiv-mcp) for terminal workflows, scripting, and debugging.
Test Coverage
493 tests passing across ingestion, search, workflow, interest modeling, enrichment, content normalization, and MCP integration.
Prerequisites
- Python 3.13+ (uses 3.13 language features)
- PostgreSQL 16+ (must be running and accessible)
- Git (for cloning the repository)
Installation
git clone https://github.com/loganrooks/arxiv-sanity-mcp.git
cd arxiv-sanity-mcp
# Create and activate a virtual environment (recommended)
python3.13 -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
pip install -e .
For development (tests, linting):
pip install -e ".[dev]"
# Or manually:
pip install pytest pytest-asyncio pytest-cov pytest-timeout respx ruff
Note: The MCP server configuration below requires the absolute path to your venv's Python interpreter. You can find it with
which pythonafter activating the venv.
Database Setup
- Create the database user and databases:
sudo -u postgres psql -c "CREATE USER arxiv_mcp WITH PASSWORD 'arxiv_mcp_dev';"
sudo -u postgres psql -c "CREATE DATABASE arxiv_mcp OWNER arxiv_mcp;"
sudo -u postgres psql -c "CREATE DATABASE arxiv_mcp_test OWNER arxiv_mcp;"
- Create a
.envfile in the project root (or set environment variables):
DATABASE_URL=postgresql+asyncpg://arxiv_mcp:arxiv_mcp_dev@localhost:5432/arxiv_mcp
- Run database migrations:
alembic upgrade head
Quick Start
Once installed and the database is set up, try these commands:
# Harvest a paper by arXiv ID
arxiv-mcp harvest fetch 2301.00001
# Search for papers
arxiv-mcp search query "attention mechanism"
# Browse recent papers in a category
arxiv-mcp search browse --category cs.AI
# Create a collection
arxiv-mcp collection create "reading-list"
# Triage a paper
arxiv-mcp triage mark 2301.00001 shortlisted
MCP Server Configuration
Claude Code
Use claude mcp add-json with your venv's absolute Python path:
claude mcp add-json arxiv-discovery --scope local '{
"command": "/absolute/path/to/arxiv-sanity-mcp/.venv/bin/python",
"args": ["-m", "arxiv_mcp.mcp"],
"cwd": "/absolute/path/to/arxiv-sanity-mcp",
"env": {
"DATABASE_URL": "postgresql+asyncpg://arxiv_mcp:arxiv_mcp_dev@localhost:5432/arxiv_mcp"
}
}'
Replace /absolute/path/to/arxiv-sanity-mcp with the actual path to your cloned repository.
Important: Use the absolute path to the venv Python binary (e.g.,
/home/user/projects/arxiv-sanity-mcp/.venv/bin/python), not justpython. This ensures the MCP server uses the correct environment regardless of which directory you launch Claude Code from.
Claude Desktop
Add this to your claude_desktop_config.json:
{
"mcpServers": {
"arxiv-discovery": {
"command": "/absolute/path/to/arxiv-sanity-mcp/.venv/bin/python",
"args": ["-m", "arxiv_mcp.mcp"],
"cwd": "/absolute/path/to/arxiv-sanity-mcp",
"env": {
"DATABASE_URL": "postgresql+asyncpg://arxiv_mcp:arxiv_mcp_dev@localhost:5432/arxiv_mcp"
}
}
}
}
Replace /absolute/path/to/arxiv-sanity-mcp with the actual path to your cloned repository.
Configuration
| Variable | Required | Default | Description |
|---|---|---|---|
DATABASE_URL |
Yes | postgresql+asyncpg://arxiv_mcp:arxiv_mcp_dev@localhost:5432/arxiv_mcp |
PostgreSQL connection string |
OPENALEX_EMAIL |
No | (empty) | Email for OpenAlex polite pool (recommended; increases rate limit from 1 to 10 req/s) |
OPENALEX_API_KEY |
No | (empty) | OpenAlex API key for enrichment |
DEPLOYMENT_MODE |
No | local |
local or hosted -- controls content license enforcement |
Design Documents
Architectural documentation is in the docs/ directory:
| Document | Description |
|---|---|
| 01 - Project Vision | Goals and product values |
| 02 - Product Principles | Design principles and constraints |
| 03 - Design Space | Retrieval and ranking options explored |
| 04 - Reference Designs | Systems studied for design inspiration |
| 05 - Architecture Hypotheses | Architectural bets and rationale |
| 06 - MCP Surface Options | MCP interface design decisions |
| 07 - Data Sources & Content Rights | arXiv data access and licensing |
| 08 - Evaluation & Experiments | Testing methodology |
| 09 - Roadmap | Development phases |
| 10 - Open Questions | Unresolved design questions |
| 11 - Sources | External references |
Architecture Decision Records
- ADR-0001 -- Exploration-first architecture
- ADR-0002 -- Metadata-first, lazy enrichment
- ADR-0003 -- License and provenance first
- ADR-0004 -- MCP as workflow substrate
See docs/adrs/ for full details.
License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.