MCP Servers

arXiv Discovery MCP

MCP server for discovering, triaging, and monitoring arXiv papers with transparent interest modeling and inspectable ranking.

README

arXiv Discovery MCP

MCP server for arXiv paper discovery, triage, and monitoring with inspectable ranking.

What This Is

arXiv Discovery MCP is a research discovery substrate inspired by arxiv-sanity. It helps researchers and AI agents discover, triage, and monitor arXiv papers through the Model Context Protocol (MCP) -- exposing tools, resources, and prompts that integrate directly into Claude Desktop, Claude Code, or any MCP-compatible client.

Unlike "chat with papers" wrappers, this system provides explicit interest modeling, inspectable ranking explanations, and structured workflow state. You build an interest profile from seed papers, followed authors, and saved queries. The system uses that profile to rank search results, surface new papers, and explain why each result scored the way it did.

The project tracks content provenance and respects reuse constraints per content type. All ranking signals are transparent: you can see exactly which interest signals contributed to each paper's score.

Features

MCP Tools (13)

Discovery

search_papers -- Full-text search with optional profile-ranked results
browse_recent -- Browse recent papers by arXiv category
find_related_papers -- Find papers related to one or more seed papers
get_paper -- Retrieve full metadata for a single paper

Workflow

triage_paper -- Mark papers as shortlisted, seen, or dismissed
add_to_collection -- Add papers to named collections (auto-creates)
create_watch -- Create a saved query that monitors for new papers

Interest & Enrichment

add_signal -- Add an interest signal (seed paper, followed author, etc.)
batch_add_signals -- Add multiple interest signals at once
create_profile -- Create a named interest profile
suggest_signals -- Get profile expansion suggestions based on usage patterns
enrich_paper -- Fetch citation counts, FWCI, and topics from OpenAlex

Content

get_content_variant -- Retrieve paper content (abstract, HTML, or PDF-to-markdown) with rights gating

MCP Resources (4)

paper://{arxiv_id} -- Paper metadata, triage state, enrichment, and content variants
collection://{slug} -- Collection contents with pagination
profile://{slug} -- Interest profile with all signals
watch://{slug}/deltas -- New papers since last check

MCP Prompts (3)

daily-digest -- Workflow guidance for reviewing new papers across watches
literature-map-from-seeds -- Workflow for building a literature map from seed papers
triage-shortlist -- Workflow for reviewing and triaging a collection

CLI

All MCP capabilities are mirrored in a full CLI (arxiv-mcp) for terminal workflows, scripting, and debugging.

Test Coverage

493 tests passing across ingestion, search, workflow, interest modeling, enrichment, content normalization, and MCP integration.

Prerequisites

Python 3.13+ (uses 3.13 language features)
PostgreSQL 16+ (must be running and accessible)
Git (for cloning the repository)

Installation

git clone https://github.com/loganrooks/arxiv-sanity-mcp.git
cd arxiv-sanity-mcp

# Create and activate a virtual environment (recommended)
python3.13 -m venv .venv
source .venv/bin/activate  # Linux/macOS
# .venv\Scripts\activate   # Windows

pip install -e .

For development (tests, linting):

pip install -e ".[dev]"
# Or manually:
pip install pytest pytest-asyncio pytest-cov pytest-timeout respx ruff

Note: The MCP server configuration below requires the absolute path to your venv's Python interpreter. You can find it with which python after activating the venv.

Database Setup

Create the database user and databases:

sudo -u postgres psql -c "CREATE USER arxiv_mcp WITH PASSWORD 'arxiv_mcp_dev';"
sudo -u postgres psql -c "CREATE DATABASE arxiv_mcp OWNER arxiv_mcp;"
sudo -u postgres psql -c "CREATE DATABASE arxiv_mcp_test OWNER arxiv_mcp;"

Create a .env file in the project root (or set environment variables):

DATABASE_URL=postgresql+asyncpg://arxiv_mcp:arxiv_mcp_dev@localhost:5432/arxiv_mcp

Run database migrations:

alembic upgrade head

Quick Start

Once installed and the database is set up, try these commands:

# Harvest a paper by arXiv ID
arxiv-mcp harvest fetch 2301.00001

# Search for papers
arxiv-mcp search query "attention mechanism"

# Browse recent papers in a category
arxiv-mcp search browse --category cs.AI

# Create a collection
arxiv-mcp collection create "reading-list"

# Triage a paper
arxiv-mcp triage mark 2301.00001 shortlisted

MCP Server Configuration

Claude Code

Use claude mcp add-json with your venv's absolute Python path:

claude mcp add-json arxiv-discovery --scope local '{
  "command": "/absolute/path/to/arxiv-sanity-mcp/.venv/bin/python",
  "args": ["-m", "arxiv_mcp.mcp"],
  "cwd": "/absolute/path/to/arxiv-sanity-mcp",
  "env": {
    "DATABASE_URL": "postgresql+asyncpg://arxiv_mcp:arxiv_mcp_dev@localhost:5432/arxiv_mcp"
  }
}'

Replace /absolute/path/to/arxiv-sanity-mcp with the actual path to your cloned repository.

Important: Use the absolute path to the venv Python binary (e.g., /home/user/projects/arxiv-sanity-mcp/.venv/bin/python), not just python. This ensures the MCP server uses the correct environment regardless of which directory you launch Claude Code from.

Claude Desktop

Add this to your claude_desktop_config.json:

{
  "mcpServers": {
    "arxiv-discovery": {
      "command": "/absolute/path/to/arxiv-sanity-mcp/.venv/bin/python",
      "args": ["-m", "arxiv_mcp.mcp"],
      "cwd": "/absolute/path/to/arxiv-sanity-mcp",
      "env": {
        "DATABASE_URL": "postgresql+asyncpg://arxiv_mcp:arxiv_mcp_dev@localhost:5432/arxiv_mcp"
      }
    }
  }
}

Replace /absolute/path/to/arxiv-sanity-mcp with the actual path to your cloned repository.

Configuration

Variable	Required	Default	Description
`DATABASE_URL`	Yes	`postgresql+asyncpg://arxiv_mcp:arxiv_mcp_dev@localhost:5432/arxiv_mcp`	PostgreSQL connection string
`OPENALEX_EMAIL`	No	(empty)	Email for OpenAlex polite pool (recommended; increases rate limit from 1 to 10 req/s)
`OPENALEX_API_KEY`	No	(empty)	OpenAlex API key for enrichment
`DEPLOYMENT_MODE`	No	`local`	`local` or `hosted` -- controls content license enforcement

Design Documents

Architectural documentation is in the docs/ directory:

Document	Description
01 - Project Vision	Goals and product values
02 - Product Principles	Design principles and constraints
03 - Design Space	Retrieval and ranking options explored
04 - Reference Designs	Systems studied for design inspiration
05 - Architecture Hypotheses	Architectural bets and rationale
06 - MCP Surface Options	MCP interface design decisions
07 - Data Sources & Content Rights	arXiv data access and licensing
08 - Evaluation & Experiments	Testing methodology
09 - Roadmap	Development phases
10 - Open Questions	Unresolved design questions
11 - Sources	External references

Architecture Decision Records

ADR-0001 -- Exploration-first architecture
ADR-0002 -- Metadata-first, lazy enrichment
ADR-0003 -- License and provenance first
ADR-0004 -- MCP as workflow substrate

See docs/adrs/ for full details.

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured