arXiv MCP Server

arXiv MCP Server

Enables searching, downloading, and managing academic papers from arXiv.org through natural language interactions. Provides tools for paper discovery, PDF downloads, and local paper collection management.

Category
Visit Server

README

arXiv CLI & MCP Server

A Python toolkit for searching and downloading papers from arXiv.org, with both a command-line interface and a Model Context Protocol (MCP) server for LLM integration.

CLI agents work well with well-documented CLI tools and/or MCP servers. This project provides both options.

Features

  • Search arXiv papers by title, author, abstract, category, and more
  • Download PDFs automatically with local caching
  • MCP Server for integration with LLM assistants (Claude Desktop, etc.)
  • Typed responses using Pydantic models for clean data handling
  • Rate limiting built-in to respect arXiv API guidelines
  • Comprehensive tests with 26 integration tests (no mocking)

Installation

Option 1: Install from GitHub (Recommended)

Install directly from the GitHub repository:

# Install the latest version
uv pip install git+https://github.com/LiamConnell/arxiv_for_agents.git

# Or with pip
pip install git+https://github.com/LiamConnell/arxiv_for_agents.git

# Now you can use the arxiv command
arxiv --help

Option 2: Install from Source

Clone the repository and install locally:

# Clone the repository
git clone https://github.com/LiamConnell/arxiv_for_agents.git
cd arxiv_for_agents

# Install in editable mode
uv pip install -e .

# Now you can use the arxiv command
arxiv --help

Option 3: Development Installation

For development with all dependencies:

# Clone and install with dev dependencies
git clone https://github.com/LiamConnell/arxiv_for_agents.git
cd arxiv_for_agents
uv pip install -e ".[dev]"

# Run tests
uv run pytest

Verify Installation

# If installed as package
arxiv --help

# Or if using as module
uv run python -m arxiv --help

Usage

Note: If you installed as a package, use arxiv directly. Otherwise, use uv run python -m arxiv.

Search Papers

Search by title:

# Using installed package
arxiv search "ti:attention is all you need"

# Or using as module
uv run python -m arxiv search "ti:attention is all you need"

Search by author:

arxiv search "au:Hinton" --max-results 20

Search by category:

arxiv search "cat:cs.AI" --max-results 10

Combined search:

arxiv search "ti:transformer AND au:Vaswani"

Get Specific Paper

Get paper metadata and download PDF:

arxiv get 1706.03762

Get metadata only (no download):

arxiv get 1706.03762 --no-download

Force re-download:

arxiv get 1706.03762 --force

Download PDF

Download just the PDF:

arxiv download 1706.03762

List Downloaded PDFs

arxiv list-downloads

JSON Output

Get results as JSON for scripting:

arxiv search "ti:neural" --json
arxiv get 1706.03762 --json --no-download

Search Query Syntax

The arXiv API supports field-specific searches:

  • ti: - Title
  • au: - Author
  • abs: - Abstract
  • cat: - Category (e.g., cs.AI, cs.LG)
  • all: - All fields (default)

You can combine searches with AND, OR, and ANDNOT:

arxiv search "ti:neural AND cat:cs.LG"
arxiv search "au:Hinton OR au:Bengio"

Download Directory

PDFs are downloaded to ./.arxiv by default. Change this with:

arxiv --download-dir ./papers search "ti:transformer"

MCP Server (Model Context Protocol)

The arXiv CLI includes a Model Context Protocol (MCP) server that allows LLM assistants (like Claude Desktop) to search and download arXiv papers programmatically.

Running the MCP Server

# Option 1: Using the script entry point (recommended)
uv run arxiv-mcp

# Option 2: Using the module
uv run python -m arxiv.mcp

The server runs in stdio mode and communicates via JSON-RPC over stdin/stdout.

MCP Tools

The server provides 4 tools for paper discovery and management:

  1. search_papers - Search arXiv with advanced query syntax

    • Supports field prefixes (ti:, au:, abs:, cat:)
    • Boolean operators (AND, OR, ANDNOT)
    • Pagination and sorting options
    • Returns paper metadata including title, authors, abstract, categories
  2. get_paper - Get detailed information about a specific paper

    • Accepts flexible ID formats (1706.03762, arXiv:1706.03762, 1706.03762v1)
    • Optionally downloads PDF automatically
    • Returns complete metadata including DOI, journal references, comments
  3. download_paper - Download PDF for a specific paper

    • Downloads to local .arxiv directory
    • Returns file path and size information
    • Supports force re-download option
  4. list_downloaded_papers - List all locally downloaded PDFs

    • Shows arxiv IDs, file sizes, and paths
    • Useful for managing local paper collection

MCP Resources

The server exposes 2 resources for direct access:

  • paper://{arxiv_id} - Get formatted paper metadata in markdown
  • downloads://list - Get markdown table of all downloaded papers

MCP Prompts

Pre-built prompt templates to guide usage:

  • search_arxiv_prompt - Guide for searching arXiv papers
  • download_paper_prompt - Guide for downloading and managing papers

Claude Desktop Configuration

Add to your Claude Desktop config file (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

If installed from GitHub/pip:

{
  "mcpServers": {
    "arxiv": {
      "command": "arxiv-mcp"
    }
  }
}

If running from source/development:

{
  "mcpServers": {
    "arxiv": {
      "command": "uv",
      "args": ["run", "arxiv-mcp"],
      "cwd": "/path/to/arxiv_for_agents"
    }
  }
}

Or use --directory to avoid needing cwd:

{
  "mcpServers": {
    "arxiv": {
      "command": "uv",
      "args": ["--directory", "/path/to/arxiv_for_agents", "run", "arxiv-mcp"]
    }
  }
}

MCP Use Cases

Once configured, you can ask Claude to:

  • "Search arXiv for recent papers on transformer architectures"
  • "Find papers by Geoffrey Hinton in the cs.AI category"
  • "Download the 'Attention is All You Need' paper"
  • "Show me papers about neural networks from 2023"
  • "List all the papers I've downloaded"
  • "Get the abstract for arXiv:1706.03762"

The MCP integration allows Claude to autonomously search, retrieve, and manage academic papers from arXiv.

Architecture

Module Structure

arxiv/
├── __init__.py       # Package exports
├── __main__.py       # CLI entry point
├── cli.py            # Click commands
├── models.py         # Pydantic models
├── services.py       # API client service
└── mcp/              # MCP server
    ├── __init__.py   # MCP package exports
    ├── __main__.py   # MCP server entry point
    └── server.py     # FastMCP server with tools, resources, prompts

tests/
└── test_services.py  # Integration tests (26 tests)

Pydantic Models

All API responses are typed using Pydantic:

from arxiv import ArxivService

service = ArxivService()
result = service.search("ti:neural", max_results=5)

# result is typed as ArxivSearchResult
print(f"Total: {result.total_results}")

for entry in result.entries:
    # entry is typed as ArxivEntry
    print(f"{entry.arxiv_id}: {entry.title}")
    print(f"Authors: {', '.join(a.name for a in entry.authors)}")

Key Models

  • ArxivSearchResult: Search results with metadata

    • total_results: Total matching papers
    • entries: List of ArxivEntry objects
  • ArxivEntry: Individual paper

    • arxiv_id: Clean ID (e.g., "1706.03762")
    • title, summary: Paper metadata
    • authors: List of Author objects
    • categories: Subject categories
    • pdf_url: Direct PDF link
    • published, updated: Datetime objects
  • Author: Paper author

    • name: Author name
    • affiliation: Optional affiliation

Testing

Run all 26 integration tests (makes real API calls):

uv run pytest tests/test_services.py -v

Run specific test class:

uv run pytest tests/test_services.py::TestArxivServiceSearch -v

The tests are integration tests that hit the real arXiv API, ensuring the service works with actual data.

API Rate Limiting

The service enforces a 3-second delay between API requests by default (arXiv's recommendation). You can adjust this:

from arxiv import ArxivService

service = ArxivService(rate_limit_delay=5.0)  # 5 seconds

Examples

Python API

from arxiv import ArxivService

# Initialize service
service = ArxivService(download_dir="./papers")

# Search
results = service.search(
    query="ti:attention is all you need",
    max_results=5,
    sort_by="relevance"
)

print(f"Found {results.total_results} papers")
for entry in results.entries:
    print(f"- {entry.title}")

# Get specific paper
entry = service.get("1706.03762", download_pdf=True)
print(f"Downloaded: {entry.title}")

# Just download PDF
pdf_path = service.download_pdf("1706.03762")
print(f"PDF saved to: {pdf_path}")

CLI Examples

# Find recent papers in a category
arxiv search "cat:cs.AI" \
  --max-results 10 \
  --sort-by submittedDate \
  --sort-order descending

# Search and output as JSON for processing
arxiv search "ti:transformer" --json | jq '.entries[].title'

# Batch download multiple papers
for id in 1706.03762 1810.04805 2010.11929; do
  arxiv download $id
done

Development

The codebase follows these principles:

  1. Type safety: Pydantic models for all API responses
  2. Clean architecture: Separation of CLI, service, and models
  3. Real tests: Integration tests with actual API calls (no mocks)
  4. Rate limiting: Respects arXiv API guidelines
  5. Caching: Automatic local caching to avoid re-downloads

arXiv API Reference

  • Base URL: https://export.arxiv.org/api/query
  • Format: Atom XML
  • Rate limit: 3 seconds between requests (recommended)
  • Documentation: https://info.arxiv.org/help/api/user-manual.html

License

This is a personal project for interacting with arXiv's public API.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured