arXiv MCP Server
Enables searching, downloading, and managing academic papers from arXiv.org through natural language interactions. Provides tools for paper discovery, PDF downloads, and local paper collection management.
README
arXiv CLI & MCP Server
A Python toolkit for searching and downloading papers from arXiv.org, with both a command-line interface and a Model Context Protocol (MCP) server for LLM integration.
CLI agents work well with well-documented CLI tools and/or MCP servers. This project provides both options.
Features
- Search arXiv papers by title, author, abstract, category, and more
- Download PDFs automatically with local caching
- MCP Server for integration with LLM assistants (Claude Desktop, etc.)
- Typed responses using Pydantic models for clean data handling
- Rate limiting built-in to respect arXiv API guidelines
- Comprehensive tests with 26 integration tests (no mocking)
Installation
Option 1: Install from GitHub (Recommended)
Install directly from the GitHub repository:
# Install the latest version
uv pip install git+https://github.com/LiamConnell/arxiv_for_agents.git
# Or with pip
pip install git+https://github.com/LiamConnell/arxiv_for_agents.git
# Now you can use the arxiv command
arxiv --help
Option 2: Install from Source
Clone the repository and install locally:
# Clone the repository
git clone https://github.com/LiamConnell/arxiv_for_agents.git
cd arxiv_for_agents
# Install in editable mode
uv pip install -e .
# Now you can use the arxiv command
arxiv --help
Option 3: Development Installation
For development with all dependencies:
# Clone and install with dev dependencies
git clone https://github.com/LiamConnell/arxiv_for_agents.git
cd arxiv_for_agents
uv pip install -e ".[dev]"
# Run tests
uv run pytest
Verify Installation
# If installed as package
arxiv --help
# Or if using as module
uv run python -m arxiv --help
Usage
Note: If you installed as a package, use arxiv directly. Otherwise, use uv run python -m arxiv.
Search Papers
Search by title:
# Using installed package
arxiv search "ti:attention is all you need"
# Or using as module
uv run python -m arxiv search "ti:attention is all you need"
Search by author:
arxiv search "au:Hinton" --max-results 20
Search by category:
arxiv search "cat:cs.AI" --max-results 10
Combined search:
arxiv search "ti:transformer AND au:Vaswani"
Get Specific Paper
Get paper metadata and download PDF:
arxiv get 1706.03762
Get metadata only (no download):
arxiv get 1706.03762 --no-download
Force re-download:
arxiv get 1706.03762 --force
Download PDF
Download just the PDF:
arxiv download 1706.03762
List Downloaded PDFs
arxiv list-downloads
JSON Output
Get results as JSON for scripting:
arxiv search "ti:neural" --json
arxiv get 1706.03762 --json --no-download
Search Query Syntax
The arXiv API supports field-specific searches:
ti:- Titleau:- Authorabs:- Abstractcat:- Category (e.g., cs.AI, cs.LG)all:- All fields (default)
You can combine searches with AND, OR, and ANDNOT:
arxiv search "ti:neural AND cat:cs.LG"
arxiv search "au:Hinton OR au:Bengio"
Download Directory
PDFs are downloaded to ./.arxiv by default. Change this with:
arxiv --download-dir ./papers search "ti:transformer"
MCP Server (Model Context Protocol)
The arXiv CLI includes a Model Context Protocol (MCP) server that allows LLM assistants (like Claude Desktop) to search and download arXiv papers programmatically.
Running the MCP Server
# Option 1: Using the script entry point (recommended)
uv run arxiv-mcp
# Option 2: Using the module
uv run python -m arxiv.mcp
The server runs in stdio mode and communicates via JSON-RPC over stdin/stdout.
MCP Tools
The server provides 4 tools for paper discovery and management:
-
search_papers - Search arXiv with advanced query syntax
- Supports field prefixes (ti:, au:, abs:, cat:)
- Boolean operators (AND, OR, ANDNOT)
- Pagination and sorting options
- Returns paper metadata including title, authors, abstract, categories
-
get_paper - Get detailed information about a specific paper
- Accepts flexible ID formats (1706.03762, arXiv:1706.03762, 1706.03762v1)
- Optionally downloads PDF automatically
- Returns complete metadata including DOI, journal references, comments
-
download_paper - Download PDF for a specific paper
- Downloads to local
.arxivdirectory - Returns file path and size information
- Supports force re-download option
- Downloads to local
-
list_downloaded_papers - List all locally downloaded PDFs
- Shows arxiv IDs, file sizes, and paths
- Useful for managing local paper collection
MCP Resources
The server exposes 2 resources for direct access:
- paper://{arxiv_id} - Get formatted paper metadata in markdown
- downloads://list - Get markdown table of all downloaded papers
MCP Prompts
Pre-built prompt templates to guide usage:
- search_arxiv_prompt - Guide for searching arXiv papers
- download_paper_prompt - Guide for downloading and managing papers
Claude Desktop Configuration
Add to your Claude Desktop config file (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
If installed from GitHub/pip:
{
"mcpServers": {
"arxiv": {
"command": "arxiv-mcp"
}
}
}
If running from source/development:
{
"mcpServers": {
"arxiv": {
"command": "uv",
"args": ["run", "arxiv-mcp"],
"cwd": "/path/to/arxiv_for_agents"
}
}
}
Or use --directory to avoid needing cwd:
{
"mcpServers": {
"arxiv": {
"command": "uv",
"args": ["--directory", "/path/to/arxiv_for_agents", "run", "arxiv-mcp"]
}
}
}
MCP Use Cases
Once configured, you can ask Claude to:
- "Search arXiv for recent papers on transformer architectures"
- "Find papers by Geoffrey Hinton in the cs.AI category"
- "Download the 'Attention is All You Need' paper"
- "Show me papers about neural networks from 2023"
- "List all the papers I've downloaded"
- "Get the abstract for arXiv:1706.03762"
The MCP integration allows Claude to autonomously search, retrieve, and manage academic papers from arXiv.
Architecture
Module Structure
arxiv/
├── __init__.py # Package exports
├── __main__.py # CLI entry point
├── cli.py # Click commands
├── models.py # Pydantic models
├── services.py # API client service
└── mcp/ # MCP server
├── __init__.py # MCP package exports
├── __main__.py # MCP server entry point
└── server.py # FastMCP server with tools, resources, prompts
tests/
└── test_services.py # Integration tests (26 tests)
Pydantic Models
All API responses are typed using Pydantic:
from arxiv import ArxivService
service = ArxivService()
result = service.search("ti:neural", max_results=5)
# result is typed as ArxivSearchResult
print(f"Total: {result.total_results}")
for entry in result.entries:
# entry is typed as ArxivEntry
print(f"{entry.arxiv_id}: {entry.title}")
print(f"Authors: {', '.join(a.name for a in entry.authors)}")
Key Models
-
ArxivSearchResult: Search results with metadata
total_results: Total matching papersentries: List of ArxivEntry objects
-
ArxivEntry: Individual paper
arxiv_id: Clean ID (e.g., "1706.03762")title,summary: Paper metadataauthors: List of Author objectscategories: Subject categoriespdf_url: Direct PDF linkpublished,updated: Datetime objects
-
Author: Paper author
name: Author nameaffiliation: Optional affiliation
Testing
Run all 26 integration tests (makes real API calls):
uv run pytest tests/test_services.py -v
Run specific test class:
uv run pytest tests/test_services.py::TestArxivServiceSearch -v
The tests are integration tests that hit the real arXiv API, ensuring the service works with actual data.
API Rate Limiting
The service enforces a 3-second delay between API requests by default (arXiv's recommendation). You can adjust this:
from arxiv import ArxivService
service = ArxivService(rate_limit_delay=5.0) # 5 seconds
Examples
Python API
from arxiv import ArxivService
# Initialize service
service = ArxivService(download_dir="./papers")
# Search
results = service.search(
query="ti:attention is all you need",
max_results=5,
sort_by="relevance"
)
print(f"Found {results.total_results} papers")
for entry in results.entries:
print(f"- {entry.title}")
# Get specific paper
entry = service.get("1706.03762", download_pdf=True)
print(f"Downloaded: {entry.title}")
# Just download PDF
pdf_path = service.download_pdf("1706.03762")
print(f"PDF saved to: {pdf_path}")
CLI Examples
# Find recent papers in a category
arxiv search "cat:cs.AI" \
--max-results 10 \
--sort-by submittedDate \
--sort-order descending
# Search and output as JSON for processing
arxiv search "ti:transformer" --json | jq '.entries[].title'
# Batch download multiple papers
for id in 1706.03762 1810.04805 2010.11929; do
arxiv download $id
done
Development
The codebase follows these principles:
- Type safety: Pydantic models for all API responses
- Clean architecture: Separation of CLI, service, and models
- Real tests: Integration tests with actual API calls (no mocks)
- Rate limiting: Respects arXiv API guidelines
- Caching: Automatic local caching to avoid re-downloads
arXiv API Reference
- Base URL: https://export.arxiv.org/api/query
- Format: Atom XML
- Rate limit: 3 seconds between requests (recommended)
- Documentation: https://info.arxiv.org/help/api/user-manual.html
License
This is a personal project for interacting with arXiv's public API.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.