arXiv Research MCP Server

arXiv Research MCP Server

Enables natural language search and analysis of arXiv academic papers with AI-powered relevance ranking, full-text extraction, and support for multiple integrations like Claude and LangChain.

Category
Visit Server

README

arXiv Research MCP Server

A comprehensive Model Context Protocol (MCP) server for searching and analyzing academic papers from arXiv with AI-powered relevance ranking and full-text extraction.

Features

  • Smart Search: Search arXiv with date filtering and relevance ranking
  • Full Text Extraction: Download and extract complete paper content
  • Caching: Intelligent caching to reduce API calls
  • Multiple Integrations: Works with Claude, LangChain, Streamlit, and more
  • Batch Processing: Process multiple research topics efficiently
  • API Wrapper: REST API for easy integration
  • Jupyter Integration: Interactive analysis and visualization tools
  • Relevance Ranking: TF-IDF based ranking for better results
  • PDF Processing: Multi-method text extraction from PDFs

Quick Start

Installation

# Clone the repository
git clone https://github.com/borderlessboy/arxiv-research-mcp
cd arxiv-research-mcp

# Install dependencies
pip install -r requirements.txt

# Create environment configuration
# cp .env.example .env  # Create .env file with your configuration

Basic Usage

# Run the MCP server
python scripts/run_server.py

# Or use the Streamlit dashboard
streamlit run integrations/streamlit_app.py

Docker Usage

The project includes a Dockerfile for easy containerized deployment.

Quick Start with Docker

# Build the Docker image
docker build -t arxiv-research-mcp .

# Run the container
docker run -p 8090:8090 arxiv-research-mcp

Docker with Custom Configuration

# Build with custom tag
docker build -t arxiv-research-mcp:latest .

# Run with custom port mapping
docker run -p 8080:8090 arxiv-research-mcp

# Run with volume for persistent cache
docker run -p 8090:8090 -v $(pwd)/cache:/app/cache arxiv-research-mcp

# Run with environment variables
docker run -p 8090:8090 \
  -e CACHE_ENABLED=true \
  -e CACHE_TTL_HOURS=24 \
  -e LOG_LEVEL=INFO \
  arxiv-research-mcp

Docker Compose (Recommended)

The project includes a docker-compose.yml file for easy deployment:

# Start the service
docker-compose up -d

# View logs
docker-compose logs -f

# Stop the service
docker-compose down

Or create a custom docker-compose.yml:

services:
  arxiv-research-mcp:
    build: .
    ports:
      - "8090:8090"
    volumes:
      - ./cache:/app/cache
    environment:
      - CACHE_ENABLED=true
      - CACHE_TTL_HOURS=24
      - LOG_LEVEL=INFO
    restart: unless-stopped
# Start the service
docker-compose up -d

# View logs
docker-compose logs -f

# Stop the service
docker-compose down

Docker Development

# Build for development with all dependencies
docker build -t arxiv-research-mcp:dev .

# Run with mounted source code for development
docker run -p 8090:8090 \
  -v $(pwd)/src:/app/src \
  -v $(pwd)/config:/app/config \
  -v $(pwd)/cache:/app/cache \
  arxiv-research-mcp:dev

Installation Options

Docker Installation (Recommended)

# Quick start with Docker
docker build -t arxiv-research-mcp .
docker run -p 8090:8090 arxiv-research-mcp

Full Installation

pip install "arxiv-research-mcp[all]"

Specific Components

# API server only
pip install "arxiv-research-mcp[api]"

# Jupyter integration
pip install "arxiv-research-mcp[jupyter]"

# Dashboard
pip install "arxiv-research-mcp[dashboard]"

# LangChain integration
pip install "arxiv-research-mcp[langchain]"

Usage Examples

1. Basic MCP Server Usage

from src.server import search_arxiv_papers_tool

# Search for papers
result = await search_arxiv_papers_tool({
    "query": "transformer models",
    "max_results": 10,
    "years_back": 4,
    "include_full_text": True
})

2. LangChain Integration

from integrations.langchain_tool import ResearchAgent

agent = ResearchAgent()
result = agent.research_topic("quantum machine learning")

3. Jupyter Analysis

from integrations.jupyter_helper import search_papers

# Search and analyze
helper = await search_papers("machine learning", max_results=20)

# Create visualizations
fig = helper.create_publication_timeline()
plt.show()

4. Streamlit Dashboard

streamlit run integrations/streamlit_app.py

Configuration

Create a .env file with your settings:

# Server Configuration
SERVER_NAME=arxiv-research-server
LOG_LEVEL=INFO

# arXiv API Configuration
ARXIV_REQUEST_TIMEOUT=30
ARXIV_MAX_RETRIES=3

# Caching
CACHE_ENABLED=true
CACHE_TTL_HOURS=24

# Content Processing
MAX_FULL_TEXT_LENGTH=50000
DEFAULT_MAX_RESULTS=10
DEFAULT_YEARS_BACK=4

API Reference

MCP Tools

search_arxiv_papers

Search for academic papers with relevance ranking.

Parameters:

  • query (string): Search query
  • max_results (integer, default: 10): Maximum papers to return
  • years_back (integer, default: 4): Years to search back
  • include_full_text (boolean, default: true): Include full paper text

clear_cache

Clear all cached search results.

get_cache_stats

Get cache statistics and information.

LangChain Tools

ArxivResearchTool

Search arXiv papers with LangChain integration.

ArxivCacheManagementTool

Manage cache with LangChain integration.

Advanced Features

Relevance Ranking

The server uses TF-IDF vectorization and cosine similarity to rank papers by relevance to your query.

PDF Processing

Multiple extraction methods (PyPDF2, pdfplumber) ensure robust text extraction from PDFs.

Caching System

Intelligent caching reduces API calls and improves response times.

Batch Processing

Process multiple research topics efficiently with the batch processor.

Docker Deployment

The project includes a production-ready Dockerfile with:

  • Lightweight Python 3.11-slim base image
  • Optimized layer caching for faster builds
  • Pre-configured HTTP server on port 8090
  • Volume support for persistent caching
  • Environment variable configuration

Development

Running Tests

pytest tests/

Code Quality

black src/ tests/
flake8 src/ tests/
mypy src/

Building

python setup.py build

Docker Development

# Build development image
docker build -t arxiv-research-mcp:dev .

# Run with source code mounted for development
docker run -p 8090:8090 \
  -v $(pwd)/src:/app/src \
  -v $(pwd)/config:/app/config \
  -v $(pwd)/cache:/app/cache \
  arxiv-research-mcp:dev

# Run tests in Docker
docker run arxiv-research-mcp:dev pytest tests/

Architecture

arxiv-research-mcp/
├── src/
│   ├── server.py              # Main MCP server
│   ├── models/                # Data models
│   ├── services/              # Core services
│   └── utils/                 # Utility functions
├── integrations/              # External integrations
├── scripts/                   # Utility scripts
├── tests/                     # Test suite
└── examples/                  # Usage examples

Documentation

For detailed documentation and guides, see the Docs/ directory:

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Troubleshooting

Docker Issues

Port already in use:

# Use a different port
docker run -p 8080:8090 arxiv-research-mcp

Permission denied:

# Run with proper permissions
sudo docker run -p 8090:8090 arxiv-research-mcp

Build fails:

# Clean build
docker system prune -a
docker build --no-cache -t arxiv-research-mcp .

Container exits immediately:

# Check logs
docker logs <container_id>
# Run interactively
docker run -it arxiv-research-mcp /bin/bash

Support

Acknowledgments

  • arXiv for providing the academic paper database
  • MCP (Model Context Protocol) for the server framework
  • The open-source community for the various libraries used

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured