Paper Search MCP
Enables searching and downloading academic papers from 14 platforms including arXiv, PubMed, Google Scholar, Web of Science, Springer, and Sci-Hub with unified data format and intelligent rate limiting.
README
Paper Search MCP (Node.js)
English|δΈζ
A Node.js Model Context Protocol (MCP) server for searching and downloading academic papers from multiple sources, including arXiv, Web of Science, PubMed, Google Scholar, Sci-Hub, ScienceDirect, Springer, Wiley, Scopus, Crossref, and 14 academic platforms in total.
β¨ Key Features
- π 14 Academic Platforms: arXiv, Web of Science, PubMed, Google Scholar, bioRxiv, medRxiv, Semantic Scholar, IACR ePrint, Sci-Hub, ScienceDirect, Springer Nature, Wiley, Scopus, Crossref
- π MCP Protocol Integration: Seamless integration with Claude Desktop and other AI assistants
- π Unified Data Model: Standardized paper format across all platforms
- β‘ High-Performance Search: Concurrent search with intelligent rate limiting
- π‘οΈ Security First: DOI validation, query sanitization, injection prevention, sensitive data masking
- π Type Safety: Complete TypeScript support with extended interfaces
- π― Academic Papers First: Smart filtering prioritizing academic papers over books
- π Smart Error Handling: Unified ErrorHandler with retry logic and platform fallback
π Supported Platforms
| Platform | Search | Download | Full Text | Citations | API Key | Special Features |
|---|---|---|---|---|---|---|
| Crossref | β | β | β | β | β | Default search, extensive metadata coverage |
| arXiv | β | β | β | β | β | Physics/CS preprints |
| Web of Science | β | β | β | β | β Required | Multi-topic search, date sorting, year ranges |
| PubMed | β | β | β | β | π‘ Optional | Biomedical literature |
| Google Scholar | β | β | β | β | β | Comprehensive academic search |
| bioRxiv | β | β | β | β | β | Biology preprints |
| medRxiv | β | β | β | β | β | Medical preprints |
| Semantic Scholar | β | β | β | β | π‘ Optional | AI semantic search |
| IACR ePrint | β | β | β | β | β | Cryptography papers |
| Sci-Hub | β | β | β | β | β | Universal paper access via DOI |
| ScienceDirect | β | β | β | β | β Required | Elsevier's full-text database |
| Springer Nature | β | β * | β | β | β Required | Dual API: Meta v2 & OpenAccess |
| Wiley | β | β | β | β | β Required | TDM API: DOI-based PDF download only |
| Scopus | β | β | β | β | β Required | Largest citation database |
β Supported | β Not supported | π‘ Optional | β * Open Access only
Note: Wiley TDM API does not support keyword search. Use
search_crossrefto find Wiley articles, then usedownload_paperwithplatform="wiley"to download PDFs by DOI.
βοΈ Compliance & Ethical Use (Sci-Hub / Google Scholar)
This project includes integrations that may have legal, contractual (ToS), and ethical constraints. You are responsible for ensuring your usage complies with applicable laws, institutional policies, and thirdβparty terms.
- Sci-Hub: May provide access to copyrighted works without authorization in many jurisdictions. Use only when you have the legal right to access the content (e.g., open access, authorβprovided copies, or licensed institutional access).
- Google Scholar: This integration relies on automated fetching/parsing and may violate Google's Terms of Service or trigger blocking/rate limits. Prefer official APIs or metadata sources (e.g., Crossref, Semantic Scholar) when ToS compliance is required.
π Quick Start
System Requirements
- Node.js >= 18.0.0
- npm or yarn
Installation
# Clone repository
git clone https://github.com/your-username/paper-search-mcp-nodejs.git
cd paper-search-mcp-nodejs
# Install dependencies
npm install
# Copy environment template
cp .env.example .env
Configuration
-
Get Web of Science API Key
- Visit Clarivate Developer Portal
- Register and apply for Web of Science API access
- Add API key to
.envfile
-
Get PubMed API Key (Optional)
- Without API key: Free usage, 3 requests/second limit
- With API key: 10 requests/second, more stable service
- Get key: See NCBI API Keys
-
Configure Environment Variables
# Edit .env file WOS_API_KEY=your_actual_api_key_here WOS_API_VERSION=v1 # PubMed API key (optional, recommended for better performance) PUBMED_API_KEY=your_ncbi_api_key_here # Semantic Scholar API key (optional, increases rate limits) SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key # Elsevier API key (required for ScienceDirect and Scopus) ELSEVIER_API_KEY=your_elsevier_api_key # Springer Nature API keys (required for Springer) SPRINGER_API_KEY=your_springer_api_key # For Metadata API v2 # Optional: Separate key for OpenAccess API (if different from main key) SPRINGER_OPENACCESS_API_KEY=your_openaccess_api_key # Wiley TDM token (required for Wiley) WILEY_TDM_TOKEN=your_wiley_tdm_token
Build and Run
Method 1: NPX (Recommended for MCP)
# Direct run with npx (most common MCP deployment)
npx -y paper-search-mcp-nodejs
# Or install globally
npm install -g paper-search-mcp-nodejs
paper-search-mcp
Method 2: Local Development
# Build TypeScript code
npm run build
# Start server
npm start
# Or run in development mode
npm run dev
MCP Server Configuration
Add the following configuration to your Claude Desktop config file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
NPX Configuration (Recommended)
{
"mcpServers": {
"paper-search-nodejs": {
"command": "npx",
"args": ["-y", "paper-search-mcp-nodejs"],
"env": {
"WOS_API_KEY": "your_web_of_science_api_key"
}
}
}
}
Local Installation Configuration
{
"mcpServers": {
"paper_search_nodejs": {
"command": "node",
"args": ["/path/to/paper-search-mcp-nodejs/dist/server.js"],
"env": {
"WOS_API_KEY": "your_web_of_science_api_key"
}
}
}
}
π οΈ MCP Tools
search_papers
Search academic papers across multiple platforms
// Random platform selection (default behavior)
search_papers({
query: "machine learning",
platform: "all", // Randomly selects one platform for efficiency
maxResults: 10,
year: "2023",
sortBy: "date"
})
// Search specific platform
search_papers({
query: "quantum computing",
platform: "webofscience", // Target specific platform
maxResults: 5
})
Platform Selection Behavior:
platform: "crossref"(default) - Free API with extensive scholarly metadata coverageplatform: "all"- Randomly selects one platform for efficient, focused results- Specific platform - Searches only that platform
- Available platforms:
crossref,arxiv,webofscience/wos,pubmed,biorxiv,medrxiv,semantic,iacr,googlescholar/scholar,scihub,sciencedirect,springer,scopus - Note:
wileyonly supports PDF download by DOI, not keyword search
search_crossref
Search academic papers from Crossref database (default search platform)
search_crossref({
query: "machine learning",
maxResults: 10,
year: "2023",
author: "Smith",
sortBy: "relevance", // or "date", "citations"
sortOrder: "desc"
})
search_arxiv
Search arXiv preprints specifically
search_arxiv({
query: "transformer neural networks",
maxResults: 10,
category: "cs.AI",
author: "Vaswani",
year: "2023",
sortBy: "date", // relevance, date, citations
sortOrder: "desc" // asc, desc
})
search_webofscience
Search Web of Science database specifically
search_webofscience({
query: "CRISPR gene editing",
maxResults: 15,
year: "2022",
journal: "Nature"
})
search_pubmed
Search PubMed/MEDLINE biomedical literature database
search_pubmed({
query: "COVID-19 vaccine efficacy",
maxResults: 20,
year: "2023",
author: "Smith",
journal: "New England Journal of Medicine",
publicationType: ["Journal Article", "Clinical Trial"],
sortBy: "date" // relevance, date
})
search_google_scholar
Search Google Scholar academic database
search_google_scholar({
query: "machine learning",
maxResults: 10,
yearLow: 2020,
yearHigh: 2023,
author: "Bengio"
})
search_biorxiv / search_medrxiv
Search biology and medical preprints
search_biorxiv({
query: "CRISPR",
maxResults: 15,
days: 30,
category: "genomics" // neuroscience, genomics, etc.
})
search_medrxiv({
query: "COVID-19",
maxResults: 10,
days: 30,
category: "infectious_diseases"
})
search_semantic_scholar
Search Semantic Scholar AI semantic database
search_semantic_scholar({
query: "deep learning",
maxResults: 10,
fieldsOfStudy: ["Computer Science"],
year: "2023"
})
search_iacr
Search IACR ePrint cryptography archive
search_iacr({
query: "zero knowledge proof",
maxResults: 5,
fetchDetails: true
})
search_scihub
Search and download papers from Sci-Hub using DOI or paper URL
search_scihub({
doiOrUrl: "10.1038/nature12373",
downloadPdf: true,
savePath: "./downloads"
})
search_sciencedirect
Search Elsevier ScienceDirect database
search_sciencedirect({
query: "artificial intelligence",
maxResults: 10,
year: "2023",
author: "Smith",
openAccess: true // Filter for open access articles
})
search_springer
Search Springer Nature database (Metadata API v2 or OpenAccess API)
search_springer({
query: "machine learning",
maxResults: 10,
year: "2023",
openAccess: true, // Use OpenAccess API for downloadable PDFs
type: "Journal" // Filter: Journal, Book, or Chapter
})
search_scopus
Search Scopus citation database
search_scopus({
query: "renewable energy",
maxResults: 10,
year: "2023",
affiliation: "MIT",
documentType: "ar" // ar=article, cp=conference, re=review
})
check_scihub_mirrors
Check health status of Sci-Hub mirror sites
check_scihub_mirrors({
forceCheck: true // Force fresh health check
})
download_paper
Download paper PDF files
download_paper({
paperId: "2106.12345", // or DOI for Sci-Hub
platform: "arxiv", // or "scihub" for Sci-Hub downloads
savePath: "./downloads"
})
get_paper_by_doi
Get paper information by DOI
get_paper_by_doi({
doi: "10.1038/s41586-023-12345-6",
platform: "all"
})
get_platform_status
Check platform status and API keys
get_platform_status({})
π Data Model
All platform paper data is converted to a unified format:
interface Paper {
paperId: string; // Unique identifier
title: string; // Paper title
authors: string[]; // Author list
abstract: string; // Abstract
doi: string; // DOI
publishedDate: Date; // Publication date
pdfUrl: string; // PDF link
url: string; // Paper page URL
source: string; // Source platform
citationCount?: number; // Citation count
journal?: string; // Journal name
year?: number; // Publication year
categories?: string[]; // Subject categories
keywords?: string[]; // Keywords
// ... more fields
}
π§ Development
Project Structure
src/
βββ models/
β βββ Paper.ts # Paper data model
βββ platforms/
β βββ PaperSource.ts # Abstract base class
β βββ ArxivSearcher.ts # arXiv searcher
β βββ WebOfScienceSearcher.ts # Web of Science searcher
β βββ PubMedSearcher.ts # PubMed searcher
β βββ GoogleScholarSearcher.ts # Google Scholar searcher
β βββ BioRxivSearcher.ts # bioRxiv/medRxiv searcher
β βββ SemanticScholarSearcher.ts # Semantic Scholar searcher
β βββ IACRSearcher.ts # IACR ePrint searcher
β βββ SciHubSearcher.ts # Sci-Hub searcher with mirror management
β βββ ScienceDirectSearcher.ts # ScienceDirect (Elsevier) searcher
β βββ SpringerSearcher.ts # Springer Nature searcher (Meta v2 & OpenAccess APIs)
β βββ WileySearcher.ts # Wiley TDM API (DOI-based PDF download only)
β βββ ScopusSearcher.ts # Scopus citation database searcher
β βββ CrossrefSearcher.ts # Crossref API searcher (default platform)
βββ utils/
β βββ RateLimiter.ts # Token bucket rate limiter
βββ server.ts # MCP server main file
Adding New Platforms
- Create new searcher class extending
PaperSource - Implement required abstract methods
- Register new searcher in
server.ts - Add corresponding MCP tool
Security Features (v0.2.5)
The codebase includes comprehensive security utilities:
src/utils/
βββ SecurityUtils.ts # Security utilities
β βββ sanitizeDoi() # DOI format validation
β βββ escapeQueryValue() # Query injection prevention
β βββ validateQueryComplexity() # DoS prevention
β βββ withTimeout() # Request timeout protection
β βββ sanitizeRequest() # Sensitive data removal
β βββ maskSensitiveData() # API key masking
βββ ErrorHandler.ts # Unified error handling
β βββ ApiError class # Custom error with metadata
β βββ HTTP error codes # 400-504 handling
β βββ Retry logic # Exponential backoff
βββ RateLimiter.ts # Token bucket rate limiting
Security Best Practices:
- All DOIs are validated before use in URLs
- Query parameters are escaped to prevent injection
- API keys are masked in all log output
- Request timeouts prevent hanging connections
- Query complexity limits prevent DoS attacks
Testing
# Run tests
npm test
# Run linting
npm run lint
# Code formatting
npm run format
Test Coverage:
- 15 test suites, 144 test cases
- All 13 platform searchers tested
- Security utilities (DOI validation, query sanitization)
- ErrorHandler (error classification, retry logic)
| Test Suite | Coverage |
|---|---|
| Platform Searchers | 13/13 β |
| SecurityUtils | β |
| ErrorHandler | β |
π Platform-Specific Features
Springer Nature Dual API System
Springer Nature provides two APIs:
-
Metadata API v2 (Main API)
- Endpoint:
https://api.springernature.com/meta/v2/json - Searches all Springer content (subscription + open access)
- Requires API key from https://dev.springernature.com/
- Endpoint:
-
OpenAccess API (Optional)
- Endpoint:
https://api.springernature.com/openaccess/json - Only searches open access content
- May require separate API key or special permissions
- Better for finding downloadable PDFs
- Endpoint:
// Search all Springer content
search_springer({
query: "machine learning",
maxResults: 10
})
// Search only open access papers
search_springer({
query: "COVID-19",
openAccess: true, // Uses OpenAccess API if available
maxResults: 5
})
Web of Science Advanced Search
π― WoS Starter API v1/v2 Support: Uses Clarivate's WoS Starter API with full field tag support.
API Version Configuration:
# In .env file (default: v1)
WOS_API_VERSION=v1 # Stable, recommended
# WOS_API_VERSION=v2 # Newer version, same endpoints
// Multi-topic search
search_webofscience({
query: 'oriented structure',
year: '2023-2025',
sortBy: 'date',
sortOrder: 'desc',
maxResults: 10
})
// Year range filtering
search_webofscience({
query: 'machine learning',
year: '2020-2024', // Supports range format
sortBy: 'citations',
sortOrder: 'desc'
})
// Advanced query with filters
search_webofscience({
query: 'blockchain',
author: 'zhang',
journal: 'Nature',
year: '2023',
sortBy: 'date',
sortOrder: 'desc'
})
// Traditional WOS query syntax with field tags
search_webofscience({
query: 'TS="machine learning" AND PY=2023 AND DT="Article"',
maxResults: 20
})
π§ v0.2.5 Improvements:
- β 18 Field Tags: Full support for all WoS Starter API field tags
- β API Version Selection: Support for both v1 and v2 endpoints
- β Enhanced Filtering: ISSN, Volume, Page, Issue, DocType, PMID filters
- β Query Validation: Security checks for query complexity and injection prevention
Supported Search Options:
query: Search terms (supports multi-topic)year: Single year "2023" or range "2020-2023"author: Author name filteringjournal: Journal/source filteringsortBy: Sort field (date,citations,relevance,title,author,journal)sortOrder: Sort direction (asc,desc)maxResults: Maximum results (1-50 per page)
Supported WOS Field Tags (18 total):
| Tag | Description | Tag | Description |
|---|---|---|---|
TS |
Topic (title, abstract, keywords) | TI |
Title |
AU |
Author | AI |
Author Identifier |
SO |
Source/Journal | IS |
ISSN/ISBN |
PY |
Publication Year | FPY |
Final Publication Year |
DO |
DOI | DOP |
Date of Publication |
VL |
Volume | PG |
Page |
CS |
Issue | DT |
Document Type |
PMID |
PubMed ID | UT |
Accession Number |
OG |
Organization | SUR |
Source URL |
Example with Field Tags:
// Search by PMID
search_webofscience({ query: 'PMID=12345678' })
// Search by DOI
search_webofscience({ query: 'DO="10.1038/nature12373"' })
// Filter by document type
search_webofscience({ query: 'TS="CRISPR" AND DT="Review"' })
// Search specific volume/issue
search_webofscience({ query: 'SO="Nature" AND VL=580 AND CS=7805' })
π§ Debugging WOS Issues:
# Enable debug logging
export NODE_ENV=development
# In CI, logDebug is enabled automatically when CI=true
Google Scholar Features
- Academic Paper Priority: Automatically filters out books, prioritizes peer-reviewed papers
- Citation Data: Provides citation counts and academic metrics
- Anti-Detection: Smart request patterns to avoid blocking
- Comprehensive Coverage: Searches across all academic publishers
Semantic Scholar Features
- AI-Powered Search: Semantic understanding of queries
- Citation Networks: Paper relationships and influence metrics
- Open Access PDFs: Direct links to freely available papers
- Research Fields: Filter by specific academic disciplines
Sci-Hub Features
- Universal Access: Access papers using DOI or direct URLs
- Mirror Network: Automatic detection and use of fastest available mirror (11+ mirrors)
- Health Monitoring: Continuous monitoring of mirror site availability
- Automatic Failover: Seamless switching between mirrors when one fails
- Smart Retry: Automatic retry with different mirrors on failure
- Response Time Optimization: Mirrors sorted by response time for best performance
π License
MIT License - see LICENSE file for details.
π€ Contributing
Contributions welcome! See CONTRIBUTING.md for guidelines.
- Fork the project
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
π Issue Reporting
If you encounter issues, please report them at GitHub Issues.
π Acknowledgments
- Original paper-search-mcp for the foundation
- MCP community for the protocol standards
β If this project helps you, please give it a star!
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
E2B
Using MCP to run code via e2b.