Academic MCP Server
Enables AI assistants to search across multiple academic databases (PubMed, arXiv, bioRxiv, medRxiv, Semantic Scholar) through a unified interface. Supports advanced filtering, metadata retrieval, PDF downloads, and comprehensive research workflows with citation analysis.
README
Academic MCP Server
🔍 A unified Model Context Protocol (MCP) server that provides AI assistants access to multiple academic databases through a single, consistent interface.
🌟 Features
Supported Databases
- PubMed 🏥 - Biomedical and life sciences literature (NCBI)
- bioRxiv 🧬 - Biology preprints
- medRxiv 💊 - Medical preprints
- arXiv 🔬 - Physics, mathematics, computer science, and more
- Semantic Scholar 🤖 - AI-powered academic search across disciplines
- Sci-Hub 📚 - Comprehensive academic paper access and download
Core Capabilities
- ✅ Unified Search: Search across all databases with a single query
- ✅ Advanced Filtering: Filter by title, author, date, journal, and more
- ✅ Metadata Access: Retrieve detailed paper information
- ✅ PDF Download: Download open access papers when available
- ✅ Deep Analysis: Generate comprehensive paper analysis prompts
- ✅ Local PDF Analysis: Support for both local and online PDF file analysis
- ✅ Citation Network Analysis: Analyze paper citation relationships and impact
- ✅ Complete Research Workflow: One-click retrieve→analyze→read→summarize
- ✅ Standardized Output: Consistent data format across all sources
🚀 Quick Start
Prerequisites
- Python 3.10+
- MCP library
- Internet connection
Installation
✅ Already Installed! Your Academic MCP Server is fully configured and ready to use.
If you need to set it up on another machine:
-
Clone or download this repository:
cd Academic-MCP-Server -
Create a virtual environment:
python -m venv venv -
Activate the virtual environment:
- Windows:
venv\Scripts\activate - Mac/Linux:
source venv/bin/activate
- Windows:
-
Install dependencies:
pip install -r requirements.txt
Note: All PubMed functionality is integrated locally. No external dependencies required!
Configuration for Cursor
This project provides TWO MCP servers with complementary features:
academic- Basic search, metadata retrieval, and PDF downloads across 6 databases (PubMed, bioRxiv, medRxiv, arXiv, Semantic Scholar, Sci-Hub)academic-research- Advanced features including citation analysis, paper impact evaluation, local PDF analysis, and complete research workflows
Add this configuration to your MCP settings file (~/.cursor/mcp.json or C:\Users\YOUR_USERNAME\.cursor\mcp.json):
Windows:
{
"mcpServers": {
"academic": {
"command": "C:\\Users\\YOUR_USERNAME\\path\\to\\Academic-MCP-Server\\venv\\Scripts\\python.exe",
"args": [
"C:\\Users\\YOUR_USERNAME\\path\\to\\Academic-MCP-Server\\academic_server.py"
],
"env": {},
"disabled": false,
"autoApprove": []
},
"academic-research": {
"command": "C:\\Users\\YOUR_USERNAME\\path\\to\\Academic-MCP-Server\\venv\\Scripts\\python.exe",
"args": [
"C:\\Users\\YOUR_USERNAME\\path\\to\\Academic-MCP-Server\\academic_research_advanced.py"
],
"env": {},
"disabled": false,
"autoApprove": []
}
}
}
Mac/Linux:
{
"mcpServers": {
"academic": {
"command": "/path/to/Academic-MCP-Server/venv/bin/python",
"args": [
"/path/to/Academic-MCP-Server/academic_server.py"
],
"env": {},
"disabled": false,
"autoApprove": []
},
"academic-research": {
"command": "/path/to/Academic-MCP-Server/venv/bin/python",
"args": [
"/path/to/Academic-MCP-Server/academic_research_advanced.py"
],
"env": {},
"disabled": false,
"autoApprove": []
}
}
}
Note: Replace YOUR_USERNAME and path/to with your actual paths.
📖 Usage
Search Papers
Search across all databases:
search_papers(
keywords="UCAR-T",
source="all",
num_results=15
)
Search specific database:
search_papers(
keywords="machine learning",
source="arxiv",
num_results=10
)
Advanced Search
search_papers_advanced(
title="neural networks",
author="Hinton",
start_date="2020-01-01",
end_date="2024-12-31",
source="semantic_scholar",
num_results=10
)
PubMed-specific advanced search:
search_papers_advanced(
title="CAR-T",
author="Wang",
journal="Nature",
start_date="2024/01/01", # PubMed uses YYYY/MM/DD
end_date="2025/12/31",
source="pubmed",
num_results=10
)
Get Paper Metadata
# PubMed
get_paper_metadata(identifier="40883768", source="pubmed")
# bioRxiv
get_paper_metadata(identifier="10.1101/2024.01.001", source="biorxiv")
# arXiv
get_paper_metadata(identifier="2301.00001", source="arxiv")
# Semantic Scholar (Paper ID or DOI)
get_paper_metadata(identifier="DOI:10.1038/s41586-020-1234-5", source="semantic_scholar")
Download PDF
download_paper_pdf(identifier="2301.00001", source="arxiv")
List Available Sources
list_available_sources()
# Returns: ["pubmed", "biorxiv", "medrxiv", "arxiv", "semantic_scholar", "scihub"]
Deep Paper Analysis
deep_paper_analysis(identifier="40883768", source="pubmed")
🛠 MCP Tools Reference
Server: academic (Basic Search & Retrieval)
1. search_papers
Search for papers using keywords.
Parameters:
keywords(str): Search querysource(str): "all", "pubmed", "biorxiv", "medrxiv", "arxiv", "semantic_scholar", or "scihub"num_results(int): Number of results per source (default: 10)
2. search_papers_advanced
Advanced search with multiple filters.
Parameters:
title(str, optional): Search in titlesauthor(str, optional): Author namejournal(str, optional): Journal namestart_date(str, optional): Start dateend_date(str, optional): End dateterm(str, optional): General search termsource(str): Database sourcenum_results(int): Number of results
3. get_paper_metadata
Get detailed metadata for a specific paper.
Parameters:
identifier(str): Paper ID (PMID, DOI, arXiv ID, etc.)source(str): Database source
4. download_paper_pdf
Download PDF for a paper.
Parameters:
identifier(str): Paper IDsource(str): Database source
5. list_available_sources
List all available databases.
6. deep_paper_analysis
Generate comprehensive analysis prompt.
Parameters:
identifier(str): Paper IDsource(str): Database source
Server: academic-research (Advanced Analysis & Research)
1. analyze_citation_network
Analyze paper's citation network.
Parameters:
paper_id(str): Paper identifier (DOI, PMID, etc.)source(str): Data source (default: "semantic_scholar")max_depth(int): Network depth 1-3 layers (default: 2)
2. evaluate_paper_impact
Evaluate academic impact of a paper.
Parameters:
paper_id(str): Paper identifiersource(str): Data source (default: "semantic_scholar")
3. recommend_related_papers
Recommend related papers using multiple strategies.
Parameters:
paper_id(str): Source paper identifiersource(str): Data source (default: "semantic_scholar")num_recommendations(int): Number of recommendations (default: 10)strategy(str): "comprehensive", "citations", "similar", or "influential"
4. research_workflow_complete
⭐ Recommended Core Feature - Complete research workflow: retrieve → analyze → read → summarize
Parameters:
topic(str): Research topic (e.g., "CRISPR gene editing")num_papers(int): Number of papers to retrieve (default: 5)include_analysis(bool): Include deep analysis (default: true)include_summary(bool): Include auto-summary (default: true)
5. analyze_local_paper
Comprehensively analyze local or online PDF papers.
Parameters:
pdf_path(str): PDF file path (local or URL)include_figures(bool): Analyze figures (default: true)include_summary(bool): Generate summary (default: true)
6. list_all_figures
List all figures from a PDF paper.
Parameters:
pdf_path(str): PDF file path (local or URL)
7. explain_specific_figure
Explain a specific figure from a PDF.
Parameters:
pdf_path(str): PDF file path (local or URL)figure_number(int): Figure number (e.g., 1, 2, 3)provide_context(bool): Include context paragraphs (default: true)
8. extract_text_from_pdf
Extract text content from PDF (supports both local and online URLs).
Parameters:
pdf_path(str): PDF path (local or URL)extract_sections(bool): Whether to extract by sectionspage_range(tuple, optional): Page range, e.g., (1, 10) for pages 1-10
9. batch_analyze_local_papers
Batch analyze all PDF papers in a folder (local folders only).
Parameters:
folder_path(str): Folder pathmax_papers(int): Maximum number of papers to analyze (default: 10)file_pattern(str): File matching pattern (default: "*.pdf")
10. compare_papers
Compare multiple papers.
Parameters:
paper_ids(list): List of paper IDs to compare (2-5 papers)comparison_aspects(list, optional): Comparison dimensions - "methodology", "findings", "impact", "timeline"
11. extract_key_information
Extract key information from papers.
Parameters:
paper_id(str): Paper identifiersource(str): Data source (default: "semantic_scholar")info_types(list, optional): List of information types to extract- "methodology": Research methods
- "findings": Main findings
- "limitations": Study limitations
- "datasets": Used datasets
- "metrics": Evaluation metrics
- "contributions": Main contributions
12. generate_paper_summary
Automatically generate paper summaries.
Parameters:
paper_id(str): Paper identifiersource(str): Data source (default: "semantic_scholar")summary_type(str): Summary type- "brief": Brief summary (100-200 words)
- "comprehensive": Comprehensive summary (500-800 words)
- "technical": Technical details summary
- "layman": Easy-to-understand version
13. extract_pdf_fulltext
Extract full text content from PDF.
Parameters:
pdf_url(str): PDF file URLextract_sections(bool): Whether to identify and extract sections (default: true)
📊 Standardized Output Format
All search results return papers in this standardized format:
{
"id": "Unique identifier (PMID, DOI, arXiv ID, etc.)",
"title": "Paper title",
"authors": "Author names (comma-separated)",
"abstract": "Paper abstract",
"publication_date": "Publication date",
"journal": "Journal or venue name",
"url": "Link to paper",
"pdf_url": "PDF link (if available)",
"source": "Database source (pubmed/biorxiv/arxiv/etc.)"
}
Semantic Scholar results include additional fields:
citation_count: Number of citationsreference_count: Number of referencesfields_of_study: Research areas
🔧 Architecture
Dual Server Design
This project provides two complementary MCP servers:
academic_server.py- Core search and retrieval functionalityacademic_research_advanced.py- Advanced analysis and research workflows
Project Structure
Academic-MCP-Server/
├── academic_server.py # Main MCP server (basic search)
├── academic_research_advanced.py # Advanced research server
├── adapters/ # Database adapters
│ ├── base_adapter.py # Abstract base class
│ ├── pubmed_adapter.py # PubMed wrapper
│ ├── biorxiv_adapter.py # bioRxiv/medRxiv
│ ├── arxiv_adapter.py # arXiv
│ ├── semantic_scholar_adapter.py
│ └── scihub_adapter.py # Sci-Hub
├── utils/ # Helper functions
│ ├── helpers.py # General utilities
│ └── pubmed_utils.py # PubMed-specific utilities
├── requirements.txt # Dependencies
└── README.md / README_CN.md # Documentation
Adapter Pattern
Each database is wrapped in an adapter that implements a common interface:
Adding New Databases
To add a new database:
- Create a new adapter in
adapters/ - Inherit from
BaseAdapter - Implement all required methods
- Register in
academic_server.py
Example:
# adapters/new_database_adapter.py
from .base_adapter import BaseAdapter
class NewDatabaseAdapter(BaseAdapter):
def search_by_keywords(self, keywords, num_results):
# Implementation
pass
# ... implement other methods
# In academic_server.py
from adapters.new_database_adapter import NewDatabaseAdapter
adapters = {
# ... existing adapters
"new_database": NewDatabaseAdapter()
}
🎯 Use Cases
For Researchers
- Search across multiple preprint servers simultaneously
- Find papers by specific authors or topics
- Download open access papers automatically
- Generate literature review materials
- Analyze local PDF collections
- Perform comprehensive citation network analysis
- Generate automated paper summaries
For AI Assistants
- Access comprehensive academic knowledge
- Provide up-to-date research information
- Help with citation and reference management
- Analyze research trends and findings
- Process and explain figures from academic papers
- Conduct complete research workflows automatically
⚠️ Limitations & Notes
API Rate Limits
- PubMed: No API key required, but rate-limited
- bioRxiv/medRxiv: No authentication required
- arXiv: Rate-limited (1 request per 3 seconds recommended)
- Semantic Scholar: Free tier has rate limits; get API key for higher limits at https://www.semanticscholar.org/product/api
- Sci-Hub: No authentication required; use responsibly
PDF Availability
- PubMed: Only PMC open access articles
- bioRxiv/medRxiv: All articles are open access
- arXiv: All articles are open access
- Semantic Scholar: Depends on publisher policies
- Sci-Hub: Wide coverage of academic papers (use for research purposes only)
Local PDF Support
- Full text extraction: Extract complete text from local or online PDFs
- Figure analysis: List and explain figures from PDF papers
- Section parsing: Automatically identify and extract paper sections
- Batch processing: Analyze multiple PDFs in a folder simultaneously
Date Formats
- PubMed:
YYYY/MM/DD - Others:
YYYY-MM-DD
🤝 Contributing
Contributions are welcome! Feel free to:
- Add new database adapters
- Improve existing functionality
- Fix bugs
- Enhance documentation
📄 License
This project builds upon the PubMed-MCP-Server and follows similar open-source principles.
🙏 Acknowledgments
- PubMed-MCP-Server for the original PubMed integration
- NCBI E-utilities
- bioRxiv/medRxiv API
- arXiv API
- Semantic Scholar API
- Sci-Hub MCP Server (JackKuo666/Sci-Hub-MCP-Server)
- FastMCP framework
⚠️ Disclaimer
The Sci-Hub integration is provided for research and educational purposes only. Users are responsible for complying with copyright laws and institutional policies in their jurisdiction. The authors do not endorse or encourage copyright infringement. Please support publishers and authors by obtaining papers through legitimate channels when possible.
📊 Project Statistics
- Supported Databases: 6 (PubMed, bioRxiv, medRxiv, arXiv, Semantic Scholar, Sci-Hub)
- MCP Servers: 2 (academic, academic-research)
- Basic MCP Tools: 6
- Advanced Research Tools: 15+
- Lines of Code: ~3,000
- Supported Formats: PDF, metadata, citations, full-text analysis
- PDF Support: Both local files and online URLs
🚀 Enhanced Features
Advanced Research Capabilities
- Citation Network Analysis: Understand paper relationships and impact
- Automated Summarization: Generate summaries in multiple styles
- Key Information Extraction: Extract methodology, findings, limitations
- Complete Research Workflows: One-click research from topic to summary
PDF Processing
- Local and Online Support: Process PDFs from local storage or URLs
- Figure Explanation: AI-powered figure analysis and explanation
- Section Recognition: Automatic identification of paper sections
- Batch Analysis: Process multiple papers simultaneously
Smart Search Features
- Concurrent Database Search: Search all databases simultaneously
- Intelligent Result Merging: Deduplicate and rank results
- Advanced Filtering: Multi-parameter search with date ranges
- Source-Specific Optimization: Tailored search for each database
📞 Support
For issues or questions:
- Check the documentation above
- Review error messages in logs
- Ensure all dependencies are installed
- Verify your MCP configuration
Happy researching! 📚🔬
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.