Research Paper Ingestion MCP Server
Enables searching, downloading, and analyzing academic papers from arXiv and Semantic Scholar to extract key insights and citation metrics. It facilitates autonomous knowledge acquisition by processing research findings and integrating them into persistent AI memory systems.
README
Research Paper Ingestion MCP Server
Autonomous knowledge acquisition from academic research papers for AGI self-improvement.
Part of the Agentic System - a 24/7 autonomous AI framework with persistent memory.
Features
Paper Discovery
- arXiv Integration: Search and download from arXiv.org
- Semantic Scholar: Citation analysis and academic impact metrics
- PDF Download: Automatic paper retrieval and storage
Knowledge Extraction
- Insight Extraction: Identify key findings and contributions
- Citation Analysis: Understand paper influence and relationships
- Technique Identification: Extract novel methods and approaches
Memory Integration
- Enhanced Memory: Store extracted knowledge for AGI learning
- Structured Entities: Create searchable memory representations
- Citation Graphs: Track knowledge lineage
Installation
cd ${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/mcp-servers/research-paper-mcp
pip install -r requirements.txt
Configuration
Add to ~/.claude.json:
{
"mcpServers": {
"research-paper-mcp": {
"command": "python3",
"args": [
"${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/mcp-servers/research-paper-mcp/server.py"
],
"env": {},
"disabled": false
}
}
}
Available Tools
search_arxiv
Search arXiv for research papers by query.
Parameters:
query(required): Search query (e.g., "recursive self-improvement AGI")max_results: Maximum results (default: 10)sort_by: Sort order - relevance, lastUpdatedDate, submittedDate
Example:
results = mcp__research-paper-mcp__search_arxiv({
"query": "meta-learning neural networks",
"max_results": 20,
"sort_by": "relevance"
})
search_semantic_scholar
Search Semantic Scholar for papers with citation metrics.
Parameters:
query(required): Search queryfields: Metadata fields to retrievelimit: Maximum results (default: 10)
Example:
results = mcp__research-paper-mcp__search_semantic_scholar({
"query": "transformer architecture attention",
"fields": ["title", "authors", "citationCount", "year"],
"limit": 15
})
download_paper
Download research paper PDF from URL.
Parameters:
url(required): PDF URLpaper_id(required): Unique identifier for filename
Example:
result = mcp__research-paper-mcp__download_paper({
"url": "https://arxiv.org/pdf/1234.5678.pdf",
"paper_id": "arxiv-1234.5678"
})
extract_insights
Extract key insights and findings from paper text.
Parameters:
paper_text(required): Full paper text or abstractfocus_areas: Optional specific areas to focus on
Example:
insights = mcp__research-paper-mcp__extract_insights({
"paper_text": paper_abstract,
"focus_areas": ["methodology", "results"]
})
analyze_citations
Analyze citation relationships and paper influence.
Parameters:
paper_id(required): Semantic Scholar or arXiv paper IDdepth: Citation graph depth 1-3 (default: 1)
Example:
analysis = mcp__research-paper-mcp__analyze_citations({
"paper_id": "arxiv:1706.03762", # "Attention Is All You Need"
"depth": 2
})
store_paper_knowledge
Store extracted knowledge in enhanced-memory for AGI learning.
Parameters:
paper_metadata(required): Paper metadata dictinsights(required): List of key insightstechniques: List of novel techniques
Example:
stored = mcp__research-paper-mcp__store_paper_knowledge({
"paper_metadata": {
"id": "arxiv-1234.5678",
"title": "Novel AGI Approach",
"authors": ["Smith", "Jones"],
"year": 2024
},
"insights": [
"Achieves 95% accuracy on benchmark",
"10x faster than previous methods"
],
"techniques": [
"Recursive meta-optimization",
"Self-modifying architectures"
]
})
Usage Patterns
Autonomous Research Workflow
# 1. Search for relevant papers
arxiv_results = mcp__research-paper-mcp__search_arxiv({
"query": "recursive self-improvement",
"max_results": 10
})
# 2. Get citation metrics
for paper in arxiv_results['papers']:
scholar_data = mcp__research-paper-mcp__search_semantic_scholar({
"query": paper['title'],
"limit": 1
})
# 3. Download high-impact papers
if scholar_data['papers'][0]['citationCount'] > 50:
pdf = mcp__research-paper-mcp__download_paper({
"url": paper['pdf_url'],
"paper_id": paper['id']
})
# 4. Extract and store insights
insights = mcp__research-paper-mcp__extract_insights({
"paper_text": paper['abstract']
})
mcp__research-paper-mcp__store_paper_knowledge({
"paper_metadata": paper,
"insights": insights['insights']
})
Citation Network Analysis
# Analyze citation influence
analysis = mcp__research-paper-mcp__analyze_citations({
"paper_id": "influential-paper-id",
"depth": 2
})
# Identify most influential papers in field
if analysis['citation_graph']['influential_citations'] > 100:
# Download and study this foundational paper
pass
Storage
- Papers Directory:
${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/research-papers/ - PDFs: Saved as
{paper_id}.pdf - Memory Integration: Via enhanced-memory-mcp create_entities
Dependencies
- arxiv: arXiv API Python wrapper
- aiohttp: Async HTTP client for Semantic Scholar API
- mcp: Model Context Protocol SDK
Future Enhancements
- PDF Text Extraction: Parse full paper text from PDFs
- Figure/Diagram Analysis: Extract visual insights
- Code Repository Links: Find implementation code
- Related Papers: Automatic discovery of connected research
- Trend Detection: Identify emerging research directions
- LLM-Powered Insight Extraction: Use GPT-4 for deeper analysis
Integration with AGI System
This MCP server closes Gap #1 from AGI_GAP_ANALYSIS.md:
Knowledge Acquisition Infrastructure ✅
- ✓ Research Paper Ingestion (arXiv + Semantic Scholar)
- ⏳ Video Transcript Processing (separate MCP)
- ⏳ GitHub Repository Analysis (future)
- ⏳ Documentation Scraping (future)
- ⏳ Knowledge Graph Integration (future)
Impact: System can now autonomously learn from the latest AI research!
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
E2B
Using MCP to run code via e2b.