Research Paper Ingestion MCP Server

Research Paper Ingestion MCP Server

Enables searching, downloading, and analyzing academic papers from arXiv and Semantic Scholar to extract key insights and citation metrics. It facilitates autonomous knowledge acquisition by processing research findings and integrating them into persistent AI memory systems.

Category
Visit Server

README

Research Paper Ingestion MCP Server

MCP Python License Part of Agentic System

Autonomous knowledge acquisition from academic research papers for AGI self-improvement.

Part of the Agentic System - a 24/7 autonomous AI framework with persistent memory.

Features

Paper Discovery

  • arXiv Integration: Search and download from arXiv.org
  • Semantic Scholar: Citation analysis and academic impact metrics
  • PDF Download: Automatic paper retrieval and storage

Knowledge Extraction

  • Insight Extraction: Identify key findings and contributions
  • Citation Analysis: Understand paper influence and relationships
  • Technique Identification: Extract novel methods and approaches

Memory Integration

  • Enhanced Memory: Store extracted knowledge for AGI learning
  • Structured Entities: Create searchable memory representations
  • Citation Graphs: Track knowledge lineage

Installation

cd ${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/mcp-servers/research-paper-mcp
pip install -r requirements.txt

Configuration

Add to ~/.claude.json:

{
  "mcpServers": {
    "research-paper-mcp": {
      "command": "python3",
      "args": [
        "${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/mcp-servers/research-paper-mcp/server.py"
      ],
      "env": {},
      "disabled": false
    }
  }
}

Available Tools

search_arxiv

Search arXiv for research papers by query.

Parameters:

  • query (required): Search query (e.g., "recursive self-improvement AGI")
  • max_results: Maximum results (default: 10)
  • sort_by: Sort order - relevance, lastUpdatedDate, submittedDate

Example:

results = mcp__research-paper-mcp__search_arxiv({
    "query": "meta-learning neural networks",
    "max_results": 20,
    "sort_by": "relevance"
})

search_semantic_scholar

Search Semantic Scholar for papers with citation metrics.

Parameters:

  • query (required): Search query
  • fields: Metadata fields to retrieve
  • limit: Maximum results (default: 10)

Example:

results = mcp__research-paper-mcp__search_semantic_scholar({
    "query": "transformer architecture attention",
    "fields": ["title", "authors", "citationCount", "year"],
    "limit": 15
})

download_paper

Download research paper PDF from URL.

Parameters:

  • url (required): PDF URL
  • paper_id (required): Unique identifier for filename

Example:

result = mcp__research-paper-mcp__download_paper({
    "url": "https://arxiv.org/pdf/1234.5678.pdf",
    "paper_id": "arxiv-1234.5678"
})

extract_insights

Extract key insights and findings from paper text.

Parameters:

  • paper_text (required): Full paper text or abstract
  • focus_areas: Optional specific areas to focus on

Example:

insights = mcp__research-paper-mcp__extract_insights({
    "paper_text": paper_abstract,
    "focus_areas": ["methodology", "results"]
})

analyze_citations

Analyze citation relationships and paper influence.

Parameters:

  • paper_id (required): Semantic Scholar or arXiv paper ID
  • depth: Citation graph depth 1-3 (default: 1)

Example:

analysis = mcp__research-paper-mcp__analyze_citations({
    "paper_id": "arxiv:1706.03762",  # "Attention Is All You Need"
    "depth": 2
})

store_paper_knowledge

Store extracted knowledge in enhanced-memory for AGI learning.

Parameters:

  • paper_metadata (required): Paper metadata dict
  • insights (required): List of key insights
  • techniques: List of novel techniques

Example:

stored = mcp__research-paper-mcp__store_paper_knowledge({
    "paper_metadata": {
        "id": "arxiv-1234.5678",
        "title": "Novel AGI Approach",
        "authors": ["Smith", "Jones"],
        "year": 2024
    },
    "insights": [
        "Achieves 95% accuracy on benchmark",
        "10x faster than previous methods"
    ],
    "techniques": [
        "Recursive meta-optimization",
        "Self-modifying architectures"
    ]
})

Usage Patterns

Autonomous Research Workflow

# 1. Search for relevant papers
arxiv_results = mcp__research-paper-mcp__search_arxiv({
    "query": "recursive self-improvement",
    "max_results": 10
})

# 2. Get citation metrics
for paper in arxiv_results['papers']:
    scholar_data = mcp__research-paper-mcp__search_semantic_scholar({
        "query": paper['title'],
        "limit": 1
    })

    # 3. Download high-impact papers
    if scholar_data['papers'][0]['citationCount'] > 50:
        pdf = mcp__research-paper-mcp__download_paper({
            "url": paper['pdf_url'],
            "paper_id": paper['id']
        })

        # 4. Extract and store insights
        insights = mcp__research-paper-mcp__extract_insights({
            "paper_text": paper['abstract']
        })

        mcp__research-paper-mcp__store_paper_knowledge({
            "paper_metadata": paper,
            "insights": insights['insights']
        })

Citation Network Analysis

# Analyze citation influence
analysis = mcp__research-paper-mcp__analyze_citations({
    "paper_id": "influential-paper-id",
    "depth": 2
})

# Identify most influential papers in field
if analysis['citation_graph']['influential_citations'] > 100:
    # Download and study this foundational paper
    pass

Storage

  • Papers Directory: ${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/research-papers/
  • PDFs: Saved as {paper_id}.pdf
  • Memory Integration: Via enhanced-memory-mcp create_entities

Dependencies

  • arxiv: arXiv API Python wrapper
  • aiohttp: Async HTTP client for Semantic Scholar API
  • mcp: Model Context Protocol SDK

Future Enhancements

  1. PDF Text Extraction: Parse full paper text from PDFs
  2. Figure/Diagram Analysis: Extract visual insights
  3. Code Repository Links: Find implementation code
  4. Related Papers: Automatic discovery of connected research
  5. Trend Detection: Identify emerging research directions
  6. LLM-Powered Insight Extraction: Use GPT-4 for deeper analysis

Integration with AGI System

This MCP server closes Gap #1 from AGI_GAP_ANALYSIS.md:

Knowledge Acquisition Infrastructure

  • ✓ Research Paper Ingestion (arXiv + Semantic Scholar)
  • ⏳ Video Transcript Processing (separate MCP)
  • ⏳ GitHub Repository Analysis (future)
  • ⏳ Documentation Scraping (future)
  • ⏳ Knowledge Graph Integration (future)

Impact: System can now autonomously learn from the latest AI research!

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
E2B

E2B

Using MCP to run code via e2b.

Official
Featured