PDF Figures MCP Server

PDF Figures MCP Server

Extracts figures and tables from PDF documents via a FastAPI service, wrapping PDFFigures 2.0. Enables AI agents to programmatically retrieve structured figure and table data from scholarly PDFs.

Category
Visit Server

README

Figure Extractor API and MCP Server

Extract figures and tables from PDF documents using this FastAPI-based service. The Figure Extractor API and MCP Server provides a straightforward HTTP interface for PDFFigures 2.0, a robust figure extraction system developed by the Allen Institute for AI.

This API wrapper makes it ideal for integration into various applications and workflows, particularly for Retrieval-Augmented Generation (RAG) applications.

The MCP Server, powered by FastMCP, exposes the PDF extraction functionality as a MCP tool. This allows for seamless integration with AI Agents and workflows that can automatically call the extraction service.

  • The server hosts the extract_figures_from_pdf tool, which can be invoked via an HTTP request to the /mcp endpoint. This tool takes a PDF URL, processes the document, and returns the extracted figures and tables in a structured JSON format.

The default MCP Server url: http://localhost:5001/mcp

<img src="./doc/mcp.png" alt="mcp_tool_use" style="zoom:67%;" />

About PDFFigures 2.0

This API service is built on top of PDFFigures 2.0, a Scala-based project by the Allen Institute for AI. PDFFigures 2.0 is specifically designed to extract figures, captions, tables, and section titles from scholarly documents in computer science domain. The original work is described in their academic paper: "PDFFigures 2.0: Mining Figures from Research Papers" (Clark and Divvala, 2016). You can read the paper here and visit the PDFFigures 2.0 website.

┌─────────────────┐      ┌──────────────────┐      ┌────────────────┐
│   Your App      │ HTTP │ Figure Extractor │ JNI  │  PDFFigures    │
│  (Any Language) │──────►  API & MCP Server│──────►     2.0        │
│                 │      │  Python(FastAPI) │      │  (Scala/JVM)   │
└─────────────────┘      └──────────────────┘      └────────────────┘

Features

  • PDF figure and table extraction
  • Support for local and remote PDF files
  • Statistics of the extracted tables and figures
  • Docker support for easy deployment
  • Visualization options for PDF parsing

FastMCP Tool

This project now includes a FastMCP tool that allows calling the PDF extraction service programmatically. The extract_figures_from_pdf tool can be used to extract figures from a PDF given a URL.

Use Cases

  1. Machine Learning Dataset Creation Extract visual data from clinical trial reports and research papers to build training datasets for medical image analysis and AI models, enabling researchers to efficiently aggregate figures for training machine learning algorithms in healthcare diagnostics.

  2. Clinical Research Data Mining Automatically extract and catalog figures from medical research articles, capturing key visualizations like treatment effect graphs, patient outcome charts, and experimental result diagrams to support systematic reviews and meta-analysis.

  3. Academic Literature Review and Education Quickly compile comprehensive visual libraries from academic publications, allowing researchers and educators to create teaching resources, compare research methodologies, and track visual trends across scientific disciplines.

Docker Deployment

Build and start the extraction server:

docker build -t pdffigures2 .
docker run -d -p 5001:5001 pdffigures2

The image is ~286MB — optimized via Alpine base, jlink minimal JRE, and pip cleanup:

Metric Original Optimized Change
Base Image slim alpine -70MB
JRE apt jlink +42MB
pip Packages Original Cleaned -24MB
System Packages apt apk -12MB
Image Size 452MB 286MB -166MB (-37%)

Test the API

  • Open http://localhost:5001/docs to view the API documentation.

  • Use curl to test extraction from a PDF URL:

    curl -X POST http://localhost:5001/api/extract \
      -F "pdf_url=https://example.com/sample.pdf"
    

Agent Skill (non-MCP)

An API client that calls the extraction server and downloads rendered figures locally. Designed for agent environments without MCP support.

# Copy config and point to the server
cp .env.example .env
# (edit .env if the server is remote)

# Run extraction against a local PDF
skills/pdffigures2/scripts/pdffigures2 paper.pdf -o ./extracted/

The script POSTs the PDF to PDFFIGURES2_API_URL, downloads each rendered figure, and prints a structured JSON summary to stdout. See skills/pdffigures2/SKILL.md for the full agent-facing documentation.

Usage

Extract Figures and Tables from a PDF

The /api/extract endpoint now supports both PDF file uploads and PDF URLs.

Using the API

You can send a POST request to /api/extract with either:

  • A file (multipart/form-data) containing the PDF.
  • A pdf_url (form-data) containing the URL of the PDF.

The API will return a JSON response with extracted figures and tables, including full renderURL paths.

A JSON response example (imageText truncated for brevity):

[
  {
    "caption": "TABLE III CMAPSS DATASET ATTRIBUTES",
    "captionBoundary": {
      "x1": 113.12599182128906,
      "x2": 225.1184844970703,
      "y1": 116.80506896972656,
      "y2": 130.57305908203125
    },
    "figType": "Table",
    "imageText": [
      "Required", "fan", "conversion", "speed", "rpm", "High-pressure", "turbines", "cool", "air", "flow",
      "lbm/s", "Low-pressure", "turbines", "cool", "air", "flow", "lbm/s", "Bleed", "enthalpy", "-", "Required"
    ],
    "name": "III",
    "page": 5,
    "regionBoundary": {
      "x1": 46.8,
      "x2": 291.12,
      "y1": 140.88,
      "y2": 385.91999999999996
    },
    "renderDpi": 300,
    "renderURL": "http://localhost:5001/resources/4-TableIII-1.png"
  },
  {
    "caption": "Fig. 3. Agentic AI implementation with Google ADK.",
    "captionBoundary": {
      "x1": 335.4129943847656,
      "x2": 516.9002685546875,
      "y1": 228.6050567626953,
      "y2": 233.40704345703125
    },
    "figType": "Figure",
    "imageText": [],
    "name": "3",
    "page": 5,
    "regionBoundary": {
      "x1": 302.88,
      "x2": 549.12,
      "y1": 86.88,
      "y2": 216
    },
    "renderDpi": 300,
    "renderURL": "http://localhost:5001/resources/4-Figure3-1.png"
  },
  {
    "caption": "TABLE I COMPARISON BETWEEN AI AGENTS AND AGENTIC AI",
    "captionBoundary": {
      "x1": 204.9189910888672,
      "x2": 390.3569641113281,
      "y1": 54.10902404785156,
      "y2": 67.87701416015625
    },
    "figType": "Table",
    "imageText": [
      "pert", "systems", "LLM-based", "agents,", "multi-agent", "coordination,", "intent-based", "workflows", "Task",
      "Scope", "Focused", "on", "short-term,", "well-defined", "tasks", "Oriented", "toward", "long-term,", "dynamic,",
    ],
    "name": "I",
    "page": 2,
    "regionBoundary": {
      "x1": 51.839999999999996,
      "x2": 543.12,
      "y1": 77.75999999999999,
      "y2": 217.92
    },
    "renderDpi": 300,
    "renderURL": "http://localhost:5001/resources/4-TableI-1.png"
  },
  {
    "caption": "Fig. 1. Traditional AI Agent vs. Agentic AI",
    "captionBoundary": {
      "x1": 224.30499267578125,
      "x2": 370.9708251953125,
      "y1": 448.1660461425781,
      "y2": 452.968017578125
    },
    "figType": "Figure",
    "imageText": [],
    "name": "1",
    "page": 2,
    "regionBoundary": {
      "x1": 45.839999999999996,
      "x2": 549.12,
      "y1": 230.88,
      "y2": 436.08
    },
    "renderDpi": 300,
    "renderURL": "http://localhost:5001/resources/4-Figure1-1.png"
  },
  {
    "caption": "TABLE IV SUMMARY OF ENGINE MAINTENANCE ACTIONS",
    "captionBoundary": {
      "x1": 215.99301147460938,
      "x2": 379.2913513183594,
      "y1": 54.10902404785156,
      "y2": 67.87701416015625
    },
    "figType": "Table",
    "imageText": [
      "#", "Engines", "RUL", "Range", "Recommended", "Action", "Priority", "Cost", "(USD)", "Labor", "Hours", "Assigned",
      "Staff", "Scheduled", "Time", "15", "82–124", "MONITOR", "low", "0", "0", "[jr", "mechanic]", "Within", "7", "days"
    ],
    "name": "IV",
    "page": 7,
    "regionBoundary": {
      "x1": 48.96,
      "x2": 549.12,
      "y1": 77.75999999999999,
      "y2": 135.12
    },
    "renderDpi": 300,
    "renderURL": "http://localhost:5001/resources/4-TableIV-1.png"
  },
  {
    "caption": "TABLE II KEY COMPONENTS OF INTENTION PROCESSING",
    "captionBoundary": {
      "x1": 345.09295654296875,
      "x2": 507.22100830078125,
      "y1": 417.8482666015625,
      "y2": 431.6162414550781
    },
    "figType": "Table",
    "imageText": [
      "Targets", "Specify", "the", "resources", "or", "entities", "to", "which", "the", "intent", "applies.", "Can", "be",
      "defined", "statically", "(explicit", "list)", "or", "dynamically", "(using", "filters", "or", "criteria).", "Context"
    ],
    "name": "II",
    "page": 3,
    "regionBoundary": {
      "x1": 302.88,
      "x2": 549.12,
      "y1": 441.84,
      "y2": 677.04
    },
    "renderDpi": 300,
    "renderURL": "http://localhost:5001/resources/4-TableII-1.png"
  },
  {
    "caption": "Fig. 2. Proposed framework for Industry 5.0 applying intent-based and Agentic AI.",
    "captionBoundary": {
      "x1": 159.51600646972656,
      "x2": 435.7596435546875,
      "y1": 288.3310241699219,
      "y2": 293.13299560546875
    },
    "figType": "Figure",
    "imageText": [],
    "name": "2",
    "page": 4,
    "regionBoundary": {
      "x1": 45.839999999999996,
      "x2": 549.12,
      "y1": 49.68,
      "y2": 276
    },
    "renderDpi": 300,
    "renderURL": "http://localhost:5001/resources/4-Figure2-1.png"
  }
]

Developer Setup

Clone the repository if you want to modify the code or run locally without Docker:

git clone https://github.com/vlln/pdffigures-mcp-server.git
cd pdffigures-mcp-server

Testing with figure_extractor.py

figure_extractor.py is a CLI tool for testing extraction against a running API server. It sends a local PDF to the API and downloads the extracted figures:

# Start the server first (via Docker or uvicorn), then:
python figure_extractor.py <path-to-pdf> --output_dir ./output

Options:

  • --output_dir — Directory to save downloaded figures (default: ./output)
  • --url — Custom API endpoint (default: http://localhost:5001/api/extract)

App Structure

project/
├── Dockerfile                # Multi-stage Alpine image (server + jlink JRE + pdffigures2 JAR)
├── .dockerignore
├── .env.example              # Example config for the Agent Skill
├── app/                      # FastAPI web service code
│   ├── __init__.py
│   ├── app.py                # API endpoints and MCP Server
│   ├── service.py            # Runs pdffigures2 JAR via subprocess
│   └── utils.py              # File I/O helpers
├── skills/                   # Agent Skills (non-MCP alternative)
│   └── pdffigures2/          # API client wrapper for pdffigures2
├── figure_extractor.py       # CLI tool for testing extraction
└── README.md

Acknowledgements

This project is a fork of Huang-lab/figure-extractor. We are grateful to the original authors for their work.

License

This project is licensed under the Apache License 2.0.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured