MCP Servers

PDF MCP

Enables PDF processing and analysis including text extraction, metadata retrieval, search, page manipulation, splitting/merging, conversion to images, and form handling.

README

PDF MCP

MCP server for PDF processing and analysis using PyPDFium2.

Features

extract_text: Extract text content from PDF files with page range support
extract_metadata: Extract PDF metadata including title, author, and page count
search_text: Search for specific text within PDF files with context
get_page_count: Get the total number of pages in a PDF file
extract_pages: Extract specific pages from a PDF and save as a new PDF
split_pdf: Split a PDF into multiple page-based PDFs with base64 encoding
merge_pdfs: Merge multiple PDF files into a single PDF
pdf_to_images: Convert PDF pages to PNG images with configurable DPI
get_form_fields: Extract all form fields from a PDF including names, types, and values
fill_form: Fill form fields in a PDF with provided values and save to output path

Installation

From Git Repository

# Clone the repository
git clone https://github.com/gzigurella/pdf-mcp.git
cd pdf-mcp

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install the package
pip install -e .

With uv (recommended)

# Clone and enter directory
git clone https://github.com/gzigurella/pdf-mcp.git
cd pdf-mcp

# Install with uv
uv pip install -e .

Integration

OpenCode

Add to your ~/.config/opencode/opencode.json:

{
  "mcpServers": {
    "pdf-mcp": {
      "type": "local",
      "command": [
        "/path/to/pdf-mcp/venv/bin/python",
        "-m",
        "pdf_mcp"
      ],
      "enabled": true
    }
  }
}

Claude Desktop

Add to your Claude Desktop config:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "pdf-mcp": {
      "command": "/path/to/pdf-mcp/venv/bin/python",
      "args": ["-m", "pdf_mcp"]
    }
  }
}

Generic MCP Client

For any MCP-compatible client:

# Start the server directly
/path/to/venv/bin/python -m pdf_mcp

The server communicates via stdio using the MCP protocol.

Tools

extract_text

Extract text content from a PDF file. Supports PDFs with searchable text and can extract text from specific pages or ranges.

Parameter	Type	Required	Default	Description
file_path	string	Yes	-	Path to the PDF file to extract text from
pages	string	No	"all"	Page range to extract (e.g., '1-5', '3,7,9', 'all')

{
  "file_path": "/path/to/document.pdf",
  "pages": "1-5"
}

extract_metadata

Extract metadata from a PDF file including title, author, subject, keywords, creator, producer, creation date, modification date, and page count.

Parameter	Type	Required	Default	Description
file_path	string	Yes	-	Path to the PDF file to extract metadata from

{
  "file_path": "/path/to/document.pdf"
}

search_text

Search for specific text within a PDF file. Returns page numbers and context around the found text. Useful for finding specific content in large documents.

Parameter	Type	Required	Default	Description
file_path	string	Yes	-	Path to the PDF file to search within
query	string	Yes	-	Text to search for in the PDF
case_sensitive	boolean	No	false	Whether to perform case-sensitive search
context_words	integer	No	10	Number of words to include before and after each match

{
  "file_path": "/path/to/document.pdf",
  "query": "important term",
  "case_sensitive": false,
  "context_words": 5
}

get_page_count

Get the total number of pages in a PDF file. Returns a simple integer count.

Parameter	Type	Required	Default	Description
file_path	string	Yes	-	Path to the PDF file to count pages for

{
  "file_path": "/path/to/document.pdf"
}

extract_pages

Extract specific pages from a PDF file and save as a new PDF. Supports page ranges and individual page selection.

Parameter	Type	Required	Default	Description
file_path	string	Yes	-	Path to the source PDF file
pages	string	Yes	-	Pages to extract (e.g., '1-5', '3,7,9', '1,3-5')
output_path	string	Yes	-	Path where the extracted pages will be saved as a new PDF

{
  "file_path": "/path/to/source.pdf",
  "pages": "1,3,5-7",
  "output_path": "/path/to/output.pdf"
}

split_pdf

Split a PDF file into multiple separate PDF files based on page ranges. Returns a JSON with base64-encoded PDFs for each selected page. Supports single pages, page ranges, and all pages.

Parameter	Type	Required	Default	Description
file_path	string	Yes	-	Path to the PDF file to split
page_range	string	Yes	-	Page range to split - 'all', single page (e.g., '1'), or range (e.g., '1-3', '2-5')

{
  "file_path": "/path/to/document.pdf",
  "page_range": "1-3"
}

merge_pdfs

Merge multiple PDF files into a single PDF. Files are merged in the order provided.

Parameter	Type	Required	Default	Description
file_paths	array	Yes	-	List of PDF file paths to merge
output_path	string	Yes	-	Path where the merged PDF will be saved

{
  "file_paths": ["/path/to/doc1.pdf", "/path/to/doc2.pdf", "/path/to/doc3.pdf"],
  "output_path": "/path/to/merged.pdf"
}

pdf_to_images

Convert PDF pages to PNG images. Returns a JSON with base64-encoded PNG images for each page. Supports custom DPI settings for resolution control.

Parameter	Type	Required	Default	Description
file_path	string	Yes	-	Path to the PDF file to convert to images
dpi	integer	No	150	Image resolution in dots per inch
format	string	No	"png"	Image format (PNG only)

{
  "file_path": "/path/to/document.pdf",
  "dpi": 300,
  "format": "png"
}

get_form_fields

Extract all form fields from a PDF document including field names, types, current values, and available choices for dropdown fields.

Parameter	Type	Required	Default	Description
file_path	string	Yes	-	Path to the PDF file to extract form fields from

{
  "file_path": "/path/to/form.pdf"
}

Returns a JSON with field information:

{
  "fields": [
    {
      "name": "first_name",
      "type": "text",
      "value": "",
      "page": 1,
      "rect": {"x0": 50, "y0": 72, "x1": 150, "y1": 92}
    },
    {
      "name": "country",
      "type": "combobox",
      "value": "",
      "page": 1,
      "rect": {...},
      "choices": ["USA", "Canada", "UK"]
    },
    {
      "name": "accept_terms",
      "type": "checkbox",
      "value": "",
      "page": 1,
      "rect": {...},
      "on_state": "Yes"
    }
  ],
  "total_fields": 3
}

fill_form

Fill form fields in a PDF document with provided values and save to output path. Supports text fields, checkboxes, radio buttons, and dropdowns.

Parameter	Type	Required	Default	Description
file_path	string	Yes	-	Path to the source PDF file
fields	object	Yes	-	Dictionary of field names and their values to fill
output_path	string	Yes	-	Path where the filled PDF will be saved

{
  "file_path": "/path/to/form.pdf",
  "fields": {
    "first_name": "John",
    "last_name": "Doe",
    "country": "USA",
    "accept_terms": true
  },
  "output_path": "/path/to/filled_form.pdf"
}

Checkbox values accept: true/false, "yes"/"no", "1"/"0". Radio buttons: use the value from on_state field (get with get_form_fields first).

Configuration

Environment Variables

Variable	Default	Description
PDF_MCP_DEBUG	false	Enable debug logging

# Example
export PDF_MCP_DEBUG=true
python -m pdf_mcp

Development

Running Tests

source venv/bin/activate
pytest

# With coverage
pytest --cov=src --cov-report=html

Project Structure

pdf-mcp/
├── src/pdf_mcp/
│   ├── __init__.py
│   ├── __main__.py
│   ├── server.py
│   ├── config.py
│   └── tools/
│       ├── __init__.py
│       ├── extract_text.py
│       ├── extract_metadata.py
│       ├── search_text.py
│       ├── get_page_count.py
│       ├── extract_pages.py
│       ├── split_pdf.py
│       ├── merge_pdfs.py
│       ├── pdf_to_images.py
│       ├── get_form_fields.py
│       └── fill_form.py
├── tests/
├── pyproject.toml
└── README.md

Troubleshooting

Installation Issues

If you encounter installation errors, ensure you have Python 3.10 or later:

python --version

File Not Found Errors

Make sure the PDF file paths are correct and the files exist:

ls -l /path/to/your/document.pdf

Encrypted PDFs

The tools will raise a RuntimeError if attempting to process encrypted PDFs. Ensure your PDFs are not password-protected.

Memory Issues with Large PDFs

For very large PDF files, consider processing them in smaller chunks using the extract_pages or split_pdf tools.

Permission Errors (Linux)

If you encounter permission errors, ensure the PDF files are readable:

chmod +r /path/to/your/document.pdf

Security Considerations

File Access: The server only processes files that exist and are readable by the running process
Path Validation: All file paths are validated before processing
No Network Access: The server does not make any network requests
Temporary Files: Temporary files are properly cleaned up after processing
Error Handling: Sensitive information is not exposed in error messages
Encrypted PDFs: Password-protected PDFs are rejected with appropriate error messages

Example Usage Scenarios

Scenario 1: Extract Text from Specific Pages

{
  "name": "extract_text",
  "arguments": {
    "file_path": "/documents/report.pdf",
    "pages": "1-3,7,9"
  }
}

Scenario 2: Search and Extract Context

{
  "name": "search_text",
  "arguments": {
    "file_path": "/documents/contract.pdf",
    "query": "liability clause",
    "case_sensitive": true,
    "context_words": 15
  }
}

Scenario 3: Merge Multiple Reports

{
  "name": "merge_pdfs",
  "arguments": {
    "file_paths": [
      "/reports/q1.pdf",
      "/reports/q2.pdf", 
      "/reports/q3.pdf",
      "/reports/q4.pdf"
    ],
    "output_path": "/reports/annual.pdf"
  }
}

Scenario 4: Convert PDF to Images

{
  "name": "pdf_to_images",
  "arguments": {
    "file_path": "/documents/presentation.pdf",
    "dpi": 300
  }
}

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured