MCP PDF Extract

MCP PDF Extract

A Model Context Protocol server that enables listing and reading PDF documents from a configured directory, extracting text content for MCP clients like Claude Desktop.

Category
Visit Server

README

MCP PDF Extract

A Model Context Protocol (MCP) server that provides PDF document reading capabilities. This server allows MCP clients to list and read PDF documents from a specified directory.

Architecture

graph TB
    subgraph "MCP Client Layer"
        Client[MCP Client<br/>Claude Desktop / Inspector]
    end
    
    subgraph "MCP Server Layer"
        Server[MCP Documents Server<br/>mcp_documents_server.py]
        FastMCP[FastMCP Framework]
        
        Server --> FastMCP
    end
    
    subgraph "Business Logic Layer"
        PDFLoader[PDF Loader<br/>pdf.py]
        Exceptions[Exception Handler<br/>app/exceptions.py]
        
        PDFLoader --> Exceptions
    end
    
    subgraph "Data Layer"
        PDFFiles[PDF Files<br/>data/pdfs/]
        EnvConfig[Environment Config<br/>.env]
    end
    
    Client <-->|"JSON-RPC over stdio"| Server
    
    Server -->|"Tools & Resources"| PDFLoader
    PDFLoader -->|"Async Read"| PDFFiles
    PDFLoader -->|"Config"| EnvConfig
    
    style Client fill:#e1f5fe
    style Server fill:#fff3e0
    style PDFLoader fill:#f3e5f5
    style PDFFiles fill:#e8f5e9

Component Communication

sequenceDiagram
    participant C as MCP Client
    participant S as MCP Server
    participant P as PDF Loader
    participant F as File System
    
    C->>S: Initialize connection
    S-->>C: Server capabilities
    
    C->>S: List documents (resource)
    S->>P: list_available_pdfs()
    P->>F: Read directory
    F-->>P: PDF file list
    P-->>S: Document metadata
    S-->>C: Document list
    
    C->>S: Read document (tool)
    S->>P: load_pdf(doc_id)
    P->>F: Read PDF file
    P->>P: Extract text
    P-->>S: Document content
    S-->>C: PDF text content

Features

  • List available PDF documents
  • Read and extract text content from PDF files
  • File size validation
  • Path traversal protection
  • Configurable PDF directory and size limits

Prerequisites

  • Python 3.10 or higher
  • uv package manager (recommended)

Installation

1. Clone the repository

cd /path/to/your/projects
git clone <repository-url>
cd MCP_pdf_extract

2. Create virtual environment and install dependencies

Using uv (recommended):

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment
uv venv

# Sync dependencies
uv sync

3. Set up environment variables

Create a .env file in the project root:

MAX_PDF_SIZE_KB=350
PDF_DIR=./data/pdfs

4. Create PDF directory

mkdir -p data/pdfs

Place your PDF files in the data/pdfs directory.

Running the Server

Basic execution

uv run python mcp_documents_server.py

Testing with MCP Inspector

To test the server with the MCP Inspector:

npx @modelcontextprotocol/inspector

In the Inspector interface:

  • Command: uv
  • Arguments: run --with mcp mcp run mcp_documents_server.py

Verify the server is running

To verify the server is responding correctly, you can send a test message:

echo '{"jsonrpc": "2.0", "method": "initialize", "params": {"capabilities": {}}, "id": 1}' | uv run python mcp_documents_server.py

You should see a JSON response from the server.

Available Resources and Tools

Resources

  • docs://documents - Lists all available PDF documents
  • docs://documents/{doc_id} - Fetches the content of a specific PDF document

Tools

  • read_doc_contents - Reads and returns the text content of a PDF document
    • Parameter: doc_id (string) - The filename of the PDF to read

Configuration

The server can be configured using environment variables:

  • PDF_DIR: Directory containing PDF files (default: ./data/pdfs)
  • MAX_PDF_SIZE_KB: Maximum allowed PDF file size in KB (default: 350)

Integration with MCP Clients

Claude Desktop

Add the following to your Claude Desktop configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "pdf-extractor": {
      "command": "uv",
      "args": ["--directory", "/path/to/MCP_pdf_extract", "run", "python", "mcp_documents_server.py"],
      "env": {}
    }
  }
}

Troubleshooting

  • Ensure Python 3.10+ is installed: python --version
  • Verify uv is installed: uv --version
  • Check that PDF files are in the correct directory: ls data/pdfs/
  • Ensure the virtual environment is activated when running commands

License

[Your license here]

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured