MokuPDF
Enables AI applications to read and process PDF files with intelligent file search, text extraction, image processing, and optional OCR support for scanned documents.
README
MokuPDF - Intelligent PDF Reading Server for AI
MokuPDF is a powerful, MCP (Model Context Protocol) compatible server that enables AI applications to read and process PDF files with advanced capabilities. It combines intelligent file search, comprehensive text extraction, image processing, and optional OCR support to handle any type of PDF document - from simple text files to complex scanned documents.
π Perfect for Claude Desktop, ChatGPT plugins, and any AI application that needs PDF processing capabilities!
π Table of Contents
- β¨ Key Features
- π¦ Installation
- π― Quick Start
- π§ MCP Configuration
- π Available Tools
- π‘ Usage Examples
- πΌοΈ Image & Scanned PDF Support
- π Smart File Search
- βοΈ Configuration Options
- π οΈ Development
- π€ Contributing
- π License
β¨ Key Features
π Intelligent PDF Processing
- π Full Text Extraction - Extract all text content from any PDF
- πΌοΈ Advanced Image Handling - Extract embedded images as base64 PNG with proper format conversion
- π± Scanned PDF Support - Auto-detects and renders image-based/scanned PDFs at high resolution
- π€ Optional OCR Integration - Extract text from scanned documents using Tesseract (optional)
- π Page-by-Page Processing - Handle large PDFs efficiently without memory issues
π― Smart File Operations
- π§ Intelligent File Search - Find PDFs using natural language: "find the report", "open invoice"
- π Multi-Location Search - Automatically searches Desktop, Downloads, Documents, and OneDrive
- π Fuzzy Matching - Handles typos and partial filenames intelligently
- π Advanced Text Search - Search within PDFs with regex support and context
π€ AI Integration
- β‘ MCP Protocol Compliant - Seamlessly integrates with Claude Desktop and other AI tools
- π FastMCP Architecture - Built on the official MCP Python SDK for reliability
- π‘ JSON-RPC Interface - Clean, standardized API for easy integration
- βοΈ Configurable & Lightweight - Minimal dependencies, fast startup, customizable options
π¦ Installation
From Source
# Clone the repository
git clone https://github.com/jameslovespancakes/mokupdf.git
cd mokupdf
# Install the package
pip install .
# Or install in development mode
pip install -e .
Using pip (when published)
# Basic installation
pip install mokupdf
# With OCR support for scanned PDFs
pip install mokupdf[ocr]
Note: For OCR functionality, you'll also need Tesseract installed on your system:
- Windows: Download from GitHub releases
- Mac:
brew install tesseract - Linux:
sudo apt-get install tesseract-ocr
π― Quick Start
Running the Server
# Start with default settings (port 8000)
mokupdf
# Start with custom port
mokupdf --port 8080
# Enable verbose logging
mokupdf --verbose
# Set custom PDF directory
mokupdf --base-dir ./documents
Command Line Options
| Option | Description | Default |
|---|---|---|
--port |
Port to listen on | 8000 |
--verbose |
Enable verbose logging | False |
--base-dir |
Base directory for PDF files | Current directory |
--max-file-size |
Maximum PDF file size in MB | 100 |
--version |
Show version information | - |
--help |
Show help message | - |
π§ MCP Configuration
Add MokuPDF to your MCP configuration file:
{
"mcpServers": {
"mokupdf": {
"command": "python",
"args": ["-m", "mokupdf"]
}
}
}
π Available MCP Tools
1. open_pdf
Open a PDF file for processing.
{
"tool": "open_pdf",
"arguments": {
"file_path": "document.pdf"
}
}
2. read_pdf
Read PDF pages with text and images. Supports page ranges for efficient processing.
{
"tool": "read_pdf",
"arguments": {
"file_path": "document.pdf",
"start_page": 1,
"end_page": 5,
"max_pages": 10
}
}
Response includes:
- Text content with
[IMAGE: ...]placeholders - Base64-encoded images
- Page information
3. search_text
Search for text within the current PDF.
{
"tool": "search_text",
"arguments": {
"query": "introduction",
"case_sensitive": false
}
}
4. get_page_text
Extract text from a specific page.
{
"tool": "get_page_text",
"arguments": {
"page_number": 1
}
}
5. get_metadata
Get metadata from the current PDF.
{
"tool": "get_metadata",
"arguments": {}
}
6. close_pdf
Close the current PDF and free memory.
{
"tool": "close_pdf",
"arguments": {}
}
7. find_pdf_files
Find PDF files using intelligent search across common directories.
{
"tool": "find_pdf_files",
"arguments": {
"query": "financial report",
"limit": 5
}
}
π‘ Usage Examples
π― Natural Language File Access
# Instead of exact paths, use natural language
User: "Can you read the financial report from last quarter?"
Claude: Uses find_pdf_files("financial report") β Opens Q3_Financial_Report.pdf
User: "Look at the user manual on my desktop"
Claude: Searches Desktop β Finds User_Manual_v2.pdf β Processes it
User: "Find all invoices"
Claude: Returns list of all PDFs containing "invoice" from common locations
π Text-Based PDFs
# Regular PDF with embedded images
{
"tool": "read_pdf",
"arguments": {
"file_path": "annual_report.pdf",
"start_page": 1,
"max_pages": 10
}
}
# Response includes:
# - Extracted text content
# - Image placeholders: [IMAGE: Image 1 - 800x600px]
# - Base64-encoded images array
# - Page metadata
πΌοΈ Scanned PDFs (Image-Based)
# Scanned document without OCR
{
"tool": "read_pdf",
"arguments": {
"file_path": "scanned_contract.pdf"
}
}
# Response:
# - "[SCANNED PAGE: This page appears to be a scanned image]"
# - "[IMAGE: Full Page Scan - 1654x2339px]"
# - High-resolution page image as base64
# With OCR enabled (pip install mokupdf[ocr])
# Response:
# - "[SCANNED PAGE - OCR EXTRACTED TEXT]:"
# - "Actual extracted text content..."
# - "[IMAGE: Full Page Scan - 1654x2339px]"
# - Original page image as base64
π Smart Search & Discovery
# Find files by content or name
{
"tool": "find_pdf_files",
"arguments": {
"query": "invoice 2024",
"limit": 5
}
}
# Response includes:
# - Ranked list of matching files
# - File metadata (size, modification date, location)
# - Relevance scores
πΌοΈ Image & Scanned PDF Support
MokuPDF automatically handles different PDF types:
| PDF Type | Text Extraction | Image Handling | OCR Support |
|---|---|---|---|
| Text-based PDF | β Direct extraction | β Embedded images extracted | β Not needed |
| Mixed PDF | β Text + images | β All images extracted | β Not needed |
| Scanned PDF | β οΈ Limited/None | β Full page rendered | β Optional OCR |
| Image-only PDF | β None | β Full page rendered | β Optional OCR |
OCR Installation
# Install with OCR support
pip install mokupdf[ocr]
# Install Tesseract system dependency
# Windows: Download from GitHub releases
# Mac: brew install tesseract
# Linux: sudo apt-get install tesseract-ocr
π Smart File Search
MokuPDF's intelligent file finder works with natural language:
Search Patterns
- Exact matches:
"report"βAnnual_Report.pdf - Partial matches:
"ann"βAnnual_Report.pdf - Multiple terms:
"financial report 2024"βFinancial_Report_2024.pdf - Fuzzy matching:
"finacial"βFinancial_Report.pdf(handles typos)
Search Locations
- Current working directory
~/Desktop~/Downloads~/Documents~/OneDrive/Desktop(if available)~/OneDrive/Documents(if available)
Ranking System
Files are ranked by:
- Exact name matches (highest priority)
- Word boundary matches
- Partial string matches
- Recent modification time (boost for recent files)
- File location (Desktop files prioritized)
βοΈ Configuration Options
Command Line Arguments
mokupdf --help
Options:
--base-dir PATH Base directory for PDF files (default: current)
--max-file-size INT Maximum PDF size in MB (default: 100)
--port INT Port number (legacy, ignored by FastMCP)
--verbose Enable verbose logging (legacy, ignored)
--version Show version information
MCP Server Configuration
{
"mcpServers": {
"mokupdf": {
"command": "python",
"args": ["-m", "mokupdf", "--base-dir", "./documents", "--max-file-size", "200"]
}
}
}
π» Development
Project Structure
mokupdf/
βββ mokupdf/
β βββ __init__.py # Package initialization
β βββ server.py # Main server implementation
β βββ __main__.py # Module entry point
βββ setup.py # Package setup script
βββ pyproject.toml # Modern Python packaging
βββ requirements.txt # Direct dependencies
βββ LICENSE # MIT License
βββ README.md # This file
Running Tests
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run with coverage
pytest --cov=mokupdf
Code Quality
# Format code
black mokupdf/
# Lint code
flake8 mokupdf/
Architecture
MokuPDF is built using:
- FastMCP: Official MCP Python SDK for reliable protocol handling
- PyMuPDF (fitz): High-performance PDF processing and rendering
- Pillow: Image format conversion and processing
- pytesseract: Optional OCR text extraction from scanned documents
π οΈ Troubleshooting
Common Issues
πΈ "ModuleNotFoundError: No module named 'mokupdf'"
# Install the package
pip install mokupdf
πΈ "No PDF is currently open"
# Always open a PDF first, or provide file_path in read_pdf
{
"tool": "open_pdf",
"arguments": {"file_path": "document.pdf"}
}
πΈ "PDF file not found"
# Use smart search instead of exact paths
{
"tool": "find_pdf_files",
"arguments": {"query": "document"}
}
πΈ OCR not working
# Install OCR dependencies
pip install mokupdf[ocr]
# Windows: Download Tesseract from GitHub releases
# Mac: brew install tesseract
# Linux: sudo apt-get install tesseract-ocr
πΈ "File too large" errors
# Increase file size limit
mokupdf --max-file-size 500 # Allow 500MB files
Debug Mode
# Enable verbose logging for detailed information
mokupdf --verbose
# Check MCP connection in Claude Desktop developer tools
# Press Ctrl+Shift+I in Claude Desktop
π Performance Tips
- Large PDFs: Use
start_pageandend_pageparameters for chunked processing - Memory usage: Close PDFs when done with
close_pdftool - OCR speed: OCR processing adds significant time - disable if not needed
- File search: Search is cached - repeated searches are faster
- Image quality: Scanned pages rendered at 2x resolution for clarity
πΊοΈ Roadmap
- [ ] Advanced OCR: Multiple language support, confidence scores
- [ ] Enhanced Search: Content-based PDF search (search inside PDF text)
- [ ] Batch Processing: Process multiple PDFs simultaneously
- [ ] Format Support: Add support for other document formats (DOCX, PPTX)
- [ ] Cloud Integration: Support for cloud storage (Google Drive, OneDrive API)
- [ ] Performance: Async processing for better concurrent handling
π Example Usage
Python Script Example
import json
import subprocess
# Start MokuPDF server
process = subprocess.Popen(
["mokupdf", "--port", "8000"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
text=True
)
# Send a request to open a PDF
request = {
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "open_pdf",
"arguments": {"file_path": "example.pdf"}
},
"id": 1
}
# Send request
process.stdin.write(json.dumps(request) + "\n")
process.stdin.flush()
# Read response
response = json.loads(process.stdout.readline())
print(f"PDF opened: {response['result']}")
Integration with LLMs
MokuPDF is designed to work seamlessly with LLM applications through MCP. The read_pdf tool returns content in a format optimized for LLM consumption:
- Text is extracted with page markers
- Images are embedded as base64 PNG with placeholders in text
- Large PDFs can be read page-by-page to avoid context limits
π οΈ Troubleshooting
Common Issues
Issue: ModuleNotFoundError: No module named 'mokupdf'
- Solution: Install the package with
pip install .
Issue: Port already in use
- Solution: Use a different port with
--port 8081
Issue: PDF file not found
- Solution: Check the base directory and ensure paths are relative to it
Issue: Large PDF causes timeout
- Solution: Use page-by-page reading with
start_pageandend_pageparameters
Debug Mode
Enable verbose logging for detailed information:
mokupdf --verbose
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π€ Contributing
We welcome contributions! MokuPDF is designed to be the best PDF processing tool for AI applications.
How to Contribute
- π΄ Fork the repository
- πΏ Create a feature branch:
git checkout -b feature/amazing-feature - π Make your changes with clear, documented code
- β Add tests for new functionality
- π§Ή Run code formatting:
black mokupdf/ - β¨ Submit a pull request
Development Setup
# Clone your fork
git clone https://github.com/yourusername/mokupdf.git
cd mokupdf
# Install in development mode with all dependencies
pip install -e ".[dev,ocr]"
# Run tests
pytest
# Format code
black mokupdf/
flake8 mokupdf/
Contribution Ideas
- π Multi-language OCR support
- β‘ Performance optimizations
- π Advanced search algorithms
- π± New document format support
- π Bug fixes and improvements
- π Documentation enhancements
π Support & Community
Getting Help
- π Issues: Open a GitHub issue for bugs or feature requests
- π¬ Discussions: Use GitHub Discussions for questions and community support
- π§ Troubleshooting: Enable
--verbosemode for detailed debugging information
Reporting Bugs
When reporting issues, please include:
- Operating system and Python version
- MokuPDF version (
mokupdf --version) - Sample PDF file (if possible)
- Complete error message and traceback
- Steps to reproduce the issue
π Acknowledgments
MokuPDF is built on the shoulders of giants:
- PyMuPDF - Exceptional PDF processing and rendering capabilities
- FastMCP - Official MCP Python SDK for reliable protocol handling
- Tesseract OCR - Open-source OCR engine for text extraction
- Pillow - Python Imaging Library for image processing
- Model Context Protocol - Standardized protocol for AI tool integration
Special thanks to the AI and open-source communities for inspiration and feedback.
π License
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License Summary
- β Commercial use - Use in commercial applications
- β Modification - Modify and distribute changes
- β Distribution - Distribute original or modified versions
- β Private use - Use privately without restrictions
- β No warranty - Software provided "as-is"
- βοΈ License notice - Include original license in copies
<div align="center">
π Made with β€οΈ for the AI community
β Star us on GitHub β’ π¦ Install from PyPI β’ π Read the Docs
</div>
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
E2B
Using MCP to run code via e2b.