ocrmypdf-mcp
Exposes ocrmypdf as a single tool for OCR-ing scanned PDFs, with automatic PATH handling for Tesseract and Ghostscript on Windows to ensure compatibility with Claude Desktop.
README
ocrmypdf-mcp
A minimal MCP server that exposes ocrmypdf as a single tool, ocr_pdf, so Claude can OCR scanned PDFs and then hand them to markitdown (or any text tool) for downstream work.
Why this exists: the obvious "just call ocrmypdf" approach falls over on Windows with the Microsoft Store (MSIX) build of Claude Desktop, because MSIX launches MCP servers with a stripped-down PATH that doesn't include Tesseract or Ghostscript. This server auto-detects the standard Windows install locations and prepends them to PATH at startup, so OCR Just Works without futzing with system environment variables.
Works on Linux and macOS too — the PATH augmentation is a no-op outside Windows.
Prerequisites (Windows)
Two system installers, then pip install.
1. Tesseract OCR
UB-Mannheim build (the standard Windows distribution): https://github.com/UB-Mannheim/tesseract/wiki
Accept the default install location (C:\Program Files\Tesseract-OCR). Add language packs during install if you need anything beyond English.
2. Ghostscript
AGPL release for Windows (free): https://www.ghostscript.com/releases/gsdnld.html
Accept the default install location (C:\Program Files\gs\gs<version>\).
3. Verify (optional)
tesseract --version
gswin64c --version
If either says "not recognized," reopen PowerShell so it picks up the updated PATH, then retry.
Install the server
git clone https://github.com/jcm4TX/ocrmypdf-mcp
cd ocrmypdf-mcp
pip install --user .
This installs ocrmypdf, the mcp SDK, and the ocrmypdf-mcp executable. On Windows it lands at:
C:\Users\<you>\AppData\Roaming\Python\Python313\Scripts\ocrmypdf-mcp.exe
Wire it up in Claude Desktop
Edit claude_desktop_config.json. On the MSIX (Microsoft Store) build of Claude Desktop, the path is:
%LOCALAPPDATA%\Packages\Claude_pzs8sxrjxfjjc\LocalCache\Roaming\Claude\claude_desktop_config.json
On the regular non-MSIX installer it's:
%APPDATA%\Claude\claude_desktop_config.json
Add an ocrmypdf-mcp entry under mcpServers:
{
"mcpServers": {
"ocrmypdf-mcp": {
"command": "C:\\Users\\<you>\\AppData\\Roaming\\Python\\Python313\\Scripts\\ocrmypdf-mcp.exe",
"args": []
}
}
}
Then fully quit Claude Desktop — right-click the tray icon and pick Quit, not just close the window — and relaunch.
Verify it loaded
In a new chat, ask "what MCP tools do you have for OCR?" — Claude should report ocr_pdf. If not, check the server log:
%LOCALAPPDATA%\Packages\Claude_pzs8sxrjxfjjc\LocalCache\Roaming\Claude\logs\mcp-server-ocrmypdf-mcp.log
Tool API
ocr_pdf(input_path, output_path?, language?, force_ocr?, deskew?)
| Arg | Type | Default | Meaning |
|---|---|---|---|
input_path |
str | required | Absolute path to input PDF |
output_path |
str | <stem>-ocr.pdf next to input |
Where to write the OCR'd PDF |
language |
str | "eng" |
Tesseract language code; join multiple with +, e.g. "eng+spa" |
force_ocr |
bool | false |
Re-OCR pages that already have a text layer |
deskew |
bool | true |
Straighten skewed pages before OCR |
Default behavior: pages without an existing text layer get OCR'd, pages that already have text pass through unchanged. Safe to run on mixed PDFs.
Typical workflow
- You hand Claude a scanned PDF path.
- Claude calls
ocr_pdf(input_path="..."). - Claude calls
markitdown.convert_to_markdownon the resulting-ocr.pdf. - Claude reads the markdown and answers your question.
Known limitations
- The MCP protocol enforces a per-request timeout (~4 minutes in current Claude Desktop). Large multi-page documents may exceed this and surface as a client-side timeout even though the underlying
ocrmypdfprocess completes successfully — the output PDF will still be on disk. If you hit this regularly, split the input into smaller page ranges first. - Complex multi-column scanned layouts (legal, probate, ledgers) can produce messy markdown when piped to markitdown afterward, because Tesseract interprets visual alignment as table structure. Post-processing the markdown to drop empty table-pipe rows recovers most of it.
License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.