OwlOCR MCP
A macOS-based MCP server that enables high-accuracy text extraction from PDF and image files using the OwlOCR app or Apple's Vision Framework. It supports multi-language OCR and provides asynchronous tools for processing documents directly within MCP clients.
README
OwlOCR MCP
MCP (Model Context Protocol) server for PDF and image OCR on macOS. Supports two backends:
- OwlOCR CLI - Higher accuracy (recommended)
- Vision Framework - No external dependencies
Features
- š PDF OCR - Extract text from PDF files page by page with separators
- š¼ļø Image OCR - Extract text from PNG, JPEG, and other image formats
- š Multi-language - Korean + English by default (configurable)
- š Dual Backend - Auto-selects OwlOCR if available, falls back to Vision Framework
- ā” Async - Non-blocking execution for MCP clients
Benchmark Results
Tested on a 4-page Korean theological document with Hebrew text:
| Metric | Vision Framework | OwlOCR CLI |
|---|---|---|
| Time | 9.87s | 9.30s |
| Time/Page | 2.47s | 2.33s |
| Word Accuracy | 85.62% | 91.79% |
| Character Accuracy | 94.46% | 95.07% |
Winner: OwlOCR CLI - Faster and more accurate.
Requirements
- macOS (uses Apple Vision Framework / OwlOCR.app)
- Python 3.11+
- OwlOCR.app (optional, for better accuracy)
Installation
Using uv (recommended)
git clone https://github.com/yourusername/owlocr-mcp.git
cd owlocr-mcp
uv sync
Using pip
git clone https://github.com/yourusername/owlocr-mcp.git
cd owlocr-mcp
pip install -e .
MCP Client Configuration
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"owlocr": {
"command": "uv",
"args": ["run", "--directory", "/path/to/owlocr-mcp", "owlocr-mcp"]
}
}
}
Generic MCP Client
{
"mcpServers": {
"owlocr": {
"command": "/path/to/owlocr-mcp/.venv/bin/python",
"args": ["-m", "owlocr_mcp.server"]
}
}
}
Available Tools
ocr_pdf_to_text
Extract text from a PDF file.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
pdf_path |
string | required | Absolute path to the PDF file |
pages |
list[int] | null | Page numbers to process (1-based). If null, all pages |
dpi |
int | 200 | Resolution for rendering. Higher = better quality but slower |
backend |
string | "auto" | "auto", "owlocr", or "vision" |
languages |
list[string] | null | Language codes (Vision only). Default: ["ko-KR", "en-US"] |
Example:
Extract text from /Users/me/document.pdf using OwlOCR
Output:
첫 ė²ģ§ø ķģ“ģ§ ė“ģ©...
===== Page 2 =====
ė ė²ģ§ø ķģ“ģ§ ė“ģ©...
--- OCR Complete: 2 page(s) processed using OwlOCR CLI ---
ocr_image_to_text
Extract text from an image file.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
image_path |
string | required | Absolute path to the image file |
backend |
string | "auto" | "auto", "owlocr", or "vision" |
languages |
list[string] | null | Language codes (Vision only) |
check_ocr_backends
Check available OCR backends on the system.
Output:
OCR Backend Status:
ā
Vision Framework: Available (macOS built-in)
ā
OwlOCR CLI: Available (/Applications/OwlOCR.app)
Recommendation: Use backend='owlocr' for best accuracy
Backend Selection
| Backend | Accuracy | Speed | Requirements |
|---|---|---|---|
owlocr |
āāāāā | āāāā | OwlOCR.app installed |
vision |
āāāā | āāāā | None (macOS built-in) |
auto |
Best available | - | Uses OwlOCR if available |
Running the Benchmark
Compare backends on your own PDF:
# Both backends
uv run python benchmark.py /path/to/your.pdf
# With accuracy comparison (requires ground truth)
uv run python benchmark.py /path/to/your.pdf --show-text
# Specific backend only
uv run python benchmark.py /path/to/your.pdf --method owlocr
uv run python benchmark.py /path/to/your.pdf --method vision
Project Structure
owlocr-mcp/
āāā src/owlocr_mcp/
ā āāā __init__.py
ā āāā server.py # MCP server with tools
ā āāā ocr.py # Vision Framework backend
ā āāā ocr_owlocr.py # OwlOCR CLI backend
ā āāā pdf.py # PDF processing utilities
āāā benchmark.py # Performance comparison script
āāā pyproject.toml
āāā README.md
How It Works
OwlOCR Backend
- Render PDF pages to PNG using
pypdfium2 - Copy images to OwlOCR sandbox:
~/Library/Containers/JonLuca-DeCaro.OwlOCR/Data/tmp/ - Run CLI:
/Applications/OwlOCR.app/Contents/MacOS/OwlOCR --cli --input <file> - Combine results with page separators
Vision Framework Backend
- Render PDF pages to PNG using
pypdfium2 - Load as
CIImagevia PyObjC - Create
VNRecognizeTextRequestwith accurate recognition level - Process with
VNImageRequestHandler - Sort results by position and combine
Troubleshooting
"OwlOCR.app not found"
Install OwlOCR from owlocr.com or use backend="vision".
File picker dialog appears
This happens when OwlOCR can't access files outside its sandbox. The MCP server handles this by copying files to the sandbox temp directory automatically.
Poor accuracy on specific languages
For Vision Framework, specify languages explicitly:
ocr_pdf_to_text(pdf_path, languages=["ja-JP", "en-US"])
Supported language codes: ko-KR, en-US, ja-JP, zh-Hans, zh-Hant, etc.
License
MIT License - see LICENSE file.
Acknowledgments
- OwlOCR by JonLuca DeCaro
- MCP Python SDK
- Apple Vision Framework
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.