Document Intelligence MCP Server

Document Intelligence MCP Server

Enables Claude to read and analyze PDF documents with automatic OCR processing for scanned files. Features intelligent text extraction, caching for performance, and secure file access with search capabilities.

Category
Visit Server

README

Building an MCP Server with OCR: From Setup Struggles to Document Intelligence- Proof of Concept

A real-world journey of setting up Anthropic's Model Context Protocol server with advanced PDF processing capabilities

The Challenge: Making Claude Desktop Read Your Documents

Ever wished your AI assistant could read through your scanned PDFs, legal documents, or HOA covenants without you having to manually extract text? That's exactly what we set out to accomplish in this session - building a custom MCP (Model Context Protocol) server that not only reads PDFs but intelligently handles scanned documents using OCR.

Starting Point: Following the Official Tutorial

We began by following Anthropic's official MCP server tutorial to build a basic weather server. What seemed like a straightforward process quickly became a Windows-specific debugging adventure.

The Setup Struggles

Problem #1: Missing Configuration File The first hurdle was the infamous "cannot find claude_desktop_config.json" error. This configuration file doesn't exist by default - you need to create it manually in the right location:

  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

Problem #2: UV Package Manager Permissions The tutorial requires uv (a fast Python package manager), but our initial installation had permission issues. The solution was using PowerShell as Administrator with the bypass flag:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Problem #3: Project Scripts Configuration Even with uv working, the server couldn't run because the pyproject.toml was missing the crucial [project.scripts] section:

[project.scripts]
weather = "weather:main"

Building the Weather Server Foundation

Once we overcame the setup issues, we had a basic weather server running with these tools:

  • get_forecast - Simulated weather forecasts for any city
  • get_alerts - Weather alerts for US states

The server successfully connected to Claude Desktop, proving our MCP infrastructure was working.

The Real Goal: Document Intelligence

With the foundation in place, we tackled the main objective - adding PDF reading capabilities with OCR support for scanned documents. This required several components:

1. Basic PDF Reading

Using PyPDF2 for extractable text PDFs:

def extract_pdf_text(file_path: str, page_numbers: list[int] = None):
    # Extract text from regular PDFs

2. OCR Integration

For scanned documents, we integrated:

  • pytesseract - Python wrapper for Google's Tesseract OCR engine
  • pdf2image - Converts PDF pages to images for OCR processing
  • Pillow - Image processing library

3. Intelligent Detection System

The server automatically determines whether a PDF needs OCR:

def has_extractable_text(file_path: str) -> bool:
    # Checks if PDF has meaningful extractable text
    # Falls back to OCR for scanned documents

4. Caching System

Perhaps the most valuable feature - OCR results are cached to avoid reprocessing:

  • Cached files use naming pattern: document_ocr_[hash].txt
  • Hash ensures cache invalidation when source PDF changes
  • Dramatically improves performance for repeat access

Security Considerations

The server includes built-in security measures:

  • Path Validation: Only allows access to predefined directories
  • File Type Restrictions: Limited to PDF files
  • Permission Checks: Validates file access before processing
ALLOWED_PDF_DIRECTORIES = [
    "/path/to/your/documents",
    "/path/to/your/pdfs", 
    "/path/to/your/downloads"
]

Real-World Test: HOA Document Analysis

To validate our system, we processed actual HOA covenant documents:

  • Input: 5.1 MB scanned PDF with 40+ pages
  • Processing: Full OCR extraction and caching
  • Output: Complete one-page summary of key provisions
  • Result: Instant future access via cached text

The system successfully identified property restrictions, assessment procedures, architectural controls, and enforcement mechanisms from a complex legal document.

Final MCP Tools Arsenal

Our completed server provides these capabilities:

Document Tools:

  • read_pdf - Read entire documents or specific pages with automatic OCR
  • list_pdfs - Inventory available documents with scan/cache status
  • search_pdf_content - Full-text search within documents

Weather Tools:

  • get_forecast - Weather forecasts for any location
  • get_alerts - Weather alerts by state

Smart Features:

  • Automatic scanned PDF detection
  • Intelligent OCR fallback
  • Persistent caching system
  • Security-first file access

Lessons Learned

1. Windows Development Gotchas

  • PowerShell execution policies can block installations
  • Path separators matter in configuration files
  • Permission issues are common with package managers

2. OCR Implementation Insights

  • System dependencies (Tesseract, Poppler) are required
  • Caching is essential for practical OCR performance
  • Hybrid approach (text extraction + OCR fallback) works best

3. MCP Architecture Benefits

  • Modular tool design allows easy capability expansion
  • Security model provides controlled file system access
  • Integration with Claude Desktop creates seamless user experience

Performance Impact

The caching system provides dramatic performance improvements:

  • First Access: ~30-60 seconds for OCR processing
  • Subsequent Access: <1 second from cache
  • Storage Overhead: ~10-20% of original PDF size for text cache

What's Next?

This foundation opens up numerous possibilities:

  • Integration with cloud OCR services for better accuracy
  • Support for additional document formats (DOCX, images)
  • Semantic search using embeddings
  • Document comparison and analysis tools
  • Automated summarization and extraction pipelines

Code Availability

The complete MCP server code includes:

  • Comprehensive error handling
  • Type hints throughout
  • Detailed documentation
  • Production-ready security measures
  • Extensible architecture for additional tools

Conclusion

Building this MCP server transformed a basic tutorial into a powerful document intelligence system. What started as debugging configuration issues evolved into a practical tool for extracting insights from scanned legal documents.

The real value isn't just in the technical implementation - it's in democratizing access to document analysis. Now anyone can ask their AI assistant to "summarize my HOA covenants" or "what are the key restrictions in my lease?" and get instant, accurate responses from scanned PDFs.

The journey from setup struggles to document intelligence showcases both the power of the MCP architecture and the practical challenges of real-world AI development. Sometimes the best learning happens when things don't work as expected.


This MCP server demonstrates the potential of combining traditional document processing with modern AI capabilities. By handling the technical complexities behind the scenes, we enable natural language interaction with complex documents - transforming how people access and understand their important paperwork.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured