MCP Servers

MCP Mistral OCR Optimized

An optimized Model Context Protocol server for document OCR processing using Mistral AI with support for high-performance batch operations and async connection pooling. It enables efficient extraction of text and tables from local files or URLs into structured markdown and HTML formats while minimizing token costs.

README

MCP Mistral OCR Optimized

Optimized MCP server for OCR processing using Mistral AI with batch processing and async connection pooling.

🚀 Key Optimizations

Feature	Benefit
Batch Processing API	Up to 50% cost reduction for large file sets
Async Connection Pooling	20-30% faster processing for multiple files
Token-Efficient Defaults	`include_images=False`, `table_format=markdown` saves 30-40% tokens
Concurrent Processing	Process up to 5 files simultaneously
Cross-Platform Paths	Works on Windows, macOS, Linux, and Docker
Configurable Parameters	Fine-tune OCR output with table_format, headers, footers

📦 Installation

Using UV (Recommended)

# Navigate to project directory
cd D:/dev/mcp_mistral_ocr_opt

# Create and activate virtual environment
uv venv
# Windows
.venv\Scripts\activate
# Unix
source .venv/bin/activate

# Install dependencies
uv pip install .

Using Docker

# Build image
docker build -t mcp-mistral-ocr-opt .

# Run container
docker run -e MISTRAL_API_KEY=your_api_key \
           -v /path/to/your/files:/data/ocr \
           mcp-mistral-ocr-opt:latest

⚙️ Configuration

Environment Variables

Create or edit .env file:

# Required
MISTRAL_API_KEY=your_api_key_here
OCR_DIR=D:/dev/mcp_mistral_ocr_opt/data/ocr

# Optional - Batch Processing
BATCH_MODE=auto                  # auto, always, never
BATCH_MIN_FILES=5                # Use batch processing for 5+ files in auto mode
INLINE_BATCH_THRESHOLD=10        # Use inline batch for <10 files
MAX_CONCURRENT_REQUESTS=5        # Max concurrent API requests

# Optional - OCR Defaults (token optimization)
DEFAULT_TABLE_FORMAT=markdown    # null, markdown, or html
INCLUDE_IMAGES=false             # Default false for token efficiency
EXTRACT_HEADER=false             # Extract document headers
EXTRACT_FOOTER=false             # Extract document footers

Claude Desktop Configuration

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "mistral-ocr-opt": {
      "command": "uv",
      "args": [
        "run",
        "--directory",
        "D:/dev/mcp_mistral_ocr_opt",
        "-m",
        "src.mcp_mistral_ocr_opt.main"
      ],
      "env": {
        "MISTRAL_API_KEY": "your_api_key_here",
        "OCR_DIR": "D:/dev/mcp_mistral_ocr_opt/data/ocr",
        "BATCH_MODE": "auto"
      }
    }
  }
}

🛠️ Available Tools

1. `process_local_file` - Process a single file

Process a single local file from OCR_DIR.

{
  "name": "process_local_file",
  "arguments": {
    "filename": "document.pdf",
    "table_format": "markdown",
    "extract_header": false,
    "extract_footer": false,
    "include_images": false
  }
}

Parameters:

filename (required): Name of file relative to OCR_DIR
table_format (optional): null, markdown, or html - default: markdown
extract_header (optional): Extract document headers - default: false
extract_footer (optional): Extract document footers - default: false
include_images (optional): Include base64 images - default: false (token efficient)

Supported local file types:

PDFs: .pdf
Images: .jpg, .jpeg, .png, .gif, .webp, .bmp, .avif
Other formats (docx/xlsx/pptx) are not supported

2. `process_batch_local_files` - Process multiple files concurrently

Process multiple files with concurrent or batch processing (auto-selected).

{
  "name": "process_batch_local_files",
  "arguments": {
    "patterns": ["*.pdf", "scanned_*.jpg"],
    "max_files": 100,
    "table_format": "markdown",
    "include_images": false
  }
}

Parameters:

patterns (required): Array of glob patterns (e.g., ["*.pdf", "*.jpg"])
max_files (optional): Maximum files to process
Other parameters same as process_local_file

Auto-selection Logic:

< 5 files: Concurrent processing
5-9 files: Inline batch (if BATCH_MODE=auto)
10+ files: File batch (saves up to 50% cost)

3. `process_url_file` - Process file from URL

Process a file from a public URL.

{
  "name": "process_url_file",
  "arguments": {
    "url": "https://example.com/document.pdf",
    "file_type": "pdf",
    "table_format": "html"
  }
}

4. `create_batch_job` - Create explicit batch job

Create a batch processing job (for large file sets, cost savings up to 50%).

{
  "name": "create_batch_job",
  "arguments": {
    "patterns": ["documents/*.pdf"],
    "use_inline": false,
    "table_format": "markdown"
  }
}

Returns:

{
  "batch_type": "file",
  "job_id": "job_abc123",
  "batch_file_id": "file_xyz789",
  "files_queued": 50,
  "message": "Batch job created with 50 files. Use check_batch_status to monitor progress."
}

5. `check_batch_status` - Monitor batch job

{
  "name": "check_batch_status",
  "arguments": {
    "job_id": "job_abc123"
  }
}

Returns:

{
  "id": "job_abc123",
  "status": "SUCCESS",
  "created_at": "2026-01-22T12:00:00",
  "completed_at": "2026-01-22T12:05:00"
}

6. `download_batch_results` - Download completed results

{
  "name": "download_batch_results",
  "arguments": {
    "job_id": "job_abc123"
  }
}

7. `cancel_batch_job` - Cancel running job

{
  "name": "cancel_batch_job",
  "arguments": {
    "job_id": "job_abc123"
  }
}

8. `list_batch_jobs` - List all batch jobs

{
  "name": "list_batch_jobs",
  "arguments": {
    "status": "RUNNING"
  }
}

📊 Output

OCR results are saved in JSON format in OCR_DIR/output/:

Single files: {filename}_{timestamp}.json
Batch results: batch_results_{job_id}_{timestamp}.jsonl

Result structure:

{
  "pages": [
    {
      "index": 0,
      "markdown": "Extracted text content...",
      "images": [],
      "tables": [],
      "hyperlinks": [],
      "dimensions": {"width": 0, "height": 0}
    }
  ],
  "model": "mistral-ocr-latest",
  "usage_info": {...},
  "_metadata": {
    "source_file": "/path/to/document.pdf",
    "output_file": "/path/to/output.json",
    "file_type": "pdf",
    "processed_at": "2026-01-22T12:00:00",
    "table_format": "markdown",
    "include_images": false
  }
}

🎯 Usage Examples

Example 1: Process a single PDF with tables

{
  "name": "process_local_file",
  "arguments": {
    "filename": "invoice.pdf",
    "table_format": "html",
    "include_images": false
  }
}

Example 2: Process all PDFs in directory with batch

{
  "name": "process_batch_local_files",
  "arguments": {
    "patterns": ["*.pdf"],
    "table_format": "markdown"
  }
}

Example 3: Create explicit batch job for 100+ documents

{
  "name": "create_batch_job",
  "arguments": {
    "patterns": ["documents/**/*.pdf"],
    "use_inline": false,
    "table_format": "html",
    "extract_header": true,
    "extract_footer": true
  }
}

Then monitor:

{
  "name": "check_batch_status",
  "arguments": {
    "job_id": "job_abc123"
  }
}

And download when complete:

{
  "name": "download_batch_results",
  "arguments": {
    "job_id": "job_abc123"
  }
}

🔧 Performance Tips

Token Optimization

Set include_images=false (default) - saves 30-40% tokens
Use table_format="markdown" (default) - more efficient than HTML
Skip extract_header/extract_footer unless needed

Cost Optimization

Use batch processing for 10+ files (up to 50% cost savings)
Set BATCH_MODE=always for large recurring batches
Use max_files to limit processing if needed

Speed Optimization

Increase MAX_CONCURRENT_REQUESTS (default: 5, max: 10)
Use inline batch for 5-9 files (faster startup)
Enable BATCH_MODE=auto (default) for auto-selection

📈 Performance Benchmarks

Scenario	Old Version	Optimized	Improvement
10 files concurrent	45s	12s	4x faster
100 files batch	$5.00	$2.50	50% cheaper
With images (tokens)	100%	60%	40% fewer tokens
PDF processing (API calls)	300	100	3x fewer calls

▶️ Run via UV

uv run pytest
uv run pytest --cov=src --cov-report=term-missing
uv run python -m src.mcp_mistral_ocr_opt.main

🐳 Docker Support

Build Image

docker build -t mcp-mistral-ocr-opt .

Run Container

docker run -e MISTRAL_API_KEY=your_key \
           -e OCR_DIR=/data/ocr \
           -v $(pwd)/data/ocr:/data/ocr \
           mcp-mistral-ocr-opt:latest

Docker Compose

version: '3.8'
services:
  mistral-ocr:
    image: mcp-mistral-ocr-opt:latest
    environment:
      MISTRAL_API_KEY: ${MISTRAL_API_KEY}
      OCR_DIR: /data/ocr
      BATCH_MODE: auto
      MAX_CONCURRENT_REQUESTS: 5
    volumes:
      - ./data/ocr:/data/ocr
    restart: unless-stopped

🤝 Migration from Original

If migrating from the original mcp-mistral-ocr:

API Key: Same key works
Tools: All original tools still work
New Tools: Batch tools added (optional to use)
Defaults: More token-efficient by default

No code changes required for basic usage!

📝 Troubleshooting

Issue: "Configuration error: MISTRAL_API_KEY is required"

Solution: Add MISTRAL_API_KEY=your_key to .env file

Issue: "File not found"

Solution: Check OCR_DIR path in .env and ensure files are in that directory

Issue: "Batch job stuck in QUEUED"

Solution: Check Mistral dashboard or try cancel_batch_job and retry

Issue: Connection errors

Solution: Verify internet connection and API key is valid

📄 License

Based on the original mcp-mistral-ocr project.

🔗 Links

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

MCP Mistral OCR Optimized

README

MCP Mistral OCR Optimized

🚀 Key Optimizations

📦 Installation

Using UV (Recommended)

Using Docker

⚙️ Configuration

Environment Variables

Claude Desktop Configuration

🛠️ Available Tools

1. process_local_file - Process a single file

2. process_batch_local_files - Process multiple files concurrently

3. process_url_file - Process file from URL

4. create_batch_job - Create explicit batch job

5. check_batch_status - Monitor batch job

6. download_batch_results - Download completed results

7. cancel_batch_job - Cancel running job

8. list_batch_jobs - List all batch jobs

📊 Output

🎯 Usage Examples

Example 1: Process a single PDF with tables

Example 2: Process all PDFs in directory with batch

Example 3: Create explicit batch job for 100+ documents

🔧 Performance Tips

Token Optimization

Cost Optimization

Speed Optimization

📈 Performance Benchmarks

▶️ Run via UV

🐳 Docker Support

Build Image

Run Container

Docker Compose

🤝 Migration from Original

📝 Troubleshooting

Issue: "Configuration error: MISTRAL_API_KEY is required"

Issue: "File not found"

Issue: "Batch job stuck in QUEUED"

Issue: Connection errors

📄 License

🔗 Links

Recommended Servers

1. `process_local_file` - Process a single file

2. `process_batch_local_files` - Process multiple files concurrently

3. `process_url_file` - Process file from URL

4. `create_batch_job` - Create explicit batch job

5. `check_batch_status` - Monitor batch job

6. `download_batch_results` - Download completed results

7. `cancel_batch_job` - Cancel running job

8. `list_batch_jobs` - List all batch jobs