mcp-local-reader

mcp-local-reader

Transform any local file into AI-optimized markdown format for seamless integration with Claude Desktop, Claude Code, and other MCP clients.

Category
Visit Server

README

MCP-LOCAL-Reader

License: MIT Python 3.11+ FastMCP

中文版 | 日本語 | Français | Deutsch

AI-Ready Document Converter - Transform any local file into AI-optimized markdown format for seamless integration with Claude Desktop, Claude Code, and other MCP clients.

Intelligent Document Processing - High-performance local file content extraction with advanced parsing for PDF, Office documents, images, and more. Automatically converts complex documents into clean, structured markdown that AI models can easily understand and process.

Features

📄 AI-Optimized File Processing

  • PDF Documents: Advanced parsing with PyMuPDF4LLM → Clean markdown output
  • Office Suite: Word, Excel, PowerPoint → Structured tables and text
  • OpenDocument: ODT, ODS, ODP → Standardized markdown format
  • Text & Data: Markdown, JSON, CSV, EPUB → Enhanced AI readability
  • Images: OCR text recognition → Searchable markdown content
  • Archives: Smart extraction → Organized document collections

🚀 Intelligent Performance

  • Smart Caching: Remembers processed files for instant re-access
  • Lazy Loading: Only loads needed components - 80% faster startup
  • Concurrent Processing: Handles multiple files simultaneously
  • Resource Optimization: Prevents system overload with smart limits

🔒 Security & Control

  • Directory Permissions: Restrict access to specific directories
  • Path Validation: Secure file access with absolute path requirements
  • File Size Limits: Prevent DoS with configurable size restrictions
  • Local-First: No data leaves your machine - complete privacy

Quick Start

Prerequisites

Installation

Option 1: One-Command Setup (Recommended)

# Clone and auto-configure
git clone https://github.com/freefish1218/mcp-local-reader.git
cd mcp-local-reader
chmod +x install.sh && ./install.sh

The installer will guide you through three installation modes:

  1. Minimal: PDF and basic text files only (smallest footprint)
  2. Standard: Office documents support, no OCR (recommended)
  3. Complete: All features including OCR and archive processing

Option 2: Manual Installation

# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# Setup project
git clone https://github.com/freefish1218/mcp-local-reader.git
cd mcp-local-reader
uv sync

# Configure environment
cp env.example .env
# Edit .env with your settings

# Start server
./start_mcp.sh

Configuration for Claude Desktop

Automatic Configuration

chmod +x configure_claude.sh && ./configure_claude.sh

Manual Configuration

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or equivalent:

{
  "mcpServers": {
    "local-reader": {
      "command": "/absolute/path/to/mcp-local-reader/start_mcp.sh",
      "args": [],
      "env": {
        "LOCAL_FILE_ALLOWED_DIRECTORIES": "/Users/username/Documents,/Users/username/Downloads"
      }
    }
  }
}

Configuration for Claude Code

Add to .claude/claude_config.json:

{
  "mcpServers": {
    "local-reader": {
      "command": "/absolute/path/to/mcp-local-reader/start_mcp.sh",
      "args": [],
      "env": {
        "LOCAL_FILE_ALLOWED_DIRECTORIES": "/Users/username/Documents,/Users/username/Downloads"
      }
    }
  }
}

Usage

After setup, use these features directly in conversations:

📄 Read & Convert to AI-Ready Markdown

Transform any file into AI-optimized markdown format:

Read the content from /Users/username/Documents/report.pdf
→ Converts to clean markdown with tables, headings, and structure

Parse /Users/username/data.xlsx and show me the data structure  
→ Extracts spreadsheet data as markdown tables

Extract text from /Users/username/presentation.pptx
→ Organizes slides into structured markdown sections

🔄 Save as Markdown Files

Convert and save documents as AI-ready markdown files:

Convert /Users/username/contract.pdf to markdown format
→ Creates contract.pdf.md with structured content

Save /Users/username/analysis.xlsx as markdown in /Users/username/output/
→ Saves formatted tables and data as markdown

Configuration

Essential Settings (.env)

# File access control (REQUIRED)
LOCAL_FILE_ALLOWED_DIRECTORIES=/Users/username/Documents,/Users/username/Downloads

# Performance optimization
TOTAL_CACHE_SIZE_MB=500          # Unified cache limit
CACHE_EXPIRE_DAYS=30             # Cache retention
FILE_READER_MAX_FILE_SIZE_MB=20  # File size limit

# Logging
LOG_LEVEL=INFO

Optional OCR Settings

For image text recognition:

# Vision model for OCR
LLM_VISION_BASE_URL=https://api.openai.com/v1
LLM_VISION_API_KEY=sk-your-api-key-here
LLM_VISION_MODEL=gpt-4o  # or qwen-vl-plus

Environment Variables

Variable Required Default Description
LOCAL_FILE_ALLOWED_DIRECTORIES current_dir Comma-separated allowed directories
TOTAL_CACHE_SIZE_MB 500 Unified cache size limit
FILE_READER_MAX_FILE_SIZE_MB 20 Maximum file size
LOG_LEVEL INFO Logging level
LLM_VISION_API_KEY - OCR vision model API key

MCP Tools

read_local_file

Extract content from local files and return as AI-optimized markdown.

Parameter Type Description
file_path string Absolute path to the file
max_size number File size limit in MB (optional)

convert_local_file

Convert files to AI-ready markdown and save to filesystem.

Parameter Type Description
file_path string Absolute path to input file
output_path string Output path (optional, defaults to input+.md)
max_size number File size limit in MB (optional)
overwrite boolean Overwrite existing files (default: false)

Supported File Types

Document Formats

  • PDF: .pdf
  • Microsoft Office: .doc, .docx, .ppt, .pptx, .xls, .xlsx
  • OpenDocument: .odt, .ods, .odp
  • Text: .txt, .md, .rtf, .csv, .json, .xml

Image Formats (with OCR)

  • Common: .png, .jpg, .jpeg, .gif, .bmp, .tiff
  • Advanced: .webp, .svg

Archive Formats

  • Compressed: .zip, .tar, .tar.gz, .7z
  • Office: .docx, .xlsx, .pptx (internally zip-based)

Special Formats

  • E-books: .epub
  • Data: .csv, .tsv, .json

Architecture

Core Components

  • FileReader (src/file_reader/core.py): Main orchestrator for file content extraction
  • MCP Server (src/mcp_server.py): FastMCP-based server providing MCP tools
  • Parser System (src/file_reader/parsers/): Specialized parsers for different file types
  • Cache Manager (src/file_reader/cache_manager.py): Unified caching system
  • Storage Layer (src/file_reader/storage/): Secure local file access

Performance Optimizations

  1. Unified Caching: Single cache instance instead of multiple (reduced from ~6GB to 500MB default)
  2. Lazy Loading: Parsers loaded on-demand, not at startup
  3. Dependency Optimization: Optional dependencies for advanced features
  4. Resource Limits: Configurable memory and file size limits

Development

Setup Development Environment

git clone https://github.com/freefish1218/mcp-local-reader.git
cd mcp-local-reader
uv sync
source .venv/bin/activate  # On Unix/macOS

Running Tests

# Run all tests
uv run python tests/run_tests.py

# Specific test categories
uv run python tests/run_tests.py --models     # Data models
uv run python tests/run_tests.py --parsers    # File parsers
uv run python tests/run_tests.py --core       # Core functionality
uv run python tests/run_tests.py --server     # MCP server

# With coverage
uv run python tests/run_tests.py -c

# Alternative pytest usage
PYTHONPATH=src uv run pytest tests/ -v

Adding New Parsers

  1. Create parser in src/file_reader/parsers/
  2. Inherit from BaseParser
  3. Register in parser_loader.py
  4. Add tests in tests/test_parsers.py

See CONTRIBUTING.md for detailed development guidelines.

Performance Characteristics

  • Smart Caching: Instantly access previously processed files without re-conversion
  • Efficient Memory Use: Optimized from 6GB+ to 500MB default cache size
  • Lightning Startup: 80% faster startup with on-demand component loading
  • Parallel Processing: Handle multiple document conversions simultaneously

System Requirements

  • Python: 3.11+
  • OS: macOS, Linux, Windows
  • Memory: 2GB+ recommended for large files
  • Optional: LibreOffice (legacy Office files), Pandoc (special conversions)

FAQ

Q: Files not reading correctly?
A: Ensure LOCAL_FILE_ALLOWED_DIRECTORIES includes your file's directory.

Q: OCR not working for images?
A: Configure LLM_VISION_API_KEY with a valid vision model API key (OpenAI GPT-4o or compatible).

Q: Want to improve processing speed?
A: The smart cache automatically remembers processed files. Clear cache directory if you want fresh processing of all files.

Q: Legacy Office files (.doc/.ppt) failing?
A: Install LibreOffice: brew install --cask libreoffice (macOS) or equivalent for your OS.

Q: What file formats are supported?
A: PDF, Word, Excel, PowerPoint, OpenDocument, images (with OCR), archives, text files, and more.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to contribute to this project.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Links

Acknowledgments

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured