mcp-documents-reader

mcp-documents-reader

Enables AI agents to read documents in Excel, DOCX, PDF, and TXT formats via MCP protocol.

Category
Visit Server

README

<h1 align="center">MCP Document Reader</h1>

<!-- mcp-name: io.github.xt765/mcp_documents_reader -->

<p align="center"><strong>MCP (Model Context Protocol) Document Reader - A powerful MCP tool for reading documents in multiple formats, enabling AI agents to truly "read" your documents.</strong></p>

<p align="center">🌐 <strong>Language</strong>: <a href="README.md">English</a> | <a href="README.zh-CN.md">中文</a></p>

<p align="center"> <a href="https://blog.csdn.net/Yunyi_Chi"><img src="https://img.shields.io/badge/CSDN-玄同765-orange.svg?style=flat&logo=csdn" alt="CSDN"></a> <a href="https://github.com/xt765/mcp_documents_reader"><img src="https://img.shields.io/badge/GitHub-mcp_documents_reader-black.svg?style=flat&logo=github" alt="GitHub"></a> <a href="https://gitee.com/xt765/mcp_documents_reader"><img src="https://img.shields.io/badge/Gitee-mcp_documents_reader-red.svg?style=flat&logo=gitee" alt="Gitee"></a> </p> <p align="center"> <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat&logo=opensourceinitiative" alt="License"></a> <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10+-blue.svg?style=flat&logo=python" alt="Python"></a> <a href="https://pypi.org/project/mcp-documents-reader/"><img src="https://img.shields.io/pypi/v/mcp-documents-reader.svg?logo=pypi" alt="PyPI Version"></a> <a href="https://pepy.tech/project/mcp-documents-reader"><img src="https://img.shields.io/pepy/dt/mcp-documents-reader.svg?logo=pypi&label=PyPI%20Downloads" alt="PyPI Downloads"></a> <a href="https://registry.modelcontextprotocol.io/v0.1/servers?search=io.github.xt765/mcp_documents_reader"><img src="https://img.shields.io/badge/MCP-Registry-blue?logo=modelcontextprotocol" alt="MCP Registry"></a> <a href="https://mcp-marketplace.io/server/io-github-xt765-mcp-documents-reader"><img src="https://img.shields.io/badge/MCP-Marketplace-22c55e.svg?style=flat&logo=shopify&logoColor=white" alt="MCP Marketplace"></a> </p>

Features

  • Multi-format Support: Supports 4 mainstream document formats: Excel (XLSX/XLS), DOCX, PDF, and TXT
  • MCP Protocol: Compliant with MCP standards, can be used as a tool for AI assistants like Trae IDE
  • Easy Integration: Simple configuration for immediate use
  • Reliable Performance: Successfully tested and running in Trae IDE
  • File System Support: Reads documents directly from the file system

📚 Documentation

User Guide · API Reference · Contributing · Changelog · License


Architecture

graph TB
    A[AI Assistant / User] -->|Call read_document| B[MCP Document Reader]
    B -->|Detect file type| C{File Type?}
    C -->|.docx| D[DOCX Reader]
    C -->|.pdf| E[PDF Reader]
    C -->|.xlsx/.xls| F[Excel Reader]
    C -->|.txt| G[Text Reader]
    D -->|Extract text| H[Return Content]
    E -->|Extract text| H
    F -->|Extract text| H
    G -->|Extract text| H
    H -->|Text content| A
    
    style A fill:#e1f5ff
    style B fill:#fff4e1
    style C fill:#f0f0f0
    style D fill:#e8f5e9
    style E fill:#e8f5e9
    style F fill:#e8f5e9
    style G fill:#e8f5e9
    style H fill:#fff9c4

Supported Formats

Format Extensions MIME Type Features
Excel .xlsx, .xls application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Sheet and cell data extraction
DOCX .docx application/vnd.openxmlformats-officedocument.wordprocessingml.document Text and structure extraction
PDF .pdf application/pdf Text extraction
Text .txt text/plain Plain text reading

Installation

Using pip (Recommended)

pip install mcp-documents-reader

From Source

git clone https://github.com/xt765/mcp_documents_reader.git
cd mcp_documents_reader
pip install -e .

MCP Tools

This server provides the following tool:

read_document

Read any supported document type with a unified interface.

Arguments:

  • filename (string, required): Document file path, supports absolute or relative paths.

Configuration

Using in Trae IDE / Claude Desktop

Add the following to your MCP configuration file:

Option 1: Using PyPI (Recommended)

{
  "mcpServers": {
    "mcp-document-reader": {
      "command": "uvx",
      "args": [
        "mcp-documents-reader"
      ]
    }
  }
}

Option 2: Using GitHub repository

{
  "mcpServers": {
    "mcp-document-reader": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/xt765/mcp_documents_reader",
        "mcp_documents_reader"
      ]
    }
  }
}

Option 3: Using Gitee repository (Faster access in China)

{
  "mcpServers": {
    "mcp-document-reader": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://gitee.com/xt765/mcp_documents_reader",
        "mcp_documents_reader"
      ]
    }
  }
}

Usage

As an MCP Tool

After configuration, AI assistants can directly call the following tool:

# Read a DOCX file
read_document(filename="example.docx")

# Read a PDF file
read_document(filename="example.pdf")

# Read an Excel file
read_document(filename="example.xlsx")

# Read a text file
read_document(filename="example.txt")

As a Python Library

from mcp_documents_reader import DocumentReaderFactory

# Using factory (recommended)
reader = DocumentReaderFactory.get_reader("document.pdf")
content = reader.read("/path/to/document.pdf")

# Check if format is supported
if DocumentReaderFactory.is_supported("file.xlsx"):
    reader = DocumentReaderFactory.get_reader("file.xlsx")
    content = reader.read("/path/to/file.xlsx")

Tool Interface Details

read_document

Read any supported document type.

Parameters:

Parameter Type Required Description
filename string Document file path, supports absolute or relative paths

Dependencies

Core Dependencies

  • mcp >= 1.26.0 - MCP protocol implementation
  • python-docx >= 1.2.0 - DOCX file reading
  • pypdf >= 6.8.0 - PDF file reading (replaces PyPDF2)
  • openpyxl >= 3.1.5 - Excel file reading

Development Dependencies

  • pytest >= 8.0.0 - Testing framework
  • pytest-asyncio >= 0.24.0 - Async testing support
  • pytest-cov >= 6.0.0 - Coverage reporting
  • basedpyright >= 0.28.0 - Type checking
  • ruff >= 0.8.0 - Linting and formatting

License

MIT License

Contributing

Issues and Pull Requests are welcome!

Related Projects

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured