MarkItDown MCP Server

MarkItDown MCP Server

A Model Context Protocol server that converts over 29 file formats, including PDFs, Office documents, and audio, into structured Markdown using Microsoft's MarkItDown library. It enables AI assistants to process diverse document types through single file conversion and batch directory processing.

Category
Visit Server

README

๐Ÿ“„ MarkItDown MCP Server

MCP Python License CI Contributions Welcome

A powerful Model Context Protocol (MCP) server that converts 29+ file formats to clean, structured Markdown using Microsoft's MarkItDown library.

๐Ÿ”ฅ Perfect for Claude Desktop, MCP clients, and AI workflows!

โœจ Features

  • ๐Ÿ”Œ MCP Protocol: Seamless integration with Claude Desktop and MCP clients
  • ๐Ÿ“ 29+ File Formats: PDFs, Office docs, images, audio, archives, and more
  • ๐Ÿ” Image Metadata: Extract EXIF metadata from images (JPG, PNG, GIF, etc.)
  • ๐ŸŽต Speech Recognition: Convert audio to text with speech transcription (MP3, WAV)*

*Requires markitdown[all] installation for full functionality

๐Ÿ“ฆ Dependency Requirements by File Type

File Type Required Dependencies Install Command
PDF pypdf, pymupdf, pdfplumber pipx inject markitdown-mcp 'markitdown[all]'
Excel (.xlsx, .xls) openpyxl, xlrd, pandas pipx inject markitdown-mcp openpyxl xlrd pandas
PowerPoint (.pptx) python-pptx Included in base install
Images PIL, exiftool (optional) Included in base install
Audio pydub, speech_recognition pipx inject markitdown-mcp 'markitdown[all]'
Basic formats None Base install only

Note: For the best experience, we recommend installing all dependencies using the Complete Install method below.

  • ๐Ÿ“Š Office Documents: Word, PowerPoint, Excel files
  • ๐ŸŒ Web Content: HTML, XML, JSON, CSV
  • ๐Ÿ“š E-books & Archives: EPUB, ZIP files
  • โšก Fast & Reliable: Built on Microsoft's MarkItDown library

๐Ÿš€ Quick Start for Claude Desktop

  1. Install the server with ALL features:

    # One command to install everything
    pipx install git+https://github.com/trsdn/markitdown-mcp.git && \
    pipx inject markitdown-mcp 'markitdown[all]' openpyxl xlrd pandas pymupdf pdfplumber
    
  2. Add to your Claude Desktop config:

    {
      "mcpServers": {
        "markitdown": {
          "command": "markitdown-mcp",
          "args": []
        }
      }
    }
    
  3. Restart Claude Desktop and start converting files!

Features

  • Convert multiple file formats to Markdown
  • Batch processing of entire directories
  • Preserves directory structure in output
  • Environment variable support via .env file

๐Ÿ“‹ Available MCP Tools

๐Ÿ”ง convert_file

Convert a single file to Markdown.

{
  "name": "convert_file",
  "arguments": {
    "file_path": "/path/to/document.pdf"
  }
}

๐Ÿ“‹ list_supported_formats

Get a complete list of supported file formats.

{
  "name": "list_supported_formats",
  "arguments": {}
}

๐Ÿ“ convert_directory

Convert all supported files in a directory.

{
  "name": "convert_directory", 
  "arguments": {
    "input_directory": "/path/to/files",
    "output_directory": "/path/to/markdown" 
  }
}

๐Ÿ“„ Supported File Formats (29+)

Category Extensions Features
๐Ÿ“Š Office .pdf, .docx, .pptx, .xlsx, .xls Full document structure
๐Ÿ–ผ๏ธ Images .jpg, .png, .gif, .bmp, .tiff, .webp EXIF metadata extraction
๐ŸŽต Audio .mp3, .wav Speech-to-text transcription
๐ŸŒ Web .html, .htm, .xml, .json, .csv Clean formatting
๐Ÿ“š Books .epub Chapter extraction
๐Ÿ“ฆ Archives .zip Auto-extract and process
๐Ÿ“ Text .txt, .md, .rst Direct conversion

Installation

Option 1: Pip Install (Recommended)

# Install from local directory
pip install -e /Users/torstenmahr/GitHub/markitdown-mcp

# Or navigate to the directory first
cd /Users/torstenmahr/GitHub/markitdown-mcp
pip install -e .

Option 2: Direct Usage

cd /Users/torstenmahr/GitHub/markitdown-mcp
source venv/bin/activate
pip install -r requirements.txt

Quick Start

MCP Server Mode (Recommended)

After pip installation:

# Start the MCP server (for use with MCP clients)
markitdown-mcp

Or using the development script:

python run_server.py

๐Ÿ› ๏ธ Installation Options

๐Ÿš€ One-Command Install (Recommended)

Install with ALL dependencies in one command:

# Using pipx (recommended)
pipx install git+https://github.com/trsdn/markitdown-mcp.git && \
pipx inject markitdown-mcp 'markitdown[all]' openpyxl xlrd pandas pymupdf pdfplumber pytesseract pydub speechrecognition

# Or download and run the install script
curl -sSL https://raw.githubusercontent.com/trsdn/markitdown-mcp/main/scripts/install-all-deps.sh | bash

Quick Install (Basic Features Only)

pip install -e git+https://github.com/trsdn/markitdown-mcp.git

Complete Install with All Dependencies (Step by Step)

To ensure all file formats are supported, use one of these methods:

Method 1: Using pipx (Recommended)

# Install the MCP server
pipx install git+https://github.com/trsdn/markitdown-mcp.git

# Install all required dependencies for full functionality
pipx inject markitdown-mcp 'markitdown[all]'         # PDF, OCR, Speech
pipx inject markitdown-mcp openpyxl xlrd pandas      # Excel support
pipx inject markitdown-mcp pymupdf pdfplumber        # Advanced PDF

Method 2: Using pip with virtual environment

# Create and activate virtual environment
python -m venv markitdown-env
source markitdown-env/bin/activate  # On Windows: markitdown-env\Scripts\activate

# Install with all dependencies in one command
git clone https://github.com/trsdn/markitdown-mcp.git
cd markitdown-mcp
pip install -e ".[all]"  # This installs everything!

Method 3: For Claude Desktop with existing installation

If you already have the MCP server installed but some formats aren't working:

# Find your installation
which markitdown-mcp  # Shows path like /Users/you/.local/bin/markitdown-mcp

# Inject missing dependencies
pipx inject markitdown-mcp 'markitdown[all]' openpyxl xlrd pandas pymupdf pdfplumber

Verify Installation

After installation, verify all dependencies are properly installed:

# Test the MCP server
markitdown-mcp --help

# For pipx installations, check injected packages
pipx list --include-injected

๐Ÿ”ง Claude Desktop Configuration

Add this to your Claude Desktop claude_desktop_config.json:

{
  "mcpServers": {
    "markitdown": {
      "command": "markitdown-mcp",
      "args": []
    }
  }
}

Config file locations:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

๐Ÿ’ก Usage Examples

Convert a PDF

Convert the file ~/Documents/report.pdf to markdown

Batch Process Directory

Convert all files in ~/Downloads/documents/ to markdown

Check Supported Formats

What file formats can you convert to markdown?

๐Ÿ” Troubleshooting

Missing Dependencies Errors

If you see errors like:

  • PdfConverter threw MissingDependencyException
  • XlsxConverter threw MissingDependencyException
  • PptxConverter threw BadZipFile

This means some optional dependencies are missing. Follow the Complete Install instructions above.

Unicode Errors with .md Files

Some Markdown files with special characters may fail with UnicodeDecodeError. This is a known limitation in the MarkItDown library.

Installation Issues

  • "externally-managed-environment" error: Use pipx instead of pip
  • Permission denied: Never use sudo with pip; use pipx or virtual environments
  • Command not found: Make sure ~/.local/bin is in your PATH

See KNOWN_ISSUES.md for more details.

Configuration

No special configuration required. The tool uses the MarkItDown library for document conversion.

Usage

Basic Usage

# Convert all supported files from input/ to output/
python mdconvert.py

Custom Directories

Specify custom input and output directories:

python mdconvert.py --input /path/to/docs --output /path/to/markdown

Single File Conversion

Convert a single file:

python mdconvert.py --file document.pdf

Command Line Options

  • --input, -i: Input directory (default: input)
  • --output, -o: Output directory (default: output)
  • --file, -f: Convert a single file instead of a directory

MCP Server Features

The MCP server provides three tools:

1. convert_file

Convert a single file to Markdown.

  • Input: File path or base64 encoded content with filename
  • Output: Converted Markdown content

2. list_supported_formats

List all supported file formats.

  • Output: Categorized list of supported file extensions

3. convert_directory

Convert all supported files in a directory.

  • Input: Input directory path, optional output directory
  • Output: Summary of conversion results

Directory Structure

markitdown-mcp/
โ”œโ”€โ”€ mcp_server.py        # MCP protocol server
โ”œโ”€โ”€ mdconvert.py         # CLI script
โ”œโ”€โ”€ run_server.py        # Server runner script
โ”œโ”€โ”€ mcp_config.json      # MCP configuration
โ”œโ”€โ”€ requirements.txt     # Python dependencies
โ”œโ”€โ”€ README.md           # This file
โ”œโ”€โ”€ input/              # Default input directory
โ”œโ”€โ”€ output/             # Default output directory
โ””โ”€โ”€ venv/               # Virtual environment

๐Ÿ” How It Works

This MCP server leverages Microsoft's MarkItDown library to provide intelligent document conversion:

  • ๐Ÿ“„ PDFs: Extracts text, tables, and structure
  • ๐Ÿ–ผ๏ธ Images: Uses OCR to extract text content + EXIF metadata
  • ๐ŸŽต Audio: Converts speech to text transcription (MP3, WAV)
  • ๐Ÿ“Š Office: Preserves formatting from Word, Excel, PowerPoint
  • ๐ŸŒ HTML: Converts to clean, readable Markdown
  • ๐Ÿ“ฆ Archives: Automatically extracts and processes contents

๐Ÿท๏ธ Tags

mcp model-context-protocol claude-desktop markdown document-conversion pdf ocr speech-to-text markitdown ai-tools

๐Ÿ“‹ Requirements

  • Python: 3.10+
  • MCP Client: Claude Desktop or compatible MCP client
  • Dependencies: Automatically installed via pip

๐Ÿค Contributing

We welcome contributions! Here's how you can help:

๐Ÿš€ Quick Start for Contributors

# Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/markitdown-mcp.git
cd markitdown-mcp

# Set up development environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e ".[dev]"

# Test your changes
markitdown-mcp  # Test the server works

๐Ÿ“ Ways to Contribute

  • ๐Ÿ› Bug Reports: Found an issue? Report it
  • ๐Ÿ’ก Feature Requests: Have an idea? Suggest it
  • ๐Ÿ“„ New File Formats: Add support for more file types
  • ๐Ÿ“š Documentation: Improve guides and examples
  • ๐Ÿงช Testing: Add tests and improve reliability
  • ๐ŸŽจ Code Quality: Refactor and optimize

๐Ÿ“‹ Contribution Process

  1. Read our Contributing Guide
  2. Check existing issues
  3. Fork the repository
  4. Create a feature branch (feat/amazing-feature)
  5. Make your changes with tests
  6. Submit a pull request

Please read docs/development/CONTRIBUTING.md for detailed guidelines.

๐Ÿ“š Documentation

For Users

For AI Agents

For Developers

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ”— Related

Test fix verification

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
E2B

E2B

Using MCP to run code via e2b.

Official
Featured