distill-mcp

distill-mcp

Connects MCP-compatible clients to Distill for converting local documents (PDF, DOCX, etc.) into clean, token-efficient Markdown, achieving 40-80% token reduction for LLMs.

Category
Visit Server

README

distill-mcp

npm version license MCP Registry

An MCP server that connects any MCP-compatible client to Distill, converting local documents to clean, token-efficient Markdown before the LLM reads them. Typical token reduction is 40–80% compared to raw document text, letting the model fit more content into its context window and reason over it faster.

Works with Claude Desktop, Claude Code, Cursor, Windsurf, and any other MCP client that supports tool calling.

Two Modes

Lightweight Full (Docker)
Requires Docker No Yes
DOCX, XLSX, PPTX, native PDF, HTML Yes Yes
Scanned PDF (OCR) No Yes
Audio (MP3, WAV, etc.) No Yes
Quality score No Yes
Install time ~2 min ~10 min

Lightweight needs only Python and pip. Full needs Docker with the Distill service running. See the setup guides for details.

Quick install

No global install required:

npx -y distill-mcp-server

Then configure Claude Desktop — see Lightweight setup or Full setup for step-by-step instructions.

Configuration

Add the server to your claude_desktop_config.json.

Lightweight mode — macOS / Linux:

{
  "mcpServers": {
    "distill-mcp": {
      "command": "npx",
      "args": ["-y", "distill-mcp-server"],
      "env": {
        "DISTILL_MCP_CONFIG": "{\"mode\":\"lightweight\",\"python_path\":\"python3\"}"
      }
    }
  }
}

Lightweight mode — Windows:

{
  "mcpServers": {
    "distill-mcp": {
      "command": "npx",
      "args": ["-y", "distill-mcp-server"],
      "env": {
        "DISTILL_MCP_CONFIG": "{\"mode\":\"lightweight\",\"python_path\":\"py\"}"
      }
    }
  }
}

Full mode — all platforms:

{
  "mcpServers": {
    "distill-mcp": {
      "command": "npx",
      "args": ["-y", "distill-mcp-server"],
      "env": {
        "DISTILL_MCP_CONFIG": "{\"mode\":\"full\",\"distill_url\":\"http://localhost:7860\"}"
      }
    }
  }
}

Full config key reference is in the setup guides: Lightweight | Full

CLAUDE.md snippet

Paste this into your project's CLAUDE.md so Claude knows how to use the tool automatically. Copy the block below as-is:

## Document Conversion — distill-mcp

When the user references a local file path (e.g. a PDF, DOCX, PPTX, XLSX,
HTML, or audio file), ALWAYS call the `convert_and_save` tool before reading
or reasoning about the document. Do not read the original file directly via
filesystem tools — use only the Markdown returned by `convert_and_save` as
the document content.

### Rules

1. Call `convert_and_save` with the absolute file path before doing anything
   else with the document.
2. Use ONLY the Markdown output from `convert_and_save` as the document
   content. Never read the original file with filesystem tools.
3. If the response includes `"overwritten": true`, tell the user that a
   previous cached version was replaced before proceeding.
4. If the response includes any `warnings`, surface them to the user before
   proceeding with the document content.
5. If `convert_and_save` returns an unsupported format error, tell the user
   which formats are supported and suggest switching modes if applicable.

### Supported formats

| Category | Lightweight | Full (Docker) |
|---|---|---|
| Word | .docx, .doc, .odt | .docx, .doc, .odt |
| Excel | .xlsx, .xlsm, .csv | .xlsx, .xlsm, .csv |
| PowerPoint | .pptx, .ppt | .pptx, .ppt |
| PDF | .pdf (native text) | .pdf (native + scanned OCR) |
| HTML | .html, .htm | .html, .htm |
| Audio | — | .mp3, .wav, .m4a, .flac, .ogg |
| Other | — | .epub, .json, .sql, .wsdl, .wsd |

### Usage

Say "convert using distill" followed by the file path:
> Convert using distill C:\Users\me\Documents\report.pdf to markdown

The snippet is also available in docs/CLAUDE.md-snippet.md.

Supported formats

Category Extensions Lightweight Full
Microsoft Word .docx, .doc, .odt Yes Yes
Microsoft Excel .xlsx, .xlsm, .csv Yes Yes
Microsoft PowerPoint .pptx, .ppt Yes Yes
PDF (native text) .pdf Yes Yes
PDF (scanned/OCR) .pdf No Yes
HTML .html, .htm Yes Yes
Audio .mp3, .wav, .m4a, .flac, .ogg No Yes
EPUB .epub No Yes
JSON .json No Yes
SQL .sql No Yes
WSDL .wsdl, .wsd No Yes

Privacy

All processing happens locally. In lightweight mode, documents are converted by the distill-core Python library on your machine — no data leaves your computer. In full mode, documents are sent to the Distill Docker service running locally on your machine — no data is sent to external services.

Built on

Distill — document-to-Markdown conversion engine.

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured