pdf2zh-next-mcp
MCP server for translating PDFs using pdf2zh-next, preserving document context by extracting all text segments for LLM translation at once.
README
pdf2zh-next-mcp
MCP server for PDF translation using pdf2zh-next as the PDF processing backend. Designed for Claude Desktop.
Instead of translating each segment independently (which loses context), this server extracts all segments at once and lets the LLM translate them together — preserving terminology consistency and context across the entire document.
Using Claude Code? Check out pdf2zh-next-skill — a lightweight skill-based approach without MCP overhead. It handles large PDFs better by leveraging Claude Code's direct file I/O and auto-continuation.
How it works
┌─────────────────────────────────────────────────┐
│ Claude Desktop │
│ │
│ 1. extract_segments ──→ segments + formulas │
│ 2. LLM translates all segments at once │
│ 3. assemble_translated ──→ final PDF │
└─────────────────────────────────────────────────┘
The LLM sees every segment before translating — so terminology stays consistent, cross-page sentences flow naturally, and formula placeholders are preserved correctly.
Prerequisites
pdf2zh-next must be installed separately:
uv tool install pdf2zh-next
Verify installation:
pdf2zh_next --version
You need uv to install both pdf2zh-next and this server.
Installation
From PyPI (recommended)
uv tool install pdf2zh-next-mcp
From GitHub
uv tool install git+https://github.com/JaeHyeon-KAIST/pdf2zh-next-mcp
From source
git clone https://github.com/JaeHyeon-KAIST/pdf2zh-next-mcp
cd pdf2zh-next-mcp
uv sync
Setup
Add to your Claude Desktop MCP config:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
If installed from PyPI or GitHub:
{
"mcpServers": {
"pdf-translate": {
"command": "uvx",
"args": ["pdf2zh-next-mcp"]
}
}
}
If running from source:
{
"mcpServers": {
"pdf-translate": {
"command": "uv",
"args": [
"run",
"--directory", "/path/to/pdf2zh-next-mcp",
"python", "-m", "pdf2zh_next_mcp.main"
]
}
}
}
Tip: If Claude Desktop can't find
uvx, use the absolute path (e.g.,/opt/homebrew/bin/uvxon macOS,C:\Users\you\.local\bin\uvx.exeon Windows).
Usage
Just ask:
"Translate this PDF to Korean: /path/to/paper.pdf"
Behind the scenes:
extract_segmentsanalyzes the PDF layout and returns all text segments- The LLM translates everything at once (with full context)
assemble_translatedinjects translations and generates the final PDF
Output files:
*-mono.pdf— translated PDF*-dual.pdf— bilingual side-by-side*-glossary.json— terminology glossary
Limitations
- Large PDFs (~30+ pages): Claude Desktop has a per-turn output token limit. For documents with many segments, the translation may fail mid-process with "response could not be fully generated". For large PDFs, use pdf2zh-next-skill with Claude Code instead.
- MCP tool result size: Segments are paginated to stay within Claude Desktop's 25K token limit per tool response. This is handled automatically.
Troubleshooting
BabeldocError: cannot unpack non-iterable NoneType object
BabelDOC needs CMap files for font character mapping. If its automatic download times out, install them manually:
cd ~/Downloads
curl -L https://github.com/funstory-ai/BabelDOC-Assets/archive/refs/heads/main.zip -o BabelDOC-Assets.zip
unzip BabelDOC-Assets.zip
mkdir -p ~/.cache/babeldoc/cmap
cp BabelDOC-Assets-main/cmap/*.json ~/.cache/babeldoc/cmap/
This is a one-time setup. The cache path is the same on all platforms.
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.