pdf4vllm
Enables vision LLMs to read PDFs by automatically detecting text corruption and switching between text extraction and image rendering modes, while preserving reading order and filtering unnecessary images to prevent token overflow.
README
pdf4vllm
PDF reading MCP server optimized for vision LLMs.
<!-- mcp-name: io.github.PyJudge/pdf4vllm -->
<details> <summary><b>한국어</b></summary>
문제
| 방식 | 문제점 |
|---|---|
| 텍스트 추출 | 인코딩 깨짐 → 쓰레기 출력, 이미지-텍스트 순서 뒤섞임 |
| 이미지 변환 | 토큰 폭발 (특히 페이지 많을 때) |
해결
pdf4vllm은 PDF가 지저분하다고 가정합니다.
- 텍스트 손상 자동 감지 → 이미지로 자동 전환
- 읽기 순서 보존 (텍스트 → 표 → 이미지 블록 순서대로)
- 페이지 제한으로 컨텍스트 오버플로우 방지
- 불필요한 이미지 자동 필터링 (로고, 선, 헤더/푸터)
설치
pip install pdf4vllm-mcp
# 또는
uvx pdf4vllm-mcp
Claude Desktop 설정
git clone https://github.com/PyJudge/pdf4vllm-mcp.git
cd pdf4vllm-mcp
python scripts/install_mcp.py
또는 직접 설정 (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"pdf4vllm": {
"command": "/python/경로",
"args": ["/pdf4vllm-mcp/경로/src/server.py"]
}
}
}
추출 모드
| 모드 | 설명 |
|---|---|
auto (기본) |
텍스트 추출 시도 → 손상 감지 시 이미지로 전환 |
text_only |
텍스트/표만 추출, 이미지 없음 |
image_only |
페이지를 이미지로만 렌더링 |
</details>
Problem
| Approach | Issue |
|---|---|
| Text extraction | Encoding corruption → garbage output, mixed text-image ordering |
| Image conversion | Token explosion (especially with many pages) |
Solution
pdf4vllm assumes PDFs are messy.
- Auto-detects text corruption → switches to image automatically
- Preserves reading order (text → table → image blocks in sequence)
- Page limits prevent context overflow
- Filters unnecessary images (logos, lines, headers/footers)
PDF Input
↓
Corruption Detection (pdfminer.six + pattern analysis)
↓
┌─────────────┬─────────────┐
│ Corrupted │ Clean │
│ → Image │ → Text + │
│ only │ Tables + │
│ │ Images │
└─────────────┴─────────────┘
↓
Ordered Blocks (JSON)
Install
pip install pdf4vllm-mcp
# or run without installing
uvx pdf4vllm-mcp
Claude Desktop Setup
git clone https://github.com/PyJudge/pdf4vllm-mcp.git
cd pdf4vllm-mcp
python scripts/install_mcp.py
Or manually edit ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"pdf4vllm": {
"command": "/path/to/python",
"args": ["/path/to/pdf4vllm-mcp/src/server.py"]
}
}
}
Claude Code Setup
Create .mcp.json in your project:
{
"mcpServers": {
"pdf4vllm": {
"command": "uvx",
"args": ["pdf4vllm-mcp"]
}
}
}
Extraction Modes
| Mode | Description |
|---|---|
auto (default) |
Try text extraction → switch to image if corrupted |
text_only |
Text/tables only, no images |
image_only |
Render pages as images only |
Output Format
{
"pages": [
{
"page_number": 1,
"content_blocks": [
{"type": "text", "content": "..."},
{"type": "table", "content": "| A | B |"},
{"type": "image", "content": "[IMAGE_0]"}
]
}
]
}
When text is corrupted:
{
"page_number": 2,
"content_blocks": [],
"text_corrupted": true,
"page_image": "[IMAGE_1]"
}
Configuration
config.json or environment variables:
{
"max_pages_per_request": 10,
"max_image_dimension": 842,
"page_image_dpi": 100
}
export PDF_MAX_PAGES=20
export PDF_PAGE_IMAGE_DPI=150
Test Server
pip install pdf4vllm-mcp[test]
python test_server.py
# → http://localhost:8000
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
E2B
Using MCP to run code via e2b.