PDF to Text MCP Server
Converts PDF files to text for use with MCP-compatible applications like Cursor IDE.
README
š PDF to Text MCP Server
A Model Context Protocol (MCP) server for converting PDF files to text, designed for seamless integration with Cursor IDE and other MCP-compatible applications.
š Quick Start
# Clone the repository
git clone https://github.com/xxx87/pdf-to-text-mcp.git
cd pdf-to-text-mcp-server
# Install dependencies
yarn install
# Build the project
yarn build
# Test the server
yarn test
⨠Features
- š Multi-file Support - Convert one or multiple PDF files simultaneously
- š Text Extraction - Extract text while preserving document structure
- ā” Fast Processing - Efficient PDF parsing with
pdf-parselibrary - š§ MCP Protocol - Full Model Context Protocol compliance
- šÆ Cursor Integration - Designed specifically for Cursor IDE
- š”ļø TypeScript - Fully typed for better development experience
- ā Testing - Comprehensive test suite included
š Table of Contents
š ļø Installation
Prerequisites
- Node.js 18+
- Yarn package manager
- Cursor IDE (for MCP integration)
Local Installation
-
Clone the repository
git clone https://github.com/xxx87/pdf-to-text-mcp.git cd pdf-to-text-mcp-server -
Install dependencies
yarn install -
Build the project
yarn build -
Verify installation
yarn test
šÆ Usage
Running as Standalone Server
yarn start
Integration with Cursor IDE
-
Add to Cursor Configuration
Add the following to your Cursor MCP settings:
{ "mcpServers": { "pdf-to-text": { "command": "node", "args": ["/absolute/path/to/pdf-to-text-mcp-server/dist/index.js"], "cwd": "/absolute/path/to/pdf-to-text-mcp-server" } } }ā ļø Important: Replace
/absolute/path/to/pdf-to-text-mcp-serverwith your actual project path. -
Using in Cursor
- Add PDFs: Drag and drop PDF files into Cursor
- Convert: Use the
pdf_to_texttool for automatic conversion - Analyze: The extracted text becomes available for AI analysis
Manual MCP Usage
// Example MCP JSON-RPC request
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "pdf_to_text",
"arguments": {
"file_paths": ["document1.pdf", "document2.pdf"]
}
}
}
āļø Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
NODE_ENV |
Environment mode | production |
LOG_LEVEL |
Logging level | info |
Custom Options
The server automatically handles PDF parsing with optimized settings. For custom configurations, modify the pdf-parse options in src/index.ts.
š API Reference
Tools
pdf_to_text
Converts PDF files to readable text format.
Parameters:
file_paths(string[]): Array of PDF file paths to convert
Returns:
{
content: [
{
type: "text",
text: string // Extracted text with file separators
}
];
}
Example Response:
{
"content": [
{
"type": "text",
"text": "Successfully converted 2 PDF file(s) to text:\n\n=== document1.pdf ===\nExtracted content here...\n\n=== document2.pdf ===\nMore content here..."
}
]
}
šļø Development
Project Structure
pdf-to-text-mcp-server/
āāā src/
ā āāā index.ts # Main MCP server implementation
ā āāā types/
ā āāā pdf-parse.d.ts # Type definitions
āāā dist/ # Compiled JavaScript output
āāā test-server.js # Test utilities
āāā package.json # Project configuration
āāā tsconfig.json # TypeScript configuration
āāā cursor-config.json # Example Cursor configuration
āāā README.md # This file
Available Scripts
| Script | Description |
|---|---|
yarn build |
Compile TypeScript to JavaScript |
yarn start |
Run the compiled server |
yarn dev |
Run in development mode with hot reload |
yarn test |
Execute test suite |
yarn lint |
Run code linting |
Building from Source
# Development mode with file watching
yarn dev
# Production build
yarn build
# Run tests
yarn test
Dependencies
| Package | Purpose | Version |
|---|---|---|
@modelcontextprotocol/sdk |
MCP protocol implementation | ^0.5.0 |
pdf-parse |
PDF text extraction | ^1.1.1 |
zod |
Runtime type validation | ^3.22.4 |
typescript |
TypeScript compiler | ^5.0.0 |
š Troubleshooting
Common Issues
| Issue | Cause | Solution |
|---|---|---|
ENOENT: no such file or directory |
Invalid file path | Verify PDF file exists and path is correct |
File is not a PDF |
Wrong file format | Ensure file has .pdf extension and is valid |
| Empty text output | Image-based PDF | This tool only extracts text-based content |
| Build errors | Missing dependencies | Run yarn install to install all dependencies |
Debug Mode
Enable verbose logging:
NODE_ENV=development yarn start
Testing
Run the comprehensive test suite:
# Run all tests
yarn test
# Test with specific PDF
echo '{"jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": {"name": "pdf_to_text", "arguments": {"file_paths": ["your-file.pdf"]}}}' | node dist/index.js
š¤ Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
Development Setup
- Fork the repository
- Clone your fork
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Test thoroughly:
yarn test - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
Code Style
- Follow existing TypeScript conventions
- Add tests for new features
- Update documentation as needed
- Ensure all tests pass
š License
This project is licensed under the MIT License - see the LICENSE file for details.
š Acknowledgments
- Model Context Protocol for the excellent MCP specification
- pdf-parse for reliable PDF text extraction
- Cursor IDE for MCP integration support
š Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Wiki
<div align="center">
Made with ā¤ļø for the MCP community
ā Star this repo ⢠š Report Bug ⢠š” Request Feature
</div>
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.