PDF to Text MCP Server

PDF to Text MCP Server

Converts PDF files to text for use with MCP-compatible applications like Cursor IDE.

Category
Visit Server

README

šŸ“„ PDF to Text MCP Server

License: MIT Node.js TypeScript MCP

A Model Context Protocol (MCP) server for converting PDF files to text, designed for seamless integration with Cursor IDE and other MCP-compatible applications.

šŸš€ Quick Start

# Clone the repository
git clone https://github.com/xxx87/pdf-to-text-mcp.git
cd pdf-to-text-mcp-server

# Install dependencies
yarn install

# Build the project
yarn build

# Test the server
yarn test

✨ Features

  • šŸ“‘ Multi-file Support - Convert one or multiple PDF files simultaneously
  • šŸ” Text Extraction - Extract text while preserving document structure
  • ⚔ Fast Processing - Efficient PDF parsing with pdf-parse library
  • šŸ”§ MCP Protocol - Full Model Context Protocol compliance
  • šŸŽÆ Cursor Integration - Designed specifically for Cursor IDE
  • šŸ›”ļø TypeScript - Fully typed for better development experience
  • āœ… Testing - Comprehensive test suite included

šŸ“‹ Table of Contents

šŸ› ļø Installation

Prerequisites

  • Node.js 18+
  • Yarn package manager
  • Cursor IDE (for MCP integration)

Local Installation

  1. Clone the repository

    git clone https://github.com/xxx87/pdf-to-text-mcp.git
    cd pdf-to-text-mcp-server
    
  2. Install dependencies

    yarn install
    
  3. Build the project

    yarn build
    
  4. Verify installation

    yarn test
    

šŸŽÆ Usage

Running as Standalone Server

yarn start

Integration with Cursor IDE

  1. Add to Cursor Configuration

    Add the following to your Cursor MCP settings:

    {
      "mcpServers": {
        "pdf-to-text": {
          "command": "node",
          "args": ["/absolute/path/to/pdf-to-text-mcp-server/dist/index.js"],
          "cwd": "/absolute/path/to/pdf-to-text-mcp-server"
        }
      }
    }
    

    āš ļø Important: Replace /absolute/path/to/pdf-to-text-mcp-server with your actual project path.

  2. Using in Cursor

    • Add PDFs: Drag and drop PDF files into Cursor
    • Convert: Use the pdf_to_text tool for automatic conversion
    • Analyze: The extracted text becomes available for AI analysis

Manual MCP Usage

// Example MCP JSON-RPC request
{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "pdf_to_text",
    "arguments": {
      "file_paths": ["document1.pdf", "document2.pdf"]
    }
  }
}

āš™ļø Configuration

Environment Variables

Variable Description Default
NODE_ENV Environment mode production
LOG_LEVEL Logging level info

Custom Options

The server automatically handles PDF parsing with optimized settings. For custom configurations, modify the pdf-parse options in src/index.ts.

šŸ“š API Reference

Tools

pdf_to_text

Converts PDF files to readable text format.

Parameters:

  • file_paths (string[]): Array of PDF file paths to convert

Returns:

{
  content: [
    {
      type: "text",
      text: string // Extracted text with file separators
    }
  ];
}

Example Response:

{
  "content": [
    {
      "type": "text",
      "text": "Successfully converted 2 PDF file(s) to text:\n\n=== document1.pdf ===\nExtracted content here...\n\n=== document2.pdf ===\nMore content here..."
    }
  ]
}

šŸ—ļø Development

Project Structure

pdf-to-text-mcp-server/
ā”œā”€ā”€ src/
│   ā”œā”€ā”€ index.ts              # Main MCP server implementation
│   └── types/
│       └── pdf-parse.d.ts    # Type definitions
ā”œā”€ā”€ dist/                     # Compiled JavaScript output
ā”œā”€ā”€ test-server.js            # Test utilities
ā”œā”€ā”€ package.json              # Project configuration
ā”œā”€ā”€ tsconfig.json             # TypeScript configuration
ā”œā”€ā”€ cursor-config.json        # Example Cursor configuration
└── README.md                 # This file

Available Scripts

Script Description
yarn build Compile TypeScript to JavaScript
yarn start Run the compiled server
yarn dev Run in development mode with hot reload
yarn test Execute test suite
yarn lint Run code linting

Building from Source

# Development mode with file watching
yarn dev

# Production build
yarn build

# Run tests
yarn test

Dependencies

Package Purpose Version
@modelcontextprotocol/sdk MCP protocol implementation ^0.5.0
pdf-parse PDF text extraction ^1.1.1
zod Runtime type validation ^3.22.4
typescript TypeScript compiler ^5.0.0

šŸ› Troubleshooting

Common Issues

Issue Cause Solution
ENOENT: no such file or directory Invalid file path Verify PDF file exists and path is correct
File is not a PDF Wrong file format Ensure file has .pdf extension and is valid
Empty text output Image-based PDF This tool only extracts text-based content
Build errors Missing dependencies Run yarn install to install all dependencies

Debug Mode

Enable verbose logging:

NODE_ENV=development yarn start

Testing

Run the comprehensive test suite:

# Run all tests
yarn test

# Test with specific PDF
echo '{"jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": {"name": "pdf_to_text", "arguments": {"file_paths": ["your-file.pdf"]}}}' | node dist/index.js

šŸ¤ Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

  1. Fork the repository
  2. Clone your fork
  3. Create a feature branch: git checkout -b feature/amazing-feature
  4. Make your changes
  5. Test thoroughly: yarn test
  6. Commit changes: git commit -m 'Add amazing feature'
  7. Push to branch: git push origin feature/amazing-feature
  8. Open a Pull Request

Code Style

  • Follow existing TypeScript conventions
  • Add tests for new features
  • Update documentation as needed
  • Ensure all tests pass

šŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

šŸ™ Acknowledgments

šŸ“ž Support


<div align="center">

Made with ā¤ļø for the MCP community

⭐ Star this repo • šŸ› Report Bug • šŸ’” Request Feature

</div>

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured