PDF Redaction MCP Server

PDF Redaction MCP Server

Enables loading, reviewing, and redacting sensitive content in PDF documents through text-based or area-based redaction methods. Supports customizable redaction appearance and saves redacted PDFs with comprehensive error handling.

Category
Visit Server

README

PDF Redaction MCP Server

A Model Context Protocol (MCP) server for PDF redaction using PyMuPDF (fitz). This server provides tools for loading PDFs, identifying and redacting sensitive text, and saving redacted documents.

Features

  • 📄 Load and read PDF files - Extract text content from PDFs for review
  • 🔍 Batch text redaction - Search and redact multiple text strings at once for maximum efficiency
  • 📋 Redaction tracking - Keep track of what's been redacted to prevent duplicate work
  • 🔎 List applied redactions - Audit trail showing which texts have been marked for redaction
  • 📐 Area-based redaction - Redact specific rectangular regions by coordinates
  • 💾 Save redacted PDFs - Apply redactions and save with automatic naming
  • 🎨 Customizable redaction appearance - Choose redaction fill colors
  • 🔒 Error handling - Comprehensive error messages via MCP protocol

Installation

This project uses uv for package management. To install:

# Clone the repository
git clone <your-repo-url>
cd redact_mcp

# Install with uv
uv pip install -e .

Usage

Running the Server

You can run the server using either the Python script directly or the FastMCP CLI:

Option 1: Direct Python execution (stdio transport)

python -m redact_mcp.server

Option 2: Using FastMCP CLI

# Stdio transport (default)
fastmcp run redact_mcp.server:mcp

# HTTP transport for remote access
fastmcp run redact_mcp.server:mcp --transport http --port 8000

Installing in MCP Clients

Claude Desktop

Add to your Claude Desktop configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "pdf-redaction": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/redact_mcp",
        "run",
        "fastmcp",
        "run",
        "redact_mcp.server:mcp"
      ]
    }
  }
}

Other MCP Clients

Use the FastMCP CLI to generate configuration for other clients:

# For Cursor
fastmcp install cursor redact_mcp.server:mcp

# For Gemini CLI
fastmcp install gemini-cli redact_mcp.server:mcp

# Generate generic MCP JSON configuration
fastmcp install mcp-json redact_mcp.server:mcp

Available Tools

1. load_pdf

Load a PDF file and extract its text content.

Parameters:

  • pdf_path (string): Path to the PDF file to load

Returns: The full text content of the PDF, organized by pages

Example:

Load the PDF at /path/to/document.pdf

2. redact_text

Redact all instances of specific texts in a loaded PDF. This tool now accepts multiple texts at once for efficient batch redaction. It automatically tracks which texts have already been redacted to prevent duplicate work.

Parameters:

  • pdf_path (string): Path to the loaded PDF file
  • texts_to_redact (list of strings): List of text strings to search for and redact
  • fill_color (tuple, optional): RGB color (0-1 range) for redaction box. Default: (0, 0, 0) - black

Returns: Summary of redaction operations, including which texts were newly redacted and which were skipped (already redacted)

Examples:

# Single text
Redact ["confidential"] in /path/to/document.pdf

# Multiple texts at once (recommended for efficiency)
Redact ["John Doe", "123-45-6789", "john.doe@email.com"] in /path/to/document.pdf

Note: The tool tracks which texts have been redacted and will skip any texts that were already processed, preventing duplicate redactions.

3. redact_area

Redact a specific rectangular area on a PDF page.

Parameters:

  • pdf_path (string): Path to the loaded PDF file
  • page_number (int): Page number (1-indexed)
  • x0 (float): Left x coordinate
  • y0 (float): Top y coordinate
  • x1 (float): Right x coordinate
  • y1 (float): Bottom y coordinate
  • fill_color (tuple, optional): RGB color (0-1 range) for redaction box. Default: (0, 0, 0) - black

Returns: Confirmation message

Example:

Redact the area from (100, 100) to (300, 150) on page 1 of /path/to/document.pdf

4. save_redacted_pdf

Apply all pending redactions and save the PDF.

Parameters:

  • pdf_path (string): Path to the loaded PDF file
  • output_path (string, optional): Custom output path. If not provided, appends "_redacted" to original filename

Returns: Path to the saved redacted PDF

Example:

Save the redacted version of /path/to/document.pdf

5. list_loaded_pdfs

List all currently loaded PDF files.

Parameters: None

Returns: List of loaded PDF paths with page counts

6. list_applied_redactions

List all redactions that have been applied to loaded PDF(s). New tool for tracking redaction progress and avoiding duplicate work.

Parameters:

  • pdf_path (string, optional): Path to a specific PDF. If not provided, lists redactions for all loaded PDFs

Returns: List of texts that have been marked for redaction in each PDF

Examples:

# List redactions for a specific PDF
List applied redactions for /path/to/document.pdf

# List redactions for all loaded PDFs
List all applied redactions

Use Cases:

  • Check what has already been redacted before adding more redactions
  • Verify redaction progress during a multi-step process
  • Avoid duplicate redaction attempts
  • Generate a report of what was redacted

7. close_pdf

Close a loaded PDF and free its resources. This also clears the redaction tracking for that PDF.

Parameters:

  • pdf_path (string): Path to the PDF file to close

Returns: Confirmation message

Workflow Example

Here's a typical workflow using this MCP server:

  1. Load a PDF

    Load the PDF at /Users/me/documents/sensitive.pdf
    
  2. Review the content The tool will return the full text content, which you can review to identify sensitive information.

  3. Redact sensitive text (batch mode - recommended)

    Redact ["Social Security Number", "123-45-6789", "John Doe", "jane.smith@email.com"] in /Users/me/documents/sensitive.pdf
    

    Pro tip: Redacting multiple texts at once is much faster than calling the tool multiple times.

  4. Check what has been redacted (optional)

    List applied redactions for /Users/me/documents/sensitive.pdf
    

    This shows you which texts have already been marked for redaction.

  5. Add more redactions if needed

    Redact ["Additional Text", "Another Secret"] in /Users/me/documents/sensitive.pdf
    

    The tool will skip any texts that were already redacted in step 3.

  6. Redact specific areas (optional)

    Redact the area from (50, 100) to (200, 120) on page 2 of /Users/me/documents/sensitive.pdf
    
  7. Save the redacted PDF

    Save the redacted version of /Users/me/documents/sensitive.pdf
    

    This will create /Users/me/documents/sensitive_redacted.pdf

  8. Close the PDF (optional)

    Close /Users/me/documents/sensitive.pdf
    

Technical Details

Performance Tips

Batch Redaction is Faster:

# ❌ Slower: Multiple individual calls
Redact ["John Doe"] in document.pdf
Redact ["123-45-6789"] in document.pdf  
Redact ["jane@email.com"] in document.pdf

# ✅ Faster: Single batch call
Redact ["John Doe", "123-45-6789", "jane@email.com"] in document.pdf

Why batch redaction is better:

  • Reduces tool invocation overhead
  • Scans the PDF only once
  • Applies all redactions in a single pass
  • Automatically prevents duplicate redactions
  • Provides a single summary of all operations

Best Practice: Collect all texts to redact first, then make one batch call.

Dependencies

  • FastMCP (>=2.12.0): Python framework for building MCP servers
  • PyMuPDF (>=1.24.0): PDF manipulation library (imported as fitz)

Architecture

  • In-memory storage: Loaded PDFs are kept in memory for fast access during redaction operations
  • Redaction tracking: The server tracks which texts have been redacted to prevent duplicate work
  • Batch processing: Multiple texts can be redacted in a single tool call for improved performance
  • Lazy application: Redaction annotations are added but not applied until save_redacted_pdf is called
  • Error handling: Uses FastMCP's ToolError for proper error propagation to MCP clients
  • Context logging: All operations log to the MCP context for transparency

Limitations (Current Version)

  • Text-only redaction: This version focuses on text redaction. Image redaction is not yet implemented.
  • Memory usage: PDFs are kept in memory while loaded. Very large PDFs may consume significant memory.
  • Single session: The in-memory store is not persistent across server restarts.

Development

Running Tests

# Install development dependencies
uv pip install -e ".[dev]"

# Run tests (when implemented)
pytest

Code Structure

redact_mcp/
├── src/
│   └── redact_mcp/
│       ├── __init__.py      # Package initialization
│       └── server.py         # Main MCP server implementation
├── pyproject.toml           # Package configuration
└── README.md               # This file

License

Apache-2.0

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Acknowledgments

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured