MCP Presidio
An MCP server that enables LLMs to detect and anonymize over 25 types of Personally Identifiable Information (PII) using Microsoft Presidio. It supports various redaction strategies and can process both plain text and structured data to help ensure data privacy.
README
β οΈ SECURITY & PRIVACY WARNING β οΈ
PLEASE READ CAREFULLY BEFORE USE
Using this MCP server to detect PII involves sending text data to the Presidio engine. While the processing happens locally within the container or python process, using this tool via an LLM Agent (like Claude, ChatGPT, etc.) implies that the text to be analyzed is being shared with that LLM.
RISKS:
- PII Leakage: If you ask an LLM to "check this text for PII" or "anonymize this", you are sending the potentially sensitive text to the LLM provider first so they can construct the tool call.
- Context Retention: The PII may be retained in the LLM's chat history, training data, or logs.
- Transmitted Context: PII will be part of the prompt context transmitted over the network.
RECOMMENDED USE:
- Local LLMs: Use with locally hosted LLMs where data does not leave your infrastructure.
- Private/Enterprise Agents: Use in approved enterprise environments with strict data privacy agreements.
- Non-LLM Integration: Use the underlying libraries directly in your code without an LLM intermediary if strict privacy is required.
ALTERNATIVE ARCHITECTURES: Consider using Presidio as a filter before the LLM. Tools like LiteLLM can integrate Presidio to sanitize input before it reaches the LLM provider, preventing PII from ever leaving your control. This MCP server is designed for agentic workflows where the LLM decides to check for PII, which inherently carries the risks mentioned above.
MCP Presidio
A Model Context Protocol (MCP) server that provides comprehensive PII (Personally Identifiable Information) detection and anonymization capabilities using Microsoft Presidio. This server enables LLMs to safely handle sensitive data by detecting and anonymizing PII in text and structured data.
Features
Core Capabilities
- PII Detection: Identify 25+ types of PII including names, emails, phone numbers, credit cards, SSNs, addresses, and more
- Text Anonymization: Multiple anonymization strategies (replace, redact, hash, mask, encrypt)
- Structured Data Support: Analyze and anonymize JSON/dictionary data recursively
- Batch Processing: Process multiple texts efficiently in batch operations
- Custom Recognizers: Add domain-specific PII patterns with regex
- Multi-language Support: Detect PII in multiple languages
- Validation Tools: Test and validate detection accuracy with metrics
Available MCP Tools
- analyze_text - Detect PII entities in text with confidence scores
- anonymize_text - Anonymize PII using various operators
- get_supported_entities - List all supported PII entity types
- add_custom_recognizer - Add custom PII detection patterns
- batch_analyze - Analyze multiple texts for PII
- batch_anonymize - Anonymize multiple texts
- get_anonymization_operators - List available anonymization methods
- analyze_structured_data - Detect PII in JSON/structured data
- anonymize_structured_data - Anonymize PII in structured data
- validate_detection - Validate detection accuracy with metrics
Installation
Choose your preferred installation method:
- π³ Docker - Self-contained, reproducible environment (recommended for production)
- π Python - Direct installation with interactive setup
- π¦ Manual - Full control over the installation process
For detailed Docker deployment instructions, see DOCKER.md.
Prerequisites
For Python Installation:
- Python 3.10 or higher
- pip or uv package manager
For Docker Installation:
- Docker 20.10 or higher
- Docker Compose (optional, for easier management)
Docker Installation (Recommended for Production)
Docker provides a self-contained, reproducible environment with all dependencies pre-installed.
Quick Start with Docker
# Clone the repository
git clone https://github.com/cmalpass/mcp-presidio.git
cd mcp-presidio
# Build the Docker image
docker build -t mcp-presidio .
# Run the container with stdio (default)
docker run -i mcp-presidio
Using Docker Compose
# Clone the repository
git clone https://github.com/cmalpass/mcp-presidio.git
cd mcp-presidio
# Build and start the container
docker-compose up -d
# View logs
docker-compose logs -f
# Stop the container
docker-compose down
Configuring Claude Desktop with Docker
To use the Docker container with Claude Desktop, update your claude_desktop_config.json:
{
"mcpServers": {
"presidio": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"mcp-presidio:latest"
],
"env": {}
}
}
}
Or if using a pre-built image from a registry:
{
"mcpServers": {
"presidio": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"ghcr.io/cmalpass/mcp-presidio:latest"
],
"env": {}
}
}
}
Docker Image Details
The Docker image includes:
- Python 3.11 slim base
- All required dependencies (mcp, presidio-analyzer, presidio-anonymizer, spacy)
- Pre-installed English language model (en_core_web_lg)
- Security-hardened with non-root user
- Multi-stage build for minimal image size (~500MB)
Advanced Docker Usage
Interactive Shell for Debugging:
docker run -it mcp-presidio bash
Custom Language Models: To include additional language models, modify the Dockerfile:
# Add after the English model installation
RUN python -m spacy download es_core_news_lg # Spanish
RUN python -m spacy download fr_core_news_lg # French
RUN python -m spacy download de_core_news_lg # German
Then rebuild the image:
docker build -t mcp-presidio:multilang .
Volume Mounting for Custom Configurations:
docker run -i -v $(pwd)/config:/app/config:ro mcp-presidio
Python Installation (Quick Install)
Use the interactive installation script that handles dependencies and language models:
Unix/Linux/macOS:
# Clone the repository
git clone https://github.com/cmalpass/mcp-presidio.git
cd mcp-presidio
# Run the installation script
./install.sh
# or
python install.py
Windows:
# Clone the repository
git clone https://github.com/cmalpass/mcp-presidio.git
cd mcp-presidio
# Run the installation script
install.bat
# or
python install.py
The script will:
- Check Python version compatibility
- Install base dependencies (mcp, presidio-analyzer, presidio-anonymizer, spacy)
- Prompt for language model installation (English, Spanish, French, German, etc.)
- Optionally install development dependencies
- Verify the installation
- Test basic functionality
Python Installation (Manual)
If you prefer manual installation:
# Clone the repository
git clone https://github.com/cmalpass/mcp-presidio.git
cd mcp-presidio
# Install the package
pip install -e .
# Download required spaCy language model (for English)
python -m spacy download en_core_web_lg
For other languages, download the appropriate spaCy model:
# Spanish
python -m spacy download es_core_news_lg
# French
python -m spacy download fr_core_news_lg
# German
python -m spacy download de_core_news_lg
Usage
Running the Server
The server runs using stdio transport, suitable for MCP clients:
mcp-presidio
Or run directly with Python:
python -m mcp_presidio.server
Configuring with Claude Desktop
Add to your Claude Desktop configuration (claude_desktop_config.json):
{
"mcpServers": {
"presidio": {
"command": "python",
"args": ["-m", "mcp_presidio.server"],
"env": {}
}
}
}
Or if installed as a script:
{
"mcpServers": {
"presidio": {
"command": "mcp-presidio",
"args": [],
"env": {}
}
}
}
Example Usage in LLM Conversations
Detecting PII:
User: Can you check this text for PII? "My name is John Smith and my email is john@example.com"
LLM: I'll analyze that text for PII using the analyze_text tool.
[Tool calls analyze_text with the text]
Result: Found 2 PII entities:
- PERSON: "John Smith" (confidence: 0.85)
- EMAIL_ADDRESS: "john@example.com" (confidence: 1.0)
Anonymizing Text:
User: Can you anonymize this customer feedback? "I'm Jane Doe, call me at 555-123-4567"
LLM: I'll anonymize the PII in that text.
[Tool calls anonymize_text]
Result: "I'm <PERSON>, call me at <PHONE_NUMBER>"
Working with Structured Data:
User: Check this JSON for PII: {"user": "bob@email.com", "phone": "555-0100"}
LLM: I'll analyze the structured data.
[Tool calls analyze_structured_data]
Result: Found PII in 2 fields:
- .user: EMAIL_ADDRESS
- .phone: PHONE_NUMBER
Supported PII Entity Types
The server supports 25+ PII entity types including:
- Personal: PERSON, DATE_TIME
- Contact: EMAIL_ADDRESS, PHONE_NUMBER, URL
- Financial: CREDIT_CARD, IBAN_CODE, US_BANK_NUMBER, CRYPTO
- Government IDs: US_SSN, US_PASSPORT, US_DRIVER_LICENSE, UK_NHS
- International IDs: SG_NRIC_FIN, IN_PAN, IN_AADHAAR, AU_ABN, AU_TFN, AU_MEDICARE
- Location: LOCATION, IP_ADDRESS
- Medical: MEDICAL_LICENSE
- Other: And many more country-specific identifiers
Use the get_supported_entities tool to see all available types for your language.
Anonymization Operators
The server supports multiple anonymization strategies:
- replace - Replace PII with placeholder text (e.g.,
<EMAIL_ADDRESS>) - redact - Remove PII entirely from text
- hash - Replace with cryptographic hash (SHA-256)
- mask - Mask characters (e.g.,
***-**-1234) - encrypt - Encrypt PII with AES encryption
- keep - Keep PII as-is (for selective anonymization)
Advanced Features
Custom Recognizers
Add domain-specific PII patterns:
# Example: Detect custom employee IDs
add_custom_recognizer(
name="employee_id_recognizer",
entity_type="EMPLOYEE_ID",
patterns=[
{"name": "emp_pattern", "regex": "EMP-\\d{6}", "score": 0.9}
],
context=["employee", "staff", "worker"]
)
Batch Processing
Process multiple documents efficiently:
# Analyze multiple texts
batch_analyze(
texts=["Text 1...", "Text 2...", "Text 3..."],
entities=["PERSON", "EMAIL_ADDRESS"],
score_threshold=0.5
)
Language Support
Specify different languages:
analyze_text(
text="Me llamo MarΓa GarcΓa",
language="es"
)
Validation and Testing
Validate detection accuracy:
validate_detection(
text="John lives at 123 Main St",
expected_entities=[
{"entity_type": "PERSON", "start": 0, "end": 4},
{"entity_type": "LOCATION", "start": 14, "end": 27}
]
)
# Returns precision, recall, and F1 score
Architecture
This MCP server integrates:
- MCP FastMCP: Provides the MCP protocol implementation
- Presidio Analyzer: Detects PII using NLP and pattern matching
- Presidio Anonymizer: Anonymizes detected PII with various operators
- spaCy: Powers the NLP engine for accurate entity recognition
Security Considerations
- All processing happens locally - no data is sent to external services
- The server uses stdio transport for secure communication with MCP clients
- Multiple anonymization strategies available for different privacy requirements
- Supports compliance requirements (GDPR, HIPAA, CCPA)
- Docker deployment provides additional isolation and security through containerization
- Container runs as non-root user for enhanced security
Development
Running Tests
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/
Project Structure
mcp-presidio/
βββ src/
β βββ mcp_presidio/
β βββ __init__.py
β βββ server.py # Main MCP server implementation
βββ tests/ # Test suite
βββ Dockerfile # Docker container definition
βββ docker-compose.yml # Docker Compose configuration
βββ docker-entrypoint.sh # Container entrypoint script
βββ .dockerignore # Docker build exclusions
βββ pyproject.toml # Project configuration
βββ README.md # This file
βββ DOCKER.md # Detailed Docker deployment guide
βββ .gitignore
License
MIT License - see LICENSE file for details
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
Acknowledgments
- Microsoft Presidio - The underlying PII detection engine
- Model Context Protocol - The protocol specification
- spaCy - NLP library for entity recognition
Support
For issues, questions, or contributions, please visit the GitHub repository.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.