Data X-Ray MCP Server
Enables Claude to search, retrieve, and analyze indexed files with sensitivity classifications and automatic redaction of sensitive information.
README
Data X-Ray MCP Server
A Model Context Protocol (MCP) server that provides Claude with access to the Data X-Ray API. This enables Claude to search, retrieve, and analyze indexed files with sensitivity classifications and redaction capabilities.
Overview
Data X-Ray is an enterprise data discovery and classification platform that indexes documents and identifies sensitive information like PII, PHI, financial data, and more. This MCP server exposes Data X-Ray's capabilities to Claude, enabling:
- Intelligent file search with KQL (Kibana Query Language) queries
- Sensitivity-aware document retrieval with classification metadata
- Automatic redaction of sensitive information
- Classification catalog access to understand what types of sensitive data DXR can detect
- Context-aware responses based on document metadata and classifications
Why This Matters
Enterprise organizations have vast amounts of unstructured data scattered across multiple systems. Data X-Ray indexes this data and identifies sensitive content, but making this metadata actionable requires integration with AI systems. This MCP server bridges that gap, allowing Claude to:
- Find relevant documents based on content, classifications, or metadata
- Understand data sensitivity before processing or sharing information
- Safely access sensitive documents through automatic redaction
- Provide context-aware answers using document metadata and classifications
Installation
Prerequisites
- Access to a Data X-Ray instance
- Data X-Ray API credentials (Bearer token)
Claude Desktop (Recommended)
Install the pre-built extension bundle (.mcpb file):
- Download or build the
.mcpbfile (see Building the Extension) - Double-click the
.mcpbfile to open the Claude Desktop install dialog - Enter your Data X-Ray configuration when prompted:
- Data X-Ray URL - Base URL of your DXR instance (e.g.,
https://dxr.yourcompany.com) - API Token - Bearer token for API authentication (stored securely in your OS keychain)
- Skip SSL Verification - Enable only for development environments with self-signed certificates
- Data X-Ray URL - Base URL of your DXR instance (e.g.,
The extension will appear in your Claude Desktop connectors panel with the Data X-Ray icon.
Claude Code CLI
Add a .mcp.json file to your project directory:
{
"mcpServers": {
"dxr": {
"command": "node",
"args": ["/path/to/dxr-mcp-server/dist/index.js"],
"env": {
"DXR_API_URL": "https://dxr.yourcompany.com",
"DXR_API_TOKEN": "your-api-token-here"
}
}
}
}
Note:
.mcp.jsonis gitignored by default since it contains credentials.
Available Tools
The MCP server provides eight tools that Claude can use:
1. list_file_metadata
Search and list file metadata from Data X-Ray with KQL filtering. Returns lightweight summaries with aggregate statistics to minimize context usage.
Parameters:
q(optional): KQL query string to filter resultslimit(optional): Number of files to return (default 50, max 500)offset(optional): Number of files to skip for pagination
Example queries:
fileName:"*.pdf" AND size > 1000000- PDFs larger than 1MBannotators.name:"Credit card"- Files with credit card data detecteddatasource.name:"Finance*" AND lastModifiedAt > now-30d- Recent finance filesentitlements.whoCanAccess: { accountType:"GROUP" AND name:"Everyone" }- Publicly accessible files
Returns: Aggregate statistics (count, size, file types, sensitive data counts) plus lightweight file summaries with pagination info.
2. get_file_metadata_details
Get complete metadata for a specific file by ID.
Parameters:
id(required): File identifier from list_file_metadata
Returns: Full metadata including datasource info, entitlements, labels, DLP labels, annotators with matched phrases, owner/creator/modifier accounts, extracted metadata, and GPS coordinates if available.
3. get_file_content
Get the original content of a file in its native format.
Parameters:
id(required): File identifier from list_file_metadata
Returns: Text files are decoded and returned as readable text. Images are returned as viewable images. PDFs have text extracted automatically. Other binary files are saved to /tmp/dxr-files/ for further processing.
Use case: Default tool for viewing file contents. Prefer this over get_file_text when you need the original file, images, or higher-fidelity parsing.
4. get_file_text
Get the plain text extracted by Data X-Ray from a file. Simpler and faster than get_file_content when you only need text.
Parameters:
id(required): File identifier from list_file_metadata
Returns: Plain text content as extracted by Data X-Ray.
Note: This is a beta endpoint. Returns empty text if DXR has not extracted text for the file (e.g. discovery-only scan, unsupported format, scanned image PDF).
5. get_file_redacted_text
Get plain text content of a file with sensitive information replaced by [REDACTED].
Parameters:
id(required): File identifierredactor_id(required): Redactor ID from get_redactors
Returns: Plain text with [REDACTED] placeholders replacing sensitive information.
Use case: Use only when explicitly requested or when there is a specific privacy requirement. For normal file viewing, use get_file_content.
6. get_classifications
Get the catalog of all available classifications in Data X-Ray.
Returns: Array of annotators, labels, and extractors with IDs, names, types, subtypes, descriptions, and links to the DXR UI.
Use case: Call this first before searching for files with sensitive data — you need the exact annotator names to use in KQL queries.
7. get_redactors
Get the catalog of all available redactors.
Returns: Array of redactor objects with IDs, names, and timestamps.
Use case: Get a redactor_id to pass to get_file_redacted_text.
8. render_pdf_pages
Render specific pages of a PDF as high-resolution images for visual analysis.
Parameters:
id(required): File identifier (must be a PDF)pages(required): Array of 1-indexed page numbers to renderscale(optional): Render scale factor (default 2)
Returns: One image content block per rendered page.
Use case: When text extraction misses structure — tables, charts, scanned pages, forms, handwritten content.
Note: Requires @napi-rs/canvas. Works in Claude Code; may not be available in Claude Desktop MCPB sandbox.
Usage Examples
Here are some example conversations you can have with Claude once the MCP server is configured:
Example 1: Finding Sensitive Documents
User: "Show me all documents that contain credit card information"
Claude uses:
1. get_classifications() to get the catalog and find the exact annotator name
2. list_file_metadata(q: 'annotators.name:"Credit card"') to find matching files
3. Returns a summary of documents with credit card data
Example 2: Safely Viewing a Sensitive Document
User: "I need to review document ID abc123, but make sure any PII is redacted"
Claude uses:
1. get_redactors() to find available redactors
2. get_file_redacted_text(id: "abc123", redactor_id: 1) to get redacted content
3. Displays the document with [REDACTED] placeholders
Example 3: Data Discovery
User: "What types of sensitive information does our DXR instance detect?"
Claude uses:
1. get_classifications() to retrieve the full catalog
2. Summarizes classification types: annotators, labels, extractors
3. Explains what each classification detects
Example 4: Finding Large PDFs
User: "Find all PDF files larger than 10MB"
Claude uses:
1. list_file_metadata(q: 'mimeType:"application/pdf" AND size > 10485760')
2. Returns a list of large PDF files with metadata
Example 5: Reading a Document
User: "What's in the file with ID xyz789?"
Claude uses:
1. get_file_text(id: "xyz789") for a quick text read (beta, simpler)
OR get_file_content(id: "xyz789") for the original file (handles images, PDFs)
2. Returns the document content
API Endpoint Mapping
The MCP server wraps these Data X-Ray API v1 endpoints:
| MCP Tool | DXR API Endpoint | Method |
|---|---|---|
| list_file_metadata | /api/v1/files |
GET (JSONL stream) |
| get_file_metadata_details | /api/v1/files?q=fileId:"..." |
GET (JSONL stream) |
| get_file_content | /api/v1/files/{id}/content |
GET (binary) |
| get_file_text | /api/v1/files/{id}/text |
GET (JSON, beta) |
| get_file_redacted_text | /api/v1/files/{id}/redacted-text |
GET |
| get_classifications | /api/v1/classifications |
GET |
| get_redactors | /api/v1/redactors |
GET |
All API calls use Bearer token authentication. The /api/v1/files endpoint returns newline-delimited JSON (JSONL); all other endpoints return JSON or binary.
Development
Prerequisites
- Node.js 18.0.0 or higher
- npm
Setup
npm install
npm run build
Build and Watch
npm run watch # Rebuild on file changes
Testing Locally
You can test the server using the MCP Inspector:
export DXR_API_URL="https://dxr.yourcompany.com"
export DXR_API_TOKEN="your-api-token"
npx @modelcontextprotocol/inspector node dist/index.js
Building the Extension
To build the .mcpb extension bundle for Claude Desktop:
-
Build the project and prune dev dependencies:
npm run build npm prune --production -
Pack the extension:
npx @anthropic-ai/mcpb pack . -
Restore dev dependencies:
npm install
This produces a dxr-mcp-server-<version>.mcpb file that can be distributed and installed by double-clicking.
The .mcpbignore file controls which files are excluded from the bundle (similar to .gitignore).
Troubleshooting
Server Not Starting
- Verify
DXR_API_URLandDXR_API_TOKENare set correctly - Check that the Data X-Ray instance is accessible
- Ensure Node.js version is 18.0.0 or higher
Authentication Errors
- Verify your API token is valid and not expired
- Ensure the token has appropriate permissions for the API endpoints
- Check that the Bearer token format is correct
Connection Issues
- Verify the Data X-Ray instance URL is correct
- Check network connectivity and firewall rules
- Ensure the Data X-Ray API is accessible from your machine
Contributing
When making changes:
- Update TypeScript types in src/types.ts if API changes
- Add new tools following the existing pattern in src/index.ts
- Update this README and
manifest.jsontools array with new tools - Run
npm run buildto ensure TypeScript compiles - Test with MCP Inspector before deploying
- Rebuild the
.mcpbextension (see Building the Extension)
License
Proprietary - Ohalo
Support
For issues or questions:
- Data X-Ray API: support@ohalo.co
- MCP Server: File an issue in the repository
Related Resources
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.