MCP Servers

eFax to JSON MCP Server

Converts eFax documents (PDF, TIFF, CCD XML) from OpenText Fax Server Software into structured JSON format with OCR support, metadata extraction, and batch processing capabilities.

README

eFax to JSON MCP Server

A Model Context Protocol (MCP) server that converts eFax documents from OpenText Fax Server Software into structured JSON format. Supports PDF, TIFF, and CCD XML document formats with advanced OCR and metadata extraction capabilities.

Features

Supported Formats

PDF Documents - Text extraction and OCR for scanned PDFs
TIFF Images - Multi-page TIFF support with OCR processing
CCD XML - Clinical Document Architecture parsing

Processing Capabilities

Intelligent OCR - Tesseract-based text recognition with confidence scoring
Metadata Extraction - Preserve document properties and fax information
Batch Processing - Convert multiple documents simultaneously
Format Validation - Comprehensive document structure validation
Error Recovery - Robust error handling with detailed reporting

Installation

Prerequisites

Node.js 18+
System-level Tesseract OCR installation:
- Ubuntu/Debian: sudo apt-get install tesseract-ocr
- macOS: brew install tesseract
- Windows: Download from UB Mannheim releases

Setup Steps

Create project directory

mkdir efax-mcp-server
cd efax-mcp-server

Initialize and install dependencies

npm init -y
npm install @modelcontextprotocol/sdk pdf-parse sharp tesseract.js xml2js
npm install -D @types/node @types/pdf-parse @types/xml2js typescript ts-node

Create directory structure

mkdir -p src/{types,processors,utils}
mkdir -p tests/test-files
mkdir -p docs

Add source files (paste the provided code into respective files)
Build the project
```
npm run build
```

Usage

MCP Client Configuration

Add to your MCP client configuration (e.g., Claude Desktop):

{
  "mcpServers": {
    "efax-converter": {
      "command": "node",
      "args": ["/path/to/efax-mcp-server/dist/server.js"]
    }
  }
}

Available Tools

1. Convert Single Document

convert_efax_document --filePath "/path/to/document.pdf" --performOCR true

Parameters:

filePath (required) - Path to eFax document
outputPath (optional) - Custom output JSON path
extractMetadata (default: true) - Extract document metadata
performOCR (default: true) - Enable OCR processing
ocrLanguage (default: "eng") - OCR language code
includeRawData (default: false) - Include raw document data

2. Batch Convert Documents

batch_convert_efax --inputDirectory "/path/to/docs" --outputDirectory "/path/to/json"

Parameters:

inputDirectory (required) - Source document directory
outputDirectory (required) - JSON output directory
filePattern (default: "*") - File matching pattern
continueOnError (default: true) - Continue on individual failures

3. Validate JSON Output

validate_efax_json --jsonPath "/path/to/output.json"

4. Get File Information

get_file_info --filePath "/path/to/document.pdf"

5. List Supported Formats

list_supported_formats

JSON Output Structure

{
  "id": "efax_document_1234567890_abc123",
  "source": "efax",
  "format": "pdf|tiff|ccd_xml",
  "timestamp": "2025-08-04T12:00:00.000Z",
  "metadata": {
    "originalFileName": "fax_document.pdf",
    "fileSize": 2048576,
    "pages": 3,
    "sender": "John Doe",
    "recipient": "Jane Smith",
    "faxNumber": "+1-555-123-4567",
    "resolution": "1200x1800",
    "ocrConfidence": 95.5,
    "processingTime": 3500
  },
  "content": {
    "text": "Full extracted text content...",
    "pages": [
      {
        "pageNumber": 1,
        "text": "Page 1 text content...",
        "confidence": 96.2,
        "metadata": {
          "width": 1200,
          "height": 1800,
          "resolution": "1200x1800"
        }
      }
    ],
    "sections": [
      {
        "title": "Patient Information",
        "content": "Patient details...",
        "type": "patient",
        "pageNumbers": [1]
      }
    ]
  },
  "rawData": {
    "pdfInfo": {},
    "imageMetadata": {}
  }
}

Architecture

Modular Design

Processors: Format-specific conversion logic
Utilities: Shared validation and file handling
Types: Comprehensive TypeScript definitions

Processing Pipeline

File Validation - Format and size checks
Format Detection - Automatic type identification
Content Extraction - Text and metadata processing
OCR Processing - Image-to-text conversion when needed
Structure Validation - Output quality assurance
JSON Serialization - Standardized output format

Development

Build Commands

npm run build     # Compile TypeScript
npm run dev       # Development mode with hot reload
npm run test      # Run test suite
npm run clean     # Clean build directory

Testing

Place sample documents in tests/test-files/ and run:

npm test

Adding New Formats

Create processor in src/processors/
Add type definitions in src/types/
Register in main server
Update documentation

Performance Considerations

OCR Processing: CPU-intensive, consider batch size limits
Memory Usage: Large TIFF files may require significant RAM
Processing Time: Varies by document complexity and OCR requirements
Concurrent Processing: Single-threaded OCR worker per instance

Error Handling

The server provides comprehensive error handling:

File Validation Errors - Invalid paths, unsupported formats
Processing Errors - OCR failures, corrupted documents
System Errors - Memory issues, disk space problems
Validation Errors - Output structure problems

Troubleshooting

Common Issues

OCR Not Working

Verify Tesseract installation: tesseract --version
Check language pack availability
Ensure sufficient system memory

Large File Processing

Monitor memory usage during conversion
Consider breaking large batches into smaller chunks
Verify available disk space for output

Permission Errors

Check read permissions on input files
Verify write permissions on output directory
Ensure MCP server has appropriate file system access

License

MIT License - see LICENSE file for details.

Support

For issues and feature requests, please use the project's issue tracker.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured