Document Processing Server
Provides comprehensive document processing, including reading, converting, and manipulating various document formats with advanced text and HTML processing capabilities.
cablate
Tools
document_reader
Read content from non-image document-files at specified paths, supporting various file formats: .pdf, .docx, .txt, .html, .csv
pdf_merger
Merge multiple PDF files into one
pdf_splitter
Split a PDF file into multiple files
docx_to_pdf
Convert DOCX files to PDF format
docx_to_html
Convert DOCX to HTML while preserving formatting
html_cleaner
Clean HTML by removing unnecessary tags and attributes
html_to_text
Convert HTML to plain text while preserving structure
html_to_markdown
Convert HTML to Markdown format
html_extract_resources
Extract all resources (images, videos, links) from HTML
html_formatter
Format and beautify HTML code
text_diff
Compare two text files and show differences
text_splitter
Split text file by specified delimiter or line count
text_formatter
Format text with proper indentation and line spacing
text_encoding_converter
Convert text between different encodings
excel_read
Read Excel file and convert to JSON format while preserving structure
format_convert
Convert between different document formats (Markdown, HTML, XML, JSON)
README
Simple Document Processing MCP Server
A powerful Model Context Protocol (MCP) server providing comprehensive document processing capabilities.
<a href="https://glama.ai/mcp/servers/pb9df6lnel"><img width="380" height="200" src="https://glama.ai/mcp/servers/pb9df6lnel/badge" alt="Simple Document Processing Server MCP server" /></a>
Features
Document Reader
- Read DOCX, PDF, TXT, HTML, CSV
Document Conversion
- DOCX to HTML/PDF conversion
- HTML to TXT/Markdown conversion
- PDF manipulation (merge, split)
Text Processing
- Multi-encoding transfer support (UTF-8, Big5, GBK)
- Text formatting and cleaning
- Text comparison and diff generation
- Text splitting by lines or delimiter
HTML Processing
- HTML cleaning and formatting
- Resource extraction (images, links, videos)
- Structure-preserving conversion
Installation
Installing via Smithery
To install Document Processing Server for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @cablate/mcp-doc-forge --client claude
Manual Installation
npm install -g @cablate/mcp-doc-forge
Usage
Cli
mcp-doc-forge
With Dive Desktop
- Click "+ Add MCP Server" in Dive Desktop
- Copy and paste this configuration:
{
"mcpServers": {
"searxng": {
"command": "npx",
"args": [
"-y",
"@cablate/mcp-doc-forge"
],
"enabled": true
}
}
}
- Click "Save" to install the MCP server
License
MIT
Contributing
Welcome community participation and contributions! Here are ways to contribute:
- ⭐️ Star the project if you find it helpful
- 🐛 Submit Issues: Report problems or provide suggestions
- 🔧 Create Pull Requests: Submit code improvements
Contact
If you have any questions or suggestions, feel free to reach out:
- 📧 Email: reahtuoo310109@gmail.com
- 📧 GitHub: CabLate
- 🤝 Collaboration: Welcome to discuss project cooperation
- 📚 Technical Guidance: Sincere welcome for suggestions and guidance
Recommended Servers
Persistent Knowledge Graph
An implementation of persistent memory for Claude using a local knowledge graph, allowing the AI to remember information about users across conversations with customizable storage location.
Hyperbrowser MCP Server
Welcome to Hyperbrowser, the Internet for AI. Hyperbrowser is the next-generation platform empowering AI agents and enabling effortless, scalable browser automation. Built specifically for AI developers, it eliminates the headaches of local infrastructure and performance bottlenecks, allowing you to
Mult Fetch MCP Server
A versatile MCP-compliant web content fetching tool that supports multiple modes (browser/node), formats (HTML/JSON/Markdown/Text), and intelligent proxy detection, with bilingual interface (English/Chinese).
Fetch MCP Server
Provides functionality to fetch web content in various formats, including HTML, JSON, plain text, and Markdown.
Search1API MCP Server
A Model Context Protocol (MCP) server that provides search and crawl functionality using Search1API.
Pandoc Document Conversion
MCP server for seamless document format conversion using Pandoc, supporting Markdown, HTML, PDF, DOCX (.docx), csv and more.
Perplexity Deep Research
A server that allows AI assistants to perform web searches using Perplexity's sonar-deep-research model with citation support.
Docx Document Processing Service
A powerful Word document processing service based on FastMCP, enabling AI assistants to create, edit, and manage docx files with full formatting support. Preserves original styles when editing content.
Perplexity Chat MCP Server
MCP Server for the Perplexity API.
Web Research Server
A Model Context Protocol server that enables Claude to perform web research by integrating Google search, extracting webpage content, and capturing screenshots.