Tika MCP Server
Extracts text and metadata from various file formats (PDF, DOCX, images with OCR) using Apache Tika, enabling AI assistants to understand file contents.
README
Tika MCP Server
This project provides a Model Context Protocol (MCP) server for extracting content and metadata from files using Apache Tika.
Overview
The Tika MCP server allows AI assistants to extract text and metadata from various file formats (PDF, DOCX, images with OCR, etc.) using Apache Tika. This enables AI assistants to understand and work with the content of files that users upload.
Features
- Extract text content from various file formats
- Extract metadata (author, creation date, etc.) from files
- Support for PDF, DOCX, images, and many other formats
- Simple JSON-RPC API following the Model Context Protocol
Requirements
- Python 3.6+
- Apache Tika server running (default: http://localhost:9998)
- MCP-compatible client
Installation
- Clone this repository
- Install dependencies:
pip install -r requirements.txt - Register the MCP server:
python -m app.register_mcp_server
Usage
The Tika MCP server provides a single tool:
extract_file
Extracts content and metadata from a file using Apache Tika.
Parameters:
file_path: Path to the file to extract content fromtika_url: URL of the running Tika server (default: http://localhost:9998)
Returns:
metadata: Dictionary of metadata extracted from the filecontent: Array of content blocks extracted from the file
Testing
Several test scripts are provided to verify the functionality:
app/test_tika_simple.py: Tests the Tika client directlyapp/test_simple_mcp.py: Tests the MCP server using the JSON-RPC protocol
Project Structure
app/: Main application codesimple_mcp_server.py: MCP server implementationtika_client.py: Client for Apache Tikamodel.py: Data models and business logicregister_mcp_server.py: Script to register the MCP server
examples/: Example files for testingrequirements.txt: Python dependencies
Setup
Get a venv using either:
uv venv
or
python3 -m venv .venv
Activate the virtual environment and install dependencies:
source .venv/bin/activate
pip install -r requirements.txt
Running the MCP Server
Start the Apache Tika server (if not already running):
docker run -d -p 9998:9998 apache/tika
Register and run the MCP server:
python -m app.register_mcp_server
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.