MCP Image Recognition Server
Provides image recognition capabilities using Anthropic Claude Vision and OpenAI GPT-4 Vision APIs, supporting multiple image formats and offering optional text extraction via Tesseract OCR.
mario-andreschak
README
MCP Image Recognition Server
An MCP server that provides image recognition capabilities using Anthropic and OpenAI vision APIs. Version 0.1.2.
Features
- Image description using Anthropic Claude Vision or OpenAI GPT-4 Vision
- Support for multiple image formats (JPEG, PNG, GIF, WebP)
- Configurable primary and fallback providers
- Base64 and file-based image input support
- Optional text extraction using Tesseract OCR
Requirements
- Python 3.8 or higher
- Tesseract OCR (optional) - Required for text extraction feature
- Windows: Download and install from UB-Mannheim/tesseract
- Linux:
sudo apt-get install tesseract-ocr
- macOS:
brew install tesseract
Installation
- Clone the repository:
git clone https://github.com/mario-andreschak/mcp-image-recognition.git
cd mcp-image-recognition
- Create and configure your environment file:
cp .env.example .env
# Edit .env with your API keys and preferences
- Build the project:
build.bat
Usage
Running the Server
Spawn the server using python:
python -m image_recognition_server.server
Start the server using batch instead:
run.bat server
Start the server in development mode with the MCP Inspector:
run.bat debug
Available Tools
-
describe_image
- Input: Base64-encoded image data and MIME type
- Output: Detailed description of the image
-
describe_image_from_file
- Input: Path to an image file
- Output: Detailed description of the image
Environment Configuration
ANTHROPIC_API_KEY
: Your Anthropic API key.OPENAI_API_KEY
: Your OpenAI API key.VISION_PROVIDER
: Primary vision provider (anthropic
oropenai
).FALLBACK_PROVIDER
: Optional fallback provider.LOG_LEVEL
: Logging level (DEBUG, INFO, WARNING, ERROR).ENABLE_OCR
: Enable Tesseract OCR text extraction (true
orfalse
).TESSERACT_CMD
: Optional custom path to Tesseract executable.OPENAI_MODEL
: OpenAI Model (default:gpt-4o-mini
). Can use OpenRouter format for other models (e.g.,anthropic/claude-3.5-sonnet:beta
).OPENAI_BASE_URL
: Optional custom base URL for the OpenAI API. Set tohttps://openrouter.ai/api/v1
for OpenRouter.OPENAI_TIMEOUT
: Optional custom timeout (in seconds) for the OpenAI API.
Using OpenRouter
OpenRouter allows you to access various models using the OpenAI API format. To use OpenRouter, follow these steps:
- Obtain an OpenAI API key from OpenRouter.
- Set
OPENAI_API_KEY
in your.env
file to your OpenRouter API key. - Set
OPENAI_BASE_URL
tohttps://openrouter.ai/api/v1
. - Set
OPENAI_MODEL
to the desired model using the OpenRouter format (e.g.,anthropic/claude-3.5-sonnet:beta
). - Set
VISION_PROVIDER
toopenai
.
Default Models
- Anthropic:
claude-3.5-sonnet-beta
- OpenAI:
gpt-4o-mini
- OpenRouter: Use the
anthropic/claude-3.5-sonnet:beta
format inOPENAI_MODEL
.
Development
Running Tests
Run all tests:
run.bat test
Run specific test suite:
run.bat test server
run.bat test anthropic
run.bat test openai
Docker Support
Build the Docker image:
docker build -t mcp-image-recognition .
Run the container:
docker run -it --env-file .env mcp-image-recognition
License
MIT License - see LICENSE file for details.
Release History
- 0.1.2 (2025-02-20): Improved OCR error handling and added comprehensive test coverage for OCR functionality
- 0.1.1 (2025-02-19): Added Tesseract OCR support for text extraction from images (optional feature)
- 0.1.0 (2025-02-19): Initial release with Anthropic and OpenAI vision support
Recommended Servers
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
@kazuph/mcp-fetch
Model Context Protocol server for fetching web content and processing images. This allows Claude Desktop (or any MCP client) to fetch web content and handle images appropriately.
mermaid-mcp-server
A Model Context Protocol (MCP) server that converts Mermaid diagrams to PNG images.
mcp-pinterest
A Pinterest Model Context Protocol (MCP) server for image search and information retrieval
DeepSRT MCP Server
An MCP server that enables users to generate summaries of YouTube videos in multiple languages and formats through integration with DeepSRT's API.
ScreenshotOne MCP Server
An official MCP server implementation that allows AI assistants to capture website screenshots through the ScreenshotOne API, enabling visual context from web pages during conversations.
Glif
Run AI workflows hosted on Glif.app via MCP, including ComfyUI-based image generators, meme generators, selfies, chained LLM calls, and more
WebPerfect MCP Server
An intelligent MCP server with a fully automated batch pipeline for web-ready images. Features include noise reduction, auto levels/curves, JPEG artifact removal, 4K resizing, smart sharpening with shadow/highlight enhancement, and advanced WebP conversion.
Stealth Browser MCP Server
Provides stealth browser capabilities using Playwright with anti-detection techniques, allowing MCP clients to navigate websites and take screenshots while evading common bot detection systems.
MCP-LOGO-GEN
MCP Tool Server for Logo Generation. This server provides logo generation capabilities using FAL AI, with tools for image generation, background removal, and image scaling.