MCP Servers

MCP Image Recognition Server

Provides image recognition capabilities using Anthropic Claude Vision and OpenAI GPT-4 Vision APIs, supporting multiple image formats and offering optional text extraction via Tesseract OCR.

mario-andreschak

Image & Video Processing

Visit Server

README

MCP Image Recognition Server

An MCP server that provides image recognition capabilities using Anthropic and OpenAI vision APIs. Version 0.1.2.

Features

Image description using Anthropic Claude Vision or OpenAI GPT-4 Vision
Support for multiple image formats (JPEG, PNG, GIF, WebP)
Configurable primary and fallback providers
Base64 and file-based image input support
Optional text extraction using Tesseract OCR

Requirements

Python 3.8 or higher
Tesseract OCR (optional) - Required for text extraction feature
- Windows: Download and install from UB-Mannheim/tesseract
- Linux: sudo apt-get install tesseract-ocr
- macOS: brew install tesseract

Installation

Clone the repository:

git clone https://github.com/mario-andreschak/mcp-image-recognition.git
cd mcp-image-recognition

Create and configure your environment file:

cp .env.example .env
# Edit .env with your API keys and preferences

Build the project:

build.bat

Usage

Running the Server

Spawn the server using python:

python -m image_recognition_server.server

Start the server using batch instead:

run.bat server

Start the server in development mode with the MCP Inspector:

run.bat debug

Available Tools

describe_image
- Input: Base64-encoded image data and MIME type
- Output: Detailed description of the image
describe_image_from_file
- Input: Path to an image file
- Output: Detailed description of the image

Environment Configuration

ANTHROPIC_API_KEY: Your Anthropic API key.
OPENAI_API_KEY: Your OpenAI API key.
VISION_PROVIDER: Primary vision provider (anthropic or openai).
FALLBACK_PROVIDER: Optional fallback provider.
LOG_LEVEL: Logging level (DEBUG, INFO, WARNING, ERROR).
ENABLE_OCR: Enable Tesseract OCR text extraction (true or false).
TESSERACT_CMD: Optional custom path to Tesseract executable.
OPENAI_MODEL: OpenAI Model (default: gpt-4o-mini). Can use OpenRouter format for other models (e.g., anthropic/claude-3.5-sonnet:beta).
OPENAI_BASE_URL: Optional custom base URL for the OpenAI API. Set to https://openrouter.ai/api/v1 for OpenRouter.
OPENAI_TIMEOUT: Optional custom timeout (in seconds) for the OpenAI API.

Using OpenRouter

OpenRouter allows you to access various models using the OpenAI API format. To use OpenRouter, follow these steps:

Obtain an OpenAI API key from OpenRouter.
Set OPENAI_API_KEY in your .env file to your OpenRouter API key.
Set OPENAI_BASE_URL to https://openrouter.ai/api/v1.
Set OPENAI_MODEL to the desired model using the OpenRouter format (e.g., anthropic/claude-3.5-sonnet:beta).
Set VISION_PROVIDER to openai.

Default Models

Anthropic: claude-3.5-sonnet-beta
OpenAI: gpt-4o-mini
OpenRouter: Use the anthropic/claude-3.5-sonnet:beta format in OPENAI_MODEL.

Development

Running Tests

Run all tests:

run.bat test

Run specific test suite:

run.bat test server
run.bat test anthropic
run.bat test openai

Docker Support

Build the Docker image:

docker build -t mcp-image-recognition .

Run the container:

docker run -it --env-file .env mcp-image-recognition

License

MIT License - see LICENSE file for details.

Release History

0.1.2 (2025-02-20): Improved OCR error handling and added comprehensive test coverage for OCR functionality
0.1.1 (2025-02-19): Added Tesseract OCR support for text extraction from images (optional feature)
0.1.0 (2025-02-19): Initial release with Anthropic and OpenAI vision support

Recommended Servers

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

@kazuph/mcp-fetch

Model Context Protocol server for fetching web content and processing images. This allows Claude Desktop (or any MCP client) to fetch web content and handle images appropriately.

Featured

Local

JavaScript

mcp-pinterest

A Pinterest Model Context Protocol (MCP) server for image search and information retrieval

Featured

TypeScript

mermaid-mcp-server

A Model Context Protocol (MCP) server that converts Mermaid diagrams to PNG images.

Featured

JavaScript

Glif

Run AI workflows hosted on Glif.app via MCP, including ComfyUI-based image generators, meme generators, selfies, chained LLM calls, and more

Official

TypeScript

ScreenshotOne MCP Server

An official MCP server implementation that allows AI assistants to capture website screenshots through the ScreenshotOne API, enabling visual context from web pages during conversations.

Official

TypeScript

DeepSRT MCP Server

An MCP server that enables users to generate summaries of YouTube videos in multiple languages and formats through integration with DeepSRT's API.

Official

JavaScript

WebPerfect MCP Server

An intelligent MCP server with a fully automated batch pipeline for web-ready images. Features include noise reduction, auto levels/curves, JPEG artifact removal, 4K resizing, smart sharpening with shadow/highlight enhancement, and advanced WebP conversion.

Local

JavaScript

Youtube MCP Server

Bridges YouTube API and AI assistants, enabling video analysis by downloading and processing closed captions to create summaries of YouTube videos.

Local

Python

mcp-screenshot

Provides screenshot and OCR capabilities for macOS.

Local

JavaScript