image_mcp
Enables text-only LLMs to analyze images by routing them to an OpenAI-compatible vision backend, supporting local files, URLs, and data URLs.
README
Image Summarization MCP Server
A Model Context Protocol (MCP) server that accepts image files and sends them to an OpenAI-compatible chat completion endpoint for analysis, description, and comparison tasks.
Use Case
Many LLMs used for agentic coding are text-only and lack support for image inputs. This tool allows you to use a secondary model dedicated to describing and analyzing images, without having to use a multi-modal LLM for your primary model. It supports both cloud and local LLMs via any server that supports the OpenAI chat completion endpoint (including llama.cpp / llama-swap, Ollama, open-webui, OpenRouter, etc).
For local models, gemma3:4b-it-qat works quite well with a relatively small footprint and fast performance (even on CPU-only).
Features
- Accepts images via unified
image_pathparameter — local paths, URLs, and data URLs - Supports
taskparameter to perform specific analysis beyond general description - Sends images to OpenAI-compatible chat completion endpoints
- Returns detailed image descriptions
- Configurable endpoint URL, API key, and model
- Optional persistent config file at
~/.config/image_mcp/config.json - Command-line interface for configuration
- Comprehensive error handling
Quick install from NPM
Add this to your global mcp_settings.json or project mcp.json:
{
"mcpServers": {
"image_mcp": {
"command": "npx",
"args": [
"-y",
"@karlcc/image_mcp"
],
"env": {
"OPENAI_API_KEY": "YOUR_API_KEY",
"OPENAI_BASE_URL": "https://api.openai.com/v1",
"OPENAI_MODEL": "gemini-3.1-flash-lite-preview"
}
}
}
}
If you prefer claude mcp add-json, use:
claude mcp add-json image_mcp --scope user '{
"type": "stdio",
"command": "npx",
"args": ["-y", "@karlcc/image_mcp"],
"env": {
"OPENAI_API_KEY": "YOUR_API_KEY",
"OPENAI_BASE_URL": "https://api.openai.com/v1",
"OPENAI_MODEL": "gemini-3.1-flash-lite-preview"
}
}'
At a minimum, configure base URL, API key, and model for your chosen backend.
For use with slow local models, you may need to also increase the timeout and max retries settings.
Configuration
The MCP server can be configured using a config file, environment variables, or command-line arguments.
Environment Variables
OPENAI_API_KEY: Your API key for the OpenAI-compatible serviceOPENAI_BASE_URL: The base URL of the OpenAI-compatible service (default:http://localhost:9292/v1)OPENAI_MODEL: The model to use for image analysisOPENAI_TIMEOUT: Request timeout in milliseconds (default: 60000). When running local models you may need to increase this.OPENAI_MAX_RETRIES: Maximum number of retry attempts (default: 3)OPENAI_STREAMING: Enable/disable streaming (true/false)MCP_USE_HTTP: Enable HTTP/SSE transport (true/false)MCP_PORT: HTTP port for MCP server (default:8080)IMAGE_MCP_CONFIG_PATH: Override config file path (default:~/.config/image_mcp/config.json)
Command Line Arguments
npx -y @karlcc/image_mcp \
--api-key your-api-key \
--base-url https://api.openai.com/v1 \
--model gpt-4-vision-preview \
--http \
--mcp-port 8080 \
--timeout 60000 \
--max-retries 5
Configuration Priority
- Command-line arguments
- Environment variables
- Config file (
~/.config/image_mcp/config.json) - Default values
Persistent Config
Save your resolved configuration once and reuse it across sessions:
node build/index.js \
--api-key your-api-key \
--base-url https://api.openai.com/v1 \
--model gpt-4.1-mini \
--http \
--mcp-port 8080 \
--save-config
This writes ~/.config/image_mcp/config.json (or a custom file via --config /path/to/config.json).
Verifying your model has vision
Before committing to a model, verify it can actually see images:
# Automatic: --save-config verifies vision by default before writing
node build/index.js --model your-model --save-config
# Quick one-shot check:
IMAGE_MCP_SMOKE=1 npm run test:smoke
# Opt-in startup probe (warns in stderr if model can't see):
IMAGE_MCP_PROBE_ON_START=true node build/index.js
If verification fails, the config file is not written and the exit code is non-zero. Use --no-verify to skip the check.
Usage
Host model vs vision backend
When the host LLM (e.g. GLM-5.1, Claude Haiku) is text-only, it cannot inspect pixels. Wire image_mcp to a vision-capable backend and the host will route image tasks there automatically.
Z.AI / GLM example
npx -y @karlcc/image_mcp \
--base-url https://open.bigmodel.cn/api/paas/v4 \
--api-key $ZAI_API_KEY \
--model glm-4.6v-flash
The app stays backend-agnostic — any OpenAI-compatible endpoint works. glm-4.6v-flash is shown because it is a capable, low-latency vision model available on Z.AI.
Client routing snippet
Add to your MCP client config (e.g. Claude Desktop, Cursor, or .claude/settings.json):
{
"mcpServers": {
"image_mcp": {
"command": "npx",
"args": [
"-y",
"@karlcc/image_mcp@latest"
],
"env": {
"OPENAI_API_KEY": "YOUR_ZAI_KEY",
"OPENAI_BASE_URL": "https://open.bigmodel.cn/api/paas/v4",
"OPENAI_MODEL": "glm-4.6v-flash"
}
}
}
}
MCP Tools
read_image_via_vision_backend
Reads and analyzes one image via the vision backend. Accepts local absolute paths, http(s) URLs, and data URLs.
Parameters
image_path(string): Image to analyze. Supports:- Absolute local paths (e.g.
/Users/me/screenshot.png) - HTTP/HTTPS URLs (e.g.
https://example.com/image.jpg) - Data URLs with base64 encoded images (e.g.
data:image/png;base64,...)
- Absolute local paths (e.g.
task(string, optional): What to do with the image (e.g."Read all text","Describe the UI layout","Extract data from chart"). Defaults to a general description.
Example Usage
Using file path:
{
"name": "read_image_via_vision_backend",
"arguments": {
"image_path": "/Users/me/screenshot.png",
"task": "Read all text in this screenshot"
}
}
Using HTTP URL:
{
"name": "read_image_via_vision_backend",
"arguments": {
"image_path": "https://example.com/image.jpg"
}
}
compare_images_via_vision_backend
Compares 2 or more images via the vision backend. Accepts local absolute paths, http(s) URLs, and data URLs.
Parameters
image_paths(array of strings, min 2): Images to compare. Each entry supports the same formats asimage_pathabove.task(string, optional): What to compare (e.g."Describe UI differences","Which chart shows higher values?"). Defaults to a general comparison.
Example Usage
{
"name": "compare_images_via_vision_backend",
"arguments": {
"image_paths": [
"/Users/me/before.png",
"/Users/me/after.png"
],
"task": "Describe the UI differences between these screenshots"
}
}
get_config_info
Returns the active server configuration for diagnostics with the API key redacted.
Dev Setup
- Clone the repository:
git clone https://github.com/karlcc/image_mcp.git
cd image_mcp
- Install dependencies:
npm install
- Build the project:
npm run build
- Starting the Server
node build/index.js
The server will start and listen on stdio for MCP protocol communications.
To run with HTTP/SSE transport:
node build/index.js --http --mcp-port 8080
MCP Tool Installation (local dev build)
Add this to your global mcp_settings.json or project mcp.json:
{
"mcpServers": {
"image_mcp": {
"command": "node",
"args": [
"/path/to/image_mcp/build/index.js"
],
"env": {
"OPENAI_API_KEY": "YOUR_API_KEY",
"OPENAI_BASE_URL": "http://localhost:9292/v1",
"OPENAI_MODEL": "gemma3:4b-it-qat"
}
}
}
}
Testing
Running Tests
Run the test suite:
npm test
The test suite includes:
- Unit tests for image processing functionality
- Integration tests that require a mock server
- Tests for both
read_image_via_vision_backendandcompare_images_via_vision_backendtools
Model Benchmark (Accuracy + Latency)
Run the built-in benchmark to compare candidate models with weighted accuracy and response latency:
npm run benchmark:models
By default this uses:
- Task file:
bench/tasks.default.json - Models:
~/.config/image_mcp/model_candidates.json(candidatesarray) - Ranking: weighted accuracy (desc), success rate (desc), median latency (asc)
Useful overrides:
node scripts/benchmark-models.mjs \
--models gemma-4-31b,kimi-k2.5-fw,qwen3.5-397b-fw \
--repeats 2 \
--tasks bench/tasks.default.json
Outputs:
- Raw call-level results at
/tmp/image_mcp_accuracy_benchmark_*.jsonl - Summary at
/tmp/image_mcp_accuracy_summary_*.json - Auto-updates active model in
~/.config/image_mcp/config.json(disable with--no-update-config)
Mock Server Testing
The project includes a mock OpenAI-compatible server for testing purposes.
- Start the mock server in a separate terminal:
node tests/mock-server.js
The mock server will start on http://localhost:9293 and provides endpoints for:
GET /v1/models- Lists available modelsPOST /v1/chat/completions- Mock chat completions with image supportPOST /v1/test/image-process- Test endpoint for image processing validation
- Set environment variables for the mock server:
export OPENAI_BASE_URL=http://localhost:9293/v1
export OPENAI_API_KEY=test-key
export OPENAI_MODEL=test-model-vision
- Run the integration tests:
npm test tests/integration.test.ts
Real OpenAI-Compatible Server Testing
To test with a real OpenAI-compatible endpoint:
- Set up your environment variables:
export OPENAI_API_KEY=your-actual-api-key
export OPENAI_BASE_URL=https://api.openai.com/v1
export OPENAI_MODEL=gpt-4-vision-preview
Or for other OpenAI-compatible services:
export OPENAI_API_KEY=your-service-api-key
export OPENAI_BASE_URL=https://your-service-endpoint/v1
export OPENAI_MODEL=your-vision-model
- Start the MCP server:
node build/index.js --http --mcp-port 8080
- Send test requests using an MCP client or test the tools directly.
Manual Testing
You can manually test the MCP server using tools like curl or MCP clients:
# Test with a local image file
curl -X POST http://localhost:8080/sse \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "read_image_via_vision_backend",
"arguments": {
"image_path": "/path/to/your/test/image.jpg"
}
}
}'
API Reference
OpenAI-Compatible API Integration
The server sends requests to the OpenAI-compatible chat completion endpoint with the following structure:
{
"model": "your-model",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in detail, including all text."
},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,..."
}
}
]
}
],
"stream": false
}
Supported Image Formats
- JPEG (.jpg, .jpeg)
- PNG (.png)
- GIF (.gif)
- WebP (.webp)
- SVG (.svg)
- BMP (.bmp)
- TIFF (.tiff)
Error Handling
The server includes comprehensive error handling for:
- Invalid image files
- Unsupported image formats
- Missing API keys
- Network connectivity issues
- API response errors
Development
Project Structure
src/
├── config.ts # Configuration management
├── image-processor.ts # Image processing utilities
├── index.ts # Main MCP server
└── openai-client.ts # OpenAI-compatible API client
Building
npm run build
Testing
npm test
Vision smoke test (requires API credentials):
IMAGE_MCP_SMOKE=1 npm run test:smoke
Full preflight before release:
npm run preflight
Release: tag and publish to npm
Recommended flow: GitHub Actions trusted publishing (OIDC).
One-time setup (npm package owner):
# Requires npm v11.10+ and package 2FA enabled on npm.
# If local npm is older, run via npx as shown here.
npx -y npm@latest trust github @karlcc/image_mcp \
--repo karlcc/image_mcp \
--file publish.yml \
--yes
Then ship each release with:
# 1) Verify quality gates
npm run build
npm test
# 2) Commit pending changes
git add -A
git commit -m "chore(release): prepare next version"
# 3) Bump version + create git tag (patch/minor/major)
npm version patch
# 4) Push commit + tag (GitHub Actions publishes to npm)
git push origin main --follow-tags
Fallback manual publish (if trusted publishing is not configured):
npm publish --access public --otp <6-digit-otp>
Dev cycle: four layers of vision detection
The repo is designed so a non-vision model can't slip through silently:
| Layer | When | How |
|---|---|---|
| Config save | --save-config |
Probes model with a tiny fixture before writing config |
| Smoke test | npm run test:smoke |
Jest test against the configured model |
| Startup probe | IMAGE_MCP_PROBE_ON_START=true |
Warns on stderr if model fails |
| Benchmark | npm run benchmark:models |
--fail-if-any-nonvision exits non-zero for 0% scorers |
License
This project is licensed under the MIT License.
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
Support
For issues and questions, please open an issue on the GitHub repository.
Tips
Tips / donations always appreciated to help fund future development.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.