agent-vision-mcp
Provides image analysis, inspection, cropping, OCR, and comparison capabilities via the Model Context Protocol, allowing AI agents to process and manipulate images using vision models.
README
agent-vision-mcp
<!-- mcp-name: io.github.idealizing/agent-vision-mcp -->
Give MCP-compatible AI agents image analysis, metadata inspection, cropping, OCR, and image comparison through any OpenAI-compatible vision model.
Features
- Analyze screenshots, charts, documents, UI, objects, and general images.
- Inspect image dimensions and metadata without calling a model.
- Crop and zoom into regions using normalized coordinates.
- Extract visible text with a VLM or an optional dedicated OCR model.
- Compare two to four images.
- Accept public URLs, local files, data URLs, and Base64 images.
- Run locally over the standard MCP stdio transport.
Claude Code
Requirements
- Python 3.10 or newer
uv- An OpenAI-compatible vision API endpoint and API key
uvx downloads the published package from PyPI into an isolated environment
and runs it. It does not use the source code in your current directory and
does not permanently install the package into your system Python.
Add To Claude Code
The command below configures Claude Code to start agent-vision-mcp from PyPI:
claude mcp add --scope user agent-vision \
--env UV_DEFAULT_INDEX=https://pypi.org/simple \
VISION_API_KEY="your-api-key" \
VISION_BASE_URL="https://your-provider.example/v1" \
VISION_MODEL_ID="your-vision-model" \
-- uvx agent-vision-mcp
Use UV_DEFAULT_INDEX=https://pypi.org/simple when your local PyPI mirror has
not synchronized the latest release.
Verify the connection:
claude mcp get agent-vision
claude mcp list
Then start Claude Code and ask:
Use vision_capabilities to show the available vision tools.
Analyze a local image:
Use vision_inspect on /data/example.png, then use vision_analyze to describe it.
By default, local image access is limited to /data and /tmp. Add another
directory with:
claude mcp remove --scope user agent-vision
claude mcp add --scope user agent-vision \
--env UV_DEFAULT_INDEX=https://pypi.org/simple \
VISION_API_KEY="your-api-key" \
VISION_BASE_URL="https://your-provider.example/v1" \
VISION_MODEL_ID="your-vision-model" \
VISION_ALLOWED_PATHS="/data,/tmp,/home/your-user/Pictures" \
-- uvx agent-vision-mcp
Dedicated OCR Model
Without dedicated OCR configuration, vision_extract_text uses the configured
vision model. To use a separate OCR model:
claude mcp add --scope user agent-vision \
--env UV_DEFAULT_INDEX=https://pypi.org/simple \
VISION_API_KEY="your-vision-api-key" \
VISION_BASE_URL="https://your-provider.example/v1" \
VISION_MODEL_ID="your-vision-model" \
OCR_ENABLED=true \
OCR_API_KEY="your-ocr-api-key" \
OCR_BASE_URL="https://your-provider.example/v1" \
OCR_MODEL_ID="your-ocr-model" \
-- uvx agent-vision-mcp
Never commit real API keys to Git.
Other MCP Clients
Use this stdio configuration with MCP clients that accept JSON configuration:
{
"mcpServers": {
"agent-vision": {
"command": "uvx",
"args": ["agent-vision-mcp"],
"env": {
"UV_DEFAULT_INDEX": "https://pypi.org/simple",
"VISION_API_KEY": "your-api-key",
"VISION_BASE_URL": "https://your-provider.example/v1",
"VISION_MODEL_ID": "your-vision-model"
}
}
}
}
Tools
| Tool | Purpose |
|---|---|
vision_analyze |
Analyze an image with task-specific prompts |
vision_inspect |
Read image dimensions, format, size, and mode |
vision_crop_analyze |
Crop and analyze a normalized image region |
vision_extract_text |
Extract visible text using OCR or the VLM |
vision_compare |
Compare two to four images |
vision_capabilities |
Show server configuration and limits |
URL Handling
VISION_URL_MODE controls remote-image handling:
autopasses URLs through for analysis and comparison, but downloads them when inspection, cropping, or OCR requires image bytes.passthroughprefers URL passthrough, except for tools that require bytes.downloadalways downloads and verifies remote images before model calls.
Downloads are streamed with byte limits, redirects are security checked, and downloaded or encoded inputs are verified as supported images.
Troubleshooting
If Claude Code cannot find the PyPI package:
UV_DEFAULT_INDEX=https://pypi.org/simple uvx --refresh agent-vision-mcp
If the MCP server does not connect:
claude mcp get agent-vision
uvx agent-vision-mcp
If you change the Claude Code configuration:
claude mcp remove --scope user agent-vision
Then add it again with the updated values.
Development
git clone https://github.com/idealizing/agent-vision-mcp.git
cd agent-vision-mcp
python -m venv .venv
.venv/bin/pip install -e ".[dev]"
cp .env.example .env
.venv/bin/python -m unittest discover -s tests -v
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.