MCP Servers

agent-vision-mcp

Provides image analysis, inspection, cropping, OCR, and comparison capabilities via the Model Context Protocol, allowing AI agents to process and manipulate images using vision models.

README

agent-vision-mcp

Give MCP-compatible AI agents image analysis, metadata inspection, cropping, OCR, and image comparison through any OpenAI-compatible vision model.

Features

Analyze screenshots, charts, documents, UI, objects, and general images.
Inspect image dimensions and metadata without calling a model.
Crop and zoom into regions using normalized coordinates.
Extract visible text with a VLM or an optional dedicated OCR model.
Compare two to four images.
Accept public URLs, local files, data URLs, and Base64 images.
Run locally over the standard MCP stdio transport.

Claude Code

Requirements

Python 3.10 or newer
uv
An OpenAI-compatible vision API endpoint and API key

uvx downloads the published package from PyPI into an isolated environment and runs it. It does not use the source code in your current directory and does not permanently install the package into your system Python.

Add To Claude Code

The command below configures Claude Code to start agent-vision-mcp from PyPI:

claude mcp add --scope user agent-vision \
  --env UV_DEFAULT_INDEX=https://pypi.org/simple \
  VISION_API_KEY="your-api-key" \
  VISION_BASE_URL="https://your-provider.example/v1" \
  VISION_MODEL_ID="your-vision-model" \
  -- uvx agent-vision-mcp

Use UV_DEFAULT_INDEX=https://pypi.org/simple when your local PyPI mirror has not synchronized the latest release.

Verify the connection:

claude mcp get agent-vision
claude mcp list

Then start Claude Code and ask:

Use vision_capabilities to show the available vision tools.

Analyze a local image:

Use vision_inspect on /data/example.png, then use vision_analyze to describe it.

By default, local image access is limited to /data and /tmp. Add another directory with:

claude mcp remove --scope user agent-vision

claude mcp add --scope user agent-vision \
  --env UV_DEFAULT_INDEX=https://pypi.org/simple \
  VISION_API_KEY="your-api-key" \
  VISION_BASE_URL="https://your-provider.example/v1" \
  VISION_MODEL_ID="your-vision-model" \
  VISION_ALLOWED_PATHS="/data,/tmp,/home/your-user/Pictures" \
  -- uvx agent-vision-mcp

Dedicated OCR Model

Without dedicated OCR configuration, vision_extract_text uses the configured vision model. To use a separate OCR model:

claude mcp add --scope user agent-vision \
  --env UV_DEFAULT_INDEX=https://pypi.org/simple \
  VISION_API_KEY="your-vision-api-key" \
  VISION_BASE_URL="https://your-provider.example/v1" \
  VISION_MODEL_ID="your-vision-model" \
  OCR_ENABLED=true \
  OCR_API_KEY="your-ocr-api-key" \
  OCR_BASE_URL="https://your-provider.example/v1" \
  OCR_MODEL_ID="your-ocr-model" \
  -- uvx agent-vision-mcp

Never commit real API keys to Git.

Other MCP Clients

Use this stdio configuration with MCP clients that accept JSON configuration:

{
  "mcpServers": {
    "agent-vision": {
      "command": "uvx",
      "args": ["agent-vision-mcp"],
      "env": {
        "UV_DEFAULT_INDEX": "https://pypi.org/simple",
        "VISION_API_KEY": "your-api-key",
        "VISION_BASE_URL": "https://your-provider.example/v1",
        "VISION_MODEL_ID": "your-vision-model"
      }
    }
  }
}

Tools

Tool	Purpose
`vision_analyze`	Analyze an image with task-specific prompts
`vision_inspect`	Read image dimensions, format, size, and mode
`vision_crop_analyze`	Crop and analyze a normalized image region
`vision_extract_text`	Extract visible text using OCR or the VLM
`vision_compare`	Compare two to four images
`vision_capabilities`	Show server configuration and limits

URL Handling

VISION_URL_MODE controls remote-image handling:

auto passes URLs through for analysis and comparison, but downloads them when inspection, cropping, or OCR requires image bytes.
passthrough prefers URL passthrough, except for tools that require bytes.
download always downloads and verifies remote images before model calls.

Downloads are streamed with byte limits, redirects are security checked, and downloaded or encoded inputs are verified as supported images.

Troubleshooting

If Claude Code cannot find the PyPI package:

UV_DEFAULT_INDEX=https://pypi.org/simple uvx --refresh agent-vision-mcp

If the MCP server does not connect:

claude mcp get agent-vision
uvx agent-vision-mcp

If you change the Claude Code configuration:

claude mcp remove --scope user agent-vision

Then add it again with the updated values.

Development

git clone https://github.com/idealizing/agent-vision-mcp.git
cd agent-vision-mcp
python -m venv .venv
.venv/bin/pip install -e ".[dev]"
cp .env.example .env
.venv/bin/python -m unittest discover -s tests -v

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured