vision-mcp

vision-mcp

A Python-based MCP server that adds image analysis capabilities to text-only LLMs via a single analyze_image tool, supporting local files, URLs, auto-scaling, and multiple OpenAI-compatible APIs.

Category
Visit Server

README

Vision MCP Server

Python 实现的 MCP (Model Context Protocol) 图片分析服务器 —— 为纯文本大模型提供视觉能力。

参考自 Markusbetter/vision-mcp-server(Node.js / TypeScript),在其基础上增加了大图自动缩放、多 API 提供商支持等功能。

功能特点

  • 单一工具:analyze_image — 分析图片内容并提供详细描述
  • 支持本地图片文件远程 HTTP(S) URL
  • 自动缩放:超过 2048×2048 像素的图片会自动等比缩放到长边 2047px 再发送,避免超大图直接报错
  • 多提供商支持:兼容任意 OpenAI 兼容 API,通过环境变量切换——OpenAI、Azure、vLLM、Ollama、ModelScope 等均可
  • 全部通过环境变量配置,无需配置文件

安装

uv(推荐)

# 克隆仓库
git clone https://github.com/Jian-1197/vision-mcp.git
cd vision-mcp

# 安装(editable 模式,修改源码即时生效)
uv tool install -e .

安装完成后 vision-mcp 命令即全局可用。

需要 Python ≥ 3.10。

Conda

conda create -n mcp python=3.10 -y
conda activate mcp
pip install -e /path/to/vision-mcp

venv / pip

python -m venv .venv
source .venv/bin/activate       # Linux/macOS
.venv\Scripts\activate          # Windows
pip install -e /path/to/vision-mcp

环境变量配置

变量 必填 说明
VISION_BASE_URL API 基础地址
VISION_API_KEY API 密钥
VISION_MODEL 模型名

MCP 客户端配置

本服务通过 stdio 传输,运行命令为 vision-mcp

以 Claude Desktop 为例:

{
  "mcpServers": {
    "vision": {
      "command": "vision-mcp",
      "env": {
        "VISION_BASE_URL": "https://api.openai.com/v1",
        "VISION_API_KEY": "sk-your-key-here",
        "VISION_MODEL": "gpt-4o"
      }
    }
  }
}

若使用 Reasonix(1.x 配置文件路径为 ~\AppData\Roaming\reasonix\config.toml,可在软件中查看,或直接在 MCP 界面添加运行命令及环境变量):

[[plugins]]
name    = "vision"
type    = "stdio"
command = "vision-mcp"
env     = { VISION_BASE_URL = "https://api.openai.com/v1", VISION_API_KEY = "sk-your-key-here", VISION_MODEL = "gpt-4o" }

免费服务示例

智谱 GLM-4.6V-Flash

智谱提供的免费视觉模型,128K 上下文,支持图片、视频、文件理解。文档

API Key 获取:访问 智谱开放平台 → 注册 → API Keys

{
  "mcpServers": {
    "vision": {
      "command": "vision-mcp",
      "env": {
        "VISION_BASE_URL": "https://open.bigmodel.cn/api/paas/v4",
        "VISION_API_KEY": "你的智谱APIKey",
        "VISION_MODEL": "glm-4.6v-flash"
      }
    }
  }
}

魔搭社区 ModelScope

ModelScope 提供免费视觉模型调用额度,每日 2k 次。文档

API Token 获取:访问 ModelScope → 个人中心 → API 令牌

{
  "mcpServers": {
    "vision": {
      "command": "vision-mcp",
      "env": {
        "VISION_BASE_URL": "https://api-inference.modelscope.cn/v1",
        "VISION_API_KEY": "你的ModelScopeToken",
        "VISION_MODEL": "Qwen/Qwen3-VL-30B-A3B-Instruct"
      }
    }
  }
}

工作原理

  1. 接收图片(本地路径 → 读取文件;URL → HTTP 下载)
  2. 若图片超过 2048px,用 Pillow 等比缩放
  3. 编码为 base64 data URI
  4. 发送到 {VISION_BASE_URL}/chat/completions(OpenAI 兼容格式)
  5. 返回模型的文字描述

许可证

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured