vision-mcp
A Python-based MCP server that adds image analysis capabilities to text-only LLMs via a single analyze_image tool, supporting local files, URLs, auto-scaling, and multiple OpenAI-compatible APIs.
README
Vision MCP Server
Python 实现的 MCP (Model Context Protocol) 图片分析服务器 —— 为纯文本大模型提供视觉能力。
参考自 Markusbetter/vision-mcp-server(Node.js / TypeScript),在其基础上增加了大图自动缩放、多 API 提供商支持等功能。
功能特点
- 单一工具:
analyze_image— 分析图片内容并提供详细描述 - 支持本地图片文件和远程 HTTP(S) URL
- 自动缩放:超过 2048×2048 像素的图片会自动等比缩放到长边 2047px 再发送,避免超大图直接报错
- 多提供商支持:兼容任意 OpenAI 兼容 API,通过环境变量切换——OpenAI、Azure、vLLM、Ollama、ModelScope 等均可
- 全部通过环境变量配置,无需配置文件
安装
uv(推荐)
# 克隆仓库
git clone https://github.com/Jian-1197/vision-mcp.git
cd vision-mcp
# 安装(editable 模式,修改源码即时生效)
uv tool install -e .
安装完成后 vision-mcp 命令即全局可用。
需要 Python ≥ 3.10。
Conda
conda create -n mcp python=3.10 -y
conda activate mcp
pip install -e /path/to/vision-mcp
venv / pip
python -m venv .venv
source .venv/bin/activate # Linux/macOS
.venv\Scripts\activate # Windows
pip install -e /path/to/vision-mcp
环境变量配置
| 变量 | 必填 | 说明 |
|---|---|---|
VISION_BASE_URL |
✅ | API 基础地址 |
VISION_API_KEY |
✅ | API 密钥 |
VISION_MODEL |
✅ | 模型名 |
MCP 客户端配置
本服务通过 stdio 传输,运行命令为 vision-mcp。
以 Claude Desktop 为例:
{
"mcpServers": {
"vision": {
"command": "vision-mcp",
"env": {
"VISION_BASE_URL": "https://api.openai.com/v1",
"VISION_API_KEY": "sk-your-key-here",
"VISION_MODEL": "gpt-4o"
}
}
}
}
若使用 Reasonix(1.x 配置文件路径为 ~\AppData\Roaming\reasonix\config.toml,可在软件中查看,或直接在 MCP 界面添加运行命令及环境变量):
[[plugins]]
name = "vision"
type = "stdio"
command = "vision-mcp"
env = { VISION_BASE_URL = "https://api.openai.com/v1", VISION_API_KEY = "sk-your-key-here", VISION_MODEL = "gpt-4o" }
免费服务示例
智谱 GLM-4.6V-Flash
智谱提供的免费视觉模型,128K 上下文,支持图片、视频、文件理解。文档
API Key 获取:访问 智谱开放平台 → 注册 → API Keys
{
"mcpServers": {
"vision": {
"command": "vision-mcp",
"env": {
"VISION_BASE_URL": "https://open.bigmodel.cn/api/paas/v4",
"VISION_API_KEY": "你的智谱APIKey",
"VISION_MODEL": "glm-4.6v-flash"
}
}
}
}
魔搭社区 ModelScope
ModelScope 提供免费视觉模型调用额度,每日 2k 次。文档
API Token 获取:访问 ModelScope → 个人中心 → API 令牌
{
"mcpServers": {
"vision": {
"command": "vision-mcp",
"env": {
"VISION_BASE_URL": "https://api-inference.modelscope.cn/v1",
"VISION_API_KEY": "你的ModelScopeToken",
"VISION_MODEL": "Qwen/Qwen3-VL-30B-A3B-Instruct"
}
}
}
}
工作原理
- 接收图片(本地路径 → 读取文件;URL → HTTP 下载)
- 若图片超过 2048px,用 Pillow 等比缩放
- 编码为 base64 data URI
- 发送到
{VISION_BASE_URL}/chat/completions(OpenAI 兼容格式) - 返回模型的文字描述
许可证
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.