vision_mcp
Enables LLMs like DeepSeek to understand images by calling external vision models via OpenAI-compatible API. Provides tools to describe images or diagnose connectivity.
README
Vision MCP Server
为缺少多模态能力的 LLM(如 DeepSeek)提供图片理解能力。通过 OpenAI-compatible API 将图片转发至视觉模型,在 Claude Code 中以 MCP 工具形式暴露 describe_image 工具。
工作原理
User: "看看这张截图"
→ Claude (DeepSeek, 无视觉)
→ 调用 describe_image 工具
→ MCP Server: 读取图片 → Base64 编码 → 请求视觉模型 API
← 返回文字描述
→ Claude 基于描述回答
环境要求
- Python 3.10+
- 任一提供
/chat/completions的视觉模型 API(SiliconFlow、vLLM、Ollama 等)
安装
cd vision-mcp
pip install -e . # 可编辑模式,推荐
vision-mcp --help # 验证安装
配置
1. 准备环境变量
项目提供 .env.example 作为模板,复制后填入实际值:
cp .env.example .env
| 变量 | 必填 | 默认值 | 说明 |
|---|---|---|---|
VISION_API_BASE |
是 | http://localhost:8000/v1 |
API 地址,不含 /chat/completions 后缀 |
VISION_API_KEY |
按需 | not-needed |
API 密钥(本地部署可留空) |
VISION_MODEL |
是 | qwen-vl-plus |
模型名称 |
VISION_MAX_TOKENS |
否 | 2000 |
单次响应最大 token |
2. Provider 配置参考
SiliconFlow
VISION_API_BASE=https://api.siliconflow.cn/v1
VISION_API_KEY=sk-your-key-here
VISION_MODEL=Qwen/Qwen3.6-35B-A3B
本地 vLLM
VISION_API_BASE=http://10.0.0.5:8000/v1
VISION_API_KEY=not-needed
VISION_MODEL=Qwen3-VL-32B-Instruct
本地 Ollama
VISION_API_BASE=http://localhost:11434/v1
VISION_API_KEY=not-needed
VISION_MODEL=llava:13b
One-API 网关(代理 GPT-4o)
VISION_API_BASE=https://your-gateway.com/v1
VISION_API_KEY=sk-your-gateway-key
VISION_MODEL=gpt-4o
3. 注册到 Claude Code
claude mcp add-json -s user vision '{
"command": "vision-mcp",
"args": [],
"env": {
"VISION_API_BASE": "https://api.siliconflow.cn/v1",
"VISION_API_KEY": "sk-your-key-here",
"VISION_MODEL": "Qwen/Qwen3.6-35B-A3B"
}
}'
-s user注册为全局可用。-s local仅当前项目可用,或使用.mcp.json在团队内共享(注意 API Key 会暴露)。
4. 验证
claude mcp list
# 应显示: vision: vision-mcp - ✓ Connected
重启 Claude Code 后生效。
更新配置
claude mcp remove vision -s user
claude mcp add-json -s user vision '{ ... }'
工具
describe_image
| 参数 | 必填 | 说明 |
|---|---|---|
image_path |
是 | 图片本地绝对路径 |
prompt |
否 | 指定描述侧重点,如 "提取所有文字"、"描述图表数据趋势" |
支持的格式:PNG / JPG / JPEG / GIF / WebP / BMP,单文件不超过 20 MB。
vision_ping
诊断工具,返回服务状态,用于排查 MCP 通信是否正常。
示例
User: 看看 @error_screenshot.png 里的报错信息
User: 分析 @architecture.png 的系统设计有什么问题
User: 把 @data_table.png 转成 markdown 表格
常见问题
Failed to connect?
- 确认使用
pip install -e .安装,否则可能报ModuleNotFoundError - 确认
vision-mcp --help可正常执行 - 检查 API 连通性:
curl $VISION_API_BASE/models - 直接运行
vision-mcp查看 stderr 错误信息
返回乱码或空内容?
- 将图片转为 PNG 格式
- 缩小图片尺寸(Base64 编码后体积增大约 33%,可能超 API 限制)
- 更换视觉模型
如何切换模型?
修改配置中的 VISION_MODEL,然后重新注册:
claude mcp remove vision -s user
claude mcp add-json -s user vision '{ ... }'
图片数据会留存吗?
图片经 Base64 编码后通过 HTTPS 发送至配置的 API 服务端,MCP Server 不做本地存储或缓存。使用公网 API 时注意不要传入敏感图片。
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.