Vision MCP Server
Provides image understanding capabilities for MCP clients (e.g., Claude Code) by analyzing images using vision models from providers like Alibaba Cloud Bailian, OpenAI, or OpenRouter, returning detailed descriptions in Markdown format.
README
Vision MCP Server
为 MCP 客户端(Claude Code 等)提供图片理解能力,通过阿里云百炼/OpenAI/OpenRouter 等视觉模型分析图片内容,返回面向软件开发的描述。
快速开始
pip install -e .
python -m vision_mcp_server
环境变量
必选
| 变量 | 说明 |
|---|---|
DASHSCOPE_API_KEY |
百炼 API Key(默认 Provider) |
Provider 切换
| 变量 | 默认值 | 说明 |
|---|---|---|
VISION_PROVIDER |
bailian |
Provider 名称:bailian / openai / openrouter |
VISION_BASE_URL |
按 Provider | 覆盖 API 端点地址 |
VISION_MODEL |
按 Provider | 覆盖模型名称 |
VISION_API_KEY |
按 Provider | 覆盖 API Key |
VISION_MAX_TOKENS |
600 (quick) / 1500 (detailed) |
最大输出 token 数 |
各 Provider 默认值
| Provider | 模型 | 地址 |
|---|---|---|
bailian |
qwen-vl-max |
https://dashscope.aliyuncs.com/compatible-mode/v1 |
openai |
gpt-4o-mini |
https://api.openai.com/v1 |
openrouter |
openai/gpt-4o |
https://openrouter.ai/api/v1 |
Tool: image_understand
image_understand(image_path: str, prompt: str | None = None, mode: str = "quick") -> dict
参数
| 参数 | 类型 | 默认 | 说明 |
|---|---|---|---|
image_path |
string | 必填 | 本地图片路径(PNG/JPG/GIF/WebP)或 HTTP URL |
prompt |
string | None |
自定义提问,不传则自动选择提示词 |
mode |
string | "quick" |
"quick" 精简快速(5-10s)/ "detailed" 七维度详细分析 |
返回
{
"description": "图片内容描述(Markdown 格式)",
"model": "qwen-vl-max",
"status": "success"
}
两种模式
| 模式 | 耗时 | 输出 | 适用场景 |
|---|---|---|---|
quick |
5-10s | 3-4 要点 | 日常识图、快速了解 |
detailed |
15-30s | 七维度分析 | UI 还原、设计评审、图表提取 |
detailed 模式的七个分析维度
- UI 布局 — 整体结构、区块位置比例
- 组件结构 — 按钮/表单/表格的层次嵌套
- 页面层级 — 信息层级关系
- 配色风格 — 主色调、设计风格、明暗模式
- OCR 文字 — 所有可见文字及位置
- 图表信息 — 图表类型、数据维度、关键数值
- 前端实现特征 — CSS 框架、响应式、动画、图标库
Claude Code 配置
项目根目录创建 .mcp.json:
{
"mcpServers": {
"vision": {
"command": "python",
"args": ["-m", "vision_mcp_server"],
"cwd": "E:/MCP",
"env": {
"DASHSCOPE_API_KEY": "sk-xxx"
}
}
}
}
安装后 /mcp → Reconnect 生效。
项目结构
src/vision_mcp_server/
├── __init__.py
├── __main__.py # 入口
├── server.py # FastMCP + image_understand tool
├── vision.py # 多 Provider 视觉客户端
└── image_utils.py # 图片路径检测 + Base64 编码
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.