Vision MCP Server

Vision MCP Server

Provides image understanding capabilities for MCP clients (e.g., Claude Code) by analyzing images using vision models from providers like Alibaba Cloud Bailian, OpenAI, or OpenRouter, returning detailed descriptions in Markdown format.

Category
Visit Server

README

Vision MCP Server

为 MCP 客户端(Claude Code 等)提供图片理解能力,通过阿里云百炼/OpenAI/OpenRouter 等视觉模型分析图片内容,返回面向软件开发的描述。

快速开始

pip install -e .
python -m vision_mcp_server

环境变量

必选

变量 说明
DASHSCOPE_API_KEY 百炼 API Key(默认 Provider)

Provider 切换

变量 默认值 说明
VISION_PROVIDER bailian Provider 名称:bailian / openai / openrouter
VISION_BASE_URL 按 Provider 覆盖 API 端点地址
VISION_MODEL 按 Provider 覆盖模型名称
VISION_API_KEY 按 Provider 覆盖 API Key
VISION_MAX_TOKENS 600 (quick) / 1500 (detailed) 最大输出 token 数

各 Provider 默认值

Provider 模型 地址
bailian qwen-vl-max https://dashscope.aliyuncs.com/compatible-mode/v1
openai gpt-4o-mini https://api.openai.com/v1
openrouter openai/gpt-4o https://openrouter.ai/api/v1

Tool: image_understand

image_understand(image_path: str, prompt: str | None = None, mode: str = "quick") -> dict

参数

参数 类型 默认 说明
image_path string 必填 本地图片路径(PNG/JPG/GIF/WebP)或 HTTP URL
prompt string None 自定义提问,不传则自动选择提示词
mode string "quick" "quick" 精简快速(5-10s)/ "detailed" 七维度详细分析

返回

{
  "description": "图片内容描述(Markdown 格式)",
  "model": "qwen-vl-max",
  "status": "success"
}

两种模式

模式 耗时 输出 适用场景
quick 5-10s 3-4 要点 日常识图、快速了解
detailed 15-30s 七维度分析 UI 还原、设计评审、图表提取

detailed 模式的七个分析维度

  1. UI 布局 — 整体结构、区块位置比例
  2. 组件结构 — 按钮/表单/表格的层次嵌套
  3. 页面层级 — 信息层级关系
  4. 配色风格 — 主色调、设计风格、明暗模式
  5. OCR 文字 — 所有可见文字及位置
  6. 图表信息 — 图表类型、数据维度、关键数值
  7. 前端实现特征 — CSS 框架、响应式、动画、图标库

Claude Code 配置

项目根目录创建 .mcp.json

{
  "mcpServers": {
    "vision": {
      "command": "python",
      "args": ["-m", "vision_mcp_server"],
      "cwd": "E:/MCP",
      "env": {
        "DASHSCOPE_API_KEY": "sk-xxx"
      }
    }
  }
}

安装后 /mcp → Reconnect 生效。

项目结构

src/vision_mcp_server/
├── __init__.py
├── __main__.py       # 入口
├── server.py         # FastMCP + image_understand tool
├── vision.py         # 多 Provider 视觉客户端
└── image_utils.py    # 图片路径检测 + Base64 编码

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured