MCP Servers

Vision MCP Server

Provides image understanding capabilities for MCP clients (e.g., Claude Code) by analyzing images using vision models from providers like Alibaba Cloud Bailian, OpenAI, or OpenRouter, returning detailed descriptions in Markdown format.

README

Vision MCP Server

为 MCP 客户端（Claude Code 等）提供图片理解能力，通过阿里云百炼/OpenAI/OpenRouter 等视觉模型分析图片内容，返回面向软件开发的描述。

快速开始

pip install -e .
python -m vision_mcp_server

环境变量

必选

变量	说明
`DASHSCOPE_API_KEY`	百炼 API Key（默认 Provider）

Provider 切换

变量	默认值	说明
`VISION_PROVIDER`	`bailian`	Provider 名称：`bailian` / `openai` / `openrouter`
`VISION_BASE_URL`	按 Provider	覆盖 API 端点地址
`VISION_MODEL`	按 Provider	覆盖模型名称
`VISION_API_KEY`	按 Provider	覆盖 API Key
`VISION_MAX_TOKENS`	`600` (quick) / `1500` (detailed)	最大输出 token 数

各 Provider 默认值

Provider	模型	地址
`bailian`	`qwen-vl-max`	`https://dashscope.aliyuncs.com/compatible-mode/v1`
`openai`	`gpt-4o-mini`	`https://api.openai.com/v1`
`openrouter`	`openai/gpt-4o`	`https://openrouter.ai/api/v1`

Tool: `image_understand`

image_understand(image_path: str, prompt: str | None = None, mode: str = "quick") -> dict

参数

参数	类型	默认	说明
`image_path`	string	必填	本地图片路径（PNG/JPG/GIF/WebP）或 HTTP URL
`prompt`	string	`None`	自定义提问，不传则自动选择提示词
`mode`	string	`"quick"`	`"quick"` 精简快速（5-10s）/ `"detailed"` 七维度详细分析

{
  "description": "图片内容描述（Markdown 格式）",
  "model": "qwen-vl-max",
  "status": "success"
}

两种模式

模式	耗时	输出	适用场景
`quick`	5-10s	3-4 要点	日常识图、快速了解
`detailed`	15-30s	七维度分析	UI 还原、设计评审、图表提取

detailed 模式的七个分析维度

UI 布局 — 整体结构、区块位置比例
组件结构 — 按钮/表单/表格的层次嵌套
页面层级 — 信息层级关系
配色风格 — 主色调、设计风格、明暗模式
OCR 文字 — 所有可见文字及位置
图表信息 — 图表类型、数据维度、关键数值
前端实现特征 — CSS 框架、响应式、动画、图标库

Claude Code 配置

项目根目录创建 .mcp.json：

{
  "mcpServers": {
    "vision": {
      "command": "python",
      "args": ["-m", "vision_mcp_server"],
      "cwd": "E:/MCP",
      "env": {
        "DASHSCOPE_API_KEY": "sk-xxx"
      }
    }
  }
}

安装后 /mcp → Reconnect 生效。

项目结构

src/vision_mcp_server/
├── __init__.py
├── __main__.py       # 入口
├── server.py         # FastMCP + image_understand tool
├── vision.py         # 多 Provider 视觉客户端
└── image_utils.py    # 图片路径检测 + Base64 编码

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured