Vision Bridge MCP
A universal vision MCP server that enables Claude Code and Claude Desktop to describe images, extract text, and answer questions about images by converting visual content to text via multiple AI providers.
README
Vision Bridge MCP
通用视觉 MCP 服务器,支持多种多模态 API 格式。当 Claude Code / Claude Desktop 使用的模型无法直接处理图片时,可通过该 MCP 工具将图片内容转换为文字描述,从而间接“看懂”图片。
功能
提供 3 个 MCP 工具:
describe_image:详细描述图片内容(对象、场景、文字、人物动作、颜色、布局等)。extract_image_text:提取图片中的所有文字(OCR)。ask_about_image:针对图片回答具体问题。
支持四种图片输入方式:
image_url:图片 URLimage_path:本地图片路径image_base64:Base64 编码的图片数据image:Messages API 格式的图片内容块,例如:
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": "iVBORw0KGgo..."
}
}
支持的 API 格式(Provider)
| Provider | 说明 | 默认 Endpoint |
|---|---|---|
anthropic |
Anthropic Messages API | /v1/messages |
openai |
OpenAI Chat Completions API | /chat/completions |
gemini |
Gemini Native generateContent API | /v1beta/models/{model}:generateContent |
通过 PROVIDER 环境变量切换 API 格式。
快速开始
全局安装
npm install -g @shen866/vision-bridge-mcp
本地开发
git clone https://github.com/shen866/vision-bridge-mcp.git
cd vision-bridge-mcp
npm install
npm run build
配置 Claude Code
API_KEY、BASE_URL、MODEL 为必填项。
Anthropic Messages
{
"mcpServers": {
"vision_bridge": {
"command": "npx",
"args": ["-y", "@shen866/vision-bridge-mcp"],
"env": {
"API_KEY": "your-api-key",
"BASE_URL": "https://api.anthropic.com",
"MODEL": "claude-3-5-sonnet-20241022"
}
}
}
}
如果使用本地构建版本,将 command 改为 node 并把 args 改为绝对路径:
{
"mcpServers": {
"vision_bridge": {
"command": "node",
"args": ["/Users/shen/workspace/kimi-vision-mcp/dist/index.js"],
"env": {
"API_KEY": "your-api-key",
"BASE_URL": "https://api.anthropic.com",
"MODEL": "claude-3-5-sonnet-20241022"
}
}
}
}
OpenAI Chat Completions
{
"mcpServers": {
"vision_bridge": {
"command": "npx",
"args": ["-y", "@shen866/vision-bridge-mcp"],
"env": {
"PROVIDER": "openai",
"API_KEY": "your-api-key",
"BASE_URL": "https://api.openai.com/v1",
"MODEL": "gpt-4o"
}
}
}
}
Gemini Native generateContent
{
"mcpServers": {
"vision_bridge": {
"command": "npx",
"args": ["-y", "@shen866/vision-bridge-mcp"],
"env": {
"PROVIDER": "gemini",
"API_KEY": "your-api-key",
"BASE_URL": "https://generativelanguage.googleapis.com",
"MODEL": "gemini-1.5-pro-latest"
}
}
}
}
使用
在 Claude Code / Claude Desktop 中发送图片或引用图片路径后,Claude 会自动调用 describe_image 等工具获取多模态模型对图片的文字描述,然后将描述交给纯文本模型继续处理。
如果 Claude Code 经常直接把图片发给模型导致报错,可以在项目根目录创建 .claude/CLAUDE.md:
当前模型不支持图片输入。当用户发送图片时,必须调用 vision_bridge MCP 工具处理,不要直接传给模型。
环境变量
| 变量 | 说明 | 默认值 |
|---|---|---|
PROVIDER |
API 格式:anthropic、openai、gemini |
anthropic |
API_KEY |
API 密钥 | 必填 |
BASE_URL |
API 基础地址 | 必填 |
ENDPOINT |
API 端点路径 | 取决于 provider |
MODEL |
模型名称 | 必填 |
MAX_TOKENS |
最大输出 token 数 | 4096 |
API_VERSION |
Messages API 版本 | 2023-06-01 |
AUTH_HEADER |
Messages API 认证方式:x-api-key 或 bearer |
x-api-key |
协议
基于 MCP(Model Context Protocol)JSON-RPC 2.0,通过 stdio 进行通信。
401 排查
如果返回 401,通常是认证头不对。各 provider 默认认证方式如下:
anthropic:x-api-key+anthropic-versionopenai:Authorization: Bearergemini:URL query parameter?key=
如果 Messages API 网关需要 Bearer Token,可设置:
"AUTH_HEADER": "bearer"
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.