Vision Bridge MCP

Vision Bridge MCP

A universal vision MCP server that enables Claude Code and Claude Desktop to describe images, extract text, and answer questions about images by converting visual content to text via multiple AI providers.

Category
Visit Server

README

Vision Bridge MCP

通用视觉 MCP 服务器,支持多种多模态 API 格式。当 Claude Code / Claude Desktop 使用的模型无法直接处理图片时,可通过该 MCP 工具将图片内容转换为文字描述,从而间接“看懂”图片。

功能

提供 3 个 MCP 工具:

  • describe_image:详细描述图片内容(对象、场景、文字、人物动作、颜色、布局等)。
  • extract_image_text:提取图片中的所有文字(OCR)。
  • ask_about_image:针对图片回答具体问题。

支持四种图片输入方式:

  • image_url:图片 URL
  • image_path:本地图片路径
  • image_base64:Base64 编码的图片数据
  • imageMessages API 格式的图片内容块,例如:
{
  "type": "image",
  "source": {
    "type": "base64",
    "media_type": "image/png",
    "data": "iVBORw0KGgo..."
  }
}

支持的 API 格式(Provider)

Provider 说明 默认 Endpoint
anthropic Anthropic Messages API /v1/messages
openai OpenAI Chat Completions API /chat/completions
gemini Gemini Native generateContent API /v1beta/models/{model}:generateContent

通过 PROVIDER 环境变量切换 API 格式。

快速开始

全局安装

npm install -g @shen866/vision-bridge-mcp

本地开发

git clone https://github.com/shen866/vision-bridge-mcp.git
cd vision-bridge-mcp
npm install
npm run build

配置 Claude Code

API_KEYBASE_URLMODEL 为必填项。

Anthropic Messages

{
  "mcpServers": {
    "vision_bridge": {
      "command": "npx",
      "args": ["-y", "@shen866/vision-bridge-mcp"],
      "env": {
        "API_KEY": "your-api-key",
        "BASE_URL": "https://api.anthropic.com",
        "MODEL": "claude-3-5-sonnet-20241022"
      }
    }
  }
}

如果使用本地构建版本,将 command 改为 node 并把 args 改为绝对路径:

{
  "mcpServers": {
    "vision_bridge": {
      "command": "node",
      "args": ["/Users/shen/workspace/kimi-vision-mcp/dist/index.js"],
      "env": {
        "API_KEY": "your-api-key",
        "BASE_URL": "https://api.anthropic.com",
        "MODEL": "claude-3-5-sonnet-20241022"
      }
    }
  }
}

OpenAI Chat Completions

{
  "mcpServers": {
    "vision_bridge": {
      "command": "npx",
      "args": ["-y", "@shen866/vision-bridge-mcp"],
      "env": {
        "PROVIDER": "openai",
        "API_KEY": "your-api-key",
        "BASE_URL": "https://api.openai.com/v1",
        "MODEL": "gpt-4o"
      }
    }
  }
}

Gemini Native generateContent

{
  "mcpServers": {
    "vision_bridge": {
      "command": "npx",
      "args": ["-y", "@shen866/vision-bridge-mcp"],
      "env": {
        "PROVIDER": "gemini",
        "API_KEY": "your-api-key",
        "BASE_URL": "https://generativelanguage.googleapis.com",
        "MODEL": "gemini-1.5-pro-latest"
      }
    }
  }
}

使用

在 Claude Code / Claude Desktop 中发送图片或引用图片路径后,Claude 会自动调用 describe_image 等工具获取多模态模型对图片的文字描述,然后将描述交给纯文本模型继续处理。

如果 Claude Code 经常直接把图片发给模型导致报错,可以在项目根目录创建 .claude/CLAUDE.md

当前模型不支持图片输入。当用户发送图片时,必须调用 vision_bridge MCP 工具处理,不要直接传给模型。

环境变量

变量 说明 默认值
PROVIDER API 格式:anthropicopenaigemini anthropic
API_KEY API 密钥 必填
BASE_URL API 基础地址 必填
ENDPOINT API 端点路径 取决于 provider
MODEL 模型名称 必填
MAX_TOKENS 最大输出 token 数 4096
API_VERSION Messages API 版本 2023-06-01
AUTH_HEADER Messages API 认证方式:x-api-keybearer x-api-key

协议

基于 MCP(Model Context Protocol)JSON-RPC 2.0,通过 stdio 进行通信。

401 排查

如果返回 401,通常是认证头不对。各 provider 默认认证方式如下:

  • anthropicx-api-key + anthropic-version
  • openaiAuthorization: Bearer
  • gemini:URL query parameter ?key=

如果 Messages API 网关需要 Bearer Token,可设置:

"AUTH_HEADER": "bearer"

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured