MCP Vision Server

MCP Vision Server

Provides advanced image analysis capabilities including object recognition, OCR text extraction, and multi-turn visual dialogues using OpenAI-compatible APIs. It supports both local files and Base64 inputs with additional features for session persistence and web-based configuration management.

Category
Visit Server

README

MCP Vision Server - 图像识别 MCP 服务器

提供图像分析能力的 MCP 服务器,支持图像识别、文字提取、多轮对话等功能。

特性

  • 图像分析 - 支持各种图像内容识别与描述
  • 多轮对话 - 基于图像的连续问答
  • 灵活输入 - 支持本地文件路径和 Base64 编码
  • OpenAI 兼容 - 使用 OpenAI 兼容 API,支持多种视觉模型
  • 会话持久化 - 对话历史可持久化存储

安装

# 克隆仓库
git clone https://github.com/YOUR_USERNAME/mcp-vision-server.git
cd mcp-vision-server

# 创建虚拟环境
python -m venv venv
source venv/Scripts/activate  # Windows Git Bash

# 安装依赖
pip install -e .

配置

  1. 复制环境变量模板:
cp .env.example .env
  1. 编辑 .env 文件,填入您的 API 配置:
# 必填配置
VISION_API_KEY=your-api-key-here
VISION_BASE_URL=https://open.bigmodel.cn/api/paas/v4/
VISION_MODEL=glm-4v

使用方法

启动服务器

mcp-vision-server

或直接运行:

python -m mcp_vision.server

Web 配置工具

启动 Web 配置界面,支持热加载配置:

mcp-vision-config

或指定端口:

mcp-vision-config --host 127.0.0.1 --port 8080

访问 http://127.0.0.1:7860 即可打开配置界面。

功能特性

  • 📝 可视化编辑所有配置项
  • 🔄 保存后自动热加载,无需重启服务
  • 🔒 API Key 密码隐藏显示
  • 📋 实时查看当前运行配置

MCP 工具

1. analyze_image - 图像分析

分析图像内容并返回详细描述。

# 基础用法
analyze_image(
    image="C:/path/to/image.png",
    prompt="详细描述这张图片"
)

# OCR 文字提取
analyze_image(
    image="C:/docs/scan.png",
    prompt="提取图片中的所有文字"
)

# 代码识别
analyze_image(
    image="C:/code/snippet.png",
    prompt="识别并转录图片中的代码,保持格式"
)

2. chat_vision - 两轮对话

基于图像进行两轮问答。

# 第一轮对话
result1 = chat_vision(
    image="C:/chart.png",
    question="这个图表显示什么数据?"
)
session_id = result1["session_id"]
# remaining_turns = 1, can_continue = True

# 第二轮对话(追问细节,对话结束后无法继续)
if result1["remaining_turns"] > 0:
    result2 = chat_vision(
        image="C:/chart.png",
        question="数据有什么趋势?",
        session_id=session_id
    )
    # remaining_turns = 0, can_continue = False

# 开始新对话
result3 = chat_vision(
    image="C:/another.png",
    question="描述这张图",
    is_new_conversation=True
)

3. get_status - 状态查询

获取服务器运行状态。

status = get_status()
# 返回: 服务器名称、模型信息、会话状态等

输入格式

支持两种图像输入格式:

1. 本地文件路径

image="C:/Users/name/Pictures/screenshot.png"
image="/home/user/images/photo.jpg"

2. Base64 编码

# 纯 Base64
image="iVBORw0KGgoAAAANSUhEUgAA..."

# Data URL 格式
image="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."

环境变量

变量名 说明 默认值
VISION_API_KEY API 密钥 -
VISION_BASE_URL API 基础 URL -
VISION_MODEL 模型名称 glm-4v
VISION_MAX_IMAGE_SIZE 最大图像大小(字节) 20971520 (20MB)
VISION_TIMEOUT 请求超时(秒) 120
VISION_TEMPERATURE 温度参数 0.7
VISION_MAX_TOKENS 最大输出 tokens 4096
VISION_LOG_LEVEL 日志级别 INFO
VISION_MAX_HISTORY 对话历史最大保存数 50
VISION_ENABLE_PERSISTENCE 启用持久化 true
VISION_HISTORY_PATH 历史文件路径 ~/.mcp-vision/history.json

支持的图像格式

  • PNG
  • JPEG / JPG
  • GIF
  • WebP
  • BMP
  • TIFF

项目结构

mcp-vision-server/
├── src/mcp_vision/
│   ├── __init__.py           # 包初始化
│   ├── server.py             # MCP 服务器主文件
│   ├── config.py             # 配置管理
│   ├── vision_client.py      # 视觉 API 客户端
│   ├── image_processor.py    # 图像处理
│   ├── chat_manager.py       # 对话管理器
│   ├── web_config.py         # Web 配置工具
│   └── utils.py              # 工具函数
├── tests/
├── .env.example
├── pyproject.toml
└── README.md

在 Claude Code 中配置

编辑 Claude Code 配置文件,添加 MCP 服务器:

{
  "mcpServers": {
    "vision": {
      "command": "mcp-vision-server",
      "env": {
        "VISION_API_KEY": "your-api-key",
        "VISION_BASE_URL": "https://open.bigmodel.cn/api/paas/v4/",
        "VISION_MODEL": "glm-4v"
      }
    }
  }
}

许可证

MIT License

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured