Mimo Vision MCP

Mimo Vision MCP

An image recognition MCP server based on mimo-v2.5, enabling AI assistants to describe images, extract text via OCR, and output structured information in JSON or table format.

Category
Visit Server

README

Mimo Vision MCP

基于 mimo-v2.5 的图像识别 MCP 服务器,为 AI 助手提供"看图"能力。

Python 3.10+ MCP License: MIT


功能特性

单一工具 recognize_image,通过 mode 参数切换三种识别能力:

模式 用途 输出格式 典型场景
describe 通用图像描述 纯文本 场景理解、内容审核、图片摘要
ocr 文字识别 纯文本 截图提取、文档数字化、车牌/招牌识别
extract 结构化信息提取 JSON 发票解析、名片录入、表格数据化

extract 模式输出格式

通过 format 参数控制输出结构:

行为
auto(默认) 由模型自动判断最合适的结构
json 强制输出 JSON 对象
table 强制输出 Markdown 表格

快速开始

1. 安装

# 克隆仓库
git clone https://github.com/k0tori/mimo-vision-mcp.git
cd mimo-vision-mcp

# 安装(开发模式)
pip install -e .

2. 配置 API

cp .env.example .env

编辑 .env,填入你的 API 凭据:

MIMO_API_BASE_URL=https://your-api-endpoint/v1
MIMO_API_KEY=your-api-key

3. 注册 MCP 服务器

在你的 MCP 客户端配置中添加:

{
  "mcpServers": {
    "mimo-vision": {
      "command": "python",
      "args": ["-m", "mimo_vision_mcp"],
      "env": {
        "MIMO_API_BASE_URL": "https://your-api-endpoint/v1",
        "MIMO_API_KEY": "your-api-key"
      }
    }
  }
}

提示: env 字段可选。如果省略,服务器会从项目目录下的 .env 文件读取配置。


使用示例

describe — 描述图片内容

{
  "image": "/path/to/photo.jpg",
  "mode": "describe"
}

也支持 URL 输入:

{ "image": "https://example.com/photo.jpg", "mode": "describe" }

ocr — 提取文字

{
  "image": "/path/to/screenshot.png",
  "mode": "ocr"
}

extract — 结构化提取

{
  "image": "/path/to/invoice.jpg",
  "mode": "extract",
  "prompt": "提取发票号码、开票日期和总金额"
}

强制 JSON 输出:

{
  "image": "/path/to/business-card.png",
  "mode": "extract",
  "format": "json"
}

工具接口

recognize_image

参数 类型 必填 默认值 说明
image string 本地文件路径或图片 URL
mode string describe / ocr / extract
prompt string "" 补充提示词,引导模型关注特定内容
format string "auto" 仅 extract 模式:auto / json / table

支持的图片格式

格式 扩展名 最大大小
PNG .png 20 MB
JPEG .jpg / .jpeg 20 MB
WebP .webp 20 MB
GIF .gif(静态) 20 MB

格式通过文件头(magic bytes)自动检测,不依赖扩展名。


错误处理

所有错误以纯文本返回,不会抛出异常。常见错误类型:

错误 示例
文件不存在 错误:文件不存在 - /path/to/missing.png
格式不支持 错误:不支持的图片格式,支持 PNG/JPEG/WebP/GIF
文件过大 错误:文件过大(25.3MB),限制 20MB
API 认证失败 错误:API 认证失败,请检查 API Key
模型返回异常 错误:模型调用失败 - ...

开发

# 安装开发依赖
pip install -e ".[dev]"

# 运行测试
pytest

# 运行测试(详细输出)
pytest -v

项目结构

src/mimo_vision_mcp/
├── __init__.py          # 包初始化
├── __main__.py          # python -m 入口
├── server.py            # MCP 服务,注册工具
├── api/
│   └── client.py        # OpenAI 兼容 API 客户端
├── tools/
│   └── recognize.py     # recognize_image 实现
└── utils/
    ├── image.py         # 图片处理(magic bytes、base64)
    └── prompts.py       # System prompt 定义

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured