mcp-eyes
A drop-in MCP server that pairs long-context reasoning LLMs with vision models in description-only mode, enabling any reasoning model to 'see' images without the vision model giving advice or solutions.
README
vision-extension
Drop-in vision capability pack for text-only reasoning LLMs. One repo containing an MCP server and a Claude Code skill, both engineered around a single contract: the vision model only describes — your reasoning model does the thinking.
English
What's in this repo
| Directory | What it is | Who installs it |
|---|---|---|
mcp-vision-extension/ |
The MCP server (Python package vision_extension). Pairs any text-only reasoning model with any vision model over OpenAI or Anthropic protocol. |
Required. Install once per machine. |
skills-vision-extension/ |
A Claude Code skill that knows the install playbook AND the day-to-day collaboration patterns between the reasoning model and the vision model. | Optional but strongly recommended for Claude Code users. Copy into ~/.claude/skills/. |
These two pieces are designed to work together. The MCP server gives your text model vision; the skill teaches your text model how to use that vision well.
Install everything in 2 commands
# 1. The MCP server
pip install "git+https://github.com/loudMore/vision-extension.git#subdirectory=mcp-vision-extension"
# 2. The Claude Code skill (optional)
git clone https://github.com/loudMore/vision-extension.git /tmp/vx
cp -r /tmp/vx/skills-vision-extension/vision-extension ~/.claude/skills/
Then point your MCP client at the new server. Detailed steps + provider presets in mcp-vision-extension/README.md.
Or just tell your agent
If you have Claude Code (or any MCP-aware agent), copy the skill once:
git clone https://github.com/loudMore/vision-extension.git
cp -r vision-extension/skills-vision-extension/vision-extension ~/.claude/skills/
Then say:
"Install vision-extension. Use the
<doubao | openai | qwen | gemini | ollama | …>provider. Here's my key:<KEY>."
The skill handles the rest. You don't write any JSON.
Why this exists
Long-context reasoning models (DeepSeek V4 Pro, GLM 5.2, Kimi K2, Qwen 3 Max, …) are extraordinary at code and analysis but cannot see images. Naively bolting on a vision API has two problems:
- No standard pipe — every IDE wires it differently.
- Vision models love to "help" — GPT-4o, Gemini, Doubao all reflexively produce advice, debugging hypotheses, and design opinions when you only wanted a description. The reasoning work gets fragmented.
vision-extension solves both:
- One MCP server, works with Claude Code, Cursor, Continue, Cline, Roo, or anything else that speaks MCP.
- Describe-only contract — the vision model is system-prompted into a pure visual scanner. No advice. No fixes. No opinions. Just verbatim transcription and structured description.
- One Claude Code skill that turns the install + daily-use rules into a single trigger phrase.
- Provider-agnostic — Anthropic protocol, OpenAI-compatible protocol. Switch with one env var.
License
MIT.
中文
仓库里有什么
| 目录 | 是什么 | 谁要装 |
|---|---|---|
mcp-vision-extension/ |
MCP server(Python 包 vision_extension),把任意纯文本推理模型和任意视觉模型用 OpenAI/Anthropic 协议接到一起 |
必装,每台机器装一次 |
skills-vision-extension/ |
Claude Code skill,把安装流程 + 主模型与视觉模型的日常协作规则打包好 | 强烈推荐,复制到 ~/.claude/skills/ 即可 |
两块组件协同设计。MCP server 给文本模型装上视觉;skill 教文本模型怎么用好这套视觉。
两条命令搞定
# 1. 装 MCP server
pip install "git+https://github.com/loudMore/vision-extension.git#subdirectory=mcp-vision-extension"
# 2. 装 Claude Code skill(可选)
git clone https://github.com/loudMore/vision-extension.git /tmp/vx
cp -r /tmp/vx/skills-vision-extension/vision-extension ~/.claude/skills/
然后让你的 MCP 客户端配新 server。详细步骤和 12 个 provider 预设见 mcp-vision-extension/README.md。
或者直接让你的 agent 装
装完 skill 之后,对你的 Claude Code(或任何支持 MCP 的 agent)说:
"装个 vision-extension。视觉模型用
<豆包 | openai | 通义 | 智谱 | ollama | …>,key 是<KEY>。"
skill 会按 7 步确定流程把剩下的全做完。你不用写任何 JSON。
为什么做这个
DeepSeek V4 Pro / GLM 5.2 / Kimi K2 / Qwen 3 Max 这类长上下文推理模型推理超强,但看不见图。直接接个视觉 API 拼起来有两个老问题:
- 没有统一通道 —— 每个 IDE 接法都不一样
- 视觉模型爱"帮忙" —— GPT-4o / Gemini / 豆包都会条件反射地给方案、提假设、写评价,把推理工作抢走一半,你只想要个描述
vision-extension 一并解决:
- 一个 MCP server,Claude Code / Cursor / Continue / Cline / Roo 通用
- describe-only 契约 —— 视觉模型被系统提示锁成纯扫描器,不给建议、不给方案、不给评价,只做逐字转录和结构化描述
- 一个 Claude Code skill 把安装流程和日常使用规则压成一句话触发
- 协议解耦 —— Anthropic 协议、OpenAI 协议都支持,一个环境变量切换
License
MIT。
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.