opencode-openai-vision-mcp
Provides image/vision support for OpenCode by routing images to any OpenAI-compatible endpoint, enabling paste-and-ask screenshot workflows.
README
opencode-openai-vision-mcp
A tiny MCP server that gives OpenCode (and any other MCP client) working image/vision support through any OpenAI-compatible endpoint — for example an OmniRoute or LiteLLM gateway, or OpenAI itself.
It reads an image file from disk, sends it to a vision-capable model as an
image_url content block, and returns the model's text description.
Why this exists
OpenCode currently cannot send image attachments to vision models served through a
custom OpenAI-compatible provider (@ai-sdk/openai-compatible). The attachment is
dropped before it reaches the model, and the assistant replies with something like:
this model does not support image input
This is an OpenCode adapter bug, tracked upstream in
anomalyco/opencode#20802
(fix PRs #26826 /
#21627 were not merged at the time
of writing). The underlying model and gateway are usually fine — a direct
/chat/completions call with image_url works; only OpenCode's conversion is broken.
This project is a workaround: instead of relying on OpenCode's native image path, it routes images through an MCP tool that you fully control.
How it works
OpenCode (you paste a screenshot)
└─ opencode-vision plugin intercepts the image inside OpenCode,
(separate project, see below) saves it to a temp file, and rewrites the
message to "call local_vision with this path"
└─ local_vision (THIS server) reads the file, base64-encodes it, and POSTs
it as image_url to an OpenAI-compatible endpoint
└─ your gateway/model e.g. OmniRoute -> any vision model
returns a text description back to OpenCode
This is a describe-then-reason approach: a vision model looks at the image and returns text; your main model works from that text. It is not native multimodality, but it restores the "paste a screenshot and ask" workflow.
Prerequisites
- Node.js >= 18
- A vision-capable model reachable via an OpenAI-compatible
/chat/completionsendpoint (OmniRoute, LiteLLM, OpenAI, etc.). - The opencode-vision plugin
(AGPL-3.0) — this is what intercepts the pasted image inside OpenCode and calls the
local_visiontool. This MCP server is the tool it calls. You need both.
Install
git clone https://github.com/WormAlien/opencode-openai-vision-mcp.git
cd opencode-openai-vision-mcp
npm install
Quick self-test (optional) — point it at your gateway and it should print a color word:
VISION_BASE_URL=http://localhost:20128/v1 \
VISION_API_KEY=your-key \
VISION_MODEL=your-vision-model \
node server.js
# then drive it with any MCP client, or just confirm it starts without error
Configure OpenCode
Add the MCP server to your opencode.json (see opencode.example.json):
{
"mcp": {
"local": {
"type": "local",
"command": ["node", "/absolute/path/to/opencode-openai-vision-mcp/server.js"],
"enabled": true,
"environment": {
"VISION_BASE_URL": "http://localhost:20128/v1",
"VISION_API_KEY": "YOUR_GATEWAY_API_KEY",
"VISION_MODEL": "your-vision-model-or-alias"
}
}
}
}
Naming the server
localmakes its tool resolve tolocal_vision, which matches the defaultimageAnalysisToolof the opencode-vision plugin — so no extra wiring needed.
Then enable the plugin for your models via opencode-vision.json
(see opencode-vision.example.json):
{
"models": ["your-provider/*"],
"imageAnalysisTool": "local_vision"
}
Restart OpenCode, select a model under that provider, paste an image, and ask away.
Environment variables
| Variable | Default | Description |
|---|---|---|
VISION_BASE_URL |
http://localhost:20128/v1 |
OpenAI-compatible base URL (must end in /v1). |
VISION_API_KEY |
(empty) | Bearer token for the endpoint. Omitted if empty. |
VISION_MODEL |
gpt-4o |
Vision model name, or a gateway alias. |
VISION_MAX_TOKENS |
1024 |
Max tokens for the description. |
Tip: point VISION_MODEL at a gateway alias (e.g. a vision alias in OmniRoute).
Then you can swap the real model in your gateway dashboard without editing any config.
The vision tool
- Input:
path(absolute path to a PNG/JPEG/WebP/GIF), optionalquestion. - Output: a text description (transcribes visible text by default).
- Handles both plain JSON and SSE/streamed responses from the endpoint.
Notes / limitations
- The image must be readable on the same machine the server runs on (it reads from the local filesystem path the plugin saved).
- Quality depends entirely on the vision model you point it at.
- Once OpenCode merges native image support for openai-compatible providers (#20802), you may not need this.
Credits
- opencode-vision (AGPL-3.0) — the OpenCode plugin that intercepts images and calls the MCP tool. A separate project; not bundled here.
- Model Context Protocol and its TypeScript SDK.
License
MIT — see LICENSE. (This applies to this MCP server only; the opencode-vision plugin has its own AGPL-3.0 license.)
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.