opencode-openai-vision-mcp

opencode-openai-vision-mcp

Provides image/vision support for OpenCode by routing images to any OpenAI-compatible endpoint, enabling paste-and-ask screenshot workflows.

Category
Visit Server

README

opencode-openai-vision-mcp

A tiny MCP server that gives OpenCode (and any other MCP client) working image/vision support through any OpenAI-compatible endpoint — for example an OmniRoute or LiteLLM gateway, or OpenAI itself.

It reads an image file from disk, sends it to a vision-capable model as an image_url content block, and returns the model's text description.

Why this exists

OpenCode currently cannot send image attachments to vision models served through a custom OpenAI-compatible provider (@ai-sdk/openai-compatible). The attachment is dropped before it reaches the model, and the assistant replies with something like:

this model does not support image input

This is an OpenCode adapter bug, tracked upstream in anomalyco/opencode#20802 (fix PRs #26826 / #21627 were not merged at the time of writing). The underlying model and gateway are usually fine — a direct /chat/completions call with image_url works; only OpenCode's conversion is broken.

This project is a workaround: instead of relying on OpenCode's native image path, it routes images through an MCP tool that you fully control.

How it works

OpenCode (you paste a screenshot)
  └─ opencode-vision plugin            intercepts the image inside OpenCode,
     (separate project, see below)     saves it to a temp file, and rewrites the
                                        message to "call local_vision with this path"
        └─ local_vision (THIS server)  reads the file, base64-encodes it, and POSTs
                                        it as image_url to an OpenAI-compatible endpoint
              └─ your gateway/model     e.g. OmniRoute -> any vision model
                                        returns a text description back to OpenCode

This is a describe-then-reason approach: a vision model looks at the image and returns text; your main model works from that text. It is not native multimodality, but it restores the "paste a screenshot and ask" workflow.

Prerequisites

  1. Node.js >= 18
  2. A vision-capable model reachable via an OpenAI-compatible /chat/completions endpoint (OmniRoute, LiteLLM, OpenAI, etc.).
  3. The opencode-vision plugin (AGPL-3.0) — this is what intercepts the pasted image inside OpenCode and calls the local_vision tool. This MCP server is the tool it calls. You need both.

Install

git clone https://github.com/WormAlien/opencode-openai-vision-mcp.git
cd opencode-openai-vision-mcp
npm install

Quick self-test (optional) — point it at your gateway and it should print a color word:

VISION_BASE_URL=http://localhost:20128/v1 \
VISION_API_KEY=your-key \
VISION_MODEL=your-vision-model \
node server.js
# then drive it with any MCP client, or just confirm it starts without error

Configure OpenCode

Add the MCP server to your opencode.json (see opencode.example.json):

{
  "mcp": {
    "local": {
      "type": "local",
      "command": ["node", "/absolute/path/to/opencode-openai-vision-mcp/server.js"],
      "enabled": true,
      "environment": {
        "VISION_BASE_URL": "http://localhost:20128/v1",
        "VISION_API_KEY": "YOUR_GATEWAY_API_KEY",
        "VISION_MODEL": "your-vision-model-or-alias"
      }
    }
  }
}

Naming the server local makes its tool resolve to local_vision, which matches the default imageAnalysisTool of the opencode-vision plugin — so no extra wiring needed.

Then enable the plugin for your models via opencode-vision.json (see opencode-vision.example.json):

{
  "models": ["your-provider/*"],
  "imageAnalysisTool": "local_vision"
}

Restart OpenCode, select a model under that provider, paste an image, and ask away.

Environment variables

Variable Default Description
VISION_BASE_URL http://localhost:20128/v1 OpenAI-compatible base URL (must end in /v1).
VISION_API_KEY (empty) Bearer token for the endpoint. Omitted if empty.
VISION_MODEL gpt-4o Vision model name, or a gateway alias.
VISION_MAX_TOKENS 1024 Max tokens for the description.

Tip: point VISION_MODEL at a gateway alias (e.g. a vision alias in OmniRoute). Then you can swap the real model in your gateway dashboard without editing any config.

The vision tool

  • Input: path (absolute path to a PNG/JPEG/WebP/GIF), optional question.
  • Output: a text description (transcribes visible text by default).
  • Handles both plain JSON and SSE/streamed responses from the endpoint.

Notes / limitations

  • The image must be readable on the same machine the server runs on (it reads from the local filesystem path the plugin saved).
  • Quality depends entirely on the vision model you point it at.
  • Once OpenCode merges native image support for openai-compatible providers (#20802), you may not need this.

Credits

License

MIT — see LICENSE. (This applies to this MCP server only; the opencode-vision plugin has its own AGPL-3.0 license.)

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured