clipboard-vision-mcp

clipboard-vision-mcp

Enables text-only AI models to understand clipboard images by describing them through a vision model, eliminating manual file saving.

Category
Visit Server

README

clipboard-vision-mcp

πŸ‡«πŸ‡· Version franΓ§aise disponible β†’ README.fr.md

Add vision to text-only models in Opencode (DeepSeek V4, GLM 5.1) β€” see the image in your clipboard directly, no manual file saving.

Tested on Windows 11 + Opencode + DeepSeek V4 Pro. Multi-OS clipboard support (Windows / macOS / Linux X11 / Linux Wayland).

Forked from itcomgroup/vision-mcp-server β€” rewritten around clipboard-first tools, security hardening, cross-platform clipboard extraction, and one-prompt AI install.


The problem

Cheap/fast text-only models like DeepSeek V4 and GLM 5.1 are great for code, but they cannot read images. Every time you paste a screenshot, the model asks you to save it to disk and provide a path.

The fix

This MCP server exposes *_from_clipboard tools. When the LLM needs to see your screenshot, it calls analyze_clipboard β€” the server reads the clipboard image, sends it to a real vision model (Groq + Llama-4 Scout, free tier), and returns a text description the text model can reason about.

Result: paste β†’ ask β†’ done. No file shuffling.


πŸ€– One-prompt install via AI (recommended)

Instead of running the steps below manually, paste one of these prompts into any coding assistant (DeepSeek, GLM, Claude, GPT, ...) and it will set everything up for you end-to-end β€” clone, venv, deps, MCP config, keybindings:

Prefer doing it yourself? Keep reading.


Features

  • πŸ–ΌοΈ Clipboard-first β€” analyze_clipboard, extract_text_from_clipboard, diagnose_error_from_clipboard, describe_ui_from_clipboard, code_from_clipboard.
  • πŸ“ File path fallback β€” same tools available for images already on disk.
  • πŸ†“ Free vision backend β€” Groq's free tier with Llama-4 Scout (17B, multimodal).
  • πŸ–₯️ Multi-OS β€” Windows, macOS, Linux (X11 + Wayland).
  • πŸ”’ Security hardened β€” extension/size/magic-byte validation, auto-delete of clipboard temp files after analysis.
  • πŸ”Œ MCP standard β€” works with Opencode, Claude Code, Cursor, Cline, Continue, or any MCP-capable client.

Requirements

  • Python 3.10+
  • Groq API key (free, 30 seconds sign-up): https://console.groq.com/keys
  • An MCP-capable client (Opencode, Claude Code, Cursor, Cline, Continue, ...)

Python dependencies (installed automatically via pip install -e .)

Package Purpose
mcp>=1.0.0 MCP protocol server
groq>=0.11.0 Groq API client (Llama-4 Scout vision)
aiofiles>=23.0.0 Async file I/O
Pillow>=10.0.0 Clipboard image extraction (Windows/macOS), PNG encoding

OS-specific clipboard dependencies

OS Command Why
Windows nothing extra Pillow + pywin32 handle the clipboard natively.
macOS brew install pngpaste (optional fallback) Pillow works in most cases; pngpaste as backup.
Linux β€” Wayland sudo apt install wl-clipboard Provides wl-paste.
Linux β€” X11 sudo apt install xclip Or your distro equivalent.

Quick start

1. Get a free Groq API key

https://console.groq.com/keys

2. Install

git clone https://github.com/Capetlevrai/clipboard-vision-mcp.git
cd clipboard-vision-mcp
python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate
pip install -e .

3. Smoke test (no Groq needed)

Copy any screenshot, then:

python examples/smoke_test.py

Expected: OK: clipboard image saved to <path>.

4. Wire to your MCP client

See docs/OPENCODE.md for Opencode (Windows-tested) or docs/CLIENTS.md for Claude Code / Cursor / Cline / Continue.

Opencode (%APPDATA%\opencode\opencode.json on Windows, ~/.config/opencode/opencode.json on Linux/macOS):

{
  "mcp": {
    "clipboard-vision": {
      "type": "local",
      "command": [
        "C:\\path\\to\\clipboard-vision-mcp\\.venv\\Scripts\\python.exe",
        "-m",
        "clipboard_vision_mcp"
      ],
      "enabled": true,
      "environment": {
        "GROQ_API_KEY": "gsk_your_key_here"
      }
    }
  }
}

πŸ’‘ Use the absolute path to the venv's Python. This guarantees the MCP starts with the right dependencies regardless of shell, cwd, or active venv.

5. ⚠️ Opencode keybindings for pasting images

Opencode does not bind image-paste to Ctrl+V / Alt+V by default. Without this step, copying a screenshot and hitting paste will insert nothing (or plain text).

Edit your Opencode keybinds.json or the keybinds section of opencode.json:

{
  "keybinds": {
    "input_paste": "ctrl+v",
    "input_paste_image": "alt+v"
  }
}

Restart Opencode after editing.

6. Does it auto-start after a Windows reboot?

Yes. Opencode re-reads opencode.json at every launch and auto-spawns any MCP server with "type": "local" and "enabled": true. Because the command uses the absolute path to the venv's Python, it doesn't matter which shell or working directory Opencode is launched from.

Reboot β†’ open Opencode β†’ clipboard-vision tools are listed. No manual step.


Usage

You: (copy a screenshot to clipboard, then type)
     "Look at what I just copied and tell me what's wrong with this error."

LLM (DeepSeek, GLM, Claude, ...): [calls diagnose_error_from_clipboard]
     β†’ "The error says `ECONNREFUSED 127.0.0.1:5432`. Postgres isn't
        running on port 5432. Start it with: ..."

The text-only model never sees pixels β€” it reads the description returned by Llama-4 Scout and reasons over it.

Tool reference

Tool Input Use when
analyze_clipboard optional prompt Generic description, Q&A on the clipboard image.
extract_text_from_clipboard β€” Pure OCR.
describe_ui_from_clipboard β€” UI/UX review, component inventory.
diagnose_error_from_clipboard β€” Error screenshot β†’ cause + fix.
code_from_clipboard β€” Extract code from a screenshot.
analyze_image image_path, optional prompt Image already on disk.
extract_text, describe_ui, diagnose_error, understand_diagram, analyze_chart, code_from_screenshot image_path Same as above for files.

Security

This server runs as a local stdio process β€” it does not open any network port and only talks to the MCP client over stdin/stdout and to the Groq API over HTTPS.

Hardening in place:

  • File type allow-list. analyze_image and the other file-path tools only accept .png .jpg .jpeg .gif .webp .bmp. This prevents a prompt-injected LLM from asking the server to read arbitrary local files (~/.ssh/id_rsa, .env, ...) and exfiltrate them as base64 to Groq.
  • Magic-byte check. File content is validated against known image headers before upload.
  • Size cap. 20 MB max per image.
  • Auto-delete clipboard temp files after each analysis. Screenshots may contain secrets (tokens, chats, credentials) β€” the server writes them to $TMPDIR/clipboard_vision_mcp/ and unlinks them on completion.
  • No telemetry. No analytics, no phone-home.

What this project cannot protect you from

  • Your API key lives in your MCP client config in plain text. That is how MCP clients work today. Keep that config file non-world-readable and never commit it. If you accidentally expose a key (chat, screenshot, git push), rotate it at https://console.groq.com/keys.
  • Groq receives the images you analyze. Check their privacy policy before sending anything sensitive.
  • Any MCP tool is executed at the direction of the LLM. If you connect a prompt-injected model to this server and feed it untrusted input, the model can choose what to analyze. The allow-list above reduces blast radius but cannot eliminate it.

Found an issue?

Please open a private security advisory rather than a public issue.


How it works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   MCP   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   HTTPS   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Opencode    β”‚ ──────▢ β”‚  clipboard-     β”‚ ────────▢ β”‚  Groq API       β”‚
β”‚  (DeepSeek)  β”‚         β”‚  vision-mcp     β”‚           β”‚  Llama-4 Scout  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
                     reads system clipboard
                     (PIL / wl-paste / xclip)
                     β†’ validate β†’ base64 β†’ send β†’ delete

Troubleshooting

  • "Clipboard does not contain an image." β€” Copy an actual image, not a file icon or text. On Linux, verify wl-paste --type image/png or xclip -selection clipboard -t image/png -o | file - works outside the MCP.
  • "GROQ_API_KEY is not set." β€” Check the environment block in your client config, then fully restart the client.
  • Tools don't appear in Opencode. β€” Check Opencode's MCP logs. Run python -m clipboard_vision_mcp manually β€” it should start and wait silently on stdin.
  • "Refusing to read '<ext>' β€” only image files are allowed." β€” You (or the LLM) passed a non-image path. This is the security guard doing its job.

Credits

License

MIT β€” see LICENSE.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured