low-hallucination-vision
A low-hallucination vision MCP server that uses OpenAI-compatible multimodal models with structured prompts, confidence gating, and forced JSON to reduce false claims in image analysis.
README
Low-Hallucination Vision Toolkit
A drop-in replacement for high-hallucination vision MCPs (like the default
analyze_image), built on top of your own OpenAI-compatible multimodal model
(mimo v2.5). Two pieces that work together:
vision-mcp/ ← MCP server (Python). The engine. Plug into any agent.
vision-skill/ ← Skill (SKILL.md). The cross-verification workflow. Plug into ZCode.
Why this is lower-hallucination than a generic VLM call
It's not magic — it's five boring disciplines, all in the MCP layer:
- Mode-routed prompts — UI / general / OCR / detect each get a tightly scoped system prompt instead of one "describe everything" prompt.
- Forced structured JSON — every claim is an object with
confidence. - Low temperature (0.2 default) — less creative completion.
- "Allowed to be ignorant" — prompts explicitly forbid common-sense completion of details not actually visible.
- Confidence gating — the MCP reflags any claim below threshold as
"_flag": "存疑", so the agent can't accidentally report it as fact.
The Skill adds a sixth layer on top: cross-verification (run two independent modes and only trust claims both agree on).
Setup (uv-managed environment)
This project uses uv for environment management.
uv creates an isolated .venv per project and pins the Python version, so
nothing pollutes your global Python. The .venv is what VSCode auto-detects.
1. Install dependencies & create the venv
cd C:\Users\zrzring\ZCodeProject\vision-mcp
uv sync
That single command:
- reads
.python-version(3.12) and auto-downloads that Python if missing, - creates
vision-mcp\.venv, - installs everything in
pyproject.toml(currentlymcp[cli]).
To add a package later:
uv add <pkg>. To rebuild after pulling the repo: justuv syncagain. Never use rawpiphere — it would install into the wrong place.
Verify it works:
uv run python -c "import main; print('OK', main.mcp.name)"
# → OK low-hallucination-vision
2. Configure API credentials
copy .env.example .env # then edit .env
VISION_API_BASE=https://api.mimo.example.com/v1 # your OpenAI-compatible endpoint
VISION_API_KEY=sk-...
VISION_MODEL=mimo-vl-2.5
VISION_TEMPERATURE=0.2
3. Make VSCode detect the venv
VSCode's Python extension auto-detects .venv in the workspace. To be safe:
- Open the folder
C:\Users\zrzring\ZCodeProject(not the single file) in VSCode. - Install the Python extension (ms-python.python) if not already.
Ctrl+Shift+P→ Python: Select Interpreter → pick the one shown asPython 3.12.13 ('.venv')undervision-mcp\.venv\Scripts\python.exe.
If it doesn't show up, force it with a workspace setting — create
.vscode/settings.json in the project root:
{
"python.defaultInterpreterForWorkspace": "vision-mcp\\.venv\\Scripts\\python.exe",
"python.terminal.activateEnvironment": true
}
Now any terminal you open in VSCode auto-activates .venv, and you get
autocomplete / type-checking for mcp and your code.
4. Register the MCP server with your agents
The server speaks stdio MCP. Use uv run to launch it — this guarantees
the project's .venv is used regardless of the agent's working directory:
Claude Code — ~/.claude.json (or project .mcp.json):
{
"mcpServers": {
"low-hallucination-vision": {
"command": "uv",
"args": ["run", "--directory",
"C:\\Users\\zrzring\\ZCodeProject\\vision-mcp",
"python", "main.py"],
"env": {
"VISION_API_BASE": "https://api.mimo.example.com/v1",
"VISION_API_KEY": "sk-...",
"VISION_MODEL": "mimo-vl-2.5"
}
}
}
}
OpenCode — opencode.json:
{
"mcp": {
"low-hallucination-vision": {
"type": "local",
"command": ["uv", "run", "--directory",
"C:\\Users\\zrzring\\ZCodeProject\\vision-mcp",
"python", "main.py"],
"environment": {
"VISION_API_BASE": "https://api.mimo.example.com/v1",
"VISION_API_KEY": "sk-...",
"VISION_MODEL": "mimo-vl-2.5"
}
}
}
}
ZCode — same mcpServers shape as Claude Code.
Why
uv run --directoryinstead of a barepython? Because the agent may launch the server from any working directory;uv run --directoryalways activates the right.venv. Environment variables can live in the config (as above) OR invision-mcp/.env— either works.
Alternative: build a standalone vision-mcp.exe
If you'd rather not depend on uv/Python at runtime, package the server into
a single executable with PyInstaller. The exe is self-contained (~24 MB),
needs no Python installed, and works on any machine when shipped with its
.env. It runs in two modes: a stdio MCP server (default) and a
command-line image tool.
Build it
pyinstaller is already in pyproject.toml, so after uv sync:
cd C:\Users\zrzring\ZCodeProject\vision-mcp
uv run pyinstaller --onefile --name vision-mcp --collect-all mcp --clean --noconfirm main.py
Output lands in dist\vision-mcp.exe. The vision-mcp.spec file is
auto-generated; you can re-run pyinstaller vision-mcp.spec --noconfirm
after that for identical builds.
Put it on PATH and configure
-
Copy the exe and your
.envto a directory already on PATH (e.g.C:\Users\<you>\.local\bin):copy dist\vision-mcp.exe C:\Users\<you>\.local\bin\ copy .env C:\Users\<you>\.local\bin\ -
The exe reads
.envfrom its own directory first, then the source dir, then the working dir. So keep.envnext to the exe — change key/endpoint there, no rebuild needed. -
Verify from anywhere:
vision-mcp --help vision-mcp analyze C:\path\to\pic.png --mode general --prompt "describe it"
Register the exe with agents
Because the exe defaults to MCP-server mode, agent config is minimal — no
uv run, no args, no env block (creds come from the exe's .env):
Claude Code — ~/.claude.json (or project .mcp.json):
{
"mcpServers": {
"low-hallucination-vision": {
"command": "vision-mcp"
}
}
}
OpenCode — opencode.json:
{
"mcp": {
"low-hallucination-vision": {
"type": "local",
"command": ["vision-mcp"]
}
}
}
If vision-mcp isn't on PATH for the agent, use the full path instead:
"command": "C:\\Users\\<you>\\.local\\bin\\vision-mcp.exe".
CLI mode (use it directly, no agent)
The same exe doubles as a terminal image tool:
vision-mcp # = MCP server (default)
vision-mcp mcp # " (explicit)
vision-mcp analyze <image> [--mode general|ui_screenshot|ocr|detect] [--prompt "..."]
vision-mcp ocr <image> [--prompt "..."]
vision-mcp detect <image> [--prompt "..."]
<image> is a local path or an http(s) URL. Output is the same JSON the MCP
tools return (with bbox normalization + confidence flagging applied).
Source vs exe — which to use? Source (
uv run) is best while developing (editmain.py, reload instantly). The exe is best for daily use and sharing to other machines — no Python toolchain needed.
3. (Optional) Register the Skill with ZCode
Copy or symlink vision-skill/ into your skills directory so the
cross-verification workflow is auto-loaded:
<skills-dir>/low-hallucination-vision/SKILL.md
The Skill is agent-agnostic in content but only ZCode auto-discovers Skills. For Claude Code / OpenCode, the MCP tools alone still work — just keep the Skill's workflow in mind (or paste the relevant section into your own prompt).
Tools provided
| Tool | What it does | When to use |
|---|---|---|
analyze_image(image_source, mode, prompt, temperature) |
Structured analysis; mode = general / ui_screenshot / ocr / detect |
Default entry point |
ocr_extract(image_source, prompt, temperature) |
Text-only extraction | When you only need words |
detect_elements(image_source, prompt, temperature) |
Object detection with mandatory bbox | When you need locations |
All three return JSON. Claims below VISION_CONFIDENCE_THRESHOLD (default 0.6)
are tagged "_flag": "存疑".
image_source accepts either a local file path or an http(s) URL.
File map
ZCodeProject/
├── vision-mcp/ ← uv project (this README lives here)
│ ├── main.py ← the MCP server + CLI (engine + anti-hallucination)
│ ├── pyproject.toml ← deps: mcp[cli], pyinstaller
│ ├── uv.lock ← pinned versions (auto-generated)
│ ├── .python-version ← 3.12 (uv auto-downloads it)
│ ├── .env.example ← copy to .env and fill in
│ ├── vision-mcp.spec ← auto-generated by PyInstaller (for rebuilds)
│ ├── .venv/ ← created by `uv sync` (gitignored)
│ ├── build/ ← PyInstaller intermediates (gitignored)
│ └── dist/
│ └── vision-mcp.exe ← the standalone exe (built, gitignored)
└── .agents/ ← skill(s) discovered by ZCode
└── skills/vision-skill/
└── SKILL.md ← cross-verification workflow for the agent
Tuning
- Still too much hallucination? Lower
VISION_TEMPERATUREto 0.1 and raiseVISION_CONFIDENCE_THRESHOLDto 0.7. - Missing real things (over-conservative)? Lower the threshold to 0.5 and raise temperature slightly to 0.3.
- Model keeps breaking JSON? Some VLMs ignore schema instructions; in
that case the tool returns
"_parse_error": truewith the raw text so you can post-process. Consider switching to a model with stronger JSON support.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.