low-hallucination-vision

low-hallucination-vision

A low-hallucination vision MCP server that uses OpenAI-compatible multimodal models with structured prompts, confidence gating, and forced JSON to reduce false claims in image analysis.

Category
Visit Server

README

Low-Hallucination Vision Toolkit

A drop-in replacement for high-hallucination vision MCPs (like the default analyze_image), built on top of your own OpenAI-compatible multimodal model (mimo v2.5). Two pieces that work together:

vision-mcp/      ← MCP server (Python). The engine. Plug into any agent.
vision-skill/    ← Skill (SKILL.md). The cross-verification workflow. Plug into ZCode.

Why this is lower-hallucination than a generic VLM call

It's not magic — it's five boring disciplines, all in the MCP layer:

  1. Mode-routed prompts — UI / general / OCR / detect each get a tightly scoped system prompt instead of one "describe everything" prompt.
  2. Forced structured JSON — every claim is an object with confidence.
  3. Low temperature (0.2 default) — less creative completion.
  4. "Allowed to be ignorant" — prompts explicitly forbid common-sense completion of details not actually visible.
  5. Confidence gating — the MCP reflags any claim below threshold as "_flag": "存疑", so the agent can't accidentally report it as fact.

The Skill adds a sixth layer on top: cross-verification (run two independent modes and only trust claims both agree on).


Setup (uv-managed environment)

This project uses uv for environment management. uv creates an isolated .venv per project and pins the Python version, so nothing pollutes your global Python. The .venv is what VSCode auto-detects.

1. Install dependencies & create the venv

cd C:\Users\zrzring\ZCodeProject\vision-mcp
uv sync

That single command:

  • reads .python-version (3.12) and auto-downloads that Python if missing,
  • creates vision-mcp\.venv,
  • installs everything in pyproject.toml (currently mcp[cli]).

To add a package later: uv add <pkg>. To rebuild after pulling the repo: just uv sync again. Never use raw pip here — it would install into the wrong place.

Verify it works:

uv run python -c "import main; print('OK', main.mcp.name)"
# → OK low-hallucination-vision

2. Configure API credentials

copy .env.example .env          # then edit .env
VISION_API_BASE=https://api.mimo.example.com/v1   # your OpenAI-compatible endpoint
VISION_API_KEY=sk-...
VISION_MODEL=mimo-vl-2.5
VISION_TEMPERATURE=0.2

3. Make VSCode detect the venv

VSCode's Python extension auto-detects .venv in the workspace. To be safe:

  1. Open the folder C:\Users\zrzring\ZCodeProject (not the single file) in VSCode.
  2. Install the Python extension (ms-python.python) if not already.
  3. Ctrl+Shift+PPython: Select Interpreter → pick the one shown as Python 3.12.13 ('.venv') under vision-mcp\.venv\Scripts\python.exe.

If it doesn't show up, force it with a workspace setting — create .vscode/settings.json in the project root:

{
  "python.defaultInterpreterForWorkspace": "vision-mcp\\.venv\\Scripts\\python.exe",
  "python.terminal.activateEnvironment": true
}

Now any terminal you open in VSCode auto-activates .venv, and you get autocomplete / type-checking for mcp and your code.

4. Register the MCP server with your agents

The server speaks stdio MCP. Use uv run to launch it — this guarantees the project's .venv is used regardless of the agent's working directory:

Claude Code~/.claude.json (or project .mcp.json):

{
  "mcpServers": {
    "low-hallucination-vision": {
      "command": "uv",
      "args": ["run", "--directory",
                "C:\\Users\\zrzring\\ZCodeProject\\vision-mcp",
                "python", "main.py"],
      "env": {
        "VISION_API_BASE": "https://api.mimo.example.com/v1",
        "VISION_API_KEY": "sk-...",
        "VISION_MODEL": "mimo-vl-2.5"
      }
    }
  }
}

OpenCodeopencode.json:

{
  "mcp": {
    "low-hallucination-vision": {
      "type": "local",
      "command": ["uv", "run", "--directory",
                   "C:\\Users\\zrzring\\ZCodeProject\\vision-mcp",
                   "python", "main.py"],
      "environment": {
        "VISION_API_BASE": "https://api.mimo.example.com/v1",
        "VISION_API_KEY": "sk-...",
        "VISION_MODEL": "mimo-vl-2.5"
      }
    }
  }
}

ZCode — same mcpServers shape as Claude Code.

Why uv run --directory instead of a bare python? Because the agent may launch the server from any working directory; uv run --directory always activates the right .venv. Environment variables can live in the config (as above) OR in vision-mcp/.env — either works.


Alternative: build a standalone vision-mcp.exe

If you'd rather not depend on uv/Python at runtime, package the server into a single executable with PyInstaller. The exe is self-contained (~24 MB), needs no Python installed, and works on any machine when shipped with its .env. It runs in two modes: a stdio MCP server (default) and a command-line image tool.

Build it

pyinstaller is already in pyproject.toml, so after uv sync:

cd C:\Users\zrzring\ZCodeProject\vision-mcp
uv run pyinstaller --onefile --name vision-mcp --collect-all mcp --clean --noconfirm main.py

Output lands in dist\vision-mcp.exe. The vision-mcp.spec file is auto-generated; you can re-run pyinstaller vision-mcp.spec --noconfirm after that for identical builds.

Put it on PATH and configure

  1. Copy the exe and your .env to a directory already on PATH (e.g. C:\Users\<you>\.local\bin):

    copy dist\vision-mcp.exe  C:\Users\<you>\.local\bin\
    copy .env                 C:\Users\<you>\.local\bin\
    
  2. The exe reads .env from its own directory first, then the source dir, then the working dir. So keep .env next to the exe — change key/endpoint there, no rebuild needed.

  3. Verify from anywhere:

    vision-mcp --help
    vision-mcp analyze C:\path\to\pic.png --mode general --prompt "describe it"
    

Register the exe with agents

Because the exe defaults to MCP-server mode, agent config is minimal — no uv run, no args, no env block (creds come from the exe's .env):

Claude Code~/.claude.json (or project .mcp.json):

{
  "mcpServers": {
    "low-hallucination-vision": {
      "command": "vision-mcp"
    }
  }
}

OpenCodeopencode.json:

{
  "mcp": {
    "low-hallucination-vision": {
      "type": "local",
      "command": ["vision-mcp"]
    }
  }
}

If vision-mcp isn't on PATH for the agent, use the full path instead: "command": "C:\\Users\\<you>\\.local\\bin\\vision-mcp.exe".

CLI mode (use it directly, no agent)

The same exe doubles as a terminal image tool:

vision-mcp                                    # = MCP server (default)
vision-mcp mcp                                #   "    (explicit)
vision-mcp analyze <image> [--mode general|ui_screenshot|ocr|detect] [--prompt "..."]
vision-mcp ocr      <image> [--prompt "..."]
vision-mcp detect   <image> [--prompt "..."]

<image> is a local path or an http(s) URL. Output is the same JSON the MCP tools return (with bbox normalization + confidence flagging applied).

Source vs exe — which to use? Source (uv run) is best while developing (edit main.py, reload instantly). The exe is best for daily use and sharing to other machines — no Python toolchain needed.

3. (Optional) Register the Skill with ZCode

Copy or symlink vision-skill/ into your skills directory so the cross-verification workflow is auto-loaded:

<skills-dir>/low-hallucination-vision/SKILL.md

The Skill is agent-agnostic in content but only ZCode auto-discovers Skills. For Claude Code / OpenCode, the MCP tools alone still work — just keep the Skill's workflow in mind (or paste the relevant section into your own prompt).


Tools provided

Tool What it does When to use
analyze_image(image_source, mode, prompt, temperature) Structured analysis; mode = general / ui_screenshot / ocr / detect Default entry point
ocr_extract(image_source, prompt, temperature) Text-only extraction When you only need words
detect_elements(image_source, prompt, temperature) Object detection with mandatory bbox When you need locations

All three return JSON. Claims below VISION_CONFIDENCE_THRESHOLD (default 0.6) are tagged "_flag": "存疑".

image_source accepts either a local file path or an http(s) URL.


File map

ZCodeProject/
├── vision-mcp/                ← uv project (this README lives here)
│   ├── main.py                ← the MCP server + CLI (engine + anti-hallucination)
│   ├── pyproject.toml         ← deps: mcp[cli], pyinstaller
│   ├── uv.lock                ← pinned versions (auto-generated)
│   ├── .python-version        ← 3.12 (uv auto-downloads it)
│   ├── .env.example           ← copy to .env and fill in
│   ├── vision-mcp.spec        ← auto-generated by PyInstaller (for rebuilds)
│   ├── .venv/                 ← created by `uv sync` (gitignored)
│   ├── build/                 ← PyInstaller intermediates (gitignored)
│   └── dist/
│       └── vision-mcp.exe     ← the standalone exe (built, gitignored)
└── .agents/                   ← skill(s) discovered by ZCode
    └── skills/vision-skill/
        └── SKILL.md           ← cross-verification workflow for the agent

Tuning

  • Still too much hallucination? Lower VISION_TEMPERATURE to 0.1 and raise VISION_CONFIDENCE_THRESHOLD to 0.7.
  • Missing real things (over-conservative)? Lower the threshold to 0.5 and raise temperature slightly to 0.3.
  • Model keeps breaking JSON? Some VLMs ignore schema instructions; in that case the tool returns "_parse_error": true with the raw text so you can post-process. Consider switching to a model with stronger JSON support.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured