loop-verify
An MCP server that performs independent verification of artifacts against criteria using a different AI model lineage (codex, OpenAI, or Gemini) to catch defects that same-family checks might miss.
README
loop-verify
Have a different AI grade the work your AI just did.
loop-verify is an independent checker for the self-verification loop — the part the loop-kit loop honestly admits it cannot do.
When an AI checks its own work, it shares its own blind spots. Free loop-kit checks
Claude's work with Claude (same family → the same misses slip through). loop-verify
hands the grading to a different model lineage (codex / GPT / Gemini), so defects the
self-check waves through get caught. The verdict format is identical to loop-kit's
validator, so it's a drop-in replacement for the same-family check.
Open source (MIT). Just a tool — no accounts, no metering, no billing.
What it does
- In: your frozen YES/NO acceptance
criteria+ theartifactto inspect (a diff or file contents). - Out: a verdict —
PASS/FAIL, each criterionOK/NG, any defects outside the criteria, and concretefix_instructions. (Same contract as loop-kit'svalidator.) - The point: the grader is a different model family from whoever wrote the work, so it doesn't share their blind spots. That independence is the whole value — and it's measurable (see the edge bench).
Three ways to use it: as an MCP server, as a Python function, or via the edge bench.
Install
python3 -m venv ~/.venvs/loop-verify
~/.venvs/loop-verify/bin/pip install -r requirements.txt
Demo (one command, runs anywhere)
python demo/run_demo.py # deterministic, offline (mock backend)
python demo/run_demo.py --backend codex # the REAL edge (costs codex quota)
Exit code 0 iff the demo's invariants held, so it doubles as a smoke test. With
--backend codex it shows the independent checker catching planted defects a naive
same-family check misses.
Run as an MCP server
# local (stdio), codex backend:
LOOP_VERIFY_BACKEND=codex ~/.venvs/loop-verify/bin/python -m loop_verify.server
# HTTP transport (binds 127.0.0.1:8000 by default; localhost-only Host check):
LOOP_VERIFY_BACKEND=codex ~/.venvs/loop-verify/bin/python -m loop_verify.server --transport http
# ...to serve other hosts, bind all interfaces and allow their Host header:
LOOP_VERIFY_HOST=0.0.0.0 LOOP_VERIFY_PORT=8000 LOOP_VERIFY_ALLOWED_HOSTS=myhost:8000 \
LOOP_VERIFY_BACKEND=codex ~/.venvs/loop-verify/bin/python -m loop_verify.server --transport http
# (LOOP_VERIFY_ALLOWED_HOSTS="*" disables the Host check; binding 0.0.0.0 alone also
# opens it. Host/port are read at startup — set them via env, not flags.)
# OpenAI backend (needs OPENAI_API_KEY + `pip install openai`):
OPENAI_API_KEY=... LOOP_VERIFY_BACKEND=openai \
~/.venvs/loop-verify/bin/python -m loop_verify.server
# Gemini backend (needs GEMINI_API_KEY + `pip install google-genai`):
GEMINI_API_KEY=... LOOP_VERIFY_BACKEND=gemini \
~/.venvs/loop-verify/bin/python -m loop_verify.server
Tools: independent_verify(criteria, artifact) and info(). Backend selected by
LOOP_VERIFY_BACKEND (codex default | openai | gemini | mock). For http, bind
with LOOP_VERIFY_HOST / LOOP_VERIFY_PORT (read at startup).
Verify the http transport is reachable end to end (boots a server, runs a real MCP client round-trip, no key needed):
python demo/http_smoke.py
Deploy (Docker)
The codex backend needs the codex CLI (not in the image), so a container uses a
key-based backend:
docker build -t loop-verify .
docker run --rm -p 8000:8000 \
-e LOOP_VERIFY_BACKEND=openai -e OPENAI_API_KEY=sk-... \
loop-verify
# MCP endpoint: http://localhost:8000/mcp
The image binds 0.0.0.0, so FastMCP's DNS-rebinding Host check is off by default
(the container accepts any Host header). To restrict it, add
-e LOOP_VERIFY_ALLOWED_HOSTS=myhost:8000.
Use it from Python
from loop_verify.service import run_independent_verify
result = run_independent_verify(criteria, artifact, backend="codex")
# -> {"verdict": "PASS"|"FAIL", "passed": bool, "criteria": [...],
# "defects_outside": [...], "fix_instructions": str, "checker": ..., "lineage": ...}
Does independence actually help? (the edge bench)
python bench/edge_bench.py --backend codex # independent checker -> GO/NO-GO
python bench/edge_bench.py --backend mock # naive/blind baseline -> typically NO-GO
The gap between an independent checker (catches planted defects) and a naive one (misses them) is the whole reason to use this. Exit code = the edge verdict, so it can gate CI.
Measured on the bundled 9 fixtures (4 clean / 5 buggy, diverse bug classes): the codex backend scored recall 1.0, false-positive 0.0 → GO (every real bug flagged, every clean artifact passed), while the naive same-family baseline misses them → NO-GO.
Honest limits
- codex backend cost: the codex backend runs on the operator's personal ChatGPT Plus quota — fine for personal/local use, not for serving many users. Use the OpenAI backend with your own key for that.
- Independent ≠ ground truth: a different lineage reduces shared blind spots; it does not eliminate error.
- The edge is the point: if the bench ever shows the independent checker ≈ a naive one, there is no reason to use it — that is a NO-GO, reported honestly, not buried.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.