loop-verify

loop-verify

An MCP server that performs independent verification of artifacts against criteria using a different AI model lineage (codex, OpenAI, or Gemini) to catch defects that same-family checks might miss.

Category
Visit Server

README

loop-verify

Have a different AI grade the work your AI just did.

loop-verify is an independent checker for the self-verification loop — the part the loop-kit loop honestly admits it cannot do.

When an AI checks its own work, it shares its own blind spots. Free loop-kit checks Claude's work with Claude (same family → the same misses slip through). loop-verify hands the grading to a different model lineage (codex / GPT / Gemini), so defects the self-check waves through get caught. The verdict format is identical to loop-kit's validator, so it's a drop-in replacement for the same-family check.

Open source (MIT). Just a tool — no accounts, no metering, no billing.

What it does

  • In: your frozen YES/NO acceptance criteria + the artifact to inspect (a diff or file contents).
  • Out: a verdict — PASS/FAIL, each criterion OK/NG, any defects outside the criteria, and concrete fix_instructions. (Same contract as loop-kit's validator.)
  • The point: the grader is a different model family from whoever wrote the work, so it doesn't share their blind spots. That independence is the whole value — and it's measurable (see the edge bench).

Three ways to use it: as an MCP server, as a Python function, or via the edge bench.

Install

python3 -m venv ~/.venvs/loop-verify
~/.venvs/loop-verify/bin/pip install -r requirements.txt

Demo (one command, runs anywhere)

python demo/run_demo.py                  # deterministic, offline (mock backend)
python demo/run_demo.py --backend codex  # the REAL edge (costs codex quota)

Exit code 0 iff the demo's invariants held, so it doubles as a smoke test. With --backend codex it shows the independent checker catching planted defects a naive same-family check misses.

Run as an MCP server

# local (stdio), codex backend:
LOOP_VERIFY_BACKEND=codex ~/.venvs/loop-verify/bin/python -m loop_verify.server

# HTTP transport (binds 127.0.0.1:8000 by default; localhost-only Host check):
LOOP_VERIFY_BACKEND=codex ~/.venvs/loop-verify/bin/python -m loop_verify.server --transport http

# ...to serve other hosts, bind all interfaces and allow their Host header:
LOOP_VERIFY_HOST=0.0.0.0 LOOP_VERIFY_PORT=8000 LOOP_VERIFY_ALLOWED_HOSTS=myhost:8000 \
LOOP_VERIFY_BACKEND=codex ~/.venvs/loop-verify/bin/python -m loop_verify.server --transport http
# (LOOP_VERIFY_ALLOWED_HOSTS="*" disables the Host check; binding 0.0.0.0 alone also
#  opens it. Host/port are read at startup — set them via env, not flags.)

# OpenAI backend (needs OPENAI_API_KEY + `pip install openai`):
OPENAI_API_KEY=... LOOP_VERIFY_BACKEND=openai \
  ~/.venvs/loop-verify/bin/python -m loop_verify.server

# Gemini backend (needs GEMINI_API_KEY + `pip install google-genai`):
GEMINI_API_KEY=... LOOP_VERIFY_BACKEND=gemini \
  ~/.venvs/loop-verify/bin/python -m loop_verify.server

Tools: independent_verify(criteria, artifact) and info(). Backend selected by LOOP_VERIFY_BACKEND (codex default | openai | gemini | mock). For http, bind with LOOP_VERIFY_HOST / LOOP_VERIFY_PORT (read at startup).

Verify the http transport is reachable end to end (boots a server, runs a real MCP client round-trip, no key needed):

python demo/http_smoke.py

Deploy (Docker)

The codex backend needs the codex CLI (not in the image), so a container uses a key-based backend:

docker build -t loop-verify .
docker run --rm -p 8000:8000 \
  -e LOOP_VERIFY_BACKEND=openai -e OPENAI_API_KEY=sk-... \
  loop-verify
# MCP endpoint: http://localhost:8000/mcp

The image binds 0.0.0.0, so FastMCP's DNS-rebinding Host check is off by default (the container accepts any Host header). To restrict it, add -e LOOP_VERIFY_ALLOWED_HOSTS=myhost:8000.

Use it from Python

from loop_verify.service import run_independent_verify

result = run_independent_verify(criteria, artifact, backend="codex")
# -> {"verdict": "PASS"|"FAIL", "passed": bool, "criteria": [...],
#     "defects_outside": [...], "fix_instructions": str, "checker": ..., "lineage": ...}

Does independence actually help? (the edge bench)

python bench/edge_bench.py --backend codex   # independent checker -> GO/NO-GO
python bench/edge_bench.py --backend mock     # naive/blind baseline -> typically NO-GO

The gap between an independent checker (catches planted defects) and a naive one (misses them) is the whole reason to use this. Exit code = the edge verdict, so it can gate CI.

Measured on the bundled 9 fixtures (4 clean / 5 buggy, diverse bug classes): the codex backend scored recall 1.0, false-positive 0.0 → GO (every real bug flagged, every clean artifact passed), while the naive same-family baseline misses them → NO-GO.

Honest limits

  • codex backend cost: the codex backend runs on the operator's personal ChatGPT Plus quota — fine for personal/local use, not for serving many users. Use the OpenAI backend with your own key for that.
  • Independent ≠ ground truth: a different lineage reduces shared blind spots; it does not eliminate error.
  • The edge is the point: if the bench ever shows the independent checker ≈ a naive one, there is no reason to use it — that is a NO-GO, reported honestly, not buried.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured