MCP Servers

UNITARES

Runtime governance for AI-agent fleets that continuously monitors agent health, confidence, and behavior through check-ins, and returns verdicts to enable self-correction before failures occur.

README

Catch an AI agent going off the rails — before anything breaks.

Runtime governance & self-telemetry for fleets of autonomous AI agents.<br/> UNITARES watches each agent while it works and tells you — and the agent itself — the moment one starts to drift, while it's still just numbers moving and not yet broken output.

Status: live. First public commit 2025-12-04 · 3.7M+ governance events in production · dogfooded.

One layer of the CIRWEL stack — runtime safety infrastructure for autonomous agents, after deployment. UNITARES is the governed fleet; Anima is its self-sensing edge counterpart. Full index ↗

</div>

Drift surfaces while the output still looks fine. Each agent is graded against its own baseline, so slow degradation shows up as integrity slipping and entropy rising before the work visibly breaks.
Confidence is checked against results. Self-reported confidence is scored against real evidence — tests, exit codes, tool output. An agent can inflate the number; it can't inflate its success rate.
Agents get a state signal they can act on. Each check-in returns one plain verdict — proceed, guide, pause, or reject — plus EISV state for finer policies. Humans can watch the same fleet through the optional dashboard.

Use UNITARES if

you run autonomous or semi-autonomous coding, research, operations, or resident agents;
you want mid-run health signals, not only pre-deploy evals or post-hoc logs;
you need agents to check their own state before continuing; and
you want an audit trail of confidence, evidence, drift, and recovery.

UNITARES is not an output validator, sandbox, or hosted agent platform. It is the runtime state layer between evals and guardrails.

Try it in 60 seconds

git clone https://github.com/cirwel/unitares.git && cd unitares
docker compose up -d --wait && make demo

make demo drives a synthetic agent through seven check-ins — clean work, then confidence drifting from results, then confusion — and prints the verdict at each step. Then point any MCP client at http://localhost:8767/mcp/.

For a human operator view, open the optional dashboard at http://localhost:8767/dashboard. Dashboard implementation details live in dashboard/README.md; public deployment screenshots live in docs/PRODUCTION_SNAPSHOT.md.

Running continuously since November 2025 · 3.7M+ governance events under sustained load · dogfooded — the agents building UNITARES run under it. Every number is verifiable on a fresh clone. (Production snapshot →)

Where it fits

UNITARES runs alongside your evals and guardrails — it doesn't replace either. They answer different questions at different times:

	Question it answers	When it acts
Evals	Is this model good enough to ship?	before deploy
Guardrails	Is this action allowed right now?	per action
UNITARES	Is this agent still healthy as it works?	continuously, mid-run

How it works

After each unit of work, the agent checks in with sync_state() — passing its self-reported confidence plus verifiable evidence (test results, exit codes, tool output). It gets back one plain verdict:

proceed · guide · pause · reject

</div>

That's the whole contract: the agent reads the verdict and course-corrects before an external guardrail has to fire. No new vocabulary required to use it.

<details> <summary><strong>The four numbers behind the verdict (EISV)</strong></summary>

<br/>

Want to act on why, not just the verdict? Each check-in also returns four scores per agent, each graded against that agent's own ~30-check-in baseline — so slow drift surfaces even while output still looks fine:

		Goes wrong when…
E · Energy	is the work advancing?	thrashing, retries, no progress
I · Integrity	do claims match results?	high confidence, low actual success
S · Entropy	drifting from its own normal?	erratic, divergent behavior
V · Valence	derived: energy vs integrity	motion without coherence (or vice-versa)

</details>

How EISV is computed · Architecture · Who it's for & threat model

</div>

Integrate in two calls

# Inside the agent's loop
result = sync_state(response_text=output, complexity=0.6, confidence=0.8)
verdict = result.get("verdict", {}).get("value")   # proceed / guide / pause / reject

if verdict in ("pause", "reject"):
    agent.require_human_review(result["verdict"]["next_action"])

The agent reads the verdict and acts — that's the whole loop. UNITARES isn't an output validator or a sandbox; it's a state layer the agent itself can read, before external controls fire.

<details> <summary><strong>Finer control: branch on the EISV components</strong></summary>

<br/>

For per-dimension policies, read the four scores instead of the single verdict:

eisv = result.get("raw_governance", result).get("primary_eisv", {})

if eisv.get("I", 1) < 0.4:
    agent.require_human_review("integrity low — pausing autonomous actions")
elif eisv.get("S", 0) > 0.7:
    agent.narrow_scope()        # fewer tools, tighter search
elif eisv.get("E", 1) < 0.2:
    agent.stop_and_summarize()  # avoid thrashing

</details>

For long-running or scheduled agents, the SDK handles connection, identity, check-ins, and heartbeats. (Getting started · MCP client config)

Don't trust this README — verify it

Evaluating with an agent? Don't take the prose. On a fresh clone, the falsifiability harness scores EISV against a deliberately dumb baseline (AUC, Brier) and self-labels each slice INCONCLUSIVE / SKEPTICAL / WEAK SIGNAL / KEEP TESTING rather than asserting. The harness is the part you run yourself.

Honest about what fires. Verdicts come from an auditable behavioral model (behavioral_assessment.py), not a black box — the information-theoretic / free-energy formulation is the research target, not the live verdict path (Paper v6 · how EISV is computed).

Human evaluators start with the Reviewer Guide.

Stack & setup

Python 3.12+ · PostgreSQL + AGE + pgvector · Redis (optional). Transports: MCP on /mcp/ (Streamable HTTP) · REST on /v1/tools/call · Dashboard on /dashboard.

<details> <summary><strong>Alternate ports, bare-metal, and thin clients</strong></summary>

If 5432, 6379, or 8767 is already allocated, pick alternate host ports:

POSTGRES_HOST_PORT=15432 REDIS_HOST_PORT=16379 GOVERNANCE_HOST_PORT=18767 docker compose up -d --wait
UNITARES_DEMO_PORT=18767 make demo

Bare-metal (lower overhead, what the maintainer runs in production): PostgreSQL 16+ with Apache AGE + pgvector compiled and installed (examples use PG 17), Redis optional.

pip install -r requirements-full.txt
export DB_BACKEND=postgres
export DB_POSTGRES_URL=postgresql://postgres:postgres@localhost:5432/governance
export DB_AGE_GRAPH=governance_graph
export UNITARES_KNOWLEDGE_BACKEND=age
python src/mcp_server.py --port 8767

requirements-full.txt is the default (server, tests, handler dev); requirements-core.txt is a 2-package subset (mcp + numpy) for thin stdio/proxy clients. DB bring-up: db/postgres/README.md. Run signal-only without the math model: export UNITARES_DISABLE_ODE=1. Full port map: docs/operations/DEFINITIVE_PORTS.md.

</details>

Documentation

Guide	Purpose
Getting Started	Setup, workflows, tool modes
How EISV is computed	Deployed formulas vs. target semantics
Reviewer Guide	Cold-evaluator path + falsifiability harness
Scope & threat model	Who it's for, why agents can't game it, what's unproven
Architecture	Pipeline, verdicts, recovery, storage
Glossary	Terms keyed by the question they answer — published at cirwel.github.io/unitares
Production snapshot	Live metrics + dashboard views
MCP Clients	Cursor / Claude Code / Claude Desktop config
Troubleshooting	Common issues
Changelog	Releases

Three files at the repo root — CLAUDE.md, AGENTS.md, CODEX_START.md — orient AI CLIs (Claude Code, Codex). Human readers can skip them.

The CIRWEL stack

UNITARES is the governance runtime at the center of a larger body of work. The full index — papers, systems, datasets, and decks — lives at cirwel.github.io.

	What it is
unitares-governance-plugin	Mount any agent into governance — Claude Code / Codex plugin that wires check-ins, dialectic review, and verdicts into the loop via hooks
unitares-host-adapter	Thin client bindings — Hermes, Claude Code, Goose, and arbitrary OpenAI-compatible clients
anima-mcp	The self-sensing edge counterpart — the same EISV model on physical hardware; the longitudinal source cited in the papers
fermata	Governed-effect runtime seed — agents propose effects; only governed effects commit
unitares-discord-bridge	Governance events, agent presence, and system health as a live Discord server
eisv-lumen	Governance benchmark dataset — 32,181 labeled EISV trajectories (20,655 real)
unitares-paper-v6	Companion paper — Information-Theoretic Governance of Heterogeneous Agent Fleets (Wang, 2026); concept DOI 10.5281/zenodo.19647159

Citation

Kenny Wang (ORCID 0009-0006-7544-2374), CIRWEL Systems. If you build on this work, please cite — see CITATION.cff.

@misc{wang2026unitares,
  author       = {Wang, Kenny},
  title        = {{UNITARES}: Information-Theoretic Governance of Heterogeneous Agent Fleets},
  year         = {2026},
  doi          = {10.5281/zenodo.19647159},
  url          = {https://doi.org/10.5281/zenodo.19647159},
  note         = {Concept DOI; resolves to latest version. ORCID: 0009-0006-7544-2374}
}

Apache License 2.0 — see LICENSE and NOTICE.<br/> Built by @cirwel · CIRWEL Systems

</div>

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured