goal-engine
Provides persistent goal-tracking with external evaluation for agentic CLIs, enabling run-until-done loops where an agent works across turns until a condition is met.
README
goal-engine
Run-until-done /goal loops for every major agentic CLI — Claude Code, Codex CLI, OpenCode, Cursor, and any MCP-compatible agent.
You state a completion condition ("all tests pass", "the PR is ready"); the agent keeps working across turns until an external evaluator confirms the condition is verifiably met — or until a loop guard or turn budget stops a runaway session.
Architecture: three composable layers
┌─────────────────────────────────────────────────────────┐
│ Layer 3 — npm package + installer CLI │
│ npx -y goal-engine · goal-engine install --all │
├─────────────────────────────────────────────────────────┤
│ Layer 2 — MCP server (goal-engine) │
│ set_goal / check_goal / get_status / clear_goal │
│ Evaluator via MCP sampling (no external API key) │
│ SQLite state · loop guard · turn budget │
├─────────────────────────────────────────────────────────┤
│ Layer 1 — portable SKILL.md │
│ Works standalone on any Agent Skills runtime │
└─────────────────────────────────────────────────────────┘
Each layer works on its own. The skill alone gives you self-checked goal loops anywhere; adding the MCP server upgrades the self-check to an independent evaluator with persistent state.
Why an external evaluator?
A skill-only goal loop asks the model to grade its own work inside the same context window — a confused agent can convince itself the goal is met. The MCP spec's sampling primitive lets this server request a completion from the connected client's own model in a fresh context, with a strict evaluation prompt. No API key, no extra provider, CLI-agnostic.
Evaluator fallback chain (strongest available wins):
- MCP sampling — the client's model judges the transcript (zero config)
- Anthropic API — set
ANTHROPIC_API_KEY(model:claude-opus-4-8, override withGOAL_ENGINE_EVAL_MODEL) - OpenAI API — set
OPENAI_API_KEY(model:gpt-4o, override withGOAL_ENGINE_OPENAI_MODEL) - Self-check — the tool returns strict self-verification instructions and never auto-completes
A flaky evaluator can never end a goal early: every evaluator failure resolves to done: false.
Install
# Install the /goal skill into every detected agent CLI
npx -y goal-engine install --all
# Or a specific one
npx -y goal-engine install --to claude-code # also: codex, opencode, cursor
Then connect the MCP server:
Claude Code
claude mcp add goal-engine -- npx -y goal-engine
Codex CLI (~/.codex/config.toml)
[mcp_servers.goal-engine]
command = "npx"
args = ["-y", "goal-engine"]
OpenCode (~/.config/opencode/config.json)
{ "mcp": { "goal-engine": { "type": "local", "command": ["npx", "-y", "goal-engine"] } } }
Bun users can substitute bunx goal-engine everywhere — the server auto-selects bun:sqlite, node:sqlite, or a JSON file for state.
Usage
/goal all unit tests pass and lint is clean
The agent then:
- calls
set_goalwith the condition verbatim, - works toward it with all available tools,
- calls
check_goalat the end of every turn with a concrete work summary, - treats each
done: falsereason as its next instruction, - stops only on
done: true(or escalates onloop_detected/budget_exhausted).
MCP tools
| Tool | Input | Output |
|---|---|---|
set_goal |
goal, session_id?, max_turns? (default 40) |
session_id, goal, max_turns, created_at |
check_goal |
session_id, summary |
done, reason?, evaluator, turns_used, max_turns |
get_status |
session_id? (defaults to latest active) |
goal, status, turns_used, elapsed_ms, last_check |
clear_goal |
session_id, completed? |
cleared, final_status |
Safety rails built into check_goal:
- Loop guard — 3 identical consecutive summaries return
loop_detectedand tell the agent to change approach or ask the user. - Turn budget —
max_turns(default 40, max 500) returnsbudget_exhaustedwith a partial-progress instruction. - Strict parsing — unparseable evaluator verdicts resolve to
done: false.
Optional: Claude Code Stop hook
The MCP-tool flow relies on the agent calling check_goal. The Stop hook closes the gap: it fires whenever Claude tries to stop, and blocks the stop while a goal is active and unmet.
mkdir -p ~/.goal-engine
cp hooks/stop-goal-evaluator.sh ~/.goal-engine/
chmod +x ~/.goal-engine/stop-goal-evaluator.sh
~/.claude/settings.json:
{
"hooks": {
"Stop": [{ "hooks": [{ "type": "command", "command": "~/.goal-engine/stop-goal-evaluator.sh" }] }]
}
}
Notes:
- The hook honors
stop_hook_active(no infinite recursion) and lets the agent stop once the turn budget is exhausted. - Claude Code caps consecutive Stop-hook blocks at 8 by default; set
CLAUDE_CODE_STOP_HOOK_BLOCK_CAP=40to match the default turn budget. - With
ANTHROPIC_API_KEYset, the hook evaluates the transcript's last assistant message; without it, the block reason instructs the agent to self-verify and finish viaclear_goal completed=true.
Environment
| Variable | Default | Purpose |
|---|---|---|
GOAL_ENGINE_DB |
~/.goal-engine/goal-engine.sqlite |
State file path |
GOAL_ENGINE_HOME |
~/.goal-engine |
Data directory |
ANTHROPIC_API_KEY |
— | Evaluator fallback when MCP sampling is unavailable |
GOAL_ENGINE_EVAL_MODEL |
claude-opus-4-8 |
Anthropic evaluator model |
OPENAI_API_KEY |
— | Second evaluator fallback |
GOAL_ENGINE_OPENAI_MODEL |
gpt-4o |
OpenAI evaluator model |
Development
bun install
bun run typecheck # tsc --noEmit
bun test # unit + MCP integration tests (in-memory transport)
bun run build # tsc → dist/
bun run smoke # drives dist/index.js over stdio with raw JSON-RPC
Project layout:
SKILL.md Layer 1 — portable Agent Skill
src/index.ts CLI entry: serve (default) | install | check-hook
src/server.ts MCP server: the four goal tools + sampling wiring
src/evaluator.ts Evaluator chain: sampling → Anthropic → OpenAI → self-check
src/db.ts State: bun:sqlite | node:sqlite | JSON fallback
src/loop-guard.ts Identical-turn loop detection
src/hook.ts Claude Code Stop hook logic
src/installer.ts Cross-CLI skill installer
hooks/stop-goal-evaluator.sh Stop hook wrapper script
agents/openai.yaml Codex plugin metadata
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.