ForLoop MCP
Enables AI harnesses to run iterative loops on code repositories with built-in quality evaluations, security gates, and approval workflows for tasks like fixing tests or implementing features.
README
ForLoop MCP
“I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops. And this is transition we're going to see for the rest of the year.”
Boris Cherny
ForLoop MCP is an implementation of that shift: a local MCP server and loop runtime that lets an AI harness move from one-shot prompting to controlled execution.
Point your harness at a repository, give it a test command, optionally add a typecheck command, and ForLoop exposes repo tools, traceable state, approval gates, loop evals, quality evals, governance decisions, and a deterministic runtime that can drive a task until verifier checks pass, escalation is required, or the mission is no longer worth pursuing.
It separates the system into four explicit layers:
Skill.md = reusable task knowledge
MCP server = tools and external capabilities
Quality = verifier checks and evidence gates
Criteria = explicit standards the final answer must satisfy
Security = sanctioned tools, scoped paths, and configured commands
Governance = stop, escalate, recover, and abandon decisions
Orchestrator = control flow, state, retries, approvals, and traceability
This release ships a stdio MCP repo server plus a CLI orchestrator. The MCP server plugs into AI harnesses. The CLI runs the full model-agnostic loop with skills, model adapters, approvals, per-step evals, explicit criteria, security gates, quality gates, governance gates, final evals, traces, and a demo repo.
Quick Start
Install from npm:
npm install -g forloop-mcp
Run the MCP server with npx. This is the standard local stdio pattern: the harness launches a command, passes args, and talks to the server over stdin/stdout.
There is no single config file shape for every harness. Use the snippet that matches your client.
Claude Desktop, Claude Code project .mcp.json, Cursor, Windsurf, Devin Desktop, and other mcpServers clients:
{
"mcpServers": {
"forloopRepo": {
"type": "stdio",
"command": "npx",
"args": [
"-y",
"forloop-mcp@latest",
"--workspace",
"/absolute/path/to/repo",
"--test-command",
"npm test"
]
}
}
}
Claude Code CLI:
claude mcp add --transport stdio forloopRepo -- npx -y forloop-mcp@latest --workspace /absolute/path/to/repo --test-command "npm test"
VS Code .vscode/mcp.json:
{
"servers": {
"forloopRepo": {
"type": "stdio",
"command": "npx",
"args": [
"-y",
"forloop-mcp@latest",
"--workspace",
"${workspaceFolder}",
"--test-command",
"npm test"
]
}
}
}
Codex CLI:
codex mcp add forloopRepo -- npx -y forloop-mcp@latest --workspace /absolute/path/to/repo --test-command "npm test"
Codex TOML:
[mcp_servers.forloopRepo]
command = "npx"
args = ["-y", "forloop-mcp@latest", "--workspace", "/absolute/path/to/repo", "--test-command", "npm test"]
Windows fallback, for harnesses that do not resolve npx directly:
{
"mcpServers": {
"forloopRepo": {
"type": "stdio",
"command": "cmd",
"args": [
"/c",
"npx",
"-y",
"forloop-mcp@latest",
"--workspace",
"C:\\absolute\\path\\to\\repo",
"--test-command",
"npm test"
]
}
}
}
Direct MCP file edits are disabled by default. For trusted harnesses that already show tool approvals, add --allow-mutations to the args array.
For a second deterministic verifier, add a configured typecheck command:
"args": ["-y", "forloop-mcp@latest", "--workspace", "/absolute/path/to/repo", "--test-command", "npm test", "--typecheck-command", "npm run typecheck"]
For wider loops, restrict the action surface with repeated allowed-tool flags:
"args": ["-y", "forloop-mcp@latest", "--workspace", "/absolute/path/to/repo", "--test-command", "npm test", "--allowed-tool", "repo.list_files", "--allowed-tool", "repo.read_file", "--allowed-tool", "repo.run_tests"]
This package is built for local stdio MCP hosts. Remote ChatGPT/OpenAI connector surfaces require remote HTTP MCP servers, so use an HTTP bridge or deploy a remote wrapper if you need that environment.
If npm is unavailable or you want the latest main branch, use GitHub as the package source:
{
"mcpServers": {
"forloopRepo": {
"type": "stdio",
"command": "npx",
"args": [
"-y",
"github:Master0fFate/forloop-mcp",
"--workspace",
"/absolute/path/to/repo",
"--test-command",
"npm test"
]
}
}
}
Why this works: forloop-mcp publishes a binary named forloop-mcp. Modern npx runs the binary that matches the package name and passes every argument after the package spec to that binary.
npm install
npm run build
npm run smoke
Run the demo loop directly:
npm run dev -- run --workspace examples/buggy-auth-service --goal "Fix failing tests" --auto-approve
Start the MCP repo tool server over stdio:
npm run mcp -- --workspace examples/buggy-auth-service --test-command "npm test"
Direct MCP mutations are disabled by default. Enable them only for trusted clients:
npm run mcp -- --workspace examples/buggy-auth-service --test-command "npm test" --allow-mutations
CLI
npx -y forloop-mcp@latest --workspace /absolute/path/to/repo --test-command "npm test"
forloop init --workspace ./my-repo
forloop run --workspace ./my-repo --goal "Fix failing tests" --test-command "npm test" --typecheck-command "npm run typecheck"
forloop inspect --trace-db ./my-repo/.forloop/state.sqlite
forloop export-trace --trace-db ./my-repo/.forloop/state.sqlite --out trace.json
forloop mcp-repo --workspace ./my-repo --test-command "npm test" --typecheck-command "npm run typecheck"
Safety Defaults
- The model proposes structured actions; the runtime validates and executes.
repo.apply_patchrequires approval.- Direct MCP
repo.apply_patchcalls are denied unless the server is started with--allow-mutations. - Standalone MCP servers can restrict calls with repeated
--allowed-tool <name>flags. repo.run_testscan only run the configured test command.repo.run_typecheckcan only run the configured typecheck command, when one is configured.- File paths are sandboxed to the selected workspace.
- The orchestrator emits
security_evalbefore tool execution and denies unsanctioned tools before they run. - Every tool result is scored by a loop eval gate before the next iteration.
- Final answers are rejected by default unless the loop gathered tool evidence and recorded a passing configured test run.
- If
quality.requireTypecheckPassedis enabled, final answers also require a passing configured typecheck run. - High-risk decisions are escalated before execution by default.
- Repeated failed steps, repeated rejected finals, and exhausted recovery attempts can abandon the mission instead of burning the whole budget.
- Every model response, tool call, tool result, approval, evaluator result, and governance decision is persisted.
- Missing workspaces, missing skills, model failures, repeated actions, invalid model output, denied approvals, and budget exhaustion resolve to explicit task states instead of silent crashes.
Quality Loop
Execution loops answer “what action should run next?” Quality loops answer “is this good enough to ship?”
ForLoop makes that second loop explicit through the task quality block:
quality:
minStepScore: 0.2
minFinalConfidence: 0
requireEvidenceBeforeFinal: true
requireTestsPassed: true
requireTypecheckPassed: false
Each tool result emits quality_eval feedback for the next iteration. Final answers that do not clear the quality gate are rejected and fed back into the loop instead of being shipped as weak completion claims.
By default, the verifier is deterministic: registered tool schemas, workspace policy, configured tests, and optional configured typecheck. minFinalConfidence exists only as an extra policy knob; it is not treated as proof because it comes from the agent that produced the answer. Model-based quality review should use a separate verifier model or subagent with a different system prompt.
Evaluation Criteria
Loops only work as well as their evaluation criteria. ForLoop makes those criteria explicit and traceable:
evaluationCriteria:
- id: tool_evidence
kind: tool_evidence
description: The loop gathered tool evidence before final completion.
required: true
- id: tests_passed
kind: tests_passed
description: The latest configured test run passed.
required: true
- id: diff_present
kind: diff_present
description: A patch or non-empty diff was recorded.
required: false
Supported deterministic criteria are tool_evidence, tests_passed, typecheck_passed, and diff_present. Every final eval includes a criterion-by-criterion report with pass/fail, evidence, and feedback. If a required criterion fails, the final answer is rejected and the criteria report is fed into the next loop turn.
Security Gate
Closed loops mostly use gates for quality: did the work meet the standard? Open loops also use gates for security: is this action sanctioned at all?
ForLoop makes that boundary explicit through the task security block:
security:
allowedTools:
- repo.list_files
- repo.search_code
- repo.read_file
- repo.apply_patch
- repo.run_tests
- repo.run_typecheck
- repo.git_diff
requireApprovalForMutations: true
Before any tool runs, the deterministic security gate emits security_eval. Unknown tools, disallowed tools, workspace escapes, and unconfigured command attempts are denied as policy violations. The wider the loop, the smaller this allowed-tool set should be.
Governance
A loop answers “how does the work continue?” Governance answers “should it continue at all?”
ForLoop makes that decision explicit through the task governance block:
governance:
escalateHighRisk: true
recoverOnFailedStep: true
maxRecoveryAttempts: 3
maxFinalRejections: 2
maxConsecutiveFailedSteps: 3
budget:
maxIterations: 8
maxEmptyRounds: 2
maxApproxTokens: 12000
Governance emits governance_decision events with one of five actions: continue, recover, escalate, stop, or abandon. abandon is a first-class task outcome for missions that repeatedly fail quality gates, produce too many empty rounds, exhaust recovery attempts, or hit the approximate token ceiling.
Exit Conditions
The loop never ships because the agent says “looks done.” Completion is anchored to external criteria: verifier evidence, required evaluation criteria, budget limits, and governance policy.
Concrete exits include passing required criteria, human stop, denied security action, repeated rejected finals, maxEmptyRounds, maxIterations, and optional maxApproxTokens. Closed loops get ROI because the path and the spend ceiling are both explicit.
Current Scope
Implemented now:
- TypeScript CLI
- Repo debugging skill
- Mock model adapter
- OpenAI adapter boundary stub
- SQLite trace store
- Repo tool registry
- MCP stdio server exposing repo tools
- Deterministic loop, criteria, security, quality, governance, and final evaluator
- Demo fixture
- Unit, integration, and smoke tests
Not included in this MVP:
- Web UI
- Arbitrary shell tools
- Long-term memory
- Cloud deployment
- Live provider calls
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.