ForgeSwarm
An MCP server that turns independent AI agents into a coordinated engineering team with shared task board, context, review loop, and enforced plan-implement-review-iterate workflow.
README
ForgeSwarm ๐ ๏ธ๐
An MCP server that turns independent AI agents into a coordinated engineering team.
Most MCP servers give agents data (GitHub, databases, web). ForgeSwarm gives them coordination: a shared task board with atomic claiming, a shared context blackboard, a decision log, and an enforced plan โ implement โ review โ iterate loop โ the same workflow shape that powers orchestration harnesses like CyOps, distilled into an open protocol primitive any MCP client can plug into.
Connect Claude Code, Codex, OpenCode, or a MiniMax M3-powered script to the same ForgeSwarm server, and they instantly become citizens of one swarm: claiming tasks without collisions, briefing each other through shared memory, and reviewing each other's work before anything counts as done.
Built for the CyOps Arena Hackathon โ MCP Server Sprint (co-hosted with MiniMax).
Why this exists
Multi-agent coding fails in predictable ways: two agents grab the same task, an agent starts work with no idea what the others decided, "done" means "the model said done", and a crashed agent silently stalls the project. ForgeSwarm fixes each one server-side, so correctness doesn't depend on prompt discipline:
| Failure mode | ForgeSwarm mechanism |
|---|---|
| Two agents do the same work | claim_task is a single atomic conditional UPDATE โ one winner, always |
| Agent starts cold, repeats settled debates | get_briefing bundles goal, constraints, decisions, dependency summaries, and prior review feedback into one onboarding packet |
| "Done" is just an assertion | submit_for_review โ a different agent must post_review; self-review is rejected; request_changes auto-returns the task to its author with feedback attached and bumps the iteration counter |
| "Tests pass, trust me" | run_checks runs allowlisted test/lint commands with a hard timeout and records exit code + output on the task as review evidence |
| Crashed agent stalls the swarm | Claims carry leases; expired leases put tasks back on the board automatically |
| Disagreements evaporate into chat | open_discussion โ positions from โฅ2 distinct agents (server-enforced) โ resolve_discussion auto-records the consensus as a binding decision in every future briefing |
| The swarm never learns | get_retrospective compiles hard evidence โ review bounce rates, check pass rates, per-agent stats, hotspot tasks โ for the swarm to analyze and act on |
| State lost between sessions | Everything persists in SQLite (WAL) โ swarms survive restarts and work across both transports |
Install
From source (not yet on PyPI):
git clone https://github.com/H2SO4620/forgeswarm && cd forgeswarm
pip install -e ".[dev]"
pytest # 20 tests, including end-to-end MCP client sessions
forgeswarm
Transports
forgeswarm # stdio (local clients spawn it)
forgeswarm --transport http --port 8765 # one shared endpoint for a whole swarm
forgeswarm --db ./myproject.db # or set FORGESWARM_DB
State is SQLite either way (default ~/.forgeswarm/forgeswarm.db), so stdio clients โ
which each spawn their own server process โ still share one swarm.
Claude Code
claude mcp add forgeswarm -- uvx forgeswarm
Or in any MCP client config:
{
"mcpServers": {
"forgeswarm": { "command": "uvx", "args": ["forgeswarm"] }
}
}
The loop
flowchart LR
G[Goal] --> P[create_project<br/>submit_plan]
P --> B[Task board]
B -->|claim_task<br/>atomic| W[Agent works<br/>get_briefing ยท save_context ยท run_checks]
W --> S[submit_for_review]
S --> R{post_review<br/>by a different agent}
R -->|approve| D[done โ]
R -->|request_changes<br/>iteration++| W
D --> B
Tools (24)
Planning โ create_project, submit_plan (whole dependency graph in one call), list_projects, register_agent
Task board โ list_tasks (with ready_only), claim_task (atomic, leased), update_task (progress + lease renewal), complete_task, get_task_graph
Shared context โ save_context, search_context, record_decision, get_briefing โญ
Review loop โ submit_for_review, get_review_queue, post_review
Discussion & consensus โ open_discussion, post_to_discussion, resolve_discussion (consensus becomes a recorded decision automatically), list_discussions
Workflow templates โ list_workflow_templates, get_workflow_template (ship-feature, refactor-module, debug-issue โ dependency-wired task graphs ready for submit_plan)
Verification & reflection โ run_checks (allowlisted: pytest, ruff, mypy, npm, cargo, go, โฆ; no shell, hard timeout, evidence recorded), get_retrospective (swarm performance evidence: bounce rates, iterations, per-agent stats)
Resources & Prompts
Live swarm state, readable without tool calls:
swarm://projects ยท swarm://agents ยท swarm://project/{id}/status ยท
swarm://project/{id}/tasks ยท swarm://project/{id}/decisions ยท
swarm://project/{id}/discussions ยท swarm://project/{id}/retrospective ยท
swarm://project/{id}/context
Role prompts that make any MCP client swarm-ready in one message:
planner ยท implementer ยท reviewer ยท standup_summary (rendered from live board state)
Demo: a MiniMax M3 swarm builds software through ForgeSwarm
examples/minimax_swarm_demo.py runs three
MiniMax M3 agents โ planner, implementer, reviewer โ that coordinate entirely
through ForgeSwarm tools over a real MCP stdio session: the planner decomposes a
goal into a task graph, the implementer claims tasks and submits work, the reviewer
approves or bounces it, and the loop runs until the board is green.
pip install -e ".[demo]"
set MINIMAX_API_KEY=sk-... # export on macOS/Linux
python examples/minimax_swarm_demo.py "Build a CLI pomodoro timer in Python"
M3 is also available through OpenRouter (same model, smaller minimum top-up):
set MINIMAX_API_KEY=sk-or-...
set MINIMAX_BASE_URL=https://openrouter.ai/api/v1
set MINIMAX_MODEL=minimax/minimax-m3
No API key handy? examples/quickstart_client.py
walks the identical workflow with a scripted client โ no LLM required:
python examples/quickstart_client.py
Verified run
A real run of the M3 swarm against "Build a CLI pomodoro timer in Python"
went from a bare goal to a finished, reviewed project with zero human
intervention โ three M3 agents talking only through ForgeSwarm tools:
m3-plannerregistered itself, created the project, decomposed the goal into 8 dependency-ordered tasks (scaffold โ timer state machine โ config โ notifier โ CLI โ tests โ docs), and recorded 4 architectural decisions (stdlib-only, foreground blocking timer, XDG config path, stderr UI honoringNO_COLOR).m3-impl-1claimed each ready task in dependency order, wrote the source viasave_context, andsubmit_for_review'd every deliverable.m3-reviewer-1pulled the review queue, cross-checked each submission againstget_briefing(goal, constraints, decisions, prior feedback), andpost_review'd a verdict for each.
The board went 8/8 done, and the closing standup_summary prompt โ also
answered by M3, purely from live board state โ correctly reported:
All planned work is complete โ 8/8 tasks closed... The project is feature-complete: scaffold, timer FSM, config, notifications, CLI, tests, and docs are all landed.
Single Most Important Next Action: Run a full end-to-end smoke test of the shipped CLI... and, if green, tag
v0.1.0and cut a release. Until we exercise the integrated binary, the "done" labels reflect unit-level completion only.
No agent ever had to be told what another agent decided, claimed, or reviewed โ every coordination fact came from ForgeSwarm's shared state.
Architecture
src/forgeswarm/
โโโ server.py # FastMCP app + stdio/streamable-HTTP entrypoint
โโโ store.py # SQLite (WAL): atomic claims, leases, review state machine
โโโ models.py # Pydantic contracts returned by every tool
โโโ tools/ # planning ยท tasks ยท context ยท review ยท checks
โโโ resources.py # swarm:// live state
โโโ prompts.py # planner / implementer / reviewer / standup
Design choices worth knowing:
- SQLite over in-memory โ over stdio every client spawns its own server process;
shared swarm state must live on disk. WAL mode + a busy timeout keeps concurrent
agents safe, and one conditional
UPDATEmakes claims race-free. - The loop is server-enforced โ review outcomes mutate task state in the same transaction as the verdict. An agent cannot skip review by prompt injection or forgetfulness; the state machine simply won't move.
run_checksis verification, not execution โ clients already execute code. The server's job is evidence: allowlisted executables, no shell, hard timeout, output recorded where reviewers can see it.
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.