ForgeSwarm

ForgeSwarm

An MCP server that turns independent AI agents into a coordinated engineering team with shared task board, context, review loop, and enforced plan-implement-review-iterate workflow.

Category
Visit Server

README

ForgeSwarm ๐Ÿ› ๏ธ๐Ÿ

CI License: MIT

An MCP server that turns independent AI agents into a coordinated engineering team.

Most MCP servers give agents data (GitHub, databases, web). ForgeSwarm gives them coordination: a shared task board with atomic claiming, a shared context blackboard, a decision log, and an enforced plan โ†’ implement โ†’ review โ†’ iterate loop โ€” the same workflow shape that powers orchestration harnesses like CyOps, distilled into an open protocol primitive any MCP client can plug into.

Connect Claude Code, Codex, OpenCode, or a MiniMax M3-powered script to the same ForgeSwarm server, and they instantly become citizens of one swarm: claiming tasks without collisions, briefing each other through shared memory, and reviewing each other's work before anything counts as done.

Built for the CyOps Arena Hackathon โ€” MCP Server Sprint (co-hosted with MiniMax).

Why this exists

Multi-agent coding fails in predictable ways: two agents grab the same task, an agent starts work with no idea what the others decided, "done" means "the model said done", and a crashed agent silently stalls the project. ForgeSwarm fixes each one server-side, so correctness doesn't depend on prompt discipline:

Failure mode ForgeSwarm mechanism
Two agents do the same work claim_task is a single atomic conditional UPDATE โ€” one winner, always
Agent starts cold, repeats settled debates get_briefing bundles goal, constraints, decisions, dependency summaries, and prior review feedback into one onboarding packet
"Done" is just an assertion submit_for_review โ†’ a different agent must post_review; self-review is rejected; request_changes auto-returns the task to its author with feedback attached and bumps the iteration counter
"Tests pass, trust me" run_checks runs allowlisted test/lint commands with a hard timeout and records exit code + output on the task as review evidence
Crashed agent stalls the swarm Claims carry leases; expired leases put tasks back on the board automatically
Disagreements evaporate into chat open_discussion โ†’ positions from โ‰ฅ2 distinct agents (server-enforced) โ†’ resolve_discussion auto-records the consensus as a binding decision in every future briefing
The swarm never learns get_retrospective compiles hard evidence โ€” review bounce rates, check pass rates, per-agent stats, hotspot tasks โ€” for the swarm to analyze and act on
State lost between sessions Everything persists in SQLite (WAL) โ€” swarms survive restarts and work across both transports

Install

From source (not yet on PyPI):

git clone https://github.com/H2SO4620/forgeswarm && cd forgeswarm
pip install -e ".[dev]"
pytest   # 20 tests, including end-to-end MCP client sessions
forgeswarm

Transports

forgeswarm                            # stdio (local clients spawn it)
forgeswarm --transport http --port 8765   # one shared endpoint for a whole swarm
forgeswarm --db ./myproject.db        # or set FORGESWARM_DB

State is SQLite either way (default ~/.forgeswarm/forgeswarm.db), so stdio clients โ€” which each spawn their own server process โ€” still share one swarm.

Claude Code

claude mcp add forgeswarm -- uvx forgeswarm

Or in any MCP client config:

{
  "mcpServers": {
    "forgeswarm": { "command": "uvx", "args": ["forgeswarm"] }
  }
}

The loop

flowchart LR
    G[Goal] --> P[create_project<br/>submit_plan]
    P --> B[Task board]
    B -->|claim_task<br/>atomic| W[Agent works<br/>get_briefing ยท save_context ยท run_checks]
    W --> S[submit_for_review]
    S --> R{post_review<br/>by a different agent}
    R -->|approve| D[done โœ“]
    R -->|request_changes<br/>iteration++| W
    D --> B

Tools (24)

Planning โ€” create_project, submit_plan (whole dependency graph in one call), list_projects, register_agent

Task board โ€” list_tasks (with ready_only), claim_task (atomic, leased), update_task (progress + lease renewal), complete_task, get_task_graph

Shared context โ€” save_context, search_context, record_decision, get_briefing โญ

Review loop โ€” submit_for_review, get_review_queue, post_review

Discussion & consensus โ€” open_discussion, post_to_discussion, resolve_discussion (consensus becomes a recorded decision automatically), list_discussions

Workflow templates โ€” list_workflow_templates, get_workflow_template (ship-feature, refactor-module, debug-issue โ€” dependency-wired task graphs ready for submit_plan)

Verification & reflection โ€” run_checks (allowlisted: pytest, ruff, mypy, npm, cargo, go, โ€ฆ; no shell, hard timeout, evidence recorded), get_retrospective (swarm performance evidence: bounce rates, iterations, per-agent stats)

Resources & Prompts

Live swarm state, readable without tool calls: swarm://projects ยท swarm://agents ยท swarm://project/{id}/status ยท swarm://project/{id}/tasks ยท swarm://project/{id}/decisions ยท swarm://project/{id}/discussions ยท swarm://project/{id}/retrospective ยท swarm://project/{id}/context

Role prompts that make any MCP client swarm-ready in one message: planner ยท implementer ยท reviewer ยท standup_summary (rendered from live board state)

Demo: a MiniMax M3 swarm builds software through ForgeSwarm

examples/minimax_swarm_demo.py runs three MiniMax M3 agents โ€” planner, implementer, reviewer โ€” that coordinate entirely through ForgeSwarm tools over a real MCP stdio session: the planner decomposes a goal into a task graph, the implementer claims tasks and submits work, the reviewer approves or bounces it, and the loop runs until the board is green.

pip install -e ".[demo]"
set MINIMAX_API_KEY=sk-...        # export on macOS/Linux
python examples/minimax_swarm_demo.py "Build a CLI pomodoro timer in Python"

M3 is also available through OpenRouter (same model, smaller minimum top-up):

set MINIMAX_API_KEY=sk-or-...
set MINIMAX_BASE_URL=https://openrouter.ai/api/v1
set MINIMAX_MODEL=minimax/minimax-m3

No API key handy? examples/quickstart_client.py walks the identical workflow with a scripted client โ€” no LLM required:

python examples/quickstart_client.py

Verified run

A real run of the M3 swarm against "Build a CLI pomodoro timer in Python" went from a bare goal to a finished, reviewed project with zero human intervention โ€” three M3 agents talking only through ForgeSwarm tools:

  1. m3-planner registered itself, created the project, decomposed the goal into 8 dependency-ordered tasks (scaffold โ†’ timer state machine โ†’ config โ†’ notifier โ†’ CLI โ†’ tests โ†’ docs), and recorded 4 architectural decisions (stdlib-only, foreground blocking timer, XDG config path, stderr UI honoring NO_COLOR).
  2. m3-impl-1 claimed each ready task in dependency order, wrote the source via save_context, and submit_for_review'd every deliverable.
  3. m3-reviewer-1 pulled the review queue, cross-checked each submission against get_briefing (goal, constraints, decisions, prior feedback), and post_review'd a verdict for each.

The board went 8/8 done, and the closing standup_summary prompt โ€” also answered by M3, purely from live board state โ€” correctly reported:

All planned work is complete โ€“ 8/8 tasks closed... The project is feature-complete: scaffold, timer FSM, config, notifications, CLI, tests, and docs are all landed.

Single Most Important Next Action: Run a full end-to-end smoke test of the shipped CLI... and, if green, tag v0.1.0 and cut a release. Until we exercise the integrated binary, the "done" labels reflect unit-level completion only.

No agent ever had to be told what another agent decided, claimed, or reviewed โ€” every coordination fact came from ForgeSwarm's shared state.

Architecture

src/forgeswarm/
โ”œโ”€โ”€ server.py        # FastMCP app + stdio/streamable-HTTP entrypoint
โ”œโ”€โ”€ store.py         # SQLite (WAL): atomic claims, leases, review state machine
โ”œโ”€โ”€ models.py        # Pydantic contracts returned by every tool
โ”œโ”€โ”€ tools/           # planning ยท tasks ยท context ยท review ยท checks
โ”œโ”€โ”€ resources.py     # swarm:// live state
โ””โ”€โ”€ prompts.py       # planner / implementer / reviewer / standup

Design choices worth knowing:

  • SQLite over in-memory โ€” over stdio every client spawns its own server process; shared swarm state must live on disk. WAL mode + a busy timeout keeps concurrent agents safe, and one conditional UPDATE makes claims race-free.
  • The loop is server-enforced โ€” review outcomes mutate task state in the same transaction as the verdict. An agent cannot skip review by prompt injection or forgetfulness; the state machine simply won't move.
  • run_checks is verification, not execution โ€” clients already execute code. The server's job is evidence: allowlisted executables, no shell, hard timeout, output recorded where reviewers can see it.

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured