Spine
Exposes a governed, provenance-grounded autonomous delivery pipeline as an MCP server, enabling AI coding assistants like Claude Code or Codex to initiate requirements-to-PR workflows with human approval gates and full audit.
README
Spine
Governed, provenance-grounded autonomous delivery — turn requirements into reviewed, tested pull requests, with a human in control.
Naming. Spine is the product. It's distributed as the
agent-orchestratorpackage and its command isorchestrator— those names stay in install lines and commands throughout the docs.
Spine reads a requirement (from Confluence, Notion, or a Markdown file), understands your target repo, generates code grounded in that repo's own conventions, writes and runs tests, and opens a pull request for you to review. It pauses for your approval before it starts and before anything merges. Nothing is pushed, merged, or written to your tracker unless you say so.
It's built for teams who want agents that are inspectable, reproducible, and safe to run on real code — not demos.
pip install --extra-index-url https://pypi.org/simple/ agent-orchestrator
orchestrator init && orchestrator doctor # scaffold .env, check readiness
orchestrator sdlc feature --source file://./spec.md --safe # build locally — no pushes, no PRs
The published name collides with an unrelated PyPI project — see the Setup & Install guide for the exact install one-liner.
Documentation
| Guide | Read it for |
|---|---|
| Setup & Install | Installing the CLI, the .env, and standing up the full stack (Temporal + Postgres) for the autonomous pipeline. |
| User Guide | A step-by-step walkthrough: from your first local build to a real PR, local models, the web dashboard, and connecting tools (MCP). |
| Features & Capabilities | The capability catalog — everything Spine can do today, its status, the command/flag to use it, and a link to each deep dive. |
| Operations & Developer Guide | How to operate it: deployment modes, the full environment-variable reference, and standing up each advanced capability — including the semantic spine (ontomesh × infodrift). |
| Community brief | A one-page overview to share — what it does, lifecycle coverage, how to try it, and the feedback we're looking for. |
New here? Install → User Guide Steps 1–4. That's the whole everyday workflow in about ten minutes.
Features & capabilities
Requirements → reviewed PR. Point it at a requirements source and a code repo. It extracts a backlog of intents, writes a spec, generates the implementation and tests, gets them green, and opens a PR — with two human gates (before building, before merging). A safe mode builds entirely locally (branch + diff, no external writes) so you can inspect everything first.
Code-grounded understanding. Before generating, it builds a Product Knowledge
Graph of your repo — modules, types, functions, call sites, blast radius — and
grounds new code in what already exists, so output reads like your team wrote it.
Works across Python, Java, and TypeScript. orchestrator understand writes a
committed, code-true memory-bank/ your whole team (and any AI tool) can read.
Governed autonomy. The workflow itself is a typed, validated artifact. A planner decomposes the objective, a runtime executes it, and per-edge verifiers check every step against schemas, evidence, and policy. Failures trigger replan, a human approval, or a clean stop. Every tool call, approval, and decision lands in an append-only audit log, and each run is capped by a spend budget.
Learns across runs. Cross-run semantic memory lets the agent recall conventions, pitfalls, and decisions from past runs — each memory cites the run it came from.
You can see inside it. Live OpenTelemetry tracing covers every LLM call, loop step, and tool call, joined to the audit log — so you can debug a run, not just read its result.
Use it your way. A CLI for scripting and CI, a web dashboard (delegate runs, watch them live, approve gates inline), a terminal UI, and MCP in both directions — consume external MCP tools, or expose the whole pipeline as an MCP server to Claude Code, Codex, or your IDE.
Bring your own model. Multi-provider via LiteLLM (Anthropic, OpenAI, Bedrock), or run fully offline on a local model (Ollama). Mix models per stage.
Durable. Long-running pipelines are checkpointed (Temporal + Postgres) — they survive restarts and resume across human approval pauses.
How it works
requirement (Confluence / Notion / Markdown)
│
▼
plan ──► validate ──► generate code ──► run tests ──► review ──► open PR
│ (grounded in your repo's knowledge graph) │
└──────────── per-edge verifiers + audit ────────────────┘
human gate 1 ▲ ▲ human gate 2
(before build) (before merge)
| Concept | What it is |
|---|---|
| Planner → GraphIR | Turns an objective into a typed, validated execution graph (nodes, edges, budgets, approval points). |
| Registry | Versioned agent templates + tool contracts the planner assembles from. |
| Runtime | LangGraph-based executor with Postgres checkpointing and typed state. |
| Verifier chain | Per-edge schema / confidence / evidence / policy checks that gate every handoff. |
| Approval gates | First-class nodes that pause for human review and resume on your decision. |
| Audit log | Append-only record of every tool call, approval, and policy decision. |
FAQ
Does it merge code on its own? No. It opens a PR; a human reviews and merges. There are two approval gates — before building and before merging — and safe mode makes no external writes at all.
Where does my code/data go?
To whichever LLM provider you configure — or nowhere external, if you run a local
model (Ollama). Generated code stays in a local branch until you choose --live.
Do I need Docker or a database?
Not for the everyday path (sdlc feature --safe builds one requirement locally).
The autonomous multi-feature pipeline + web dashboard needs Temporal + Postgres —
see the Setup guide.
Which languages and models? Code generation and comprehension cover Python, Java, and TypeScript. Any LiteLLM-supported provider (Anthropic, OpenAI, Bedrock) or a local Ollama model; you can set a different model per stage.
How is it safe to run on real repos? Write guards on generated files, allow-listed + write-gated external tools, a per-run spend budget, an append-only audit trail, and human approval before any push or merge.
CLI or web UI? Either — they drive the same engine and the same API. Use the CLI for scripting/CI, the web UI (or terminal UI) for watching runs and approving gates by hand.
Can other tools call it? Yes. It speaks MCP both ways: it can use external MCP servers, and it can run as an MCP server so Claude Code / Codex / your IDE can call the pipeline (with the same gates).
Contributing
Issues and PRs are welcome. See CONTRIBUTING.md, CODE_OF_CONDUCT.md, and SECURITY.md.
License
Apache License 2.0. See LICENSE.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.