AllBrain MCP

AllBrain MCP

A multi-agent workflow orchestration server with a global SQLite event store, supporting DAG-based task execution, conflict resolution, and intent extraction.

Category
Visit Server

README

AllBrain MCP

One Brain. Multiple Agents.

AllBrain MCP captures raw agent events into a global SQLite-backed brain so a new agent can resume project context later.

Implemented core:

  • FastMCP stdio server
  • Global SQLite store at ~/.allbrain/allbrain.db
  • Canonical project identity
  • Mandatory session-bound append-only events
  • Stable event ordering with UUIDv7 and timestamps
  • save_event() and list_events() MCP tools
  • Event type registry with unknown event rejection
  • Git context tools with safe non-repo behavior
  • resume_project() built from raw events plus optional Git context
  • Snapshot-backed incremental resume
  • Manual create_snapshot() checkpointing
  • Weighted auto snapshots
  • Snapshot/reducer/compression version checks
  • Explicit snapshot delta merge strategy
  • Multi-agent event attribution with agent_id, impact_score, caused_by, and branch
  • Conflict detection and resolution tools
  • Layered multi-agent resume output
  • Rule-based semantic intent extraction
  • Intent graph and contradiction detection
  • Intent-aware resume output
uv run allbrain start --project . --agent codex

Semantic event types:

  • goal_set
  • task_started
  • task_completed
  • file_modified
  • failure
  • task_blocked

Audit events use tool_call. They do not mutate task state, but they are exposed as secondary tool_usage signal in resume output.

Snapshots are derived checkpoints. Raw events remain the only source of truth, and snapshots can be rebuilt from the event stream.

Snapshot metadata stores snapshot_schema_version, reducer_version, and compression_version. Incompatible snapshots are ignored and rebuilt from raw events instead of being trusted silently.

Sprint 4 adds conflict-aware multi-agent context. resume_project() includes global_view, agent_view, conflict_view, decision_view, merged_state, and resolved_conflicts while preserving the legacy top-level fields for compatibility.

Conflict decisions are conservative: low-margin conflicts are marked needs_review, and conflict-aware decision_view.next_step overrides the global resume suggestion.

Sprint 5 adds deterministic semantic intent tooling. extract_intents(), detect_contradictions(), and resume_with_intent() derive intent context from raw events without LLMs, embeddings, or a vector database.

Intent confidence evolves from supporting evidence, intent lifecycle status tracks active/completed/blocked state, graph edges include an edge_type, and contradictions include a numeric severity_score.

Intent extraction collapses file churn inside an active task into supporting evidence for the main intent, avoids supportive refactor/test false positives, and keeps snapshot+delta intent replay equivalent to full replay.

Sprint 9 introduces the Workflow Engine — the Orchestrator core. This is a foundational change: instead of scheduling tasks atomically, the engine now schedules subtasks within a DAG, handles dependency-aware execution, aggregates multi-agent outputs, and recovers from failures at the node level.

Components:

  • TaskGraph with TaskNode and TaskEdge abstractions
  • DependencyEngine: DAG validation, cycle detection, topological sort, ready-set calculation, critical path, and blocking reason analysis
  • WorkflowStateMachine: PENDING → READY → RUNNING → COMPLETED / FAILED / BLOCKED with validated transitions
  • SubtaskScheduler: SchedulerV1 evolution that schedules subtasks, not just tasks, respecting dependency readiness and max-parallel limits
  • ResultAggregator: Combines Architect/Build/Reviewer outputs with CONCAT, MERGE, VOTE, and SUMMARY strategies
  • RecoveryManager: Node-level retry with exponential backoff, cascading block for exhausted retries, and workflow resume with completed result replay
  • WorkflowEngine: Orchestrates the full lifecycle — create workflow from subtasks, step through the DAG (process completions, failures, scheduling), and run to completion

Example: "Implement OAuth Login" decomposes into a DAG: Design API → Implement Backend → Security Review → Write Tests ───────┘

The engine runs this DAG step by step. If node 3 fails, only node 3 retries — the rest of the workflow does not restart.

Key design decisions:

  • Event-sourced: new semantic event types added (subtask_created, subtask_started, subtask_completed, subtask_failed, workflow_state_changed, retry_scheduled, workflow_created, workflow_started, workflow_completed, workflow_failed, result_aggregated)
  • Idempotent recovery: completed nodes are replayed into a resumed workflow via engine.resume()
  • Isolated module: allbrain/workflow/ does not mutate existing orchestrator code; integration via orchestrator/workflow_bridge.py is planned for future sprints
  • Full test coverage: 30 unit/integration tests covering DAG ops, state machine, scheduling, aggregation, recovery, serialization, and end-to-end workflow execution

The existing task-level orchestrator (allbrain/orchestrator/) remains fully operational. No regressions introduced (111 of 112 existing tests pass; the one failure is pre-existing in test_agent_profile_scheduler.py).

Sprint 10 introduces the Agent Runtime Layer + Async Executor — moving AllBrain from "plans workflows" to "actually runs agents." This is the first sprint where the system can execute real LLM calls (Claude, OpenAI, Gemini, Qwen, OpenCode CLI, Codex CLI) through a unified adapter contract.

Components:

  • AgentDefinition schema: id, name, version, provider, capabilities, cost, latency profile, max context, adapter class, config, safety limits
  • AgentRegistry: central registry with auto-discovery from environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY, DASHSCOPE_API_KEY, OPENCODE_AVAILABLE, CODEX_AVAILABLE)
  • AgentAdapter ABC: provider-agnostic execution contract with execute(), health_check(), estimate_cost()
  • SafetyWrapper: input sanitization (prompt injection defense), cost ceiling (per-call + per-workflow), rate limiting, output validation
  • ExecutionMetrics: duration, token counts, cost, success/failure, collected per execution
  • CapabilityLearner: EMA-based auto-learning from execution metrics — capability scores evolve from observed success rates
  • TaskQueue ABC + InMemoryTaskQueue: async FIFO queue, Redis/RabbitMQ-swap-ready
  • WorkerPool: N-worker async dispatch with graceful shutdown and in-flight tracking
  • AgentRuntime: bridges WorkflowEngineTaskQueueWorkerPoolAgentAdapterSafetyWrapperMetricsCollectorCapabilityLearner
  • MockAdapter: zero-cost adapter for testing without real LLM calls

Execution model (distributed-first, async event-driven):

WorkflowEngine
   |
   v
AgentRuntime.execute_subtask(assignment)
   |
   v
SafetyWrapper (sanitize, cost check, rate limit)
   |
   v
Adapter.execute(task, context) -- runs in thread executor with timeout
   |
   v
ExecutionMetrics -- recorded + fed to CapabilityLearner
   |
   v
SubtaskResult -- back to Workflow Engine

Key design decisions applied from the Sprint 9 review:

  1. Event-sourced single source of truth: Workflow state remains a derived view; agent execution events are written to the same event store
  2. Engine/Scheduler/Runtime boundary clarified: Scheduler decides "who", Engine decides "how + when", Runtime executes "actually run"
  3. Safety first: every adapter call goes through SafetyWrapper with hard cost ceilings
  4. Capability auto-learning: metrics from real executions feed back into the scheduling layer
  5. Distributed-ready queue: TaskQueue ABC allows swapping InMemoryTaskQueue for Redis/RabbitMQ without changing the runtime

Adapter slots for future sprints: Claude, OpenAI, Gemini, Qwen, OpenCode CLI, Codex CLI. All share the same AgentAdapter contract.

Test coverage: 41 new tests covering definition serialization, registry, safety (cost ceiling, rate limit, input sanitization, domain allowlist), metrics collection, capability learning (EMA convergence, cold start, latency tracking), queue operations, worker pool lifecycle, runtime execution (success, failure, timeout, unknown agent, batch), and end-to-end workflow + runtime integration.

Full test suite: 182 tests, 181 passing (one pre-existing failure in test_unhealthy_reviewer_is_skipped unrelated to this sprint).

Sprint 33 introduces the World Model Layer — the cognitive shift from "decide then act" to "predict then decide". The system can now ask "what happens if I do this?" before committing, and feeds the answer into the closed-loop learning engine.

Components:

  • WorldState, Prediction, SimulationResult: pydantic models with extra="forbid" and bounded numeric fields; Prediction adds a confidence score (0-1) for downstream calibration
  • EnvironmentTracker: deterministic WorldState capture
  • StateTransitionBridge: immutable model_copy(update=...) transitions; input never mutated
  • PredictionBridge: deterministic risk/success/cost/confidence rules (deploy without tests is high risk)
  • SimulationBridge: combines transition + prediction, mints a uuid7 simulation_id
  • WorldModel facade: pure observe() and simulate(action, state); no event writing at this layer
  • WorldStateBuilder: projection from event list to world state (derived view, not in-memory)
  • WorldHistory: event-derived query helper for latest_state() and latest_simulation()

Pipeline integration:

  • SystemDecisionPipeline.run(...) gains simulate_before_execute: bool = False and risk_threshold: float = 0.7
  • When enabled, the pipeline emits world_state_observed and world_simulation_run between final_decision_recorded and the scheduler
  • If prediction.risk >= risk_threshold, the runtime state machine transitions to BLOCKED with reason world_simulation_high_risk
  • Otherwise the world success_probability overrides execution_plan["predicted_success"] so the closed-loop learning engine compares world model output against the actual outcome

New event types:

  • world_state_observed — emitted on every observe() call
  • world_simulation_run — emitted on every simulate() call with impact_score = prediction.risk

New MCP tools:

  • observe_world(project_path, limit) — captures a fresh WorldState and emits the event
  • simulate_action(action, project_path, limit) — captures state, simulates the action, emits both events

Replay equivalence: EventReplayEngine routes world events into a new state["world"] key. WorldStateBuilder is the projection; the world state is fully reconstructable from the event log alone. The replay equivalence test asserts replay(events)["final_state"]["world"] matches WorldStateBuilder().build(events) exactly.

Deferred to future sprints (raised during planning, not in this scope): action.metadata for richer action descriptors, payload_version on world events for migration safety, and a tighter test_replay_simulation_prediction_equivalence beyond the builder-level check.

Test coverage: 11 new tests in tests/test_world.py covering event emission, prediction rules, transition immutability, history round-trip, replay equivalence, MCP impl stability, pipeline simulation gating, and world-to-learning prediction feedback.

Full test suite: 225 tests, 225 passing, no regressions.

The next layer is Sprint 34 — Counterfactual Reasoning: "what if I had not done this?" and "which alternative is best?".

Sprint 34 adds the counterfactual reasoning layer on top of the world model. The system can now ask "what would have happened if I had chosen differently?", compute decision regret, and produce advisory recommendations with severity bands.

Components:

  • CounterfactualResult, RankedAlternative: pydantic models with extra="forbid"; improvement = alt.success − actual.success, regret = max(0, improvement)
  • recommendation_severity(improvement) returns Literal["low", "medium", "high"] with bands [0.20, 0.40) / [0.40, 0.70) / >= 0.70
  • AlternativeGenerator: deterministic ACTION_MAP (deploy → [run_tests, delay_deploy, rollback], delete → [backup, archive])
  • CounterfactualEvaluator: stateless compare using SimulationBridge for both actual and alternative
  • AlternativeRanker: stateless rank by success_probability − risk
  • CounterfactualEngine: facade, analyze(state, action, limit=N) and rank(state, actions)
  • CounterfactualProjection: replay projection with analyses, generated, recommendations, unknown_actions, count, unknown_action_count, recommendation_count

Pipeline integration:

  • SystemDecisionPipeline.run(...) gains enable_counterfactual: bool = False, counterfactual_limit: int = Field(ge=1, le=100), regret_threshold: float = Field(ge=0.0, le=1.0)
  • Pipeline raises ValueError when counterfactual_limit < 1 (defense in depth alongside the schema validation)
  • Runs after the world simulation step, before EXECUTION; the pipeline observes a fresh WorldState on its own (independent of simulate_before_execute)
  • R1 advisory only: never overrides final_decision. Continues to EXECUTION regardless. counterfactual_recommendation is emitted only when best.improvement >= regret_threshold
  • Learning integration (S1 plain): the prediction dict is enriched with best_alternative and regret before ClosedLoopLearningEngine.evaluate(). error_delta formula is unchanged

New event types:

  • counterfactual_generated — at the start of an analysis. If the action is unknown, payload includes reason: "unknown_action" and an empty alternatives list
  • counterfactual_evaluated — once per alternative
  • counterfactual_recommendation — only when threshold met, with severity and impact_score = improvement

New MCP tools:

  • generate_counterfactual(action, project_path, limit, counterfactual_limit) — runs engine.analyze() and writes events
  • rank_alternatives(actions, project_path, limit) — runs AlternativeRanker.rank() (read-only, no events)

Replay equivalence: EventReplayEngine routes counterfactual_* events into a new state["counterfactual"] key. CounterfactualProjection is the projection. The replay equivalence test asserts replay(events)["final_state"]["counterfactual"] == CounterfactualProjection().build(events) exactly.

Future metrics (Sprint 35+, not implemented in this sprint): average_regret, rolling_regret, high_regret_count (with severity breakdown), unknown_action_rate, regret_by_objective_kind. The data is already in the event log; the evolution/organizational learning layer would consume the projection.

Test coverage: 13 new tests in tests/test_counterfactual.py covering alternative generation, improvement/regret math, ranking, event emission, the unknown_action metric, projection build, replay equivalence, severity bands, pipeline integration (gating, learning integration, validation), and MCP tools.

Full test suite: 238 tests, 238 passing, no regressions.

The next layer is decision quality analytics: aggregating regret history into dashboards and tying the unknown-action metric to action knowledge base expansion.

Sprint 35 adds the scenario planning layer on top of counterfactual reasoning. The system can now ask "what are all the futures that could unfold from this action, and how spread out are they?" by running the same action against four different state overlays.

Components:

  • ScenarioResult, ScenarioAnalysis: pydantic models with extra="forbid"; analysis_id: UUID (uuid7) for replay debugging and observability timeline; confidence: float 0-1 from template
  • ScenarioTemplate (frozen dataclass): name + environment_state_overlay (additive merge) + environment_state_remove (explicit key removal) + resources_overlay + resources_remove + confidence + description + template_version
  • apply_overlay(state, template): immutable state modifier using model_copy(update=...)
  • ScenarioGenerator: defaults() returns 4 named templates; from_specs(specs) builds custom ones
  • ScenarioEvaluator: stateless, takes a simulator, returns ScenarioResult
  • ScenarioRanker: select(results) picks best/worst/safest/expected; metrics(results) computes prediction_spread, risk_volatility, uncertainty, confidence_total
  • ScenarioEngine: facade, analyze(state, action, limit=N) and evaluate_custom(state, action, scenarios)
  • ScenarioProjection: replay projection that deduplicates analysis_ids via a seen_ids set

Metrics exposed:

  • prediction_spread = best.success - worst.success
  • risk_volatility = max(risk) - min(risk)
  • uncertainty = 1 - sum(confidence * prediction.confidence)
  • confidence_total = sum(scenario confidences) (sanity, ~1.0)

Default templates:

  • best_case (confidence 0.25): environment = {tests: passed, deployment: ready}, all resources true
  • expected_case (confidence 0.50): no overlay, baseline trajectory
  • worst_case (confidence 0.15): environment tests removed, resources = {internet: false, disk: false}
  • safest_case (confidence 0.10): environment = {tests: passed, deployment: verified}, all resources true

State overlay semantics (O2): overlay fields merge additively. Removing keys requires an explicit environment_state_remove / resources_remove list. apply_overlay is immutable and never mutates the input state.

Pipeline integration:

  • SystemDecisionPipeline.run(...) gains enable_scenarios: bool = False, scenarios_limit: int = Field(ge=1, le=20), scenario_recommendation_threshold: float = Field(ge=0.0, le=1.0)
  • Pipeline raises ValueError when scenarios_limit < 1 (defense in depth alongside the schema validation)
  • Runs after the counterfactual step, before EXECUTION; the scenario step observes a fresh WorldState on its own (D1 independent)
  • R1 advisory: never overrides final_decision. Continues to EXECUTION regardless. scenario_recommended is emitted with rationale every time
  • Learning integration: the prediction dict is enriched with prediction_spread, risk_volatility, and uncertainty before ClosedLoopLearningEngine.evaluate()

New event types:

  • scenario_generated — payload includes template_version: 1, analysis_id, and the list of actual scenario names evaluated
  • scenario_evaluated — one per scenario result, with impact_score = confidence
  • scenario_recommended — always emitted (R1) with best_case, expected_case, rationale, and template_version

New MCP tools:

  • generate_scenarios(action, project_path, limit, scenarios_limit) — runs engine.analyze() and writes events
  • evaluate_scenarios(action, scenarios, project_path, limit) — runs engine.evaluate_custom() with user-provided scenario dicts; per-scenario events are emitted (not the 4 defaults)

Replay equivalence: EventReplayEngine routes scenario_* events into a new state["scenarios"] key. ScenarioProjection is the projection. The replay equivalence test asserts replay(events)["final_state"]["scenarios"] == ScenarioProjection().build(events) exactly.

Future metrics (Sprint 36+, not implemented in this sprint):

  • normalized_spread = prediction_spread / expected_case.success_probability — same 0.20 spread at expected=0.80 vs expected=0.30 is not the same forecast disagreement
  • scenario_accuracy — post-hoc comparison of each scenario's success_probability against the actual actual_success recorded in RUNTIME_FEEDBACK_RECORDED; belongs to the evolution layer
  • analysis_id timeline — across runs, surface how often the same analysis_id correlates with downstream decision_regret to learn whether scenario spread is a leading indicator of regret
  • template_version migration tooling when template semantics change

Test coverage: 13 new tests in tests/test_scenarios.py covering default templates, best/worst/safest selection, metrics, overlay remove semantics, event emission, projection dedup, replay equivalence, pipeline integration (output, learning integration, validation), and custom-scenario MCP tool.

Full test suite: 251 tests, 251 passing, no regressions.

The next step is decision quality analytics: aggregating regret history and tying the unknown-action metric to action knowledge base expansion.

Sprint 36 adds the strategic foresight layer on top of multi-future scenarios. The system now asks "which sequence of actions produces the best long-term outcome?" by simulating plans step by step with state chaining.

Components:

  • FuturePlan: pydantic model with actions, predicted_success, cumulative_risk, cumulative_cost, horizon, confidence, step_states (debug hook)
  • ForesightAnalysis: pydantic model with analysis_id: UUID (uuid7), action, best_plan, safest_plan, fastest_plan, expected_plan, plan_spread, strategy_uncertainty, horizon_risk, template_version=1, plans
  • DEPLOY_PLANS: static list of 4 default plans for the deploy action (P1 single list)
  • ActionPlanner: generate(action) returns plans for deploy or [] otherwise
  • MultiStepSimulator: chains SimulationBridge through each step, returns (final_state, predictions, step_states) (MS1)
  • PlanEvaluator: enforces max_horizon (T1 reject) and computes the plan metrics
  • PlanRanker: select(plans) picks best/safest/fastest/expected by score predicted_success - cumulative_risk (S1 plain)
  • ForesightEngine: facade, analyze(state, action, limit) and evaluate_custom(state, actions)
  • ForesightProjection: replay projection with analyses, generated, recommendations, analysis_ids, count, recommendation_count (deduplicated)

Step states debug hook: MultiStepSimulator.simulate(state, actions) returns step_states (initial + N step states), captured in FuturePlan.step_states and serialized to event payload. Makes "which action broke the state" and "which step created drift" obvious.

Pipeline integration:

  • SystemDecisionPipeline.run(...) gains enable_foresight: bool = False, foresight_limit: int = Field(ge=1, le=20), max_horizon: int = Field(ge=1, le=20)
  • Pipeline raises ValueError when foresight_limit < 1 or max_horizon < 1 (defense in depth alongside the schema validation)
  • Runs after the scenarios step, before EXECUTION; the foresight step observes a fresh WorldState on its own (D1 independent)
  • Plans longer than max_horizon raise ValueError (T1 reject)
  • R1 advisory: never overrides final_decision. Continues to EXECUTION regardless. foresight_recommended is emitted with rationale every time
  • Learning integration: the prediction dict is enriched with future_horizon, strategy_uncertainty, and horizon_risk before ClosedLoopLearningEngine.evaluate()

New event types:

  • foresight_generated — payload includes template_version: 1, analysis_id, plans_count, plan_ids
  • foresight_evaluated — one per plan, with impact_score = predicted_success
  • foresight_recommended — always emitted (R1) with best_plan, expected_plan, rationale, template_version

New MCP tools:

  • generate_future_plans(action, project_path, limit, foresight_limit, max_horizon) — runs engine.analyze() and writes events
  • evaluate_plan(actions, project_path, limit, max_horizon) — runs engine.evaluate_custom() on a user-provided plan; max_horizon enforces T1 reject

Replay equivalence: EventReplayEngine routes foresight_* events into a new state["foresight"] key. ForesightProjection is the projection. The replay equivalence test asserts replay(events)["final_state"]["foresight"] == ForesightProjection().build(events) exactly.

Boundary clarity (per the user's mental model):

  • counterfactual (Sprint 34): one-step alternative analysis
  • scenario (Sprint 35): one-state multi-world analysis
  • foresight (Sprint 36): multi-step trajectory analysis

Future metrics (Sprint 37+, not implemented in this sprint):

  • horizon_cost — distinct from cumulative_cost; weighted by step position for discounting distant costs
  • worst_step_riskmax(p.risk for p in predictions). The current cumulative_risk = average is "soft"; the worst-step view makes catastrophic steps visible
  • plan_depth — explicit split between horizon (model capacity) and plan_length (actual plan length)
  • plan_regret — best_plan success minus the chosen plan success; belongs to the evolution layer
  • Extensible planning templates (P2 dict) — currently only deploy is supported
  • payload_version migration on world, counterfactual, and scenario events (deferred from Sprint 33 onwards)

Test coverage: 16 new tests in tests/test_foresight.py covering plan generation, best/safest/fastest selection, step states debug hook, horizon metrics, projection build, event emission, replay equivalence, pipeline integration (output, learning integration, validation), max_horizon T1 reject, unknown action sentinel, and custom-plan MCP tool.

Full test suite: 267 tests, 267 passing, no regressions.

The system can now say: "I can deploy now. Running tests first increases success. The best long-term strategy is run_tests → fix_failures → deploy → monitor with predicted success 95%, risk 15%, horizon 4 steps." This is the first time AllBrain thinks in sequences, not just single actions.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured