MCP Servers

AllBrain MCP

A multi-agent workflow orchestration server with a global SQLite event store, supporting DAG-based task execution, conflict resolution, and intent extraction.

README

AllBrain MCP

One Brain. Multiple Agents.

AllBrain MCP captures raw agent events into a global SQLite-backed brain so a new agent can resume project context later.

Implemented core:

FastMCP stdio server
Global SQLite store at ~/.allbrain/allbrain.db
Canonical project identity
Mandatory session-bound append-only events
Stable event ordering with UUIDv7 and timestamps
save_event() and list_events() MCP tools
Event type registry with unknown event rejection
Git context tools with safe non-repo behavior
resume_project() built from raw events plus optional Git context
Snapshot-backed incremental resume
Manual create_snapshot() checkpointing
Weighted auto snapshots
Snapshot/reducer/compression version checks
Explicit snapshot delta merge strategy
Multi-agent event attribution with agent_id, impact_score, caused_by, and branch
Conflict detection and resolution tools
Layered multi-agent resume output
Rule-based semantic intent extraction
Intent graph and contradiction detection
Intent-aware resume output

uv run allbrain start --project . --agent codex

Semantic event types:

goal_set
task_started
task_completed
file_modified
failure
task_blocked

Audit events use tool_call. They do not mutate task state, but they are exposed as secondary tool_usage signal in resume output.

Snapshots are derived checkpoints. Raw events remain the only source of truth, and snapshots can be rebuilt from the event stream.

Snapshot metadata stores snapshot_schema_version, reducer_version, and compression_version. Incompatible snapshots are ignored and rebuilt from raw events instead of being trusted silently.

Sprint 4 adds conflict-aware multi-agent context. resume_project() includes global_view, agent_view, conflict_view, decision_view, merged_state, and resolved_conflicts while preserving the legacy top-level fields for compatibility.

Conflict decisions are conservative: low-margin conflicts are marked needs_review, and conflict-aware decision_view.next_step overrides the global resume suggestion.

Sprint 5 adds deterministic semantic intent tooling. extract_intents(), detect_contradictions(), and resume_with_intent() derive intent context from raw events without LLMs, embeddings, or a vector database.

Intent confidence evolves from supporting evidence, intent lifecycle status tracks active/completed/blocked state, graph edges include an edge_type, and contradictions include a numeric severity_score.

Intent extraction collapses file churn inside an active task into supporting evidence for the main intent, avoids supportive refactor/test false positives, and keeps snapshot+delta intent replay equivalent to full replay.

Sprint 9 introduces the Workflow Engine — the Orchestrator core. This is a foundational change: instead of scheduling tasks atomically, the engine now schedules subtasks within a DAG, handles dependency-aware execution, aggregates multi-agent outputs, and recovers from failures at the node level.

Components:

TaskGraph with TaskNode and TaskEdge abstractions
DependencyEngine: DAG validation, cycle detection, topological sort, ready-set calculation, critical path, and blocking reason analysis
WorkflowStateMachine: PENDING → READY → RUNNING → COMPLETED / FAILED / BLOCKED with validated transitions
SubtaskScheduler: SchedulerV1 evolution that schedules subtasks, not just tasks, respecting dependency readiness and max-parallel limits
ResultAggregator: Combines Architect/Build/Reviewer outputs with CONCAT, MERGE, VOTE, and SUMMARY strategies
RecoveryManager: Node-level retry with exponential backoff, cascading block for exhausted retries, and workflow resume with completed result replay
WorkflowEngine: Orchestrates the full lifecycle — create workflow from subtasks, step through the DAG (process completions, failures, scheduling), and run to completion

Example: "Implement OAuth Login" decomposes into a DAG: Design API → Implement Backend → Security Review → Write Tests ───────┘

The engine runs this DAG step by step. If node 3 fails, only node 3 retries — the rest of the workflow does not restart.

Key design decisions:

Event-sourced: new semantic event types added (subtask_created, subtask_started, subtask_completed, subtask_failed, workflow_state_changed, retry_scheduled, workflow_created, workflow_started, workflow_completed, workflow_failed, result_aggregated)
Idempotent recovery: completed nodes are replayed into a resumed workflow via engine.resume()
Isolated module: allbrain/workflow/ does not mutate existing orchestrator code; integration via orchestrator/workflow_bridge.py is planned for future sprints
Full test coverage: 30 unit/integration tests covering DAG ops, state machine, scheduling, aggregation, recovery, serialization, and end-to-end workflow execution

The existing task-level orchestrator (allbrain/orchestrator/) remains fully operational. No regressions introduced (111 of 112 existing tests pass; the one failure is pre-existing in test_agent_profile_scheduler.py).

Sprint 10 introduces the Agent Runtime Layer + Async Executor — moving AllBrain from "plans workflows" to "actually runs agents." This is the first sprint where the system can execute real LLM calls (Claude, OpenAI, Gemini, Qwen, OpenCode CLI, Codex CLI) through a unified adapter contract.

Components:

AgentDefinition schema: id, name, version, provider, capabilities, cost, latency profile, max context, adapter class, config, safety limits
AgentRegistry: central registry with auto-discovery from environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY, DASHSCOPE_API_KEY, OPENCODE_AVAILABLE, CODEX_AVAILABLE)
AgentAdapter ABC: provider-agnostic execution contract with execute(), health_check(), estimate_cost()
SafetyWrapper: input sanitization (prompt injection defense), cost ceiling (per-call + per-workflow), rate limiting, output validation
ExecutionMetrics: duration, token counts, cost, success/failure, collected per execution
CapabilityLearner: EMA-based auto-learning from execution metrics — capability scores evolve from observed success rates
TaskQueue ABC + InMemoryTaskQueue: async FIFO queue, Redis/RabbitMQ-swap-ready
WorkerPool: N-worker async dispatch with graceful shutdown and in-flight tracking
AgentRuntime: bridges WorkflowEngine → TaskQueue → WorkerPool → AgentAdapter → SafetyWrapper → MetricsCollector → CapabilityLearner
MockAdapter: zero-cost adapter for testing without real LLM calls

Execution model (distributed-first, async event-driven):

WorkflowEngine
   |
   v
AgentRuntime.execute_subtask(assignment)
   |
   v
SafetyWrapper (sanitize, cost check, rate limit)
   |
   v
Adapter.execute(task, context) -- runs in thread executor with timeout
   |
   v
ExecutionMetrics -- recorded + fed to CapabilityLearner
   |
   v
SubtaskResult -- back to Workflow Engine

Key design decisions applied from the Sprint 9 review:

Event-sourced single source of truth: Workflow state remains a derived view; agent execution events are written to the same event store
Engine/Scheduler/Runtime boundary clarified: Scheduler decides "who", Engine decides "how + when", Runtime executes "actually run"
Safety first: every adapter call goes through SafetyWrapper with hard cost ceilings
Capability auto-learning: metrics from real executions feed back into the scheduling layer
Distributed-ready queue: TaskQueue ABC allows swapping InMemoryTaskQueue for Redis/RabbitMQ without changing the runtime

Adapter slots for future sprints: Claude, OpenAI, Gemini, Qwen, OpenCode CLI, Codex CLI. All share the same AgentAdapter contract.

Test coverage: 41 new tests covering definition serialization, registry, safety (cost ceiling, rate limit, input sanitization, domain allowlist), metrics collection, capability learning (EMA convergence, cold start, latency tracking), queue operations, worker pool lifecycle, runtime execution (success, failure, timeout, unknown agent, batch), and end-to-end workflow + runtime integration.

Full test suite: 182 tests, 181 passing (one pre-existing failure in test_unhealthy_reviewer_is_skipped unrelated to this sprint).

Sprint 33 introduces the World Model Layer — the cognitive shift from "decide then act" to "predict then decide". The system can now ask "what happens if I do this?" before committing, and feeds the answer into the closed-loop learning engine.

Components:

WorldState, Prediction, SimulationResult: pydantic models with extra="forbid" and bounded numeric fields; Prediction adds a confidence score (0-1) for downstream calibration
EnvironmentTracker: deterministic WorldState capture
StateTransitionBridge: immutable model_copy(update=...) transitions; input never mutated
PredictionBridge: deterministic risk/success/cost/confidence rules (deploy without tests is high risk)
SimulationBridge: combines transition + prediction, mints a uuid7 simulation_id
WorldModel facade: pure observe() and simulate(action, state); no event writing at this layer
WorldStateBuilder: projection from event list to world state (derived view, not in-memory)
WorldHistory: event-derived query helper for latest_state() and latest_simulation()

Pipeline integration:

SystemDecisionPipeline.run(...) gains simulate_before_execute: bool = False and risk_threshold: float = 0.7
When enabled, the pipeline emits world_state_observed and world_simulation_run between final_decision_recorded and the scheduler
If prediction.risk >= risk_threshold, the runtime state machine transitions to BLOCKED with reason world_simulation_high_risk
Otherwise the world success_probability overrides execution_plan["predicted_success"] so the closed-loop learning engine compares world model output against the actual outcome

New event types:

world_state_observed — emitted on every observe() call
world_simulation_run — emitted on every simulate() call with impact_score = prediction.risk

New MCP tools:

observe_world(project_path, limit) — captures a fresh WorldState and emits the event
simulate_action(action, project_path, limit) — captures state, simulates the action, emits both events

Replay equivalence: EventReplayEngine routes world events into a new state["world"] key. WorldStateBuilder is the projection; the world state is fully reconstructable from the event log alone. The replay equivalence test asserts replay(events)["final_state"]["world"] matches WorldStateBuilder().build(events) exactly.

Deferred to future sprints (raised during planning, not in this scope): action.metadata for richer action descriptors, payload_version on world events for migration safety, and a tighter test_replay_simulation_prediction_equivalence beyond the builder-level check.

Test coverage: 11 new tests in tests/test_world.py covering event emission, prediction rules, transition immutability, history round-trip, replay equivalence, MCP impl stability, pipeline simulation gating, and world-to-learning prediction feedback.

Full test suite: 225 tests, 225 passing, no regressions.

The next layer is Sprint 34 — Counterfactual Reasoning: "what if I had not done this?" and "which alternative is best?".

Sprint 34 adds the counterfactual reasoning layer on top of the world model. The system can now ask "what would have happened if I had chosen differently?", compute decision regret, and produce advisory recommendations with severity bands.

Components:

CounterfactualResult, RankedAlternative: pydantic models with extra="forbid"; improvement = alt.success − actual.success, regret = max(0, improvement)
recommendation_severity(improvement) returns Literal["low", "medium", "high"] with bands [0.20, 0.40) / [0.40, 0.70) / >= 0.70
AlternativeGenerator: deterministic ACTION_MAP (deploy → [run_tests, delay_deploy, rollback], delete → [backup, archive])
CounterfactualEvaluator: stateless compare using SimulationBridge for both actual and alternative
AlternativeRanker: stateless rank by success_probability − risk
CounterfactualEngine: facade, analyze(state, action, limit=N) and rank(state, actions)
CounterfactualProjection: replay projection with analyses, generated, recommendations, unknown_actions, count, unknown_action_count, recommendation_count

Pipeline integration:

SystemDecisionPipeline.run(...) gains enable_counterfactual: bool = False, counterfactual_limit: int = Field(ge=1, le=100), regret_threshold: float = Field(ge=0.0, le=1.0)
Pipeline raises ValueError when counterfactual_limit < 1 (defense in depth alongside the schema validation)
Runs after the world simulation step, before EXECUTION; the pipeline observes a fresh WorldState on its own (independent of simulate_before_execute)
R1 advisory only: never overrides final_decision. Continues to EXECUTION regardless. counterfactual_recommendation is emitted only when best.improvement >= regret_threshold
Learning integration (S1 plain): the prediction dict is enriched with best_alternative and regret before ClosedLoopLearningEngine.evaluate(). error_delta formula is unchanged

New event types:

counterfactual_generated — at the start of an analysis. If the action is unknown, payload includes reason: "unknown_action" and an empty alternatives list
counterfactual_evaluated — once per alternative
counterfactual_recommendation — only when threshold met, with severity and impact_score = improvement

New MCP tools:

generate_counterfactual(action, project_path, limit, counterfactual_limit) — runs engine.analyze() and writes events
rank_alternatives(actions, project_path, limit) — runs AlternativeRanker.rank() (read-only, no events)

Replay equivalence: EventReplayEngine routes counterfactual_* events into a new state["counterfactual"] key. CounterfactualProjection is the projection. The replay equivalence test asserts replay(events)["final_state"]["counterfactual"] == CounterfactualProjection().build(events) exactly.

Future metrics (Sprint 35+, not implemented in this sprint): average_regret, rolling_regret, high_regret_count (with severity breakdown), unknown_action_rate, regret_by_objective_kind. The data is already in the event log; the evolution/organizational learning layer would consume the projection.

Test coverage: 13 new tests in tests/test_counterfactual.py covering alternative generation, improvement/regret math, ranking, event emission, the unknown_action metric, projection build, replay equivalence, severity bands, pipeline integration (gating, learning integration, validation), and MCP tools.

Full test suite: 238 tests, 238 passing, no regressions.

The next layer is decision quality analytics: aggregating regret history into dashboards and tying the unknown-action metric to action knowledge base expansion.

Sprint 35 adds the scenario planning layer on top of counterfactual reasoning. The system can now ask "what are all the futures that could unfold from this action, and how spread out are they?" by running the same action against four different state overlays.

Components:

ScenarioResult, ScenarioAnalysis: pydantic models with extra="forbid"; analysis_id: UUID (uuid7) for replay debugging and observability timeline; confidence: float 0-1 from template
ScenarioTemplate (frozen dataclass): name + environment_state_overlay (additive merge) + environment_state_remove (explicit key removal) + resources_overlay + resources_remove + confidence + description + template_version
apply_overlay(state, template): immutable state modifier using model_copy(update=...)
ScenarioGenerator: defaults() returns 4 named templates; from_specs(specs) builds custom ones
ScenarioEvaluator: stateless, takes a simulator, returns ScenarioResult
ScenarioRanker: select(results) picks best/worst/safest/expected; metrics(results) computes prediction_spread, risk_volatility, uncertainty, confidence_total
ScenarioEngine: facade, analyze(state, action, limit=N) and evaluate_custom(state, action, scenarios)
ScenarioProjection: replay projection that deduplicates analysis_ids via a seen_ids set

Metrics exposed:

prediction_spread = best.success - worst.success
risk_volatility = max(risk) - min(risk)
uncertainty = 1 - sum(confidence * prediction.confidence)
confidence_total = sum(scenario confidences) (sanity, ~1.0)

Default templates:

best_case (confidence 0.25): environment = {tests: passed, deployment: ready}, all resources true
expected_case (confidence 0.50): no overlay, baseline trajectory
worst_case (confidence 0.15): environment tests removed, resources = {internet: false, disk: false}
safest_case (confidence 0.10): environment = {tests: passed, deployment: verified}, all resources true

State overlay semantics (O2): overlay fields merge additively. Removing keys requires an explicit environment_state_remove / resources_remove list. apply_overlay is immutable and never mutates the input state.

Pipeline integration:

SystemDecisionPipeline.run(...) gains enable_scenarios: bool = False, scenarios_limit: int = Field(ge=1, le=20), scenario_recommendation_threshold: float = Field(ge=0.0, le=1.0)
Pipeline raises ValueError when scenarios_limit < 1 (defense in depth alongside the schema validation)
Runs after the counterfactual step, before EXECUTION; the scenario step observes a fresh WorldState on its own (D1 independent)
R1 advisory: never overrides final_decision. Continues to EXECUTION regardless. scenario_recommended is emitted with rationale every time
Learning integration: the prediction dict is enriched with prediction_spread, risk_volatility, and uncertainty before ClosedLoopLearningEngine.evaluate()

New event types:

scenario_generated — payload includes template_version: 1, analysis_id, and the list of actual scenario names evaluated
scenario_evaluated — one per scenario result, with impact_score = confidence
scenario_recommended — always emitted (R1) with best_case, expected_case, rationale, and template_version

New MCP tools:

generate_scenarios(action, project_path, limit, scenarios_limit) — runs engine.analyze() and writes events
evaluate_scenarios(action, scenarios, project_path, limit) — runs engine.evaluate_custom() with user-provided scenario dicts; per-scenario events are emitted (not the 4 defaults)

Replay equivalence: EventReplayEngine routes scenario_* events into a new state["scenarios"] key. ScenarioProjection is the projection. The replay equivalence test asserts replay(events)["final_state"]["scenarios"] == ScenarioProjection().build(events) exactly.

Future metrics (Sprint 36+, not implemented in this sprint):

normalized_spread = prediction_spread / expected_case.success_probability — same 0.20 spread at expected=0.80 vs expected=0.30 is not the same forecast disagreement
scenario_accuracy — post-hoc comparison of each scenario's success_probability against the actual actual_success recorded in RUNTIME_FEEDBACK_RECORDED; belongs to the evolution layer
analysis_id timeline — across runs, surface how often the same analysis_id correlates with downstream decision_regret to learn whether scenario spread is a leading indicator of regret
template_version migration tooling when template semantics change

Test coverage: 13 new tests in tests/test_scenarios.py covering default templates, best/worst/safest selection, metrics, overlay remove semantics, event emission, projection dedup, replay equivalence, pipeline integration (output, learning integration, validation), and custom-scenario MCP tool.

Full test suite: 251 tests, 251 passing, no regressions.

The next step is decision quality analytics: aggregating regret history and tying the unknown-action metric to action knowledge base expansion.

Sprint 36 adds the strategic foresight layer on top of multi-future scenarios. The system now asks "which sequence of actions produces the best long-term outcome?" by simulating plans step by step with state chaining.

Components:

FuturePlan: pydantic model with actions, predicted_success, cumulative_risk, cumulative_cost, horizon, confidence, step_states (debug hook)
ForesightAnalysis: pydantic model with analysis_id: UUID (uuid7), action, best_plan, safest_plan, fastest_plan, expected_plan, plan_spread, strategy_uncertainty, horizon_risk, template_version=1, plans
DEPLOY_PLANS: static list of 4 default plans for the deploy action (P1 single list)
ActionPlanner: generate(action) returns plans for deploy or [] otherwise
MultiStepSimulator: chains SimulationBridge through each step, returns (final_state, predictions, step_states) (MS1)
PlanEvaluator: enforces max_horizon (T1 reject) and computes the plan metrics
PlanRanker: select(plans) picks best/safest/fastest/expected by score predicted_success - cumulative_risk (S1 plain)
ForesightEngine: facade, analyze(state, action, limit) and evaluate_custom(state, actions)
ForesightProjection: replay projection with analyses, generated, recommendations, analysis_ids, count, recommendation_count (deduplicated)

Step states debug hook: MultiStepSimulator.simulate(state, actions) returns step_states (initial + N step states), captured in FuturePlan.step_states and serialized to event payload. Makes "which action broke the state" and "which step created drift" obvious.

Pipeline integration:

SystemDecisionPipeline.run(...) gains enable_foresight: bool = False, foresight_limit: int = Field(ge=1, le=20), max_horizon: int = Field(ge=1, le=20)
Pipeline raises ValueError when foresight_limit < 1 or max_horizon < 1 (defense in depth alongside the schema validation)
Runs after the scenarios step, before EXECUTION; the foresight step observes a fresh WorldState on its own (D1 independent)
Plans longer than max_horizon raise ValueError (T1 reject)
R1 advisory: never overrides final_decision. Continues to EXECUTION regardless. foresight_recommended is emitted with rationale every time
Learning integration: the prediction dict is enriched with future_horizon, strategy_uncertainty, and horizon_risk before ClosedLoopLearningEngine.evaluate()

New event types:

foresight_generated — payload includes template_version: 1, analysis_id, plans_count, plan_ids
foresight_evaluated — one per plan, with impact_score = predicted_success
foresight_recommended — always emitted (R1) with best_plan, expected_plan, rationale, template_version

New MCP tools:

generate_future_plans(action, project_path, limit, foresight_limit, max_horizon) — runs engine.analyze() and writes events
evaluate_plan(actions, project_path, limit, max_horizon) — runs engine.evaluate_custom() on a user-provided plan; max_horizon enforces T1 reject

Replay equivalence: EventReplayEngine routes foresight_* events into a new state["foresight"] key. ForesightProjection is the projection. The replay equivalence test asserts replay(events)["final_state"]["foresight"] == ForesightProjection().build(events) exactly.

Boundary clarity (per the user's mental model):

counterfactual (Sprint 34): one-step alternative analysis
scenario (Sprint 35): one-state multi-world analysis
foresight (Sprint 36): multi-step trajectory analysis

Future metrics (Sprint 37+, not implemented in this sprint):

horizon_cost — distinct from cumulative_cost; weighted by step position for discounting distant costs
worst_step_risk — max(p.risk for p in predictions). The current cumulative_risk = average is "soft"; the worst-step view makes catastrophic steps visible
plan_depth — explicit split between horizon (model capacity) and plan_length (actual plan length)
plan_regret — best_plan success minus the chosen plan success; belongs to the evolution layer
Extensible planning templates (P2 dict) — currently only deploy is supported
payload_version migration on world, counterfactual, and scenario events (deferred from Sprint 33 onwards)

Test coverage: 16 new tests in tests/test_foresight.py covering plan generation, best/safest/fastest selection, step states debug hook, horizon metrics, projection build, event emission, replay equivalence, pipeline integration (output, learning integration, validation), max_horizon T1 reject, unknown action sentinel, and custom-plan MCP tool.

Full test suite: 267 tests, 267 passing, no regressions.

The system can now say: "I can deploy now. Running tests first increases success. The best long-term strategy is run_tests → fix_failures → deploy → monitor with predicted success 95%, risk 15%, horizon 4 steps." This is the first time AllBrain thinks in sequences, not just single actions.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured