bot-relay-mcp

bot-relay-mcp

MCP coordination primitive for multi-agent workflows. Lets multiple AI coding agents (Claude Code, Cursor, n8n, custom clients) discover each other and exchange messages and tasks via a shared SQLite-backed message bus.

Category
Visit Server

README

bot-relay-mcp

CI

A local-first message relay for AI coding agents. Two interfaces, one shared SQLite database, zero infrastructure.

v2.1 — architecturally complete. 25 tools. Everything v2.0 delivered (smart routing, task leases, session-aware reads, lazy health monitor, busy/DND status, webhook retries, channels) + the v2.1 sweep: explicit auth_state machine with revoke/recovery flow, managed-agent rotation grace, keyring-based encryption with online rotation, unified relay CLI with recover + re-encrypt + doctor + init + test + generate-hooks + backup + restore. 14 of 14 Codex architectural findings closed. See CHANGELOG for the full arc.

What is this?

bot-relay-mcp gives AI coding agents and external systems a way to coordinate.

Two audiences, two transports:

  • AI coding agents (Claude Code, Cursor, Cline, Zed) connect via stdio MCP. Drop one entry into ~/.claude.json and the relay's tools appear inside Claude. No daemon required.
  • External systems (n8n, Slack, Telegram, custom scripts) connect via HTTP+SSE with optional Bearer auth. Trigger agent actions or receive webhook events.

Everything reads and writes the same SQLite file at ~/.bot-relay/relay.db. There is no cloud, no daemon you have to install, no service mesh.

Quick Start (30 seconds)

Once published to npm, setup is a single config entry — no cloning, no compiling, no absolute paths.

Add to your ~/.claude.json:

{
  "mcpServers": {
    "bot-relay": {
      "command": "npx",
      "args": ["-y", "bot-relay-mcp"],
      "type": "stdio"
    }
  }
}

The first invocation fetches the package and starts the server. Subsequent launches are instant.

Quick Start (from source)

git clone https://github.com/Maxlumiere/bot-relay-mcp.git
cd bot-relay-mcp
npm install
npm run build

Add to ~/.claude.json:

{
  "mcpServers": {
    "bot-relay": {
      "command": "node",
      "args": ["/absolute/path/to/bot-relay-mcp/dist/index.js"],
      "type": "stdio"
    }
  }
}

Open two Claude Code terminals and try it:

Terminal A:

> Register on the relay as "planner" with role "orchestrator"
> Discover other agents
> Send a message to "builder": "Can you handle the API layer?"

Terminal B:

> Register on the relay as "builder" with role "builder"
> Check my relay messages
> Reply to planner: "On it."

The database is created automatically at ~/.bot-relay/relay.db on first use. That is the full setup.

File permissions (v2.1). The relay creates ~/.bot-relay/ at 0700 and relay.db + backup tarballs at 0600 — owner-only. config.json is operator-managed; the relay never chmods it but logs a warning at startup if it's more permissive than 0600. POSIX only — native Windows NTFS uses ACLs, not POSIX modes, so the chmod calls are no-ops there (documented).

Tools

Identity

Tool Inputs Description
register_agent name, role, capabilities[] Register this terminal as a named agent. Uses upsert — safe to call multiple times.
unregister_agent name Remove an agent from the relay. Idempotent. Fires agent.unregistered webhook on success.
discover_agents role (optional) List all registered agents with status (online/stale/offline).
spawn_agent name, role, capabilities, cwd?, initial_message? Spawn a new Claude Code terminal pre-configured as a relay agent. Cross-platform (v1.9): macOS (iTerm2/Terminal.app), Linux (gnome-terminal/konsole/xterm/tmux fallback chain — tmux covers headless servers), Windows (wt.exe/powershell.exe/cmd.exe). See docs/cross-platform-spawn.md.

Messaging

Tool Inputs Description
send_message from, to, content, priority Send a direct message to another agent by name.
get_messages agent_name, status, limit Check your mailbox. Pending messages are auto-marked as read.
broadcast from, content, role (optional) Send a message to all registered agents (or filter by role).

Tasks

Tool Inputs Description
post_task from, to, title, description, priority Assign a task to another agent.
post_task_auto (v2.0) from, title, description, required_capabilities[], priority Auto-route to the least-loaded agent whose capabilities match ALL required. Queues if no match; assigns on the next capable registration.
update_task task_id, agent_name, action, result? Actions: accept / complete / reject / cancel (v2.0, requester-only) / heartbeat (v2.0, renews lease). State machine + CAS enforced.
get_tasks agent_name, role, status, limit Query your task queue (assigned to you or posted by you).
get_task task_id Get a single task by ID with full details.

Channels (v2.0)

Tool Inputs Description
create_channel name, description?, creator Create a named channel for multi-agent coordination. Requires channels capability.
join_channel channel_name, agent_name Join any public channel.
leave_channel channel_name, agent_name Leave a channel.
post_to_channel channel_name, from, content, priority Post to a channel you are a member of.
get_channel_messages channel_name, agent_name, limit, since? Read messages posted to a channel since your join time.

Status + Health (v2.0)

Tool Inputs Description
set_status agent_name, status Signal online / busy / away / offline. busy/away exempt you from health-monitor task reassignment.
health_check (none) Report relay version, uptime, and live counts (agents, messages, tasks, channels). No auth required.

Webhooks (v1.2+)

Tool Inputs Description
register_webhook url, event, filter, secret Subscribe to relay events via HTTP POST.
list_webhooks (none) List all registered webhook subscriptions.
delete_webhook webhook_id Remove a webhook subscription.

Supported events: message.sent, message.broadcast, task.posted, task.accepted, task.completed, task.rejected, task.cancelled (v2.0), task.auto_routed (v2.0), task.health_reassigned (v2.0), channel.message_posted (v2.0), agent.unregistered, agent.spawned, webhook.delivery_failed, * (all).

v2.0 — retry with backoff. Failed webhook deliveries retry at 60s / 300s / 900s (3 attempts). CAS-claimed per row — no double delivery. Piggybacks on webhook-firing tool calls, no background thread.

When secret is provided, each delivery includes an X-Relay-Signature: sha256=... HMAC header. Filter optionally restricts firing to events where from_agent or to_agent matches.

Example: Task Delegation

Terminal A — Orchestrator:

1. register_agent("orchestrator", "planner", ["delegation", "review"])
2. discover_agents() → sees "worker" is online
3. post_task(from: "orchestrator", to: "worker",
     title: "Write auth tests",
     description: "Cover login, logout, token refresh. Use vitest.",
     priority: "high")
4. send_message(from: "orchestrator", to: "worker",
     content: "Task posted — check your queue.")

Terminal B — Worker:

1. register_agent("worker", "builder", ["testing", "backend"])
2. get_messages("worker") → message from orchestrator
3. get_tasks("worker", role: "assigned", status: "posted") → auth test task
4. update_task(task_id, "worker", "accept")
5. ... does the work ...
6. update_task(task_id, "worker", "complete", result: "12 tests passing")
7. send_message(from: "worker", to: "orchestrator",
     content: "Auth tests done. All passing.")

Terminal A checks results:

1. get_messages("orchestrator") → "Auth tests done."
2. get_task(task_id) → status: completed, result: "12 tests passing"

How It Works

Every Claude Code terminal spawns its own MCP server process via stdio. All processes read and write the same SQLite file at ~/.bot-relay/relay.db. SQLite WAL mode handles concurrent access safely. Messages older than 7 days and completed tasks older than 30 days are purged automatically on startup.

Unified relay CLI (v2.1)

One entry, six subcommands: doctor / init / test / generate-hooks / backup / restore. First-run setup:

relay init          # interactive
relay init --yes    # defaults + random HTTP secret

relay doctor runs a diagnostic sweep; relay test runs a minimal self-check against a throwaway relay; relay generate-hooks emits Claude Code hook JSON for ~/.claude/settings.json. Full reference in docs/cli.md. The standalone bin/relay-backup + bin/relay-restore from Phase 2c have been absorbed into relay backup + relay restore.

Token lifecycle (v2.1)

Two new tools for credential hygiene: rotate_token lets an agent swap its own token with history preserved; revoke_token lets an admin-capable agent nullify another agent's token_hash (target re-bootstraps via the Phase 2b migration path). New admin capability is never auto-granted — register admin agents explicitly. Full operator runbook in docs/token-lifecycle.md.

Error codes (v2.1)

Every tool error response carries a stable error_code token alongside the free-form error string. Branch on the code; never string-match the message. Full catalog + stability guarantee in docs/error-codes.md. Source of truth: src/error-codes.ts.

Protocol version (v2.1)

Beyond the package version string, the relay surfaces a protocol_version via register_agent + health_check responses. Clients should key compatibility on protocol_version (bumps only on tool-surface changes) rather than the package version (bumps on every ship). See docs/protocol-version.md for SemVer rules + a client-side compatibility snippet.

HTTP Mode — for n8n, Slack, Telegram, custom scripts

Run the relay as an HTTP daemon and any HTTP client can drive it:

RELAY_TRANSPORT=http RELAY_HTTP_SECRET=your-shared-secret node dist/index.js
# Listens on http://127.0.0.1:3777

Production deployment — set RELAY_HTTP_SECRET. v2.1 refuses to start on a non-loopback host (0.0.0.0, a public IP, Docker -p 3777:3777 without loopback pinning) unless RELAY_HTTP_SECRET is set. Loopback binds (127.0.0.1, ::1, localhost) stay zero-config for local development.

Dev-only escape hatch: RELAY_ALLOW_OPEN_PUBLIC=1 lets the relay start anyway on a public host without a secret — useful for throwaway local Docker nets, but logs a loud warning every startup. Never use in production.

Endpoints:

  • POST /mcp — JSON-RPC (the MCP protocol over HTTP+SSE). Requires Authorization: Bearer <secret> if http_secret is configured.
  • GET /health — server status (always open, no auth)
  • GET / — built-in dashboard (live view of agents, messages, tasks, webhooks). v2.1 Phase 4d: protected by Host-header allowlist (DNS-rebinding defense) + auth gate (RELAY_DASHBOARD_SECRET or RELAY_HTTP_SECRET fallback). Loopback binds allow no-secret access for dev; non-loopback binds require a secret. Full policy in docs/dashboard-security.md.
  • GET /api/snapshot — JSON snapshot of relay state (same gates as /)

Three transport modes:

  • stdio (default) — per-terminal, for AI coding agents
  • http — daemon, for external systems
  • both — HTTP daemon plus a stdio connection (useful for bridge scripts)

All transports share the same SQLite database. Stdio agents and HTTP clients see the same world.

Process-boundary reminder (v2.1.3): stdio MCP clients and the HTTP daemon are separate processes. Each Claude Code terminal with "type":"stdio" in ~/.claude.json spawns its own node dist/index.js child. Restarting the :3777 HTTP daemon never affects stdio clients — their own child processes are untouched. Operator /mcp reconnect is only needed after restart for "type":"http" MCP clients pointed at the daemon URL. See docs/transport-architecture.md for the full topology + post-restart operator checklist.

n8n integration example

Trigger a Claude Code agent from an n8n workflow:

POST http://127.0.0.1:3777/mcp
Authorization: Bearer your-shared-secret
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "post_task",
    "arguments": {
      "from": "n8n-workflow-42",
      "to": "builder",
      "title": "Process new lead",
      "description": "Lead data: ...",
      "priority": "high"
    }
  }
}

Then register a webhook so n8n hears about completion:

{
  "jsonrpc": "2.0", "id": 2, "method": "tools/call",
  "params": {
    "name": "register_webhook",
    "arguments": {
      "url": "https://your-n8n.example.com/webhook/abc",
      "event": "task.completed",
      "secret": "shared-with-n8n"
    }
  }
}

When a task completes, n8n receives a POST with the result and an HMAC signature in X-Relay-Signature.

Config File (v1.2+)

Optional ~/.bot-relay/config.json:

{
  "transport": "http",
  "http_port": 3777,
  "http_host": "127.0.0.1",
  "webhook_timeout_ms": 5000,
  "http_secret": null,
  "trusted_proxies": []
}

Env vars override file config: RELAY_TRANSPORT, RELAY_HTTP_PORT, RELAY_HTTP_HOST, RELAY_HTTP_SECRET, RELAY_TRUSTED_PROXIES (comma-separated CIDRs).

Trusted Proxies and X-Forwarded-For (v1.6.2)

By default, the relay IGNORES the X-Forwarded-For header completely. Rate limits are keyed on the direct socket peer IP only. This prevents a caller from sending a spoofed header to get their own rate-limit bucket.

If you front the relay with Cloudflare, nginx, or any other reverse proxy, configure trusted_proxies with CIDRs of those proxies:

{
  "trusted_proxies": ["127.0.0.0/8", "::1/128", "10.0.0.0/8"]
}

Or via env var:

RELAY_TRUSTED_PROXIES="127.0.0.0/8,::1/128,10.0.0.0/8"

When the direct peer IP falls in the trusted list, the relay walks the X-Forwarded-For chain right-to-left, skipping trusted hops, and uses the leftmost-untrusted hop as the real client IP. This matches RFC 7239 §7.4 and how nginx/Express normally handle this.

Per-Agent Tokens (v1.7)

Every tool call (other than first-time register_agent and /health) requires an agent token. The token identifies WHO is calling — separate from and stronger than the shared HTTP secret, which only identifies a trusted network.

Issuing a token — first registration:

# The response returns `agent_token` ONCE. Save it.
{
  "jsonrpc": "2.0", "id": 1, "method": "tools/call",
  "params": {
    "name": "register_agent",
    "arguments": { "name": "builder", "role": "builder", "capabilities": ["tasks"] }
  }
}

The server stores only a bcrypt hash. The raw token is surfaced once in the response agent_token field and echoed to stderr as [auth] New agent_token issued for "builder". Save it: RELAY_AGENT_TOKEN=.... If you lose it, unregister_agent (auth'd) then re-register.

Presenting the token on every subsequent call — three ways:

  1. Arg field (works for stdio + HTTP):
    { "name": "send_message", "arguments": { "from": "builder", "to": "ops", "content": "hi", "agent_token": "..." } }
    
  2. HTTP header:
    X-Agent-Token: <token>
    
  3. Env var (stdio flow, also picked up by HTTP client wrappers):
    export RELAY_AGENT_TOKEN=<token>
    

Capabilities are set at first registration and are immutable (v1.7.1). To change an agent's capability set, call unregister_agent (with its token) then register_agent with the new capability list. Re-register attempts that change caps are ignored with a capabilities_note in the response.

Capability catalog:

  • spawn — required for spawn_agent
  • tasks — required for post_task, update_task
  • webhooks — required for register_webhook, list_webhooks, delete_webhook
  • broadcast — required for broadcast
  • All other tools are always allowed for any authenticated agent (no capability check).

Migration for pre-v1.7 agents (v2.1+): agents registered before v1.7 have no token hash. A register_agent call against such a row self-migrates — the relay detects the null hash, issues a fresh token, and the agent is first-class from that point on. No RELAY_ALLOW_LEGACY=1 required for the migration call itself. RELAY_ALLOW_LEGACY is still available as a coarser escape hatch for non-register tool calls against unmigrated legacy rows (e.g., if you want send_message to work before an agent has migrated); turn it OFF once all your agents have migrated.

Encryption at Rest (v1.7 opt-in; keyring + rotation in v2.1 Phase 4b.3)

Set the keyring to encrypt message/task/audit/webhook content fields in the SQLite database with AES-256-GCM. Three configuration sources (pick exactly one — multi-set is rejected at startup):

# 1. Inline JSON (for CI / secrets managers)
export RELAY_ENCRYPTION_KEYRING='{"current":"k1","keys":{"k1":"<base64-32>"}}'

# 2. File path (operator-friendly; chmod 600)
export RELAY_ENCRYPTION_KEYRING_PATH=~/.bot-relay/keyring.json

# 3. Legacy single-key (auto-wraps to { current: "k1", keys: { k1: <value> } }; deprecation warning at startup)
export RELAY_ENCRYPTION_KEY="<base64-32>"

# Generate a key:
openssl rand -base64 32
# or:
node -e 'console.log(require("crypto").randomBytes(32).toString("base64"))'

When the keyring is set, the relay transparently encrypts on write (with current key) and decrypts on read (with any key in the keyring). Every ciphertext carries an enc:<key_id>:... prefix so rows are self-describing. Legacy enc1:... rows (pre-Phase-4b.3 deployments) decrypt via RELAY_ENCRYPTION_LEGACY_KEY_ID (default k1).

Rotating keys (online)

Full runbook at docs/key-rotation.md. In summary:

  1. Add the new key to the keyring while keeping the old one (both decrypt; current still points to old).
  2. Flip current to the new key; restart. New writes use the new key.
  3. relay re-encrypt --from old_key_id --to new_key_id --yes — scans + migrates all existing rows across 5 encrypted columns. Resumable.
  4. relay re-encrypt --verify-clean old_key_id — exit 0 = safe to retire.
  5. Remove the old key from the keyring; restart.

Without the keyring set, content is stored plaintext (default, convenient for local dev).

Rotation Guide — HTTP Shared Secret (v1.7)

The RELAY_HTTP_SECRET shared secret can be rotated without downtime using a grace window:

Step 1 — promote the new secret as primary, keep the old as previous:

RELAY_HTTP_SECRET="new-secret-v2" \
RELAY_HTTP_SECRET_PREVIOUS="old-secret-v1" \
RELAY_TRANSPORT=http node dist/index.js

During this window, BOTH secrets are accepted. Requests using the old secret receive an X-Relay-Secret-Deprecated: true response header as a signal to upgrade.

Step 2 — update every client to present new-secret-v2 in their Authorization: Bearer … or X-Relay-Secret header.

Step 3 — watch for the deprecation header on your dashboard/logs until no more requests use the old secret.

Step 4 — drop the old secret:

RELAY_HTTP_SECRET="new-secret-v2" \
RELAY_TRANSPORT=http node dist/index.js    # RELAY_HTTP_SECRET_PREVIOUS unset

Multiple previous secrets are supported as a comma-separated list:

RELAY_HTTP_SECRET_PREVIOUS="v1-secret,v0-secret"

Secret comparisons are timing-safe (v1.7.1 — crypto.timingSafeEqual), so an attacker cannot recover the secret via byte-by-byte response-timing measurement.

Multi-machine: centralized deployment (v2.1)

bot-relay-mcp is LLM-agnostic, CLI-agnostic, and deployment-flexible. Pick the path that fits your setup:

Path A — Single-machine (default). Stdio transport, per-terminal process, zero infrastructure. Best for solo development on one laptop. No secrets, no reverse proxies, no ops — run npm install + add the stdio entry to your MCP client config and you're done. Covered throughout this README.

Path B — Multi-machine (centralized, v2.1 Phase 7r). One bot-relay-mcp hub on a VPS, multiple thin MCP clients connecting via HTTP. Agents on different machines can send_message, post tasks, subscribe to webhooks, and join channels through shared state. No new architecture — just the HTTP transport we've had since v1.2, packaged with a convenience CLI in v2.1.

When to pick centralized

  • Two or more machines in play (dev laptop + CI, work + personal, family devices)
  • AI agents running on different hosts that need to coordinate
  • Team environments where multiple people connect their MCP clients to a shared relay

Quick pair flow

On the hub (VPS, reachable at e.g. https://relay.example.com): install bot-relay-mcp, run under systemd with RELAY_TRANSPORT=http + RELAY_HTTP_SECRET, terminate TLS with Caddy/nginx. See docs/multi-machine-deployment.md for the worked VPS runbook.

On each client machine:

relay pair https://relay.example.com \
  --name "$(whoami)-$(hostname -s)" \
  --role operator \
  --capabilities spawn,tasks,webhooks,broadcast,channels \
  --secret "$RELAY_HTTP_SECRET"

relay pair probes the hub, registers this machine as an agent, captures the returned one-time agent_token, and emits an MCP client config snippet ready to paste into ~/.claude.json / ~/.cursor/mcp.json / etc. Persist the token (export RELAY_AGENT_TOKEN=… in your shell rc) so hooks can authenticate on every terminal open.

Verify after pairing:

relay doctor --remote https://relay.example.com

Expected: PASS on reachability + protocol compatibility + token auth + hub auth config.

Trust-model tradeoffs

  • Hub operator can read plaintext messages in RAM (even with RELAY_ENCRYPTION_KEY set, decryption happens server-side for routing)
  • Hub is a single point of failure for cross-machine coordination
  • Recommended for trusted groups (families, small teams, personal multi-machine setups)
  • NOT recommended for mutually distrustful parties sharing a single hub, or compliance-bound workloads where in-RAM access by the operator is a policy violation

See SECURITY.md §Centralized deployment trust model for the full posture + incident response playbook.

Bridge to other tools via MCP

bot-relay-mcp is MCP-compatible, so your MCP client can connect to both bot-relay-mcp AND other MCP servers (Slack, Discord, Matrix, email, etc.) simultaneously. That's an operator deployment choice — we don't integrate those into bot-relay-mcp. See docs/multi-machine-deployment.md §3 for the pattern.

Zero-Friction Setup

To skip approval prompts for relay tools, add this to your project's .claude/settings.json:

{
  "permissions": {
    "allow": [
      "mcp__bot-relay__register_agent",
      "mcp__bot-relay__discover_agents",
      "mcp__bot-relay__send_message",
      "mcp__bot-relay__get_messages",
      "mcp__bot-relay__broadcast",
      "mcp__bot-relay__post_task",
      "mcp__bot-relay__update_task",
      "mcp__bot-relay__get_tasks",
      "mcp__bot-relay__get_task",
      "mcp__bot-relay__register_webhook",
      "mcp__bot-relay__list_webhooks",
      "mcp__bot-relay__delete_webhook"
    ]
  }
}

Auto-Check on Session Start

Add a SessionStart hook so every terminal automatically checks the relay for pending messages when it opens. See docs/hooks.md for the full configuration.

Near-Real-Time Mail Delivery (v1.8)

The SessionStart hook only fires when a terminal opens. If an agent is actively working and mail arrives mid-session, it does not see the message until next startup (or a human pastes it in).

v1.8 adds a PostToolUse hook — hooks/post-tool-use-check.sh — that fires after every tool call, checks the mailbox, and injects pending messages as additionalContext so the running session picks them up immediately.

Install per-project (NOT global), in <project>/.claude/settings.json:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "*",
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/bot-relay-mcp/hooks/post-tool-use-check.sh",
            "timeout": 5
          }
        ]
      }
    ]
  }
}

Important — if your path contains spaces, single-quote it inside the JSON string. Claude Code passes the command to the shell, which splits on whitespace. Without the single quotes the hook silently fails with errors like /bin/sh: ... is a directory. Example for a real installation at /Users/maxime/Documents/Ai stuff/Claude AI/bot-relay-mcp/:

"command": "'/Users/maxime/Documents/Ai stuff/Claude AI/bot-relay-mcp/hooks/post-tool-use-check.sh'"

The outer double-quotes are JSON; the inner single-quotes are shell. Paths with no spaces do not need this treatment.

The hook prefers the HTTP path when RELAY_AGENT_TOKEN is set and the daemon is running (full auth + audit), falling back to direct sqlite on RELAY_DB_PATH otherwise. It does NOT re-register (SessionStart handles that), does NOT check tasks (simpler focus, less context pressure), and silent-exits when there is no mail. Full docs + troubleshooting in docs/post-tool-use-hook.md.

Honest limitation: idle terminals get no delivery. The hook only fires when the agent is actively running tool calls. For long-idle windows, still rely on SessionStart + human attention.

Turn-End Mail Delivery (v2.1)

PostToolUse only fires on turns that include at least one tool call. A text-only turn (Claude responds with no tool invocation) does not trigger it. The Stop hook — hooks/stop-check.sh — closes that gap by firing on every turn-end, whether or not the turn invoked tools. Install both together in your project's .claude/settings.json:

{
  "hooks": {
    "Stop": [
      {
        "matcher": "*",
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/bot-relay-mcp/hooks/stop-check.sh",
            "timeout": 5
          }
        ]
      }
    ]
  }
}

Same single-quote-the-path-if-it-contains-spaces rule, same env vars, same HTTP/sqlite fallback, same silent-fail contract as PostToolUse. Full docs + troubleshooting in docs/stop-hook.md.

Honest limitation: Stop does NOT wake truly idle terminals. If no turn is in progress, neither hook fires. For long-idle windows, use the Layer 2 Managed Agent reference (examples/managed-agent-reference/).

Backup & Restore (v2.1)

Two CLIs for disaster recovery:

relay-backup                              # snapshot to ~/.bot-relay/backups/
relay-backup --output /srv/backup.tgz     # custom path
relay-restore ~/.bot-relay/backups/relay-backup-<iso>.tar.gz

relay-backup produces a tar.gz of the live DB (via a consistent VACUUM INTO snapshot — safe while the daemon is running), the optional config.json, and a manifest.json with schema version and row counts. Works identically on the native better-sqlite3 driver and the optional sql.js wasm driver.

relay-restore always safety-backs-up the current DB first (to ~/.bot-relay/backups/pre-restore-<iso>.tar.gz). If that safety backup fails, the restore aborts untouched. It then refuses if the daemon appears to be running (/health probe, best-effort), refuses schema-version mismatches (higher = hard refuse, lower = --force overrides), runs PRAGMA integrity_check on the extracted DB, and finally atomic-swaps the new DB into place.

Full docs + troubleshooting in docs/backup-restore.md.

Lost-Token Recovery (v2.1)

Close a terminal, lose RELAY_AGENT_TOKEN, and the relay rejects your register_agent with AUTH_FAILED because the row is intact. Clear the registration so the agent can re-bootstrap:

relay recover <agent-name>                 # interactive confirm
relay recover <agent-name> --yes           # skip confirm (for scripts)
relay recover <agent-name> --dry-run       # show what would change, commit nothing
relay recover <agent-name> --db-path PATH  # non-default DB location

Messages and tasks addressed to the agent are preserved — only the agents + agent_capabilities rows are cleared. After recovery, the operator calls register_agent with the same name/role/capabilities and captures a fresh agent_token.

Trust model: filesystem access to ~/.bot-relay/relay.db IS the authority (same boundary the daemon relies on). Not an MCP tool — the caller by definition cannot authenticate. The CLI emits an audit_log entry with tool='recovery.cli' + the operator's OS username for incident traceability.

Cross-Platform Spawn (v1.9)

spawn_agent opens a new Claude Code terminal on macOS, Linux, and Windows via a driver abstraction:

  • macOSbin/spawn-agent.sh (iTerm2 → Terminal.app). Unchanged from v1.6.4, preserves the 3-layer hardening + 19-payload adversarial test suite.
  • Linuxgnome-terminalkonsolextermtmux fallback chain. The tmux fallback creates a detached session (attach later with tmux attach -t <agent-name>) — covers headless servers with no GUI.
  • Windowswt.exe (Windows Terminal) → powershell.execmd.exe.

Driver selection: RELAY_TERMINAL_APP override (allowlist-gated) > process.platform auto-detect > in-driver fallback chain.

Full install requirements per platform + manual smoke-test checklists + troubleshooting: docs/cross-platform-spawn.md.

Env-var propagation is minimal by default (principle of least authority): system essentials + anything prefixed RELAY_*. Secrets like AWS_SECRET_ACCESS_KEY are NOT passed to spawned agents unless explicitly prefixed.

Plug-and-play defaults (v2.1.2): spawned terminals are configured for autonomous work out of the box — they auto-pull mail from their inbox on first turn instead of idling, run with --permission-mode bypassPermissions so they don't ask the operator to approve every tool call, get an iTerm2 / session-picker title set to the agent name, and run at --effort high to cap token spend (parent terminals doing strategic work may use xhigh; spawned children doing mechanical work shouldn't inherit it). Every default has an env override for the rare case the legacy behavior is wanted:

Default Env override Notes
Kickstart prompt sent as positional arg RELAY_SPAWN_KICKSTART="custom" / RELAY_SPAWN_NO_KICKSTART=1 The default tells the spawned agent to pull get_messages and act on inbox.
--permission-mode bypassPermissions RELAY_SPAWN_PERMISSION_MODE=<mode> Allowlist: acceptEdits, auto, bypassPermissions, default, dontAsk, plan.
--name <agent> RELAY_SPAWN_DISPLAY_NAME="custom" Shows up as the iTerm2 tab + Claude Code session title.
--effort high RELAY_SPAWN_EFFORT=<level> Allowlist: low, medium, high, xhigh, max.

Layer 2: Managed Agents (v1.10)

Agents that are NOT Claude Code terminals — Python daemons, Node workers, Hermes/Ollama integrations, custom scripts. They connect to the relay via HTTP (recommended) or direct SQLite, use the same 25 MCP tools (v2.1), and authenticate with per-agent tokens. If registered with managed:true, they also receive token-rotation push-messages over the normal get_messages channel — see docs/managed-agent-protocol.md.

Full integration guide with mental model, auth flow, lifecycle, error patterns, and security notes: docs/managed-agent-integration.md.

Runnable reference implementations (stdlib-only, ~200 LOC each):

Both demonstrate: register, send/receive messages, accept + complete tasks, discover peers, SIGINT cleanup. Each has a SMOKE.md with a 5-step manual verification checklist.

SQLite Driver Options (v1.11)

The relay uses SQLite for persistent state. Two drivers are available:

  • native (default)better-sqlite3, a compiled C addon. Fast, supports WAL mode, multi-process safe. Requires a C++ compiler at npm install time.
  • wasmsql.js, SQLite compiled to WebAssembly. Zero native compilation. Slightly slower writes (in-memory + write-back-to-file). Single-process only (not safe for multi-terminal stdio).

Switch with one env var:

npm install sql.js                    # one-time install of the optional dep
RELAY_SQLITE_DRIVER=wasm node dist/index.js

Both drivers read the same relay.db file format. Full details, performance notes, and limitations: docs/sqlite-wasm-driver.md.

Roadmap

  • v1.1: Local relay, 9 tools, SQLite, auto-purge
  • v1.2: HTTP transport, webhook system, config file — 12 tools
  • v1.3: Presence integrity, unregister_agent, hook delivers mail — 13 tools
  • v1.4: spawn_agent + role templates + dashboard — 14 tools
  • v1.5: Built-in security — Bearer auth, audit log, rate limiting
  • v1.6: Hardening pass — SSRF, input validation, path traversal, stdout discipline
  • v1.7: Per-agent tokens, secret rotation, at-rest encryption, capability scoping
  • v1.8: Near-real-time mail via PostToolUse hook
  • v1.9: Cross-platform spawn (macOS / Linux / Windows / tmux) — Node/TS driver abstraction
  • v1.10: Layer 2 Managed Agents — reference Python + Node workers
  • v1.11: SQLite WASM driver (sql.js opt-in) — zero native compilation on Windows/Alpine/Docker/CI
  • v2.0: Plug-and-play — channels, smart routing (post_task_auto), task leases + heartbeat, lazy health monitor, session-aware reads, busy/DND, health_check, webhook retry with CAS, payload size limits, config validation, auto-unregister, dead-agent purge, debug mode. 22 tools.
  • v2.1 (current): Architectural completion — explicit auth_state machine, managed-agent rotation grace, versioned ciphertext + keyring with online rotation (relay re-encrypt), lost-token recovery CLI (relay recover), admin-initiated cross-agent rotation (rotate_token_admin), structured error_code catalog, protocol_version surface, Phase 4p webhook-secret encryption, Phase 4b.1 v2 revoke/recovery redesign. 25 tools. 14 of 14 Codex architectural findings closed.
  • v2.2: Polish — batch operations, fan-out/fan-in, scheduled messages, metrics endpoint, message ACK, private channels, token scoping, idle-terminal wake.
  • v2.5: Federation — cross-machine peering, E2E encryption.

Dashboard

When running in http or both mode, open http://127.0.0.1:3777/ in a browser. You'll see:

  • Live agent presence (online / stale / offline)
  • Active tasks with priority and assignment
  • Recent messages
  • Registered webhooks
  • Recently completed tasks

Auto-refreshes every 3 seconds. Useful for "what's happening across all my terminals right now?" at a glance.

Role Templates

See roles/ for drop-in role specs. Examples:

  • planner.md — orchestrator that delegates and synthesizes
  • builder.md — worker that accepts and completes tasks
  • reviewer.md — skeptical reviewer with structured output
  • researcher.md — investigates questions, returns findings

Three ways to apply a role: paste into project CLAUDE.md, pass as initial_message when spawning, or wire via shell alias.

Requirements

  • Node.js 18+
  • Claude Code (or any MCP-compatible client)

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured