MCP Servers

bot-relay-mcp

MCP coordination primitive for multi-agent workflows. Lets multiple AI coding agents (Claude Code, Cursor, n8n, custom clients) discover each other and exchange messages and tasks via a shared SQLite-backed message bus.

README

bot-relay-mcp

A local-first message relay for AI coding agents. Two interfaces, one shared SQLite database, zero infrastructure.

v2.1 — architecturally complete. 25 tools. Everything v2.0 delivered (smart routing, task leases, session-aware reads, lazy health monitor, busy/DND status, webhook retries, channels) + the v2.1 sweep: explicit auth_state machine with revoke/recovery flow, managed-agent rotation grace, keyring-based encryption with online rotation, unified relay CLI with recover + re-encrypt + doctor + init + test + generate-hooks + backup + restore. 14 of 14 Codex architectural findings closed. See CHANGELOG for the full arc.

What is this?

bot-relay-mcp gives AI coding agents and external systems a way to coordinate.

Two audiences, two transports:

AI coding agents (Claude Code, Cursor, Cline, Zed) connect via stdio MCP. Drop one entry into ~/.claude.json and the relay's tools appear inside Claude. No daemon required.
External systems (n8n, Slack, Telegram, custom scripts) connect via HTTP+SSE with optional Bearer auth. Trigger agent actions or receive webhook events.

Everything reads and writes the same SQLite file at ~/.bot-relay/relay.db. There is no cloud, no daemon you have to install, no service mesh.

Quick Start (30 seconds)

Once published to npm, setup is a single config entry — no cloning, no compiling, no absolute paths.

Add to your ~/.claude.json:

{
  "mcpServers": {
    "bot-relay": {
      "command": "npx",
      "args": ["-y", "bot-relay-mcp"],
      "type": "stdio"
    }
  }
}

The first invocation fetches the package and starts the server. Subsequent launches are instant.

Quick Start (from source)

git clone https://github.com/Maxlumiere/bot-relay-mcp.git
cd bot-relay-mcp
npm install
npm run build

Add to ~/.claude.json:

{
  "mcpServers": {
    "bot-relay": {
      "command": "node",
      "args": ["/absolute/path/to/bot-relay-mcp/dist/index.js"],
      "type": "stdio"
    }
  }
}

Open two Claude Code terminals and try it:

Terminal A:

> Register on the relay as "planner" with role "orchestrator"
> Discover other agents
> Send a message to "builder": "Can you handle the API layer?"

Terminal B:

> Register on the relay as "builder" with role "builder"
> Check my relay messages
> Reply to planner: "On it."

The database is created automatically at ~/.bot-relay/relay.db on first use. That is the full setup.

File permissions (v2.1). The relay creates ~/.bot-relay/ at 0700 and relay.db + backup tarballs at 0600 — owner-only. config.json is operator-managed; the relay never chmods it but logs a warning at startup if it's more permissive than 0600. POSIX only — native Windows NTFS uses ACLs, not POSIX modes, so the chmod calls are no-ops there (documented).

Tools

Identity

Tool	Inputs	Description
`register_agent`	`name`, `role`, `capabilities[]`	Register this terminal as a named agent. Uses upsert — safe to call multiple times.
`unregister_agent`	`name`	Remove an agent from the relay. Idempotent. Fires `agent.unregistered` webhook on success.
`discover_agents`	`role` (optional)	List all registered agents with status (online/stale/offline).
`spawn_agent`	`name`, `role`, `capabilities`, `cwd?`, `initial_message?`	Spawn a new Claude Code terminal pre-configured as a relay agent. Cross-platform (v1.9): macOS (iTerm2/Terminal.app), Linux (gnome-terminal/konsole/xterm/tmux fallback chain — tmux covers headless servers), Windows (wt.exe/powershell.exe/cmd.exe). See `docs/cross-platform-spawn.md`.

Messaging

Tool	Inputs	Description
`send_message`	`from`, `to`, `content`, `priority`	Send a direct message to another agent by name.
`get_messages`	`agent_name`, `status`, `limit`	Check your mailbox. Pending messages are auto-marked as read.
`broadcast`	`from`, `content`, `role` (optional)	Send a message to all registered agents (or filter by role).

Tasks

Tool	Inputs	Description
`post_task`	`from`, `to`, `title`, `description`, `priority`	Assign a task to another agent.
`post_task_auto` (v2.0)	`from`, `title`, `description`, `required_capabilities[]`, `priority`	Auto-route to the least-loaded agent whose capabilities match ALL required. Queues if no match; assigns on the next capable registration.
`update_task`	`task_id`, `agent_name`, `action`, `result?`	Actions: `accept` / `complete` / `reject` / `cancel` (v2.0, requester-only) / `heartbeat` (v2.0, renews lease). State machine + CAS enforced.
`get_tasks`	`agent_name`, `role`, `status`, `limit`	Query your task queue (assigned to you or posted by you).
`get_task`	`task_id`	Get a single task by ID with full details.

Channels (v2.0)

Tool	Inputs	Description
`create_channel`	`name`, `description?`, `creator`	Create a named channel for multi-agent coordination. Requires `channels` capability.
`join_channel`	`channel_name`, `agent_name`	Join any public channel.
`leave_channel`	`channel_name`, `agent_name`	Leave a channel.
`post_to_channel`	`channel_name`, `from`, `content`, `priority`	Post to a channel you are a member of.
`get_channel_messages`	`channel_name`, `agent_name`, `limit`, `since?`	Read messages posted to a channel since your join time.

Status + Health (v2.0)

Tool	Inputs	Description
`set_status`	`agent_name`, `status`	Signal `online` / `busy` / `away` / `offline`. `busy`/`away` exempt you from health-monitor task reassignment.
`health_check`	(none)	Report relay version, uptime, and live counts (agents, messages, tasks, channels). No auth required.

Webhooks (v1.2+)

Tool	Inputs	Description
`register_webhook`	`url`, `event`, `filter`, `secret`	Subscribe to relay events via HTTP POST.
`list_webhooks`	(none)	List all registered webhook subscriptions.
`delete_webhook`	`webhook_id`	Remove a webhook subscription.

Supported events: message.sent, message.broadcast, task.posted, task.accepted, task.completed, task.rejected, task.cancelled (v2.0), task.auto_routed (v2.0), task.health_reassigned (v2.0), channel.message_posted (v2.0), agent.unregistered, agent.spawned, webhook.delivery_failed, * (all).

v2.0 — retry with backoff. Failed webhook deliveries retry at 60s / 300s / 900s (3 attempts). CAS-claimed per row — no double delivery. Piggybacks on webhook-firing tool calls, no background thread.

When secret is provided, each delivery includes an X-Relay-Signature: sha256=... HMAC header. Filter optionally restricts firing to events where from_agent or to_agent matches.

Example: Task Delegation

Terminal A — Orchestrator:

1. register_agent("orchestrator", "planner", ["delegation", "review"])
2. discover_agents() → sees "worker" is online
3. post_task(from: "orchestrator", to: "worker",
     title: "Write auth tests",
     description: "Cover login, logout, token refresh. Use vitest.",
     priority: "high")
4. send_message(from: "orchestrator", to: "worker",
     content: "Task posted — check your queue.")

Terminal B — Worker:

1. register_agent("worker", "builder", ["testing", "backend"])
2. get_messages("worker") → message from orchestrator
3. get_tasks("worker", role: "assigned", status: "posted") → auth test task
4. update_task(task_id, "worker", "accept")
5. ... does the work ...
6. update_task(task_id, "worker", "complete", result: "12 tests passing")
7. send_message(from: "worker", to: "orchestrator",
     content: "Auth tests done. All passing.")

Terminal A checks results:

1. get_messages("orchestrator") → "Auth tests done."
2. get_task(task_id) → status: completed, result: "12 tests passing"

How It Works

Every Claude Code terminal spawns its own MCP server process via stdio. All processes read and write the same SQLite file at ~/.bot-relay/relay.db. SQLite WAL mode handles concurrent access safely. Messages older than 7 days and completed tasks older than 30 days are purged automatically on startup.

Unified `relay` CLI (v2.1)

One entry, six subcommands: doctor / init / test / generate-hooks / backup / restore. First-run setup:

relay init          # interactive
relay init --yes    # defaults + random HTTP secret

relay doctor runs a diagnostic sweep; relay test runs a minimal self-check against a throwaway relay; relay generate-hooks emits Claude Code hook JSON for ~/.claude/settings.json. Full reference in docs/cli.md. The standalone bin/relay-backup + bin/relay-restore from Phase 2c have been absorbed into relay backup + relay restore.

Token lifecycle (v2.1)

Two new tools for credential hygiene: rotate_token lets an agent swap its own token with history preserved; revoke_token lets an admin-capable agent nullify another agent's token_hash (target re-bootstraps via the Phase 2b migration path). New admin capability is never auto-granted — register admin agents explicitly. Full operator runbook in docs/token-lifecycle.md.

Error codes (v2.1)

Every tool error response carries a stable error_code token alongside the free-form error string. Branch on the code; never string-match the message. Full catalog + stability guarantee in docs/error-codes.md. Source of truth: src/error-codes.ts.

Protocol version (v2.1)

Beyond the package version string, the relay surfaces a protocol_version via register_agent + health_check responses. Clients should key compatibility on protocol_version (bumps only on tool-surface changes) rather than the package version (bumps on every ship). See docs/protocol-version.md for SemVer rules + a client-side compatibility snippet.

HTTP Mode — for n8n, Slack, Telegram, custom scripts

Run the relay as an HTTP daemon and any HTTP client can drive it:

RELAY_TRANSPORT=http RELAY_HTTP_SECRET=your-shared-secret node dist/index.js
# Listens on http://127.0.0.1:3777

Production deployment — set RELAY_HTTP_SECRET. v2.1 refuses to start on a non-loopback host (0.0.0.0, a public IP, Docker -p 3777:3777 without loopback pinning) unless RELAY_HTTP_SECRET is set. Loopback binds (127.0.0.1, ::1, localhost) stay zero-config for local development.

Dev-only escape hatch: RELAY_ALLOW_OPEN_PUBLIC=1 lets the relay start anyway on a public host without a secret — useful for throwaway local Docker nets, but logs a loud warning every startup. Never use in production.

Endpoints:

POST /mcp — JSON-RPC (the MCP protocol over HTTP+SSE). Requires Authorization: Bearer <secret> if http_secret is configured.
GET /health — server status (always open, no auth)
GET / — built-in dashboard (live view of agents, messages, tasks, webhooks). v2.1 Phase 4d: protected by Host-header allowlist (DNS-rebinding defense) + auth gate (RELAY_DASHBOARD_SECRET or RELAY_HTTP_SECRET fallback). Loopback binds allow no-secret access for dev; non-loopback binds require a secret. Full policy in docs/dashboard-security.md.
GET /api/snapshot — JSON snapshot of relay state (same gates as /)

Three transport modes:

stdio (default) — per-terminal, for AI coding agents
http — daemon, for external systems
both — HTTP daemon plus a stdio connection (useful for bridge scripts)

All transports share the same SQLite database. Stdio agents and HTTP clients see the same world.

Process-boundary reminder (v2.1.3): stdio MCP clients and the HTTP daemon are separate processes. Each Claude Code terminal with "type":"stdio" in ~/.claude.json spawns its own node dist/index.js child. Restarting the :3777 HTTP daemon never affects stdio clients — their own child processes are untouched. Operator /mcp reconnect is only needed after restart for "type":"http" MCP clients pointed at the daemon URL. See docs/transport-architecture.md for the full topology + post-restart operator checklist.

n8n integration example

Trigger a Claude Code agent from an n8n workflow:

POST http://127.0.0.1:3777/mcp
Authorization: Bearer your-shared-secret
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "post_task",
    "arguments": {
      "from": "n8n-workflow-42",
      "to": "builder",
      "title": "Process new lead",
      "description": "Lead data: ...",
      "priority": "high"
    }
  }
}

Then register a webhook so n8n hears about completion:

{
  "jsonrpc": "2.0", "id": 2, "method": "tools/call",
  "params": {
    "name": "register_webhook",
    "arguments": {
      "url": "https://your-n8n.example.com/webhook/abc",
      "event": "task.completed",
      "secret": "shared-with-n8n"
    }
  }
}

When a task completes, n8n receives a POST with the result and an HMAC signature in X-Relay-Signature.

Config File (v1.2+)

Optional ~/.bot-relay/config.json:

{
  "transport": "http",
  "http_port": 3777,
  "http_host": "127.0.0.1",
  "webhook_timeout_ms": 5000,
  "http_secret": null,
  "trusted_proxies": []
}

Env vars override file config: RELAY_TRANSPORT, RELAY_HTTP_PORT, RELAY_HTTP_HOST, RELAY_HTTP_SECRET, RELAY_TRUSTED_PROXIES (comma-separated CIDRs).

Trusted Proxies and X-Forwarded-For (v1.6.2)

By default, the relay IGNORES the X-Forwarded-For header completely. Rate limits are keyed on the direct socket peer IP only. This prevents a caller from sending a spoofed header to get their own rate-limit bucket.

If you front the relay with Cloudflare, nginx, or any other reverse proxy, configure trusted_proxies with CIDRs of those proxies:

{
  "trusted_proxies": ["127.0.0.0/8", "::1/128", "10.0.0.0/8"]
}

Or via env var:

RELAY_TRUSTED_PROXIES="127.0.0.0/8,::1/128,10.0.0.0/8"

When the direct peer IP falls in the trusted list, the relay walks the X-Forwarded-For chain right-to-left, skipping trusted hops, and uses the leftmost-untrusted hop as the real client IP. This matches RFC 7239 §7.4 and how nginx/Express normally handle this.

Per-Agent Tokens (v1.7)

Every tool call (other than first-time register_agent and /health) requires an agent token. The token identifies WHO is calling — separate from and stronger than the shared HTTP secret, which only identifies a trusted network.

Issuing a token — first registration:

# The response returns `agent_token` ONCE. Save it.
{
  "jsonrpc": "2.0", "id": 1, "method": "tools/call",
  "params": {
    "name": "register_agent",
    "arguments": { "name": "builder", "role": "builder", "capabilities": ["tasks"] }
  }
}

The server stores only a bcrypt hash. The raw token is surfaced once in the response agent_token field and echoed to stderr as [auth] New agent_token issued for "builder". Save it: RELAY_AGENT_TOKEN=.... If you lose it, unregister_agent (auth'd) then re-register.

Presenting the token on every subsequent call — three ways:

Arg field (works for stdio + HTTP):

{ "name": "send_message", "arguments": { "from": "builder", "to": "ops", "content": "hi", "agent_token": "..." } }

HTTP header:
```
X-Agent-Token: <token>
```
Env var (stdio flow, also picked up by HTTP client wrappers):
```
export RELAY_AGENT_TOKEN=<token>
```

Capabilities are set at first registration and are immutable (v1.7.1). To change an agent's capability set, call unregister_agent (with its token) then register_agent with the new capability list. Re-register attempts that change caps are ignored with a capabilities_note in the response.

Capability catalog:

spawn — required for spawn_agent
tasks — required for post_task, update_task
webhooks — required for register_webhook, list_webhooks, delete_webhook
broadcast — required for broadcast
All other tools are always allowed for any authenticated agent (no capability check).

Migration for pre-v1.7 agents (v2.1+): agents registered before v1.7 have no token hash. A register_agent call against such a row self-migrates — the relay detects the null hash, issues a fresh token, and the agent is first-class from that point on. No RELAY_ALLOW_LEGACY=1 required for the migration call itself. RELAY_ALLOW_LEGACY is still available as a coarser escape hatch for non-register tool calls against unmigrated legacy rows (e.g., if you want send_message to work before an agent has migrated); turn it OFF once all your agents have migrated.

Encryption at Rest (v1.7 opt-in; keyring + rotation in v2.1 Phase 4b.3)

Set the keyring to encrypt message/task/audit/webhook content fields in the SQLite database with AES-256-GCM. Three configuration sources (pick exactly one — multi-set is rejected at startup):

# 1. Inline JSON (for CI / secrets managers)
export RELAY_ENCRYPTION_KEYRING='{"current":"k1","keys":{"k1":"<base64-32>"}}'

# 2. File path (operator-friendly; chmod 600)
export RELAY_ENCRYPTION_KEYRING_PATH=~/.bot-relay/keyring.json

# 3. Legacy single-key (auto-wraps to { current: "k1", keys: { k1: <value> } }; deprecation warning at startup)
export RELAY_ENCRYPTION_KEY="<base64-32>"

# Generate a key:
openssl rand -base64 32
# or:
node -e 'console.log(require("crypto").randomBytes(32).toString("base64"))'

When the keyring is set, the relay transparently encrypts on write (with current key) and decrypts on read (with any key in the keyring). Every ciphertext carries an enc:<key_id>:... prefix so rows are self-describing. Legacy enc1:... rows (pre-Phase-4b.3 deployments) decrypt via RELAY_ENCRYPTION_LEGACY_KEY_ID (default k1).

Rotating keys (online)

Full runbook at docs/key-rotation.md. In summary:

Add the new key to the keyring while keeping the old one (both decrypt; current still points to old).
Flip current to the new key; restart. New writes use the new key.
relay re-encrypt --from old_key_id --to new_key_id --yes — scans + migrates all existing rows across 5 encrypted columns. Resumable.
relay re-encrypt --verify-clean old_key_id — exit 0 = safe to retire.
Remove the old key from the keyring; restart.

Without the keyring set, content is stored plaintext (default, convenient for local dev).

Rotation Guide — HTTP Shared Secret (v1.7)

The RELAY_HTTP_SECRET shared secret can be rotated without downtime using a grace window:

Step 1 — promote the new secret as primary, keep the old as previous:

RELAY_HTTP_SECRET="new-secret-v2" \
RELAY_HTTP_SECRET_PREVIOUS="old-secret-v1" \
RELAY_TRANSPORT=http node dist/index.js

During this window, BOTH secrets are accepted. Requests using the old secret receive an X-Relay-Secret-Deprecated: true response header as a signal to upgrade.

Step 2 — update every client to present new-secret-v2 in their Authorization: Bearer … or X-Relay-Secret header.

Step 3 — watch for the deprecation header on your dashboard/logs until no more requests use the old secret.

Step 4 — drop the old secret:

RELAY_HTTP_SECRET="new-secret-v2" \
RELAY_TRANSPORT=http node dist/index.js    # RELAY_HTTP_SECRET_PREVIOUS unset

Multiple previous secrets are supported as a comma-separated list:

RELAY_HTTP_SECRET_PREVIOUS="v1-secret,v0-secret"

Secret comparisons are timing-safe (v1.7.1 — crypto.timingSafeEqual), so an attacker cannot recover the secret via byte-by-byte response-timing measurement.

Multi-machine: centralized deployment (v2.1)

bot-relay-mcp is LLM-agnostic, CLI-agnostic, and deployment-flexible. Pick the path that fits your setup:

Path A — Single-machine (default). Stdio transport, per-terminal process, zero infrastructure. Best for solo development on one laptop. No secrets, no reverse proxies, no ops — run npm install + add the stdio entry to your MCP client config and you're done. Covered throughout this README.

Path B — Multi-machine (centralized, v2.1 Phase 7r). One bot-relay-mcp hub on a VPS, multiple thin MCP clients connecting via HTTP. Agents on different machines can send_message, post tasks, subscribe to webhooks, and join channels through shared state. No new architecture — just the HTTP transport we've had since v1.2, packaged with a convenience CLI in v2.1.

When to pick centralized

Two or more machines in play (dev laptop + CI, work + personal, family devices)
AI agents running on different hosts that need to coordinate
Team environments where multiple people connect their MCP clients to a shared relay

Quick pair flow

On the hub (VPS, reachable at e.g. https://relay.example.com): install bot-relay-mcp, run under systemd with RELAY_TRANSPORT=http + RELAY_HTTP_SECRET, terminate TLS with Caddy/nginx. See docs/multi-machine-deployment.md for the worked VPS runbook.

On each client machine:

relay pair https://relay.example.com \
  --name "$(whoami)-$(hostname -s)" \
  --role operator \
  --capabilities spawn,tasks,webhooks,broadcast,channels \
  --secret "$RELAY_HTTP_SECRET"

relay pair probes the hub, registers this machine as an agent, captures the returned one-time agent_token, and emits an MCP client config snippet ready to paste into ~/.claude.json / ~/.cursor/mcp.json / etc. Persist the token (export RELAY_AGENT_TOKEN=… in your shell rc) so hooks can authenticate on every terminal open.

Verify after pairing:

relay doctor --remote https://relay.example.com

Expected: PASS on reachability + protocol compatibility + token auth + hub auth config.

Trust-model tradeoffs

Hub operator can read plaintext messages in RAM (even with RELAY_ENCRYPTION_KEY set, decryption happens server-side for routing)
Hub is a single point of failure for cross-machine coordination
Recommended for trusted groups (families, small teams, personal multi-machine setups)
NOT recommended for mutually distrustful parties sharing a single hub, or compliance-bound workloads where in-RAM access by the operator is a policy violation

See SECURITY.md §Centralized deployment trust model for the full posture + incident response playbook.

Bridge to other tools via MCP

bot-relay-mcp is MCP-compatible, so your MCP client can connect to both bot-relay-mcp AND other MCP servers (Slack, Discord, Matrix, email, etc.) simultaneously. That's an operator deployment choice — we don't integrate those into bot-relay-mcp. See docs/multi-machine-deployment.md §3 for the pattern.

Zero-Friction Setup

To skip approval prompts for relay tools, add this to your project's .claude/settings.json:

{
  "permissions": {
    "allow": [
      "mcp__bot-relay__register_agent",
      "mcp__bot-relay__discover_agents",
      "mcp__bot-relay__send_message",
      "mcp__bot-relay__get_messages",
      "mcp__bot-relay__broadcast",
      "mcp__bot-relay__post_task",
      "mcp__bot-relay__update_task",
      "mcp__bot-relay__get_tasks",
      "mcp__bot-relay__get_task",
      "mcp__bot-relay__register_webhook",
      "mcp__bot-relay__list_webhooks",
      "mcp__bot-relay__delete_webhook"
    ]
  }
}

Auto-Check on Session Start

Add a SessionStart hook so every terminal automatically checks the relay for pending messages when it opens. See docs/hooks.md for the full configuration.

Near-Real-Time Mail Delivery (v1.8)

The SessionStart hook only fires when a terminal opens. If an agent is actively working and mail arrives mid-session, it does not see the message until next startup (or a human pastes it in).

v1.8 adds a PostToolUse hook — hooks/post-tool-use-check.sh — that fires after every tool call, checks the mailbox, and injects pending messages as additionalContext so the running session picks them up immediately.

Install per-project (NOT global), in <project>/.claude/settings.json:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "*",
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/bot-relay-mcp/hooks/post-tool-use-check.sh",
            "timeout": 5
          }
        ]
      }
    ]
  }
}

Important — if your path contains spaces, single-quote it inside the JSON string. Claude Code passes the command to the shell, which splits on whitespace. Without the single quotes the hook silently fails with errors like /bin/sh: ... is a directory. Example for a real installation at /Users/maxime/Documents/Ai stuff/Claude AI/bot-relay-mcp/:
"command": "'/Users/maxime/Documents/Ai stuff/Claude AI/bot-relay-mcp/hooks/post-tool-use-check.sh'"
The outer double-quotes are JSON; the inner single-quotes are shell. Paths with no spaces do not need this treatment.

The hook prefers the HTTP path when RELAY_AGENT_TOKEN is set and the daemon is running (full auth + audit), falling back to direct sqlite on RELAY_DB_PATH otherwise. It does NOT re-register (SessionStart handles that), does NOT check tasks (simpler focus, less context pressure), and silent-exits when there is no mail. Full docs + troubleshooting in docs/post-tool-use-hook.md.

Honest limitation: idle terminals get no delivery. The hook only fires when the agent is actively running tool calls. For long-idle windows, still rely on SessionStart + human attention.

Turn-End Mail Delivery (v2.1)

PostToolUse only fires on turns that include at least one tool call. A text-only turn (Claude responds with no tool invocation) does not trigger it. The Stop hook — hooks/stop-check.sh — closes that gap by firing on every turn-end, whether or not the turn invoked tools. Install both together in your project's .claude/settings.json:

{
  "hooks": {
    "Stop": [
      {
        "matcher": "*",
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/bot-relay-mcp/hooks/stop-check.sh",
            "timeout": 5
          }
        ]
      }
    ]
  }
}

Same single-quote-the-path-if-it-contains-spaces rule, same env vars, same HTTP/sqlite fallback, same silent-fail contract as PostToolUse. Full docs + troubleshooting in docs/stop-hook.md.

Honest limitation: Stop does NOT wake truly idle terminals. If no turn is in progress, neither hook fires. For long-idle windows, use the Layer 2 Managed Agent reference (examples/managed-agent-reference/).

Backup & Restore (v2.1)

Two CLIs for disaster recovery:

relay-backup                              # snapshot to ~/.bot-relay/backups/
relay-backup --output /srv/backup.tgz     # custom path
relay-restore ~/.bot-relay/backups/relay-backup-<iso>.tar.gz

relay-backup produces a tar.gz of the live DB (via a consistent VACUUM INTO snapshot — safe while the daemon is running), the optional config.json, and a manifest.json with schema version and row counts. Works identically on the native better-sqlite3 driver and the optional sql.js wasm driver.

relay-restore always safety-backs-up the current DB first (to ~/.bot-relay/backups/pre-restore-<iso>.tar.gz). If that safety backup fails, the restore aborts untouched. It then refuses if the daemon appears to be running (/health probe, best-effort), refuses schema-version mismatches (higher = hard refuse, lower = --force overrides), runs PRAGMA integrity_check on the extracted DB, and finally atomic-swaps the new DB into place.

Full docs + troubleshooting in docs/backup-restore.md.

Lost-Token Recovery (v2.1)

Close a terminal, lose RELAY_AGENT_TOKEN, and the relay rejects your register_agent with AUTH_FAILED because the row is intact. Clear the registration so the agent can re-bootstrap:

relay recover <agent-name>                 # interactive confirm
relay recover <agent-name> --yes           # skip confirm (for scripts)
relay recover <agent-name> --dry-run       # show what would change, commit nothing
relay recover <agent-name> --db-path PATH  # non-default DB location

Messages and tasks addressed to the agent are preserved — only the agents + agent_capabilities rows are cleared. After recovery, the operator calls register_agent with the same name/role/capabilities and captures a fresh agent_token.

Trust model: filesystem access to ~/.bot-relay/relay.db IS the authority (same boundary the daemon relies on). Not an MCP tool — the caller by definition cannot authenticate. The CLI emits an audit_log entry with tool='recovery.cli' + the operator's OS username for incident traceability.

Cross-Platform Spawn (v1.9)

spawn_agent opens a new Claude Code terminal on macOS, Linux, and Windows via a driver abstraction:

macOS — bin/spawn-agent.sh (iTerm2 → Terminal.app). Unchanged from v1.6.4, preserves the 3-layer hardening + 19-payload adversarial test suite.
Linux — gnome-terminal → konsole → xterm → tmux fallback chain. The tmux fallback creates a detached session (attach later with tmux attach -t <agent-name>) — covers headless servers with no GUI.
Windows — wt.exe (Windows Terminal) → powershell.exe → cmd.exe.

Driver selection: RELAY_TERMINAL_APP override (allowlist-gated) > process.platform auto-detect > in-driver fallback chain.

Full install requirements per platform + manual smoke-test checklists + troubleshooting: docs/cross-platform-spawn.md.

Env-var propagation is minimal by default (principle of least authority): system essentials + anything prefixed RELAY_*. Secrets like AWS_SECRET_ACCESS_KEY are NOT passed to spawned agents unless explicitly prefixed.

Plug-and-play defaults (v2.1.2): spawned terminals are configured for autonomous work out of the box — they auto-pull mail from their inbox on first turn instead of idling, run with --permission-mode bypassPermissions so they don't ask the operator to approve every tool call, get an iTerm2 / session-picker title set to the agent name, and run at --effort high to cap token spend (parent terminals doing strategic work may use xhigh; spawned children doing mechanical work shouldn't inherit it). Every default has an env override for the rare case the legacy behavior is wanted:

Default	Env override	Notes
Kickstart prompt sent as positional arg	`RELAY_SPAWN_KICKSTART="custom"` / `RELAY_SPAWN_NO_KICKSTART=1`	The default tells the spawned agent to pull `get_messages` and act on inbox.
`--permission-mode bypassPermissions`	`RELAY_SPAWN_PERMISSION_MODE=<mode>`	Allowlist: `acceptEdits`, `auto`, `bypassPermissions`, `default`, `dontAsk`, `plan`.
`--name <agent>`	`RELAY_SPAWN_DISPLAY_NAME="custom"`	Shows up as the iTerm2 tab + Claude Code session title.
`--effort high`	`RELAY_SPAWN_EFFORT=<level>`	Allowlist: `low`, `medium`, `high`, `xhigh`, `max`.

Layer 2: Managed Agents (v1.10)

Agents that are NOT Claude Code terminals — Python daemons, Node workers, Hermes/Ollama integrations, custom scripts. They connect to the relay via HTTP (recommended) or direct SQLite, use the same 25 MCP tools (v2.1), and authenticate with per-agent tokens. If registered with managed:true, they also receive token-rotation push-messages over the normal get_messages channel — see docs/managed-agent-protocol.md.

Full integration guide with mental model, auth flow, lifecycle, error patterns, and security notes: docs/managed-agent-integration.md.

Runnable reference implementations (stdlib-only, ~200 LOC each):

Python: examples/managed-agent-reference/python/agent.py
Node: examples/managed-agent-reference/node/agent.js

Both demonstrate: register, send/receive messages, accept + complete tasks, discover peers, SIGINT cleanup. Each has a SMOKE.md with a 5-step manual verification checklist.

SQLite Driver Options (v1.11)

The relay uses SQLite for persistent state. Two drivers are available:

native (default) — better-sqlite3, a compiled C addon. Fast, supports WAL mode, multi-process safe. Requires a C++ compiler at npm install time.
wasm — sql.js, SQLite compiled to WebAssembly. Zero native compilation. Slightly slower writes (in-memory + write-back-to-file). Single-process only (not safe for multi-terminal stdio).

Switch with one env var:

npm install sql.js                    # one-time install of the optional dep
RELAY_SQLITE_DRIVER=wasm node dist/index.js

Both drivers read the same relay.db file format. Full details, performance notes, and limitations: docs/sqlite-wasm-driver.md.

Roadmap

v1.1: Local relay, 9 tools, SQLite, auto-purge
v1.2: HTTP transport, webhook system, config file — 12 tools
v1.3: Presence integrity, unregister_agent, hook delivers mail — 13 tools
v1.4: spawn_agent + role templates + dashboard — 14 tools
v1.5: Built-in security — Bearer auth, audit log, rate limiting
v1.6: Hardening pass — SSRF, input validation, path traversal, stdout discipline
v1.7: Per-agent tokens, secret rotation, at-rest encryption, capability scoping
v1.8: Near-real-time mail via PostToolUse hook
v1.9: Cross-platform spawn (macOS / Linux / Windows / tmux) — Node/TS driver abstraction
v1.10: Layer 2 Managed Agents — reference Python + Node workers
v1.11: SQLite WASM driver (sql.js opt-in) — zero native compilation on Windows/Alpine/Docker/CI
v2.0: Plug-and-play — channels, smart routing (post_task_auto), task leases + heartbeat, lazy health monitor, session-aware reads, busy/DND, health_check, webhook retry with CAS, payload size limits, config validation, auto-unregister, dead-agent purge, debug mode. 22 tools.
v2.1 (current): Architectural completion — explicit auth_state machine, managed-agent rotation grace, versioned ciphertext + keyring with online rotation (relay re-encrypt), lost-token recovery CLI (relay recover), admin-initiated cross-agent rotation (rotate_token_admin), structured error_code catalog, protocol_version surface, Phase 4p webhook-secret encryption, Phase 4b.1 v2 revoke/recovery redesign. 25 tools. 14 of 14 Codex architectural findings closed.
v2.2: Polish — batch operations, fan-out/fan-in, scheduled messages, metrics endpoint, message ACK, private channels, token scoping, idle-terminal wake.
v2.5: Federation — cross-machine peering, E2E encryption.

Dashboard

When running in http or both mode, open http://127.0.0.1:3777/ in a browser. You'll see:

Live agent presence (online / stale / offline)
Active tasks with priority and assignment
Recent messages
Registered webhooks
Recently completed tasks

Auto-refreshes every 3 seconds. Useful for "what's happening across all my terminals right now?" at a glance.

Role Templates

See roles/ for drop-in role specs. Examples:

planner.md — orchestrator that delegates and synthesizes
builder.md — worker that accepts and completes tasks
reviewer.md — skeptical reviewer with structured output
researcher.md — investigates questions, returns findings

Three ways to apply a role: paste into project CLAUDE.md, pass as initial_message when spawning, or wire via shell alias.

Requirements

Node.js 18+
Claude Code (or any MCP-compatible client)

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

bot-relay-mcp

README

bot-relay-mcp

What is this?

Quick Start (30 seconds)

Quick Start (from source)

Tools

Identity

Messaging

Tasks

Channels (v2.0)

Status + Health (v2.0)

Webhooks (v1.2+)

Example: Task Delegation

How It Works

Unified relay CLI (v2.1)

Token lifecycle (v2.1)

Error codes (v2.1)

Protocol version (v2.1)

HTTP Mode — for n8n, Slack, Telegram, custom scripts

n8n integration example

Config File (v1.2+)

Trusted Proxies and X-Forwarded-For (v1.6.2)

Per-Agent Tokens (v1.7)

Encryption at Rest (v1.7 opt-in; keyring + rotation in v2.1 Phase 4b.3)

Rotating keys (online)

Rotation Guide — HTTP Shared Secret (v1.7)

Multi-machine: centralized deployment (v2.1)

When to pick centralized

Quick pair flow

Trust-model tradeoffs

Bridge to other tools via MCP

Zero-Friction Setup

Auto-Check on Session Start

Near-Real-Time Mail Delivery (v1.8)

Turn-End Mail Delivery (v2.1)

Backup & Restore (v2.1)

Lost-Token Recovery (v2.1)

Cross-Platform Spawn (v1.9)

Layer 2: Managed Agents (v1.10)

SQLite Driver Options (v1.11)

Roadmap

Dashboard

Role Templates

Requirements

License

Recommended Servers

Unified `relay` CLI (v2.1)