Triage MCP Server

Triage MCP Server

Enables AI agents to autonomously check, diagnose, and recover Dockerized services through safe, tool-based ops without direct host shell access.

Category
Visit Server

README

๐Ÿฉบ Triage โ€” a self-healing ops MCP for any Dockerized service

Let an AI agent (or a human) check, diagnose, and recover a service โ€” without a host shell.

Most "give the agent ops powers" setups are bad: you either hand the model a raw shell (now it can roam the whole box and conflate unrelated subsystems), or you wire up dashboards a model can't read. Triage is the third option:

A small MCP server that exposes a handful of health/diagnose/recover tools. Each returns raw evidence AND a plain-English translation, a suggested action, and whether the fix is safe to auto-apply. The agent acts through tools โ€” it never touches the host directly.

The policy that makes it safe

Class Tools Behaviour
Auto-fix safe triage_restart_process, triage_recover An agent may run these on its own and report after. Infra only โ€” no data touched.
Ask before risky triage_apply(confirm=true) Anything that could lose data / change external state. Dry-run unless confirm=true.
Can't self-fix (reported) Diagnosed and handed to the human with exact steps โ€” never faked.

The dual raw + layman output is the differentiator: the agent gets structured data to act on, and the human gets a sentence they can actually understand ("Postiz's API engine isn't running โ€” the known cold-boot hiccup. I'll restart it.").

Tools

Tool Kind What it does
triage_health() read Containers + configured in-container processes + optional dependency ping.
triage_diagnose() read Health check matched to a runbook โ†’ issues with raw + plain-English + action + can_auto_fix.
triage_logs(lines) read Raw service log tail.
triage_restart_process(name) safe Restart one in-container process (pm2).
triage_recover() safe Recreate the service container from compose. No volumes/data touched.
triage_apply(confirm) risky Dry-run by default; runs the configured risky command only on confirm=true.

Configure (zero code changes)

Everything is env-driven โ€” point it at any compose-managed service:

TRIAGE_COMPOSE=/path/to/docker-compose.yaml   # compose file
TRIAGE_SERVICE=app                            # the main container/service name
TRIAGE_LABEL="My App"                         # friendly name used in messages
TRIAGE_PROCS=backend,worker                   # optional: in-container processes to watch
TRIAGE_PROC_MGR=pm2                           # "pm2" | "none"
TRIAGE_DB_PING="docker exec app-db pg_isready"  # optional: rc 0 = dependency healthy
TRIAGE_RISKY_CMD=""                           # optional: a guarded recovery (clear a queue, etc.)
TRIAGE_RISKY_DESC="clear the stuck job queue"
TRIAGE_PORT=9500

See .env.example.

Run

pip install -r requirements.txt
python3 triage.py            # serves an MCP over streamable-http on TRIAGE_PORT

Register it with your agent runtime (any MCP client). For an always-on host service, use the included launchd template com.triage.ops.plist (macOS) โ€” adapt to systemd on Linux.

Hard-won lessons baked in

  • Agents in a container can't see host processes. Give them status tools, not a shell. With shell access a model conflates unrelated subsystems and reports false negatives. Tools keep it honest.
  • Two reports, always. Structured raw for the agent to branch on; a one-sentence layman for the human. A health check the human can't read is half a tool.
  • Encode the safe/risky boundary in the tool, not the prompt. "Don't clear the queue without asking" in a system prompt is a suggestion; a confirm=true-gated dry-run is a guarantee.
  • docker compose ps --format json varies by version (NDJSON vs single array) โ€” handle both.
  • Recover โ‰  restart. A dead process needs a restart; an unhealthy container needs a recreate. Separate tools so the agent escalates correctly.

Built by

Built by KodeKing ยท author Fazal Shah. We build local, private, multi-agent AI systems for teams who can't send their data to the cloud. Issues and PRs welcome.

License

MIT โ€” see LICENSE.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured