MCP Servers

Triage MCP Server

Enables AI agents to autonomously check, diagnose, and recover Dockerized services through safe, tool-based ops without direct host shell access.

README

🩺 Triage — a self-healing ops MCP for any Dockerized service

Let an AI agent (or a human) check, diagnose, and recover a service — without a host shell.

Most "give the agent ops powers" setups are bad: you either hand the model a raw shell (now it can roam the whole box and conflate unrelated subsystems), or you wire up dashboards a model can't read. Triage is the third option:

A small MCP server that exposes a handful of health/diagnose/recover tools. Each returns raw evidence AND a plain-English translation, a suggested action, and whether the fix is safe to auto-apply. The agent acts through tools — it never touches the host directly.

The policy that makes it safe

Class	Tools	Behaviour
Auto-fix safe	`triage_restart_process`, `triage_recover`	An agent may run these on its own and report after. Infra only — no data touched.
Ask before risky	`triage_apply(confirm=true)`	Anything that could lose data / change external state. Dry-run unless `confirm=true`.
Can't self-fix	(reported)	Diagnosed and handed to the human with exact steps — never faked.

The dual raw + layman output is the differentiator: the agent gets structured data to act on, and the human gets a sentence they can actually understand ("Postiz's API engine isn't running — the known cold-boot hiccup. I'll restart it.").

Tools

Tool	Kind	What it does
`triage_health()`	read	Containers + configured in-container processes + optional dependency ping.
`triage_diagnose()`	read	Health check matched to a runbook → issues with raw + plain-English + action + `can_auto_fix`.
`triage_logs(lines)`	read	Raw service log tail.
`triage_restart_process(name)`	safe	Restart one in-container process (pm2).
`triage_recover()`	safe	Recreate the service container from compose. No volumes/data touched.
`triage_apply(confirm)`	risky	Dry-run by default; runs the configured risky command only on `confirm=true`.

Configure (zero code changes)

Everything is env-driven — point it at any compose-managed service:

TRIAGE_COMPOSE=/path/to/docker-compose.yaml   # compose file
TRIAGE_SERVICE=app                            # the main container/service name
TRIAGE_LABEL="My App"                         # friendly name used in messages
TRIAGE_PROCS=backend,worker                   # optional: in-container processes to watch
TRIAGE_PROC_MGR=pm2                           # "pm2" | "none"
TRIAGE_DB_PING="docker exec app-db pg_isready"  # optional: rc 0 = dependency healthy
TRIAGE_RISKY_CMD=""                           # optional: a guarded recovery (clear a queue, etc.)
TRIAGE_RISKY_DESC="clear the stuck job queue"
TRIAGE_PORT=9500

See .env.example.

Run

pip install -r requirements.txt
python3 triage.py            # serves an MCP over streamable-http on TRIAGE_PORT

Register it with your agent runtime (any MCP client). For an always-on host service, use the included launchd template com.triage.ops.plist (macOS) — adapt to systemd on Linux.

Hard-won lessons baked in

Agents in a container can't see host processes. Give them status tools, not a shell. With shell access a model conflates unrelated subsystems and reports false negatives. Tools keep it honest.
Two reports, always. Structured raw for the agent to branch on; a one-sentence layman for the human. A health check the human can't read is half a tool.
Encode the safe/risky boundary in the tool, not the prompt. "Don't clear the queue without asking" in a system prompt is a suggestion; a confirm=true-gated dry-run is a guarantee.
docker compose ps --format json varies by version (NDJSON vs single array) — handle both.
Recover ≠ restart. A dead process needs a restart; an unhealthy container needs a recreate. Separate tools so the agent escalates correctly.

Built by

Built by KodeKing · author Fazal Shah. We build local, private, multi-agent AI systems for teams who can't send their data to the cloud. Issues and PRs welcome.

License

MIT — see LICENSE.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured