thruk-mcp
Enables natural language interaction with Thruk monitoring systems, allowing users to query hosts/services, schedule downtimes, acknowledge problems, and more via MCP-compatible clients.
README
thruk-mcp
Model Context Protocol (MCP) server for Thruk — the unified web frontend for Naemon, Nagios, Icinga and Shinken.
Expose Thruk's REST API to MCP-compatible clients (Claude Desktop, Dust, LibreChat, OpenWebUI...) so that an LLM can query hosts/services, schedule downtimes, acknowledge problems, force rechecks and more in natural language.
Features
- Read: hosts, services, hostgroups, servicegroups, downtimes, comments, sites, aggregated stats, current problems
- Write: schedule/delete downtimes, acknowledge & remove acks, force rechecks
- Escape hatch:
thruk_querytool to call any Thruk REST endpoint - Multi-backend support (Thruk federated sites): pass
backends="prod,dr"to any tool - Two transports: stdio (default) or Streamable-HTTP (
--listen <port>) - Async httpx client with proper error handling and TLS verification
- Tested with
pytest+respx, linted withruff, packaged withhatchling
Quick start
1. Configure
cp .env.example .env
$EDITOR .env # set THRUK_BASE_URL and THRUK_API_KEY
An API key can be created from the Thruk user profile page (requires api_keys_enabled in thruk_local.conf) or via the REST API itself.
2a. Run with Docker
docker compose up -d
# MCP Streamable-HTTP endpoint: http://localhost:8001/mcp
2b. Run locally
pip install thruk-mcp # or: pipx install thruk-mcp
# stdio mode (for Claude Desktop, LibreChat, etc.)
thruk-mcp
# HTTP mode
thruk-mcp --listen 8001
For local development of the project itself, see CONTRIBUTING.md.
3. Wire it to an MCP client
Claude Desktop (~/.config/Claude/claude_desktop_config.json or macOS equivalent):
{
"mcpServers": {
"thruk": {
"command": "thruk-mcp",
"env": {
"THRUK_BASE_URL": "https://monitor.example.com/thruk",
"THRUK_API_KEY": "xxxxxxxx"
}
}
}
}
4. Use with the Docker MCP Gateway
The image at ghcr.io/k9fr4n/thruk-mcp:latest defaults to stdio transport, so it can be spawned natively by the gateway.
Option A — Private local catalog
# 1. Create your private catalog
docker mcp catalog create thruk-private
# 2. Register this server (catalog/server.yaml ships with the repo)
docker mcp catalog add thruk-private thruk-mcp ./catalog/server.yaml
# 3. Configure credentials & enable
docker mcp secret set thruk-mcp.api_key=YOUR_KEY
docker mcp config write thruk-mcp.base_url=https://monitor.example.com/thruk
docker mcp server enable thruk-mcp
# 4. Run the gateway with your catalog
docker mcp gateway run --catalog thruk-private
Then point any MCP client (Claude Desktop, VS Code, Cursor, ...) at the gateway as documented here.
Option B — Submit upstream
catalog/server.yaml, catalog/tools.json and catalog/readme.md follow the docker/mcp-registry schema and can be submitted to the official Docker MCP Catalog via PR.
What's exposed
57 MCP Tools
Read — state
thruk_list_hosts, thruk_get_host, thruk_list_services, thruk_get_service,
thruk_list_hostgroups, thruk_list_servicegroups, thruk_list_contacts, thruk_get_contact,
thruk_problems, thruk_stats, thruk_totals (compact 16-field host+service totals, faster
than thruk_stats), thruk_sites.
Read — history & comments
thruk_list_logs, thruk_list_alerts, thruk_list_notifications,
thruk_notification_summary (notifications grouped by contact/host/service/state/command),
thruk_recent_events, thruk_list_comments, thruk_list_downtimes, thruk_get_downtime.
Read — noise & flap analysis
thruk_top_noisy_hosts (hosts ranked by alert count over a window),
thruk_top_noisy_services (services ranked by alert count),
thruk_flap_summary (hosts/services ranked by state transition count).
Read — problem intelligence
thruk_oldest_problems (unhandled problems sorted by age, oldest first),
thruk_unacked_critical (CRITICAL/DOWN not acknowledged for > N minutes),
thruk_stale_acks (acknowledgements older than N days — forgotten problems),
thruk_problem_counts (flat aggregate of unhealthy-state counts, filterable by hostgroup,
custom vars or any structured filter — replaces the former thruk_problems_by_hostgroup),
thruk_stale_checks (surface checks that stopped running — the dangerous "false green").
Read — analytics
thruk_alert_heatmap (alert counts bucketed by time, useful for spotting recurring
patterns), thruk_notification_heatmap (notification counts bucketed by time — spot
mail/paging storms), thruk_concurrent_failures (windows where multiple hosts failed
simultaneously),
thruk_recurring_problems (hosts/services generating repeated alerts over a window).
Read — availability / SLA
thruk_host_availability (uptime % for a single host — time_up_percent, time_down_percent,
time_unreachable_percent and scheduled equivalents),
thruk_service_availability (ok/warning/critical/unknown % for a single service),
thruk_hostgroup_availability (availability for all hosts or services in a hostgroup,
sorted worst-first; type = hosts | services | both).
All three accept since/until (Thruk relative or ISO) or a timeperiod shortcut
(lastmonth, thismonth, last24hours, lastweek, …).
thruk_reliability_report (per host/service reliability metrics — MTTR / MTBF /
incident counts — derived from the log over a window).
Read — performance data
thruk_get_perfdata (fetch and parse performance data for a single host or service),
thruk_perfdata_snapshot (parsed perfdata for every service matching a filter, in one call),
thruk_perfdata_near_threshold (metrics within within_percent % of breaching their
warn/crit range — early-warning signal before an alert fires).
Write — downtime management
thruk_schedule_downtime (host/service), thruk_schedule_host_services_downtime
(all services of a host), thruk_schedule_propagated_host_downtime (parent+children),
thruk_schedule_hostgroup_downtime, thruk_schedule_servicegroup_downtime,
thruk_delete_downtime, thruk_delete_active_downtimes,
thruk_delete_downtimes_by_filter.
Write — problem handling
thruk_acknowledge, thruk_bulk_acknowledge (acknowledge multiple hosts/services in one call),
thruk_remove_acknowledgement, thruk_recheck,
thruk_add_comment, thruk_delete_comment,
thruk_checks (enable/disable active checks for a host or service),
thruk_notifications (enable/disable host or service notifications, with optional
cascade to all services of a host).
Escape hatches
thruk_query (raw call to any REST endpoint), thruk_run_background_query
(long-running endpoint via Thruk's ?background=1 mechanism with automatic
job polling).
All list-style tools share a consistent
limit/offset/sort/columnscontract. By default they return a tight subset of columns (~10 fields per row) to keep LLM token consumption low. Passcolumns=""to opt out and receive every column the Thruk row contains.
5 MCP Resources
URI templates that MCP clients with a resource browser (Claude Desktop, VS Code, ...) can "open" like files:
| URI | Content |
|---|---|
thruk://hosts/{name} |
Full host JSON |
thruk://services/{host}/{service} |
Full service JSON |
thruk://hostgroups/{name} |
Host group config + members |
thruk://problems |
Current unhandled problems (hosts + services) |
thruk://stats |
Aggregated host/service stats (cached) |
3 MCP Prompts
Pre-canned workflows the user can invoke as a slash-command in the MCP client UI:
| Prompt | Arguments | Purpose |
|---|---|---|
investigate_alert |
host, optional service |
7-step incident triage |
schedule_maintenance |
target, duration_minutes, kind |
Safe downtime workflow with confirmation |
diagnose_flapping |
host, service |
Root-cause a flapping service (uses thruk_flap_summary) |
Robustness
- Connection retries —
httpx.AsyncHTTPTransport(retries=3)handles DNS failures, connection refusals, TLS handshakes. - HTTP retries with backoff — 5xx and 429 responses are retried up to 3 times with exponential backoff + jitter (cap 5 s).
- Opt-in TTL cache — slow-moving endpoints (
/sites,/processinfo,/hosts/stats,/services/stats,/contacts,/timeperiods, ...) are cached in-process for 15 s. Any tool can request caching viacache_ttl=on the underlying client. This absorbs the burst of identical calls an LLM agent typically issues across a multi-tool turn. - Pagination helper —
ThrukClient.get_all()is an async generator that iterates pages of 500 rows up to a configurable hard limit (default 50 000), so internal callers can scan entire backends without manual offset math. - Long-running queries — the
thruk_run_background_querytool wraps Thruk's?background=1flow and polls/thruk/jobs/<id>/outputuntil the job completes (5 min default timeout).
Environment variables
Connection
| Variable | Default | Description |
|---|---|---|
THRUK_BASE_URL |
http://localhost/thruk |
Thruk URL (no trailing slash) |
THRUK_API_KEY |
(required) | X-Thruk-Auth-Key header |
THRUK_AUTH_USER |
Impersonation user (superuser key only) | |
THRUK_VERIFY_SSL |
true |
Set false for self-signed certs |
THRUK_TIMEOUT |
30 |
HTTP timeout in seconds |
THRUK_DEFAULT_BACKENDS |
CSV of default backend names (federated Thruk) |
Security / multi-tenant (v0.6)
| Variable | Default | Description |
|---|---|---|
THRUK_READ_ONLY |
false |
Strip every write tool (ack, downtime, recheck, ...) |
THRUK_ENABLED_TOOLS |
Allowlist of tool names. CSV with fnmatch wildcards. Empty = all | |
THRUK_AUDIT_LOG |
true |
Emit one JSON audit line on stderr per write tool invocation |
THRUK_MAX_CONCURRENT |
0 |
Cap of concurrent in-flight HTTP requests. 0 = unlimited |
Security
-
Read-only mode — set
THRUK_READ_ONLY=trueto remove every write tool (thruk_acknowledge,thruk_schedule_*_downtime,thruk_recheck,thruk_delete_*,thruk_run_background_query) from the MCP server. The LLM literally cannot mutate monitoring state. Use this for general-purpose agents that should only observe. -
Tool allowlist —
THRUK_ENABLED_TOOLS=thruk_list_*,thruk_problems,thruk_statsrestricts the exposed surface to the listed tools (fnmatch wildcards supported). Useful when fronting multiple LLM clients with the same gateway but different scopes. -
Audit log — every write tool invocation emits one JSON line on
thruk_mcp.audit(stderr by default):{"ts":"2026-05-17T22:00:00+00:00","tool":"thruk_acknowledge","user":"alice", "args":{"host":"srv01","comment":"investigating"},"target":"srv01","status":"ok"}Disable with
THRUK_AUDIT_LOG=false. Sensitive keys (api_key,password,token) are redacted as***before logging. -
Rate limit —
THRUK_MAX_CONCURRENT=8caps in-flight HTTP requests with anasyncio.Semaphore. Combined with the v0.3 TTL cache, this protects the Thruk core from an LLM that loops on tools or chains them aggressively.
Development
pip install -e ".[dev]"
pre-commit install # one-time setup of git hooks
ruff check src tests && ruff format src tests # lint + format
mypy src # type-check
pytest -v --cov=thruk_mcp --cov-fail-under=80 # tests with coverage gate
Conventions:
- Conventional Commits (
feat:,fix:,chore:,docs:,refactor:,test:). - No direct push to
main: branch → PR → squash merge. - Any new tool must come with a
respx-mocked unit test intests/test_tools.pyand an entry incatalog/tools.json(Docker MCP Registry contract). - CI gate:
ruff,ruff format --check,mypy,pytestwith 80 % coverage minimum.
References
- Thruk REST API: https://www.thruk.org/documentation/rest.html
- Thruk REST commands: https://www.thruk.org/documentation/rest_commands.html
- MCP spec: https://spec.modelcontextprotocol.io/
- Inspired by: https://github.com/lausser/omd-mcp (initial proof-of-concept)
Project docs
- CHANGELOG.md — what changed in each release.
- UPGRADING.md — per-version migration notes.
- SUPPORT.md — supported Python / Thruk / MCP-client versions, security policy, release cadence.
- CONTRIBUTING.md — dev setup, PR conventions, tool / env-var contribution checklists.
License
MIT — see LICENSE.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.