MCP Servers

thruk-mcp

Enables natural language interaction with Thruk monitoring systems, allowing users to query hosts/services, schedule downtimes, acknowledge problems, and more via MCP-compatible clients.

README

thruk-mcp

Model Context Protocol (MCP) server for Thruk — the unified web frontend for Naemon, Nagios, Icinga and Shinken.

Expose Thruk's REST API to MCP-compatible clients (Claude Desktop, Dust, LibreChat, OpenWebUI...) so that an LLM can query hosts/services, schedule downtimes, acknowledge problems, force rechecks and more in natural language.

Features

Read: hosts, services, hostgroups, servicegroups, downtimes, comments, sites, aggregated stats, current problems
Write: schedule/delete downtimes, acknowledge & remove acks, force rechecks
Escape hatch: thruk_query tool to call any Thruk REST endpoint
Multi-backend support (Thruk federated sites): pass backends="prod,dr" to any tool
Two transports: stdio (default) or Streamable-HTTP (--listen <port>)
Async httpx client with proper error handling and TLS verification
Tested with pytest + respx, linted with ruff, packaged with hatchling

Quick start

1. Configure

cp .env.example .env
$EDITOR .env   # set THRUK_BASE_URL and THRUK_API_KEY

An API key can be created from the Thruk user profile page (requires api_keys_enabled in thruk_local.conf) or via the REST API itself.

2a. Run with Docker

docker compose up -d
# MCP Streamable-HTTP endpoint: http://localhost:8001/mcp

2b. Run locally

pip install thruk-mcp        # or: pipx install thruk-mcp

# stdio mode (for Claude Desktop, LibreChat, etc.)
thruk-mcp

# HTTP mode
thruk-mcp --listen 8001

For local development of the project itself, see CONTRIBUTING.md.

3. Wire it to an MCP client

Claude Desktop (~/.config/Claude/claude_desktop_config.json or macOS equivalent):

{
  "mcpServers": {
    "thruk": {
      "command": "thruk-mcp",
      "env": {
        "THRUK_BASE_URL": "https://monitor.example.com/thruk",
        "THRUK_API_KEY": "xxxxxxxx"
      }
    }
  }
}

4. Use with the Docker MCP Gateway

The image at ghcr.io/k9fr4n/thruk-mcp:latest defaults to stdio transport, so it can be spawned natively by the gateway.

Option A — Private local catalog

# 1. Create your private catalog
docker mcp catalog create thruk-private

# 2. Register this server (catalog/server.yaml ships with the repo)
docker mcp catalog add thruk-private thruk-mcp ./catalog/server.yaml

# 3. Configure credentials & enable
docker mcp secret set thruk-mcp.api_key=YOUR_KEY
docker mcp config write thruk-mcp.base_url=https://monitor.example.com/thruk
docker mcp server enable thruk-mcp

# 4. Run the gateway with your catalog
docker mcp gateway run --catalog thruk-private

Then point any MCP client (Claude Desktop, VS Code, Cursor, ...) at the gateway as documented here.

Option B — Submit upstream

catalog/server.yaml, catalog/tools.json and catalog/readme.md follow the docker/mcp-registry schema and can be submitted to the official Docker MCP Catalog via PR.

What's exposed

57 MCP Tools

Read — state thruk_list_hosts, thruk_get_host, thruk_list_services, thruk_get_service, thruk_list_hostgroups, thruk_list_servicegroups, thruk_list_contacts, thruk_get_contact, thruk_problems, thruk_stats, thruk_totals (compact 16-field host+service totals, faster than thruk_stats), thruk_sites.

Read — history & comments thruk_list_logs, thruk_list_alerts, thruk_list_notifications, thruk_notification_summary (notifications grouped by contact/host/service/state/command), thruk_recent_events, thruk_list_comments, thruk_list_downtimes, thruk_get_downtime.

Read — noise & flap analysis thruk_top_noisy_hosts (hosts ranked by alert count over a window), thruk_top_noisy_services (services ranked by alert count), thruk_flap_summary (hosts/services ranked by state transition count).

Read — problem intelligence thruk_oldest_problems (unhandled problems sorted by age, oldest first), thruk_unacked_critical (CRITICAL/DOWN not acknowledged for > N minutes), thruk_stale_acks (acknowledgements older than N days — forgotten problems), thruk_problem_counts (flat aggregate of unhealthy-state counts, filterable by hostgroup, custom vars or any structured filter — replaces the former thruk_problems_by_hostgroup), thruk_stale_checks (surface checks that stopped running — the dangerous "false green").

Read — analytics thruk_alert_heatmap (alert counts bucketed by time, useful for spotting recurring patterns), thruk_notification_heatmap (notification counts bucketed by time — spot mail/paging storms), thruk_concurrent_failures (windows where multiple hosts failed simultaneously), thruk_recurring_problems (hosts/services generating repeated alerts over a window).

Read — availability / SLA thruk_host_availability (uptime % for a single host — time_up_percent, time_down_percent, time_unreachable_percent and scheduled equivalents), thruk_service_availability (ok/warning/critical/unknown % for a single service), thruk_hostgroup_availability (availability for all hosts or services in a hostgroup, sorted worst-first; type = hosts | services | both). All three accept since/until (Thruk relative or ISO) or a timeperiod shortcut (lastmonth, thismonth, last24hours, lastweek, …). thruk_reliability_report (per host/service reliability metrics — MTTR / MTBF / incident counts — derived from the log over a window).

Read — performance data thruk_get_perfdata (fetch and parse performance data for a single host or service), thruk_perfdata_snapshot (parsed perfdata for every service matching a filter, in one call), thruk_perfdata_near_threshold (metrics within within_percent % of breaching their warn/crit range — early-warning signal before an alert fires).

Write — downtime management thruk_schedule_downtime (host/service), thruk_schedule_host_services_downtime (all services of a host), thruk_schedule_propagated_host_downtime (parent+children), thruk_schedule_hostgroup_downtime, thruk_schedule_servicegroup_downtime, thruk_delete_downtime, thruk_delete_active_downtimes, thruk_delete_downtimes_by_filter.

Write — problem handling thruk_acknowledge, thruk_bulk_acknowledge (acknowledge multiple hosts/services in one call), thruk_remove_acknowledgement, thruk_recheck, thruk_add_comment, thruk_delete_comment, thruk_checks (enable/disable active checks for a host or service), thruk_notifications (enable/disable host or service notifications, with optional cascade to all services of a host).

Escape hatches thruk_query (raw call to any REST endpoint), thruk_run_background_query (long-running endpoint via Thruk's ?background=1 mechanism with automatic job polling).

All list-style tools share a consistent limit / offset / sort / columns contract. By default they return a tight subset of columns (~10 fields per row) to keep LLM token consumption low. Pass columns="" to opt out and receive every column the Thruk row contains.

5 MCP Resources

URI templates that MCP clients with a resource browser (Claude Desktop, VS Code, ...) can "open" like files:

URI	Content
`thruk://hosts/{name}`	Full host JSON
`thruk://services/{host}/{service}`	Full service JSON
`thruk://hostgroups/{name}`	Host group config + members
`thruk://problems`	Current unhandled problems (hosts + services)
`thruk://stats`	Aggregated host/service stats (cached)

3 MCP Prompts

Pre-canned workflows the user can invoke as a slash-command in the MCP client UI:

Prompt	Arguments	Purpose
`investigate_alert`	`host`, optional `service`	7-step incident triage
`schedule_maintenance`	`target`, `duration_minutes`, `kind`	Safe downtime workflow with confirmation
`diagnose_flapping`	`host`, `service`	Root-cause a flapping service (uses `thruk_flap_summary`)

Robustness

Connection retries — httpx.AsyncHTTPTransport(retries=3) handles DNS failures, connection refusals, TLS handshakes.
HTTP retries with backoff — 5xx and 429 responses are retried up to 3 times with exponential backoff + jitter (cap 5 s).
Opt-in TTL cache — slow-moving endpoints (/sites, /processinfo, /hosts/stats, /services/stats, /contacts, /timeperiods, ...) are cached in-process for 15 s. Any tool can request caching via cache_ttl= on the underlying client. This absorbs the burst of identical calls an LLM agent typically issues across a multi-tool turn.
Pagination helper — ThrukClient.get_all() is an async generator that iterates pages of 500 rows up to a configurable hard limit (default 50 000), so internal callers can scan entire backends without manual offset math.
Long-running queries — the thruk_run_background_query tool wraps Thruk's ?background=1 flow and polls /thruk/jobs/<id>/output until the job completes (5 min default timeout).

Environment variables

Connection

Variable	Default	Description
`THRUK_BASE_URL`	`http://localhost/thruk`	Thruk URL (no trailing slash)
`THRUK_API_KEY`	(required)	`X-Thruk-Auth-Key` header
`THRUK_AUTH_USER`		Impersonation user (superuser key only)
`THRUK_VERIFY_SSL`	`true`	Set `false` for self-signed certs
`THRUK_TIMEOUT`	`30`	HTTP timeout in seconds
`THRUK_DEFAULT_BACKENDS`		CSV of default backend names (federated Thruk)

Security / multi-tenant (v0.6)

Variable	Default	Description
`THRUK_READ_ONLY`	`false`	Strip every write tool (ack, downtime, recheck, ...)
`THRUK_ENABLED_TOOLS`		Allowlist of tool names. CSV with fnmatch wildcards. Empty = all
`THRUK_AUDIT_LOG`	`true`	Emit one JSON audit line on stderr per write tool invocation
`THRUK_MAX_CONCURRENT`	`0`	Cap of concurrent in-flight HTTP requests. 0 = unlimited

Security

Read-only mode — set THRUK_READ_ONLY=true to remove every write tool (thruk_acknowledge, thruk_schedule_*_downtime, thruk_recheck, thruk_delete_*, thruk_run_background_query) from the MCP server. The LLM literally cannot mutate monitoring state. Use this for general-purpose agents that should only observe.
Tool allowlist — THRUK_ENABLED_TOOLS=thruk_list_*,thruk_problems,thruk_stats restricts the exposed surface to the listed tools (fnmatch wildcards supported). Useful when fronting multiple LLM clients with the same gateway but different scopes.
Audit log — every write tool invocation emits one JSON line on thruk_mcp.audit (stderr by default):
```
{"ts":"2026-05-17T22:00:00+00:00","tool":"thruk_acknowledge","user":"alice",
 "args":{"host":"srv01","comment":"investigating"},"target":"srv01","status":"ok"}
```
Disable with THRUK_AUDIT_LOG=false. Sensitive keys (api_key, password, token) are redacted as *** before logging.
Rate limit — THRUK_MAX_CONCURRENT=8 caps in-flight HTTP requests with an asyncio.Semaphore. Combined with the v0.3 TTL cache, this protects the Thruk core from an LLM that loops on tools or chains them aggressively.

Development

pip install -e ".[dev]"
pre-commit install                              # one-time setup of git hooks

ruff check src tests && ruff format src tests   # lint + format
mypy src                                        # type-check
pytest -v --cov=thruk_mcp --cov-fail-under=80   # tests with coverage gate

Conventions:

Conventional Commits (feat:, fix:, chore:, docs:, refactor:, test:).
No direct push to main: branch → PR → squash merge.
Any new tool must come with a respx-mocked unit test in tests/test_tools.py and an entry in catalog/tools.json (Docker MCP Registry contract).
CI gate: ruff, ruff format --check, mypy, pytest with 80 % coverage minimum.

References

Thruk REST API: https://www.thruk.org/documentation/rest.html
Thruk REST commands: https://www.thruk.org/documentation/rest_commands.html
MCP spec: https://spec.modelcontextprotocol.io/
Inspired by: https://github.com/lausser/omd-mcp (initial proof-of-concept)

Project docs

CHANGELOG.md — what changed in each release.
UPGRADING.md — per-version migration notes.
SUPPORT.md — supported Python / Thruk / MCP-client versions, security policy, release cadence.
CONTRIBUTING.md — dev setup, PR conventions, tool / env-var contribution checklists.

License

MIT — see LICENSE.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured