RHOAI Observability MCP
Provides AI assistants with direct access to Red Hat OpenShift AI observability data, enabling querying of Prometheus metrics, Alertmanager alerts, Loki logs, Grafana dashboards, and Kubernetes cluster state to troubleshoot vLLM inference workloads.
README
Red Hat OpenShift AI (RHOAI) Observability MCP
An MCP (Model Context Protocol) server that gives AI assistants direct access to Red Hat OpenShift AI observability data. Query Prometheus metrics, Alertmanager alerts, Loki logs, Grafana dashboards, and Kubernetes cluster state to troubleshoot vLLM inference workloads.
Features
- 21 tools across 7 categories for comprehensive observability
- vLLM-aware metrics (TTFT, TPOT, E2E latency, KV cache, queue depth)
- Composite investigation tools that correlate metrics, logs, and alerts automatically
- Auto-detection of in-cluster vs external access to OpenShift services
- Built on FastMCP with async backends via
httpx
Architecture
graph TD
A[Claude / AI Assistant] -->|MCP Protocol| B[rhoai-observability-mcp]
B --> C[Thanos / Prometheus]
B --> D[Alertmanager]
B --> E[Loki]
B --> F[Tempo]
B --> G[Grafana]
B --> H[Kubernetes / OpenShift]
Backends:
| Backend | Purpose | Source |
|---|---|---|
| Prometheus (Thanos) | Metrics queries (PromQL) | backends/prometheus.py |
| Alertmanager | Active alerts and alert groups | backends/alertmanager.py |
| Loki | Log queries (LogQL) | backends/loki.py |
| Tempo | Distributed trace queries (TraceQL) | backends/tempo.py |
| Grafana | Dashboard discovery and panel queries | backends/grafana.py |
| Kubernetes (OpenShift) | Pods, events, nodes, InferenceServices | backends/openshift.py |
Quick Start
# Clone and install
git clone https://github.com/opendatahub-io/rhoai-observability-mcp.git
cd rhoai-observability-mcp
uv pip install -e ".[dev]"
# Configure (see INSTALL.md for all options)
export THANOS_URL=https://thanos-querier.openshift-monitoring.svc:9091
export ALERTMANAGER_URL=https://alertmanager-main.openshift-monitoring.svc:9093
export OPENSHIFT_TOKEN=$(oc whoami -t)
# Run
python -m rhoai_obs_mcp.server
See INSTALL.md for detailed setup, configuration, and Claude Desktop integration.
Build & Deploy
Build the container image
make build
Override the image name or tag:
make build IMAGE_NAME=quay.io/myorg/rhoai-observability-mcp IMAGE_TAG=v1.0.0
Push to registry
make push
Deploy to OpenShift
Prerequisites: oc login to your cluster, kustomize installed, and create the target project:
oc new-project rhoai-obs-mcp
Then deploy:
make deploy
This uses Kustomize to build the OpenShift overlay (deploy/overlays/openshift/) on top of the base manifests (deploy/base/) and applies them to the rhoai-obs-mcp namespace. To deploy to a different namespace:
make deploy NAMESPACE=my-namespace
Undeploy
make undeploy
If you deployed to a custom namespace, pass the same value:
make undeploy NAMESPACE=my-namespace
CI-built images
Container images are automatically built from main and published to GHCR:
ghcr.io/opendatahub-io/rhoai-observability-mcp:latest
Local Development with Kind
Set up a local Kubernetes cluster with mock observability backends for development and testing:
# Prerequisites: kind, kubectl, helm, kustomize
make kind-up
This creates a Kind cluster, installs Prometheus + Alertmanager + Grafana via Helm, deploys a fake vLLM metrics exporter, and deploys the MCP server. Access the MCP server at http://localhost:30080.
To point at real external backends instead of the mocks:
make kind-deploy THANOS_URL=https://real-cluster:9091 ALERTMANAGER_URL=https://real-cluster:9093 GRAFANA_URL=https://real-cluster:3000 TEMPO_URL=https://real-cluster:8080
Tear down:
make kind-down
Tool Reference
Metrics
| Tool | Description |
|---|---|
query_prometheus |
Execute a raw PromQL query against ThanosQuerier |
query_prometheus_range |
Execute a PromQL range query to get time-series data (trends, spikes, correlations) |
get_vllm_metrics |
Get a summary of key vLLM metrics (TTFT, TPOT, E2E, cache, queue) for a model |
list_metrics |
List available Prometheus metric names, optionally filtered by regex |
Alerts
| Tool | Description |
|---|---|
get_alerts |
Get active alerts from Alertmanager, filterable by severity and labels |
get_alert_groups |
Get alerts grouped by their routing labels |
Logs
| Tool | Description |
|---|---|
query_logs |
Execute a LogQL query against OpenShift LokiStack |
get_pod_logs |
Get logs for a specific pod by namespace and name |
Traces
| Tool | Description |
|---|---|
get_trace |
Fetch a distributed trace by its trace ID |
search_traces |
Search for traces using TraceQL expressions |
list_trace_tags |
List available trace tag names for building TraceQL queries |
Cluster
| Tool | Description |
|---|---|
get_pods |
List pods in a namespace with status, restarts, and creation time |
get_events |
List Kubernetes events, filterable by resource and reason |
get_node_status |
Get node status, capacity, and GPU allocation info |
describe_resource |
Get detailed description of a Kubernetes resource |
get_inference_services |
List KServe InferenceService resources |
Dashboards
| Tool | Description |
|---|---|
list_dashboards |
List available Grafana dashboards, filterable by tag or title |
get_dashboard_panels |
Get panels and their queries from a Grafana dashboard |
Investigation
| Tool | Description |
|---|---|
investigate_latency |
Correlate latency metrics, error logs, and alerts for a vLLM model |
investigate_gpu |
Correlate GPU utilization, KV cache, queue depth, and pod status |
investigate_errors |
Correlate error logs, alerts, and Kubernetes events in a namespace |
Documentation
- INSTALL.md -- Installation, configuration, and integration
- TESTING.md -- Running tests and writing new ones
- CONTRIBUTING.md -- Development setup and contribution guidelines
License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.