MCP Servers

OCP Performance Analyzer MCP

A comprehensive, AI-powered performance analysis and monitoring platform for OpenShift/Kubernetes clusters. This project provides Model Context Protocol (MCP) servers for analyzing etcd, network, and OVN-Kubernetes components with deep performance insights, automated root cause analysis, and actionable recommendations.

README

OCP Performance Analyzer MCP

Overview
Architecture
Features
Project Structure
Installation
Quick Start
Components
Configuration
Usage Examples
API Reference
Troubleshooting
Contributing

Overview

The OCP Performance Analyzer MCP is a multi-component platform designed to monitor and analyze OpenShift/Kubernetes cluster performance across four main areas:

ETCD Analyzer - Comprehensive etcd cluster performance monitoring
Network Analyzer - Network stack performance analysis (L1, sockets, netstat, I/O)
OVN-Kubernetes Analyzer - OVN-Kubernetes networking component analysis
Node Analyzer - Node health and performance monitoring (PLEG, runtime operations, resource usage)

Each component includes:

MCP servers exposing performance analysis tools
AI-powered agents for intelligent analysis and reporting
Data collection tools for Prometheus metrics
ELT (Extract-Load-Transform) pipelines for data processing
Persistent storage using DuckDB
Web interfaces for interactive analysis

Architecture

High-Level Architecture

┌──────────────────────────────────────────────────────────┐
│                    Client Layer                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │
│  │   Web UI     │  │   CLI Tools  │  │   REST API   │    │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘    │
└─────────┼─────────────────┼─────────────────┼────────────┘
          │                 │                 │
          └─────────────────┼─────────────────┘
                            │
┌───────────────────────────┼───────────────────────────────┐
│                    AI Agent Layer (Port 8080)             │
│  ┌─────────────────────────────────────────────────────┐  │
│  │  LangGraph Agents: Chat, Report, Storage            │  │
│  │  • Streaming responses                              │  │
│  │  • Tool orchestration                               │  │
│  │  • Conversation memory                              │  │
│  └─────────────────────────────────────────────────────┘  │
└───────────────────────────┬───────────────────────────────┘
                            │ MCP Protocol
┌───────────────────────────┼─────────────────────────────────────────────────┐
│                    MCP Server Layer (Port 8000)                             │
│  ┌──────────────┐  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐  │
│  │ ETCD Server  │  │ Network Server│  │ OVNK Server   │  │ Node Server   │  │
│  │ 15+ tools    │  │ 10+ tools     │  │ 8+ tools      │  │ 5+ tools      │  │
│  └───────┬──────┘  └────────┬──────┘  └────────┬──────┘  └────────┬──────┘  │
└──────────┼──────────────────┼──────────────────┼──────────────────┼─────────┘
           │                  │                  │                  │
┌──────────┼──────────────────┼──────────────────┼──────────────────┼──────────┐
│          │                  │                  │                  │          │
│  ┌───────▼───────┐  ┌───────▼───────┐  ┌───────▼───────┐  ┌───────▼───────┐  │
│  │   Tools/      │  │   Tools/      │  │   Tools/      │  │   Tools/      │  │
│  │   Collectors  │  │   Collectors  │  │   Collectors  │  │   Collectors  │  │
│  └───────┬───────┘  └───────┬───────┘  └───────┬───────┘  └───────┬───────┘  │
│          │                  │                  │                  │          │
│  ┌───────▼───────┐  ┌───────▼───────┐  ┌───────▼───────┐  ┌───────▼───────┐  │
│  │   Analysis    │  │   Analysis    │  │   Analysis    │  │   Analysis    │  │
│  │   Modules     │  │   Modules     │  │   Modules     │  │   Modules     │  │
│  └───────┬───────┘  └───────┬───────┘  └───────┬───────┘  └───────┬───────┘  │
│          │                  │                  │                  │          │
│  ┌───────▼───────┐  ┌───────▼───────┐  ┌───────▼───────┐  ┌───────▼───────┐  │
│  │   ELT         │  │   ELT         │  │   ELT         │  │   ELT         │  │
│  │   Pipeline    │  │   Pipeline    │  │   Pipeline    │  │   Pipeline    │  │
│  └───────┬───────┘  └───────┬───────┘  └───────┬───────┘  └───────┬───────┘  │
│          │                  │                  │                  │          │
│  ┌───────▼──────────────────▼──────────────────▼──────────────────▼───────┐  │
│  │              Storage Layer (DuckDB)                                    │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────┐
│         OpenShift/Kubernetes Cluster Infrastructure         │
│  • ETCD Cluster    • Prometheus    • Kubernetes API         │
│  • Master Nodes    • OVN-Kubernetes • Network Components    │
└─────────────────────────────────────────────────────────────┘

Component Architecture

Each analyzer (etcd, network, ovnk) follows a consistent architecture:

MCP Server - FastMCP-based server exposing analysis tools
Tools/Collectors - Specialized metric collectors for Prometheus queries
Analysis Modules - Performance analysis and bottleneck detection
ELT Pipeline - Data transformation and HTML table generation
Storage Modules - DuckDB persistence for historical data
AI Agents - LangGraph-based agents for intelligent analysis

Features

Core Capabilities

Multi-Component Analysis: ETCD, Network, and OVN-Kubernetes analyzers
MCP Protocol: Model Context Protocol servers for tool exposure
AI-Powered: LangGraph agents with OpenAI integration
Real-time Monitoring: Live metrics collection and analysis
Historical Analysis: DuckDB-based time-series storage
Automated Reporting: Executive-ready performance reports
Web Interfaces: Interactive chat and analysis UIs
Streaming Responses: Real-time result streaming via SSE

ETCD Analyzer Features

15+ Analysis Tools: Cluster status, WAL fsync, backend commit, disk I/O, network I/O, node usage
Deep Drive Analysis: Multi-subsystem comprehensive review
Bottleneck Detection: Automated performance issue identification
Performance Reports: Executive summaries with recommendations
Critical Metrics: WAL fsync P99 (<10ms), backend commit P99 (<25ms)

Network Analyzer Features

10+ Analysis Tools: L1 stats, socket statistics (TCP/UDP/IP/mem/softnet), netstat, network I/O
Multi-Layer Analysis: Physical layer to application layer metrics
Performance Metrics: Throughput, latency, packet statistics, connection tracking
Comprehensive Coverage: 95+ network metrics across 9 categories

OVN-Kubernetes Analyzer Features

8+ Analysis Tools: OVN database, kubelet CNI, latency, OVS usage, pod metrics, API stats
OVN-Specific Metrics: Northbound/Southbound database sizes, sync performance
CNI Analysis: Kubelet and CNI performance metrics
OVS Monitoring: Open vSwitch daemon and flow table statistics

Node Analyzer Features

5+ Analysis Tools: Node resource usage, PLEG latency, kubelet runtime operations errors, cluster info, health status
PLEG Monitoring: Pod Lifecycle Event Generator relist latency metrics with configurable thresholds
Runtime Error Tracking: Kubelet runtime operations error rates by operation type
Resource Metrics: CPU, memory, and cgroup usage across node groups (controlplane, worker, infra, workload)
Node Group Support: Metrics grouped by node role for targeted analysis
Web UI: Markdown-rendered chat interface with color-coded insights and recommendations

Shared Features

Configuration Management: YAML-based metrics configuration (11 metric files)
Authentication: OpenShift/Kubernetes cluster authentication
Prometheus Integration: Direct PromQL query execution
Data Visualization: HTML table generation with highlighting
Export Capabilities: Reports, data exports, historical queries

Project Structure

ocp-performance-analyzer-mcp/
│
├── analysis/                    # Performance analysis modules
│   ├── etcd/                    # ETCD-specific analysis
│   │   ├── etcd_performance_deepdrive.py
│   │   └── etcd_performance_report.py
│   ├── net/                     # Network analysis (future)
│   ├── node/                    # Node analysis (future)
│   ├── ovnk/                    # OVN-Kubernetes analysis (future)
│   └── utils/                   # Shared analysis utilities
│       └── analysis_utility.py
│
├── config/                      # Configuration management
│   ├── metrics_config_reader.py # Unified metrics loader
│   ├── metrics-alert.yml        # Alert metrics
│   ├── metrics-api.yml          # API server metrics
│   ├── metrics-cni.yml          # CNI metrics
│   ├── metrics-disk.yml         # Disk I/O metrics
│   ├── metrics-etcd.yml         # ETCD metrics (51 metrics)
│   ├── metrics-latency.yml      # Latency metrics
│   ├── metrics-net.yml           # Network metrics (95 metrics)
│   ├── metrics-node.yml          # Node metrics
│   ├── metrics-ovn.yml           # OVN metrics
│   ├── metrics-ovs.yml           # OVS metrics
│   ├── metrics-pods.yml          # Pod metrics
│   ├── README.md                 # Config documentation
│   └── test_metrics_loading.py   # Configuration tests
│
├── elt/                         # Extract-Load-Transform pipeline
│   ├── etcd/                    # ETCD ELT modules
│   │   ├── analyzer_elt_backend_commit.py
│   │   ├── analyzer_elt_bottleneck_analysis.py
│   │   ├── analyzer_elt_cluster_status.py
│   │   ├── analyzer_elt_compact_defrag.py
│   │   ├── analyzer_elt_general_info.py
│   │   ├── analyzer_elt_performance_deep_drive.py
│   │   ├── analyzer_elt_wal_fsync.py
│   │   └── etcd_analyzer_elt_*.py
│   ├── net/                     # Network ELT modules
│   │   ├── analyzer_elt_network_io.py
│   │   ├── analyzer_elt_network_l1.py
│   │   ├── analyzer_elt_network_netstat4*.py
│   │   └── analyzer_elt_network_socket4*.py
│   ├── node/                    # Node ELT modules
│   │   ├── analyzer_elt_node_usage.py
│   │   ├── analyzer_elt_node_pleg_relist.py
│   │   └── analyzer_elt_node_kubelet_runtime_operations_errors.py
│   ├── ocp/                     # OCP cluster ELT modules
│   │   ├── analyzer_elt_cluster_alert.py
│   │   ├── analyzer_elt_cluster_apistats.py
│   │   └── analyzer_elt_cluster_info.py
│   ├── ovnk/                    # OVN-Kubernetes ELT modules
│   │   ├── analyzer_elt_deepdrive.py
│   │   ├── analyzer_elt_kubelet_cni.py
│   │   ├── analyzer_elt_latency.py
│   │   └── analyzer_elt_ovs.py
│   ├── pods/                    # Pod ELT modules
│   │   └── analyzer_elt_pods_usage.py
│   ├── disk/                    # Disk ELT modules
│   │   └── analyzer_elt_disk_io.py
│   └── utils/                   # ELT utilities
│       ├── analyzer_elt_json2table.py  # Generic orchestrator
│       ├── analyzer_elt_utility.py      # Pure utilities
│       └── README.md                    # ELT documentation
│
├── mcp/                         # MCP servers and agents
│   ├── etcd/                    # ETCD analyzer MCP server
│   │   ├── etcd_analyzer_mcp_server.py      # Main MCP server
│   │   ├── etcd_analyzer_client_chat.py     # Chat client (FastAPI)
│   │   ├── etcd_analyzer_mcp_agent_report.py    # Report agent
│   │   ├── etcd_analyzer_mcp_agent_stor2db.py   # Storage agent
│   │   ├── etcd_analyzer_command.sh             # Management script
│   │   ├── etcd_analyzer_cluster.duckdb         # DuckDB database
│   │   ├── exports/                             # Report exports
│   │   ├── logs/                                # Application logs
│   │   ├── storage/                             # Storage modules
│   │   ├── pyproject.toml                       # Package config
│   │   └── README.md                            # ETCD docs
│   ├── net/                     # Network analyzer MCP server
│   │   ├── network_analyzer_mcp_server.py
│   │   ├── network_analyzer_client_chat.py
│   │   ├── network_analyzer_mcp_command.sh
│   │   ├── exports/
│   │   ├── logs/
│   │   └── storage/
│   ├── node/                    # Node analyzer MCP server
│   │   ├── node_analyzer_mcp_server.py      # Main MCP server
│   │   ├── node_analyzer_client_chat.py     # Chat client (FastAPI)
│   │   ├── mcp_tools/                       # Modular MCP tool definitions
│   │   │   ├── __init__.py
│   │   │   ├── models.py                    # Pydantic models
│   │   │   ├── health_check.py              # Health status tool
│   │   │   ├── cluster_info.py              # Cluster info tool
│   │   │   ├── node_usage.py                # Node usage tool
│   │   │   ├── node_pleg_relist.py          # PLEG latency tool
│   │   │   └── node_kubelet_runtime_operations_errors.py  # Runtime errors tool
│   │   ├── exports/
│   │   └── logs/
│   └── ovnk/                    # OVN-Kubernetes analyzer MCP server
│       ├── ovnk_analyzer_mcp_server.py
│       ├── ovnk_analyzer_mcp_client_chat.py
│       ├── ovnk_analyzer_mcp_command.sh
│       ├── exports/
│       ├── logs/
│       ├── storage/
│       └── README.md
│
├── ocauth/                      # OpenShift authentication
│   └── openshift_auth.py        # K8s/OCP auth, token management
│
├── storage/                     # DuckDB storage modules
│   ├── etcd/                    # ETCD storage modules
│   │   ├── analyzer_stor_backend_commit.py
│   │   ├── analyzer_stor_cluster_info.py
│   │   ├── analyzer_stor_compact_defrag.py
│   │   ├── analyzer_stor_disk_io.py
│   │   ├── analyzer_stor_disk_wal_fsync.py
│   │   ├── analyzer_stor_general_info.py
│   │   ├── analyzer_stor_network_io.py
│   │   └── analyzer_stor_utility.py
│   ├── net/                     # Network storage (future)
│   └── ovnk/                    # OVN-Kubernetes storage (future)
│
├── tools/                       # Metric collection tools
│   ├── etcd/                    # ETCD collectors
│   │   ├── etcd_cluster_status.py
│   │   ├── etcd_general_info.py
│   │   ├── etcd_disk_wal_fsync.py
│   │   ├── etcd_disk_backend_commit.py
│   │   └── etcd_disk_compact_defrag.py
│   ├── net/                     # Network collectors
│   │   ├── network_io.py
│   │   ├── network_l1.py
│   │   ├── network_netstat4tcp.py
│   │   ├── network_netstat4udp.py
│   │   ├── network_socket4tcp.py
│   │   ├── network_socket4udp.py
│   │   ├── network_socket4ip.py
│   │   ├── network_socket4mem.py
│   │   └── network_socket4softnet.py
│   ├── node/                    # Node collectors
│   │   ├── node_usage.py
│   │   ├── node_pleg_relist.py
│   │   └── node_kubelet_runtime_operations_errors.py
│   ├── ocp/                     # OCP collectors
│   │   ├── cluster_info.py
│   │   ├── cluster_apistats.py
│   │   └── cluster_alert.py
│   ├── ovnk/                    # OVN-Kubernetes collectors
│   │   ├── ovnk_baseinfo.py
│   │   ├── ovnk_kubelet_cni.py
│   │   ├── ovnk_latency.py
│   │   └── ovnk_ovs_usage.py
│   ├── pods/                    # Pod collectors
│   │   └── pods_usage.py
│   ├── disk/                    # Disk collectors
│   │   └── disk_io.py
│   └── utils/                   # Shared utilities
│       ├── promql_basequery.py  # Base Prometheus queries
│       └── promql_utility.py     # PromQL helpers
│
├── webroot/                     # Web interfaces
│   ├── etcd/                    # ETCD web UI
│   │   └── etcd_analyzer_mcp_llm.html
│   ├── net/                     # Network web UI
│   │   └── network_analyzer_mcp_llm.html
│   ├── node/                    # Node web UI
│   │   └── node_analyzer_mcp_llm.html
│   └── ovnk/                    # OVN-Kubernetes web UI
│       └── ovnk_analyzer_mcp_llm.html
│
├── exports/                     # Generated reports and exports
├── logs/                        # Application logs
├── pyproject.toml               # Main project configuration
├── LICENSE                      # License file
└── README.md                    # This file

Installation

Prerequisites

Python 3.8 or higher
Access to OpenShift/Kubernetes cluster
KUBECONFIG configured
Prometheus/Thanos accessible
OpenAI API key (for AI features)

Step 1: Clone Repository

git clone https://github.com/liqcui/ocp-performance-analyzer-mcp.git
cd ocp-performance-analyzer-mcp

Step 2: Create Virtual Environment

python3 -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows

Step 3: Install Dependencies

pip install -e .

Or install from the root pyproject.toml:

pip install -r requirements.txt  # If available

Key Dependencies:

fastmcp>=1.12.4 - MCP server framework
fastapi>=0.115.7 - Web framework
langchain>=0.3.0 - LLM integration
langgraph>=0.3.0 - Agent orchestration
duckdb>=1.0.0 - Time-series database
kubernetes>=30.0.0 - Kubernetes client
prometheus-api-client>=0.5.3 - Prometheus queries
pydantic>=2.0.0 - Data validation
pandas>=2.2.0 - Data processing
pyyaml>=6.0.1 - Configuration parsing

Step 4: Configure Environment

Create .env file (optional):

# OpenAI-compatible API configuration
OPENAI_API_KEY=your-api-key-here
BASE_URL=https://your-llm-api-endpoint

# OpenShift configuration
KUBECONFIG=/path/to/your/kubeconfig

# Optional: MCP Inspector
ENABLE_MCP_INSPECTOR=0
MCP_INSPECTOR_URL=http://127.0.0.1:8000/sse

Step 5: Verify KUBECONFIG

export KUBECONFIG=/path/to/kubeconfig
kubectl get nodes
oc get clusterversion  # For OpenShift

Quick Start

ETCD Analyzer

cd mcp/etcd

# Start MCP server
./etcd_analyzer_command.sh start

# Or manually
python etcd_analyzer_mcp_server.py

# Start chat client (in another terminal)
python etcd_analyzer_client_chat.py

# Access web UI
open http://localhost:8080/ui

Network Analyzer

cd mcp/net

# Start MCP server
./network_analyzer_mcp_command.sh start

# Or manually
python network_analyzer_mcp_server.py

# Start chat client
python network_analyzer_client_chat.py

OVN-Kubernetes Analyzer

cd mcp/ovnk

# Start MCP server
./ovnk_analyzer_mcp_command.sh start

# Or manually
python ovnk_analyzer_mcp_server.py

# Start chat client
python ovnk_analyzer_mcp_client_chat.py

Node Analyzer

cd mcp/node

# Start MCP server (Port 8004)
python node_analyzer_mcp_server.py

# Start chat client (Port 8084) in another terminal
python node_analyzer_client_chat.py

# Access web UI
open http://localhost:8084/ui

Components

1. MCP Servers

Each analyzer exposes an MCP server with specialized tools:

ETCD MCP Server (`mcp/etcd/etcd_analyzer_mcp_server.py`)

Tools:

get_server_health - Server health check
get_etcd_cluster_status - Cluster health via etcdctl
get_ocp_cluster_info - Cluster information
get_etcd_general_info - General etcd metrics
get_etcd_node_usage - Master node metrics
get_etcd_disk_wal_fsync - WAL fsync performance
get_etcd_disk_backend_commit - Backend commit performance
get_node_disk_io - Disk I/O metrics
get_etcd_disk_compact_defrag - Compaction/defrag metrics
get_etcd_network_io - Network I/O metrics
get_etcd_performance_deep_drive - Comprehensive analysis
get_etcd_bottleneck_analysis - Bottleneck detection
generate_etcd_performance_report - Executive report

Network MCP Server (`mcp/net/network_analyzer_mcp_server.py`)

Tools:

get_ocp_cluster_info - Cluster information
query_network_l1_metrics - Layer 1 network statistics
query_network_io_metrics - Network I/O performance
query_network_socket_tcp_metrics - TCP socket statistics
query_network_socket_udp_metrics - UDP socket statistics
query_network_socket_ip_metrics - IP socket statistics
query_network_socket_mem_metrics - Socket memory statistics
query_network_socket_softnet_metrics - Softnet statistics
query_network_netstat_tcp_metrics - TCP netstat metrics
query_network_netstat_udp_metrics - UDP netstat metrics

OVN-Kubernetes MCP Server (`mcp/ovnk/ovnk_analyzer_mcp_server.py`)

Tools:

get_ocp_cluster_info - Cluster information
query_ovnk_pod_metrics - OVN-Kubernetes pod metrics
query_multus_pod_metrics - Multus CNI metrics
query_ovnk_container_metrics - OVN container metrics
query_ovnk_sync_metrics - OVN synchronization metrics
query_ovnk_ovs_metrics - OVS daemon metrics
query_ovnk_latency_metrics - Network latency metrics
query_kube_api_metrics - Kubernetes API metrics

Node MCP Server (`mcp/node/node_analyzer_mcp_server.py`)

Tools:

get_server_health - Server health check and collector initialization status
get_ocp_cluster_info - Cluster information and node inventory
get_ocp_node_usage - Node resource usage (CPU, memory, cgroup) by node group
get_ocp_node_pleg_latency - PLEG relist latency metrics with thresholds
- Healthy: < 1s
- Warning: 1-10s
- Critical: > 10s (default), configurable to 3 minutes
get_ocp_node_runtime_errors - Kubelet runtime operations error rates
- Healthy: < 0.01 errors/sec
- Warning: 0.01-0.1 errors/sec
- Critical: 0.1-1 errors/sec
- Severe: > 1 error/sec

Features:

Modular tool architecture in mcp_tools/ directory
Node group support (controlplane, worker, infra, workload)
Comprehensive health summary with node-level metrics
Markdown-based chat UI with syntax highlighting
Real-time streaming responses

2. Tools/Collectors

Specialized collectors organized by category:

ETCD: Cluster status, general info, WAL fsync, backend commit, compact/defrag
Network: I/O, L1, sockets (TCP/UDP/IP/mem/softnet), netstat (TCP/UDP)
Node: CPU, memory, cgroup usage, PLEG relist latency, kubelet runtime operations errors
- nodeUsageCollector - Node resource metrics (CPU, memory, cgroup)
- plegRelistCollector - Pod Lifecycle Event Generator latency metrics
- kubeletRuntimeOperationsErrorsCollector - Runtime operation error rates by type
OCP: Cluster info, API stats, alerts
OVNK: OVN database, kubelet CNI, latency, OVS usage
Pods: Pod and container metrics
Disk: Disk I/O performance

3. Analysis Modules

Performance analysis and reporting:

Deep Drive Analysis: Multi-subsystem comprehensive review
Bottleneck Detection: Automated issue identification
Performance Reports: Executive summaries with recommendations
Baseline Comparison: Current vs. target performance
Root Cause Analysis: Script-based + AI-powered RCA

4. ELT Pipeline

Extract-Load-Transform for data processing:

Generic Orchestrator: Routes data to metric-specific handlers
Metric Handlers: Specialized ELT modules per metric type
HTML Generation: Formatted tables with highlighting
Data Transformation: JSON to structured DataFrames

5. Storage Layer

DuckDB-based persistent storage:

Time-Series Data: Efficient temporal data storage
Schema Management: Automatic table creation and migration
Query Interface: SQL-based data access
Historical Analysis: Long-term performance tracking

6. AI Agents

LangGraph-based intelligent agents:

Chat Agent: Conversational interface with tool execution
Report Agent: Automated performance report generation
Storage Agent: Data collection and persistence

Configuration

Metrics Configuration

Metrics are defined in YAML files under config/:

metrics-etcd.yml - 51 ETCD metrics across 5 categories
metrics-net.yml - 95 network metrics across 9 categories
metrics-api.yml - 15 API server metrics
metrics-disk.yml - 8 disk I/O metrics
metrics-node.yml - 5 node metrics
metrics-ovn.yml - 2 OVN metrics
metrics-ovs.yml - 18 OVS metrics
metrics-pods.yml - 6 pod metrics
metrics-cni.yml - 18 CNI metrics
metrics-latency.yml - 18 latency metrics
metrics-alert.yml - Alert metrics

See config/README.md for detailed configuration documentation.

Environment Variables

# Required
export KUBECONFIG=/path/to/kubeconfig

# Optional - automatically set to UTC
export TZ=UTC

# LLM Configuration
export OPENAI_API_KEY=your-api-key
export BASE_URL=https://api.openai.com/v1

# MCP Inspector (optional)
export ENABLE_MCP_INSPECTOR=1
export MCP_INSPECTOR_URL=http://127.0.0.1:8000/sse

# Logging
export LOG_LEVEL=INFO
export OVNK_LOG_LEVEL=INFO

Performance Thresholds

Default thresholds (configurable in analysis modules):

thresholds = {
    'wal_fsync_p99_ms': 10.0,              # Critical for write performance
    'backend_commit_p99_ms': 25.0,         # Critical for persistence
    'cpu_usage_warning': 70.0,             # Pod CPU warning
    'cpu_usage_critical': 85.0,            # Pod CPU critical
    'memory_usage_warning': 70.0,           # Pod memory warning
    'memory_usage_critical': 85.0,         # Pod memory critical
    'peer_latency_warning_ms': 50.0,       # Network warning
    'peer_latency_critical_ms': 100.0,     # Network critical
    'network_utilization_warning': 70.0,   # Network utilization warning
    'network_utilization_critical': 85.0,  # Network utilization critical
}

Usage Examples

Example 1: ETCD Performance Analysis

# Start ETCD analyzer
cd mcp/etcd
./etcd_analyzer_command.sh start

# In web UI, ask:
"Analyze etcd performance for the last hour"
"Show me WAL fsync performance"
"Generate a performance report for the last 24 hours"

Example 2: Network Analysis

# Start network analyzer
cd mcp/net
python network_analyzer_mcp_server.py

# Query network metrics
curl -X POST http://localhost:8000/tools/query_network_io_metrics \
  -H "Content-Type: application/json" \
  -d '{"duration": "1h"}'

Example 3: OVN-Kubernetes Analysis

# Start OVN-Kubernetes analyzer
cd mcp/ovnk
python ovnk_analyzer_mcp_server.py

# Query OVN metrics
curl -X POST http://localhost:8000/tools/query_ovnk_pod_metrics \
  -H "Content-Type: application/json" \
  -d '{"duration": "1h"}'

Example 4: Performance Report Generation

# Using ETCD report agent
cd mcp/etcd
python etcd_analyzer_mcp_agent_report.py

# Follow prompts:
# 1. Select duration mode or time range mode
# 2. Enter duration (e.g., "1h") or time range
# 3. View streaming analysis and report

Example 5: Data Storage

# Using ETCD storage agent
cd mcp/etcd
python etcd_analyzer_mcp_agent_stor2db.py

# Data stored in etcd_analyzer_cluster.duckdb
# Query stored data:
python -c "
import duckdb
conn = duckdb.connect('etcd_analyzer_cluster.duckdb')
result = conn.execute('SELECT * FROM wal_fsync_p99_latency LIMIT 10').fetchall()
print(result)
"

API Reference

MCP Server Endpoints

All MCP servers expose tools via HTTP/SSE:

Base URL: http://localhost:8000
Health Check: GET /health
Tools: POST /tools/{tool_name}

Chat Client Endpoints

AI chat clients expose REST APIs:

Base URL: http://localhost:8080
Web UI: GET /ui or GET /
Streaming Chat: POST /chat/stream
Non-streaming Chat: POST /chat
Health: GET /api/mcp/health
Tools List: GET /api/tools

Tool Parameters

Common parameters across tools:

duration (str): Time duration (e.g., "5m", "1h", "24h")
start_time (str, optional): Start time in ISO format
end_time (str, optional): End time in ISO format

See individual component READMEs for detailed API documentation:

mcp/etcd/README.md - ETCD analyzer API
mcp/ovnk/README.md - OVN-Kubernetes analyzer API
config/README.md - Configuration API
elt/utils/README.md - ELT pipeline API

Troubleshooting

Common Issues

1. MCP Server Won't Start

Solutions:

# Check KUBECONFIG
echo $KUBECONFIG
kubectl get nodes

# Check if port 8000 is in use
lsof -i :8000

# Check logs
tail -f logs/mcp_server_*.log

2. Authentication Failures

Solutions:

# Verify KUBECONFIG
export KUBECONFIG=/path/to/kubeconfig
kubectl auth can-i get pods -n openshift-etcd

# Check Prometheus access
kubectl get route -n openshift-monitoring

3. Missing Metrics

Solutions:

# Verify Prometheus is accessible
oc get pods -n openshift-monitoring | grep prometheus

# Check metric availability
oc exec -n openshift-monitoring prometheus-k8s-0 -- \
  promtool query instant http://localhost:9090 \
  'etcd_disk_wal_fsync_duration_seconds_bucket'

4. LLM API Errors

Solutions:

# Check .env file
cat .env | grep OPENAI_API_KEY

# Test API connection
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
  $BASE_URL/models

Debug Mode

Enable verbose logging:

export LOG_LEVEL=DEBUG
export OVNK_LOG_LEVEL=DEBUG
python mcp/etcd/etcd_analyzer_mcp_server.py

Contributing

Development Setup

# Clone repository
git clone https://github.com/liqcui/ocp-performance-analyzer-mcp.git
cd ocp-performance-analyzer-mcp

# Create development environment
python3 -m venv venv
source venv/bin/activate

# Install in development mode
pip install -e .

# Install development dependencies
pip install pytest pytest-asyncio black flake8 mypy

Code Style

# Format code
black .

# Lint code
flake8 .

# Type checking
mypy .

Adding New Metrics

Define metric in appropriate config/metrics-*.yml file
Add collector in tools/{category}/ directory
Add ELT handler in elt/{category}/ directory
Add storage module in storage/{category}/ directory
Register tool in MCP server
Update documentation

Testing

# Run tests
pytest

# Run with coverage
pytest --cov=. --cov-report=html

License

MIT License - see LICENSE file for details.

Support

For issues and questions:

Check the troubleshooting section
Review component-specific READMEs
Check logs in logs/ directories
Open an issue with detailed logs and configuration

Acknowledgments

MCP Protocol: Model Context Protocol
LangChain: LangChain Framework
LangGraph: LangGraph
FastMCP: FastMCP Library
DuckDB: DuckDB
OpenShift: Red Hat OpenShift

Roadmap

Planned Features

[ ] Multi-cluster support
[ ] Historical trend analysis
[ ] Anomaly detection with ML
[ ] Custom alert rules
[ ] Grafana integration
[ ] Slack/Teams notifications
[ ] Performance prediction
[ ] Automated remediation suggestions
[ ] Kubernetes native deployment (Helm charts)
[ ] Real-time streaming metrics

Built with ❤️ for the OpenShift and Kubernetes community

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

OCP Performance Analyzer MCP

README

OCP Performance Analyzer MCP

Table of Contents

Overview

Architecture

High-Level Architecture

Component Architecture

Features

Core Capabilities

ETCD Analyzer Features

Network Analyzer Features

OVN-Kubernetes Analyzer Features

Node Analyzer Features

Shared Features

Project Structure

Installation

Prerequisites

Step 1: Clone Repository

Step 2: Create Virtual Environment

Step 3: Install Dependencies

Step 4: Configure Environment

Step 5: Verify KUBECONFIG

Quick Start

ETCD Analyzer

Network Analyzer

OVN-Kubernetes Analyzer

Node Analyzer

Components

1. MCP Servers

ETCD MCP Server (mcp/etcd/etcd_analyzer_mcp_server.py)

Network MCP Server (mcp/net/network_analyzer_mcp_server.py)

OVN-Kubernetes MCP Server (mcp/ovnk/ovnk_analyzer_mcp_server.py)

Node MCP Server (mcp/node/node_analyzer_mcp_server.py)

2. Tools/Collectors

3. Analysis Modules

4. ELT Pipeline

5. Storage Layer

6. AI Agents

Configuration

Metrics Configuration

Environment Variables

Performance Thresholds

Usage Examples

Example 1: ETCD Performance Analysis

Example 2: Network Analysis

Example 3: OVN-Kubernetes Analysis

Example 4: Performance Report Generation

Example 5: Data Storage

API Reference

MCP Server Endpoints

Chat Client Endpoints

Tool Parameters

Troubleshooting

Common Issues

1. MCP Server Won't Start

2. Authentication Failures

3. Missing Metrics

4. LLM API Errors

Debug Mode

Contributing

Development Setup

Code Style

Adding New Metrics

Testing

License

Support

Acknowledgments

Roadmap

Planned Features

Recommended Servers

ETCD MCP Server (`mcp/etcd/etcd_analyzer_mcp_server.py`)

Network MCP Server (`mcp/net/network_analyzer_mcp_server.py`)

OVN-Kubernetes MCP Server (`mcp/ovnk/ovnk_analyzer_mcp_server.py`)

Node MCP Server (`mcp/node/node_analyzer_mcp_server.py`)