OCP Performance Analyzer MCP
A comprehensive, AI-powered performance analysis and monitoring platform for OpenShift/Kubernetes clusters. This project provides Model Context Protocol (MCP) servers for analyzing etcd, network, and OVN-Kubernetes components with deep performance insights, automated root cause analysis, and actionable recommendations.
README
OCP Performance Analyzer MCP
A comprehensive, AI-powered performance analysis and monitoring platform for OpenShift/Kubernetes clusters. This project provides Model Context Protocol (MCP) servers for analyzing etcd, network, and OVN-Kubernetes components with deep performance insights, automated root cause analysis, and actionable recommendations.
Table of Contents
- Overview
- Architecture
- Features
- Project Structure
- Installation
- Quick Start
- Components
- Configuration
- Usage Examples
- API Reference
- Troubleshooting
- Contributing
Overview
The OCP Performance Analyzer MCP is a multi-component platform designed to monitor and analyze OpenShift/Kubernetes cluster performance across four main areas:
- ETCD Analyzer - Comprehensive etcd cluster performance monitoring
- Network Analyzer - Network stack performance analysis (L1, sockets, netstat, I/O)
- OVN-Kubernetes Analyzer - OVN-Kubernetes networking component analysis
- Node Analyzer - Node health and performance monitoring (PLEG, runtime operations, resource usage)
Each component includes:
- MCP servers exposing performance analysis tools
- AI-powered agents for intelligent analysis and reporting
- Data collection tools for Prometheus metrics
- ELT (Extract-Load-Transform) pipelines for data processing
- Persistent storage using DuckDB
- Web interfaces for interactive analysis
Architecture
High-Level Architecture
┌──────────────────────────────────────────────────────────┐
│ Client Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Web UI │ │ CLI Tools │ │ REST API │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└─────────┼─────────────────┼─────────────────┼────────────┘
│ │ │
└─────────────────┼─────────────────┘
│
┌───────────────────────────┼───────────────────────────────┐
│ AI Agent Layer (Port 8080) │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ LangGraph Agents: Chat, Report, Storage │ │
│ │ • Streaming responses │ │
│ │ • Tool orchestration │ │
│ │ • Conversation memory │ │
│ └─────────────────────────────────────────────────────┘ │
└───────────────────────────┬───────────────────────────────┘
│ MCP Protocol
┌───────────────────────────┼─────────────────────────────────────────────────┐
│ MCP Server Layer (Port 8000) │
│ ┌──────────────┐ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ ETCD Server │ │ Network Server│ │ OVNK Server │ │ Node Server │ │
│ │ 15+ tools │ │ 10+ tools │ │ 8+ tools │ │ 5+ tools │ │
│ └───────┬──────┘ └────────┬──────┘ └────────┬──────┘ └────────┬──────┘ │
└──────────┼──────────────────┼──────────────────┼──────────────────┼─────────┘
│ │ │ │
┌──────────┼──────────────────┼──────────────────┼──────────────────┼──────────┐
│ │ │ │ │ │
│ ┌───────▼───────┐ ┌───────▼───────┐ ┌───────▼───────┐ ┌───────▼───────┐ │
│ │ Tools/ │ │ Tools/ │ │ Tools/ │ │ Tools/ │ │
│ │ Collectors │ │ Collectors │ │ Collectors │ │ Collectors │ │
│ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │
│ │ │ │ │ │
│ ┌───────▼───────┐ ┌───────▼───────┐ ┌───────▼───────┐ ┌───────▼───────┐ │
│ │ Analysis │ │ Analysis │ │ Analysis │ │ Analysis │ │
│ │ Modules │ │ Modules │ │ Modules │ │ Modules │ │
│ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │
│ │ │ │ │ │
│ ┌───────▼───────┐ ┌───────▼───────┐ ┌───────▼───────┐ ┌───────▼───────┐ │
│ │ ELT │ │ ELT │ │ ELT │ │ ELT │ │
│ │ Pipeline │ │ Pipeline │ │ Pipeline │ │ Pipeline │ │
│ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │
│ │ │ │ │ │
│ ┌───────▼──────────────────▼──────────────────▼──────────────────▼───────┐ │
│ │ Storage Layer (DuckDB) │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ OpenShift/Kubernetes Cluster Infrastructure │
│ • ETCD Cluster • Prometheus • Kubernetes API │
│ • Master Nodes • OVN-Kubernetes • Network Components │
└─────────────────────────────────────────────────────────────┘
Component Architecture
Each analyzer (etcd, network, ovnk) follows a consistent architecture:
- MCP Server - FastMCP-based server exposing analysis tools
- Tools/Collectors - Specialized metric collectors for Prometheus queries
- Analysis Modules - Performance analysis and bottleneck detection
- ELT Pipeline - Data transformation and HTML table generation
- Storage Modules - DuckDB persistence for historical data
- AI Agents - LangGraph-based agents for intelligent analysis
Features
Core Capabilities
- Multi-Component Analysis: ETCD, Network, and OVN-Kubernetes analyzers
- MCP Protocol: Model Context Protocol servers for tool exposure
- AI-Powered: LangGraph agents with OpenAI integration
- Real-time Monitoring: Live metrics collection and analysis
- Historical Analysis: DuckDB-based time-series storage
- Automated Reporting: Executive-ready performance reports
- Web Interfaces: Interactive chat and analysis UIs
- Streaming Responses: Real-time result streaming via SSE
ETCD Analyzer Features
- 15+ Analysis Tools: Cluster status, WAL fsync, backend commit, disk I/O, network I/O, node usage
- Deep Drive Analysis: Multi-subsystem comprehensive review
- Bottleneck Detection: Automated performance issue identification
- Performance Reports: Executive summaries with recommendations
- Critical Metrics: WAL fsync P99 (<10ms), backend commit P99 (<25ms)
Network Analyzer Features
- 10+ Analysis Tools: L1 stats, socket statistics (TCP/UDP/IP/mem/softnet), netstat, network I/O
- Multi-Layer Analysis: Physical layer to application layer metrics
- Performance Metrics: Throughput, latency, packet statistics, connection tracking
- Comprehensive Coverage: 95+ network metrics across 9 categories
OVN-Kubernetes Analyzer Features
- 8+ Analysis Tools: OVN database, kubelet CNI, latency, OVS usage, pod metrics, API stats
- OVN-Specific Metrics: Northbound/Southbound database sizes, sync performance
- CNI Analysis: Kubelet and CNI performance metrics
- OVS Monitoring: Open vSwitch daemon and flow table statistics
Node Analyzer Features
- 5+ Analysis Tools: Node resource usage, PLEG latency, kubelet runtime operations errors, cluster info, health status
- PLEG Monitoring: Pod Lifecycle Event Generator relist latency metrics with configurable thresholds
- Runtime Error Tracking: Kubelet runtime operations error rates by operation type
- Resource Metrics: CPU, memory, and cgroup usage across node groups (controlplane, worker, infra, workload)
- Node Group Support: Metrics grouped by node role for targeted analysis
- Web UI: Markdown-rendered chat interface with color-coded insights and recommendations
Shared Features
- Configuration Management: YAML-based metrics configuration (11 metric files)
- Authentication: OpenShift/Kubernetes cluster authentication
- Prometheus Integration: Direct PromQL query execution
- Data Visualization: HTML table generation with highlighting
- Export Capabilities: Reports, data exports, historical queries
Project Structure
ocp-performance-analyzer-mcp/
│
├── analysis/ # Performance analysis modules
│ ├── etcd/ # ETCD-specific analysis
│ │ ├── etcd_performance_deepdrive.py
│ │ └── etcd_performance_report.py
│ ├── net/ # Network analysis (future)
│ ├── node/ # Node analysis (future)
│ ├── ovnk/ # OVN-Kubernetes analysis (future)
│ └── utils/ # Shared analysis utilities
│ └── analysis_utility.py
│
├── config/ # Configuration management
│ ├── metrics_config_reader.py # Unified metrics loader
│ ├── metrics-alert.yml # Alert metrics
│ ├── metrics-api.yml # API server metrics
│ ├── metrics-cni.yml # CNI metrics
│ ├── metrics-disk.yml # Disk I/O metrics
│ ├── metrics-etcd.yml # ETCD metrics (51 metrics)
│ ├── metrics-latency.yml # Latency metrics
│ ├── metrics-net.yml # Network metrics (95 metrics)
│ ├── metrics-node.yml # Node metrics
│ ├── metrics-ovn.yml # OVN metrics
│ ├── metrics-ovs.yml # OVS metrics
│ ├── metrics-pods.yml # Pod metrics
│ ├── README.md # Config documentation
│ └── test_metrics_loading.py # Configuration tests
│
├── elt/ # Extract-Load-Transform pipeline
│ ├── etcd/ # ETCD ELT modules
│ │ ├── analyzer_elt_backend_commit.py
│ │ ├── analyzer_elt_bottleneck_analysis.py
│ │ ├── analyzer_elt_cluster_status.py
│ │ ├── analyzer_elt_compact_defrag.py
│ │ ├── analyzer_elt_general_info.py
│ │ ├── analyzer_elt_performance_deep_drive.py
│ │ ├── analyzer_elt_wal_fsync.py
│ │ └── etcd_analyzer_elt_*.py
│ ├── net/ # Network ELT modules
│ │ ├── analyzer_elt_network_io.py
│ │ ├── analyzer_elt_network_l1.py
│ │ ├── analyzer_elt_network_netstat4*.py
│ │ └── analyzer_elt_network_socket4*.py
│ ├── node/ # Node ELT modules
│ │ ├── analyzer_elt_node_usage.py
│ │ ├── analyzer_elt_node_pleg_relist.py
│ │ └── analyzer_elt_node_kubelet_runtime_operations_errors.py
│ ├── ocp/ # OCP cluster ELT modules
│ │ ├── analyzer_elt_cluster_alert.py
│ │ ├── analyzer_elt_cluster_apistats.py
│ │ └── analyzer_elt_cluster_info.py
│ ├── ovnk/ # OVN-Kubernetes ELT modules
│ │ ├── analyzer_elt_deepdrive.py
│ │ ├── analyzer_elt_kubelet_cni.py
│ │ ├── analyzer_elt_latency.py
│ │ └── analyzer_elt_ovs.py
│ ├── pods/ # Pod ELT modules
│ │ └── analyzer_elt_pods_usage.py
│ ├── disk/ # Disk ELT modules
│ │ └── analyzer_elt_disk_io.py
│ └── utils/ # ELT utilities
│ ├── analyzer_elt_json2table.py # Generic orchestrator
│ ├── analyzer_elt_utility.py # Pure utilities
│ └── README.md # ELT documentation
│
├── mcp/ # MCP servers and agents
│ ├── etcd/ # ETCD analyzer MCP server
│ │ ├── etcd_analyzer_mcp_server.py # Main MCP server
│ │ ├── etcd_analyzer_client_chat.py # Chat client (FastAPI)
│ │ ├── etcd_analyzer_mcp_agent_report.py # Report agent
│ │ ├── etcd_analyzer_mcp_agent_stor2db.py # Storage agent
│ │ ├── etcd_analyzer_command.sh # Management script
│ │ ├── etcd_analyzer_cluster.duckdb # DuckDB database
│ │ ├── exports/ # Report exports
│ │ ├── logs/ # Application logs
│ │ ├── storage/ # Storage modules
│ │ ├── pyproject.toml # Package config
│ │ └── README.md # ETCD docs
│ ├── net/ # Network analyzer MCP server
│ │ ├── network_analyzer_mcp_server.py
│ │ ├── network_analyzer_client_chat.py
│ │ ├── network_analyzer_mcp_command.sh
│ │ ├── exports/
│ │ ├── logs/
│ │ └── storage/
│ ├── node/ # Node analyzer MCP server
│ │ ├── node_analyzer_mcp_server.py # Main MCP server
│ │ ├── node_analyzer_client_chat.py # Chat client (FastAPI)
│ │ ├── mcp_tools/ # Modular MCP tool definitions
│ │ │ ├── __init__.py
│ │ │ ├── models.py # Pydantic models
│ │ │ ├── health_check.py # Health status tool
│ │ │ ├── cluster_info.py # Cluster info tool
│ │ │ ├── node_usage.py # Node usage tool
│ │ │ ├── node_pleg_relist.py # PLEG latency tool
│ │ │ └── node_kubelet_runtime_operations_errors.py # Runtime errors tool
│ │ ├── exports/
│ │ └── logs/
│ └── ovnk/ # OVN-Kubernetes analyzer MCP server
│ ├── ovnk_analyzer_mcp_server.py
│ ├── ovnk_analyzer_mcp_client_chat.py
│ ├── ovnk_analyzer_mcp_command.sh
│ ├── exports/
│ ├── logs/
│ ├── storage/
│ └── README.md
│
├── ocauth/ # OpenShift authentication
│ └── openshift_auth.py # K8s/OCP auth, token management
│
├── storage/ # DuckDB storage modules
│ ├── etcd/ # ETCD storage modules
│ │ ├── analyzer_stor_backend_commit.py
│ │ ├── analyzer_stor_cluster_info.py
│ │ ├── analyzer_stor_compact_defrag.py
│ │ ├── analyzer_stor_disk_io.py
│ │ ├── analyzer_stor_disk_wal_fsync.py
│ │ ├── analyzer_stor_general_info.py
│ │ ├── analyzer_stor_network_io.py
│ │ └── analyzer_stor_utility.py
│ ├── net/ # Network storage (future)
│ └── ovnk/ # OVN-Kubernetes storage (future)
│
├── tools/ # Metric collection tools
│ ├── etcd/ # ETCD collectors
│ │ ├── etcd_cluster_status.py
│ │ ├── etcd_general_info.py
│ │ ├── etcd_disk_wal_fsync.py
│ │ ├── etcd_disk_backend_commit.py
│ │ └── etcd_disk_compact_defrag.py
│ ├── net/ # Network collectors
│ │ ├── network_io.py
│ │ ├── network_l1.py
│ │ ├── network_netstat4tcp.py
│ │ ├── network_netstat4udp.py
│ │ ├── network_socket4tcp.py
│ │ ├── network_socket4udp.py
│ │ ├── network_socket4ip.py
│ │ ├── network_socket4mem.py
│ │ └── network_socket4softnet.py
│ ├── node/ # Node collectors
│ │ ├── node_usage.py
│ │ ├── node_pleg_relist.py
│ │ └── node_kubelet_runtime_operations_errors.py
│ ├── ocp/ # OCP collectors
│ │ ├── cluster_info.py
│ │ ├── cluster_apistats.py
│ │ └── cluster_alert.py
│ ├── ovnk/ # OVN-Kubernetes collectors
│ │ ├── ovnk_baseinfo.py
│ │ ├── ovnk_kubelet_cni.py
│ │ ├── ovnk_latency.py
│ │ └── ovnk_ovs_usage.py
│ ├── pods/ # Pod collectors
│ │ └── pods_usage.py
│ ├── disk/ # Disk collectors
│ │ └── disk_io.py
│ └── utils/ # Shared utilities
│ ├── promql_basequery.py # Base Prometheus queries
│ └── promql_utility.py # PromQL helpers
│
├── webroot/ # Web interfaces
│ ├── etcd/ # ETCD web UI
│ │ └── etcd_analyzer_mcp_llm.html
│ ├── net/ # Network web UI
│ │ └── network_analyzer_mcp_llm.html
│ ├── node/ # Node web UI
│ │ └── node_analyzer_mcp_llm.html
│ └── ovnk/ # OVN-Kubernetes web UI
│ └── ovnk_analyzer_mcp_llm.html
│
├── exports/ # Generated reports and exports
├── logs/ # Application logs
├── pyproject.toml # Main project configuration
├── LICENSE # License file
└── README.md # This file
Installation
Prerequisites
- Python 3.8 or higher
- Access to OpenShift/Kubernetes cluster
- KUBECONFIG configured
- Prometheus/Thanos accessible
- OpenAI API key (for AI features)
Step 1: Clone Repository
git clone https://github.com/liqcui/ocp-performance-analyzer-mcp.git
cd ocp-performance-analyzer-mcp
Step 2: Create Virtual Environment
python3 -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
Step 3: Install Dependencies
pip install -e .
Or install from the root pyproject.toml:
pip install -r requirements.txt # If available
Key Dependencies:
fastmcp>=1.12.4- MCP server frameworkfastapi>=0.115.7- Web frameworklangchain>=0.3.0- LLM integrationlanggraph>=0.3.0- Agent orchestrationduckdb>=1.0.0- Time-series databasekubernetes>=30.0.0- Kubernetes clientprometheus-api-client>=0.5.3- Prometheus queriespydantic>=2.0.0- Data validationpandas>=2.2.0- Data processingpyyaml>=6.0.1- Configuration parsing
Step 4: Configure Environment
Create .env file (optional):
# OpenAI-compatible API configuration
OPENAI_API_KEY=your-api-key-here
BASE_URL=https://your-llm-api-endpoint
# OpenShift configuration
KUBECONFIG=/path/to/your/kubeconfig
# Optional: MCP Inspector
ENABLE_MCP_INSPECTOR=0
MCP_INSPECTOR_URL=http://127.0.0.1:8000/sse
Step 5: Verify KUBECONFIG
export KUBECONFIG=/path/to/kubeconfig
kubectl get nodes
oc get clusterversion # For OpenShift
Quick Start
ETCD Analyzer
cd mcp/etcd
# Start MCP server
./etcd_analyzer_command.sh start
# Or manually
python etcd_analyzer_mcp_server.py
# Start chat client (in another terminal)
python etcd_analyzer_client_chat.py
# Access web UI
open http://localhost:8080/ui
Network Analyzer
cd mcp/net
# Start MCP server
./network_analyzer_mcp_command.sh start
# Or manually
python network_analyzer_mcp_server.py
# Start chat client
python network_analyzer_client_chat.py
OVN-Kubernetes Analyzer
cd mcp/ovnk
# Start MCP server
./ovnk_analyzer_mcp_command.sh start
# Or manually
python ovnk_analyzer_mcp_server.py
# Start chat client
python ovnk_analyzer_mcp_client_chat.py
Node Analyzer
cd mcp/node
# Start MCP server (Port 8004)
python node_analyzer_mcp_server.py
# Start chat client (Port 8084) in another terminal
python node_analyzer_client_chat.py
# Access web UI
open http://localhost:8084/ui
Components
1. MCP Servers
Each analyzer exposes an MCP server with specialized tools:
ETCD MCP Server (mcp/etcd/etcd_analyzer_mcp_server.py)
Tools:
get_server_health- Server health checkget_etcd_cluster_status- Cluster health via etcdctlget_ocp_cluster_info- Cluster informationget_etcd_general_info- General etcd metricsget_etcd_node_usage- Master node metricsget_etcd_disk_wal_fsync- WAL fsync performanceget_etcd_disk_backend_commit- Backend commit performanceget_node_disk_io- Disk I/O metricsget_etcd_disk_compact_defrag- Compaction/defrag metricsget_etcd_network_io- Network I/O metricsget_etcd_performance_deep_drive- Comprehensive analysisget_etcd_bottleneck_analysis- Bottleneck detectiongenerate_etcd_performance_report- Executive report
Network MCP Server (mcp/net/network_analyzer_mcp_server.py)
Tools:
get_ocp_cluster_info- Cluster informationquery_network_l1_metrics- Layer 1 network statisticsquery_network_io_metrics- Network I/O performancequery_network_socket_tcp_metrics- TCP socket statisticsquery_network_socket_udp_metrics- UDP socket statisticsquery_network_socket_ip_metrics- IP socket statisticsquery_network_socket_mem_metrics- Socket memory statisticsquery_network_socket_softnet_metrics- Softnet statisticsquery_network_netstat_tcp_metrics- TCP netstat metricsquery_network_netstat_udp_metrics- UDP netstat metrics
OVN-Kubernetes MCP Server (mcp/ovnk/ovnk_analyzer_mcp_server.py)
Tools:
get_ocp_cluster_info- Cluster informationquery_ovnk_pod_metrics- OVN-Kubernetes pod metricsquery_multus_pod_metrics- Multus CNI metricsquery_ovnk_container_metrics- OVN container metricsquery_ovnk_sync_metrics- OVN synchronization metricsquery_ovnk_ovs_metrics- OVS daemon metricsquery_ovnk_latency_metrics- Network latency metricsquery_kube_api_metrics- Kubernetes API metrics
Node MCP Server (mcp/node/node_analyzer_mcp_server.py)
Tools:
get_server_health- Server health check and collector initialization statusget_ocp_cluster_info- Cluster information and node inventoryget_ocp_node_usage- Node resource usage (CPU, memory, cgroup) by node groupget_ocp_node_pleg_latency- PLEG relist latency metrics with thresholds- Healthy: < 1s
- Warning: 1-10s
- Critical: > 10s (default), configurable to 3 minutes
get_ocp_node_runtime_errors- Kubelet runtime operations error rates- Healthy: < 0.01 errors/sec
- Warning: 0.01-0.1 errors/sec
- Critical: 0.1-1 errors/sec
- Severe: > 1 error/sec
Features:
- Modular tool architecture in
mcp_tools/directory - Node group support (controlplane, worker, infra, workload)
- Comprehensive health summary with node-level metrics
- Markdown-based chat UI with syntax highlighting
- Real-time streaming responses
2. Tools/Collectors
Specialized collectors organized by category:
- ETCD: Cluster status, general info, WAL fsync, backend commit, compact/defrag
- Network: I/O, L1, sockets (TCP/UDP/IP/mem/softnet), netstat (TCP/UDP)
- Node: CPU, memory, cgroup usage, PLEG relist latency, kubelet runtime operations errors
nodeUsageCollector- Node resource metrics (CPU, memory, cgroup)plegRelistCollector- Pod Lifecycle Event Generator latency metricskubeletRuntimeOperationsErrorsCollector- Runtime operation error rates by type
- OCP: Cluster info, API stats, alerts
- OVNK: OVN database, kubelet CNI, latency, OVS usage
- Pods: Pod and container metrics
- Disk: Disk I/O performance
3. Analysis Modules
Performance analysis and reporting:
- Deep Drive Analysis: Multi-subsystem comprehensive review
- Bottleneck Detection: Automated issue identification
- Performance Reports: Executive summaries with recommendations
- Baseline Comparison: Current vs. target performance
- Root Cause Analysis: Script-based + AI-powered RCA
4. ELT Pipeline
Extract-Load-Transform for data processing:
- Generic Orchestrator: Routes data to metric-specific handlers
- Metric Handlers: Specialized ELT modules per metric type
- HTML Generation: Formatted tables with highlighting
- Data Transformation: JSON to structured DataFrames
5. Storage Layer
DuckDB-based persistent storage:
- Time-Series Data: Efficient temporal data storage
- Schema Management: Automatic table creation and migration
- Query Interface: SQL-based data access
- Historical Analysis: Long-term performance tracking
6. AI Agents
LangGraph-based intelligent agents:
- Chat Agent: Conversational interface with tool execution
- Report Agent: Automated performance report generation
- Storage Agent: Data collection and persistence
Configuration
Metrics Configuration
Metrics are defined in YAML files under config/:
metrics-etcd.yml- 51 ETCD metrics across 5 categoriesmetrics-net.yml- 95 network metrics across 9 categoriesmetrics-api.yml- 15 API server metricsmetrics-disk.yml- 8 disk I/O metricsmetrics-node.yml- 5 node metricsmetrics-ovn.yml- 2 OVN metricsmetrics-ovs.yml- 18 OVS metricsmetrics-pods.yml- 6 pod metricsmetrics-cni.yml- 18 CNI metricsmetrics-latency.yml- 18 latency metricsmetrics-alert.yml- Alert metrics
See config/README.md for detailed configuration documentation.
Environment Variables
# Required
export KUBECONFIG=/path/to/kubeconfig
# Optional - automatically set to UTC
export TZ=UTC
# LLM Configuration
export OPENAI_API_KEY=your-api-key
export BASE_URL=https://api.openai.com/v1
# MCP Inspector (optional)
export ENABLE_MCP_INSPECTOR=1
export MCP_INSPECTOR_URL=http://127.0.0.1:8000/sse
# Logging
export LOG_LEVEL=INFO
export OVNK_LOG_LEVEL=INFO
Performance Thresholds
Default thresholds (configurable in analysis modules):
thresholds = {
'wal_fsync_p99_ms': 10.0, # Critical for write performance
'backend_commit_p99_ms': 25.0, # Critical for persistence
'cpu_usage_warning': 70.0, # Pod CPU warning
'cpu_usage_critical': 85.0, # Pod CPU critical
'memory_usage_warning': 70.0, # Pod memory warning
'memory_usage_critical': 85.0, # Pod memory critical
'peer_latency_warning_ms': 50.0, # Network warning
'peer_latency_critical_ms': 100.0, # Network critical
'network_utilization_warning': 70.0, # Network utilization warning
'network_utilization_critical': 85.0, # Network utilization critical
}
Usage Examples
Example 1: ETCD Performance Analysis
# Start ETCD analyzer
cd mcp/etcd
./etcd_analyzer_command.sh start
# In web UI, ask:
"Analyze etcd performance for the last hour"
"Show me WAL fsync performance"
"Generate a performance report for the last 24 hours"
Example 2: Network Analysis
# Start network analyzer
cd mcp/net
python network_analyzer_mcp_server.py
# Query network metrics
curl -X POST http://localhost:8000/tools/query_network_io_metrics \
-H "Content-Type: application/json" \
-d '{"duration": "1h"}'
Example 3: OVN-Kubernetes Analysis
# Start OVN-Kubernetes analyzer
cd mcp/ovnk
python ovnk_analyzer_mcp_server.py
# Query OVN metrics
curl -X POST http://localhost:8000/tools/query_ovnk_pod_metrics \
-H "Content-Type: application/json" \
-d '{"duration": "1h"}'
Example 4: Performance Report Generation
# Using ETCD report agent
cd mcp/etcd
python etcd_analyzer_mcp_agent_report.py
# Follow prompts:
# 1. Select duration mode or time range mode
# 2. Enter duration (e.g., "1h") or time range
# 3. View streaming analysis and report
Example 5: Data Storage
# Using ETCD storage agent
cd mcp/etcd
python etcd_analyzer_mcp_agent_stor2db.py
# Data stored in etcd_analyzer_cluster.duckdb
# Query stored data:
python -c "
import duckdb
conn = duckdb.connect('etcd_analyzer_cluster.duckdb')
result = conn.execute('SELECT * FROM wal_fsync_p99_latency LIMIT 10').fetchall()
print(result)
"
API Reference
MCP Server Endpoints
All MCP servers expose tools via HTTP/SSE:
- Base URL:
http://localhost:8000 - Health Check:
GET /health - Tools:
POST /tools/{tool_name}
Chat Client Endpoints
AI chat clients expose REST APIs:
- Base URL:
http://localhost:8080 - Web UI:
GET /uiorGET / - Streaming Chat:
POST /chat/stream - Non-streaming Chat:
POST /chat - Health:
GET /api/mcp/health - Tools List:
GET /api/tools
Tool Parameters
Common parameters across tools:
duration(str): Time duration (e.g., "5m", "1h", "24h")start_time(str, optional): Start time in ISO formatend_time(str, optional): End time in ISO format
See individual component READMEs for detailed API documentation:
mcp/etcd/README.md- ETCD analyzer APImcp/ovnk/README.md- OVN-Kubernetes analyzer APIconfig/README.md- Configuration APIelt/utils/README.md- ELT pipeline API
Troubleshooting
Common Issues
1. MCP Server Won't Start
Solutions:
# Check KUBECONFIG
echo $KUBECONFIG
kubectl get nodes
# Check if port 8000 is in use
lsof -i :8000
# Check logs
tail -f logs/mcp_server_*.log
2. Authentication Failures
Solutions:
# Verify KUBECONFIG
export KUBECONFIG=/path/to/kubeconfig
kubectl auth can-i get pods -n openshift-etcd
# Check Prometheus access
kubectl get route -n openshift-monitoring
3. Missing Metrics
Solutions:
# Verify Prometheus is accessible
oc get pods -n openshift-monitoring | grep prometheus
# Check metric availability
oc exec -n openshift-monitoring prometheus-k8s-0 -- \
promtool query instant http://localhost:9090 \
'etcd_disk_wal_fsync_duration_seconds_bucket'
4. LLM API Errors
Solutions:
# Check .env file
cat .env | grep OPENAI_API_KEY
# Test API connection
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
$BASE_URL/models
Debug Mode
Enable verbose logging:
export LOG_LEVEL=DEBUG
export OVNK_LOG_LEVEL=DEBUG
python mcp/etcd/etcd_analyzer_mcp_server.py
Contributing
Development Setup
# Clone repository
git clone https://github.com/liqcui/ocp-performance-analyzer-mcp.git
cd ocp-performance-analyzer-mcp
# Create development environment
python3 -m venv venv
source venv/bin/activate
# Install in development mode
pip install -e .
# Install development dependencies
pip install pytest pytest-asyncio black flake8 mypy
Code Style
# Format code
black .
# Lint code
flake8 .
# Type checking
mypy .
Adding New Metrics
- Define metric in appropriate
config/metrics-*.ymlfile - Add collector in
tools/{category}/directory - Add ELT handler in
elt/{category}/directory - Add storage module in
storage/{category}/directory - Register tool in MCP server
- Update documentation
Testing
# Run tests
pytest
# Run with coverage
pytest --cov=. --cov-report=html
License
MIT License - see LICENSE file for details.
Support
For issues and questions:
- Check the troubleshooting section
- Review component-specific READMEs
- Check logs in
logs/directories - Open an issue with detailed logs and configuration
Acknowledgments
- MCP Protocol: Model Context Protocol
- LangChain: LangChain Framework
- LangGraph: LangGraph
- FastMCP: FastMCP Library
- DuckDB: DuckDB
- OpenShift: Red Hat OpenShift
Roadmap
Planned Features
- [ ] Multi-cluster support
- [ ] Historical trend analysis
- [ ] Anomaly detection with ML
- [ ] Custom alert rules
- [ ] Grafana integration
- [ ] Slack/Teams notifications
- [ ] Performance prediction
- [ ] Automated remediation suggestions
- [ ] Kubernetes native deployment (Helm charts)
- [ ] Real-time streaming metrics
Built with ❤️ for the OpenShift and Kubernetes community
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.