K8s Doctor MCP

K8s Doctor MCP

AI-powered Kubernetes diagnostics that analyzes pod crashes, logs, and cluster health to provide root cause analysis and actionable solutions for common issues like CrashLoopBackOff, OOM kills, and connection errors.

Category
Visit Server

README

๐Ÿฅ K8s Doctor MCP

AI-powered Kubernetes cluster diagnostics and intelligent debugging recommendations

npm version npm downloads License Node Kubernetes

English | ํ•œ๊ตญ์–ด

Demo

<!-- Add your demo GIF here --> K8s Doctor Demo

Why K8s Doctor?

When a Kubernetes issue strikes, developers typically run through an endless loop of:

  • kubectl get pods
  • kubectl logs
  • kubectl describe
  • Frantically searching StackOverflow...

K8s Doctor changes the game. It's not just a kubectl wrapper - it's an AI-powered diagnostic tool that:

  • ๐Ÿ” Analyzes root causes - Goes beyond simple status checks
  • ๐Ÿง  Detects error patterns - Recognizes common issues (Connection Refused, OOM, DNS failures)
  • ๐Ÿ’ก Provides actionable solutions - Gives you exact kubectl commands to fix problems
  • ๐Ÿ“Š Exit code analysis - Explains what exit 137, 143, 1 actually mean
  • ๐ŸŽฏ Log pattern matching - Finds the signal in thousands of log lines
  • ๐Ÿฅ Health scoring - Rates your pod/cluster health 0-100

Features

Tool Description
diagnose-pod Comprehensive pod diagnostics - analyzes status, events, resources, and provides health score
debug-crashloop CrashLoopBackOff specialist - decodes exit codes, analyzes logs, finds root cause
analyze-logs Smart log analysis - detects error patterns, suggests fixes for common issues
check-resources Resource usage - validates CPU/Memory limits, warns about OOM risks
full-diagnosis Cluster health check - scans all nodes and pods for issues
check-events Event analysis - filters and analyzes Warning events
list-namespaces Namespace listing - quick overview of all namespaces
list-pods Pod listing - shows problematic pods with status indicators

Installation

Via npm (recommended)

npm install -g @zerry_jin/k8s-doctor-mcp

From source

git clone https://github.com/ongjin/k8s-doctor-mcp.git
cd k8s-doctor-mcp
npm install && npm run build

Setup with Claude Code

# After npm global install
claude mcp add --scope project k8s-doctor -- k8s-doctor-mcp

# Or from source build
claude mcp add --scope project k8s-doctor -- node /path/to/k8s-doctor-mcp/dist/index.js

Quick Setup (Auto-approve Tools)

Tired of manually approving tool execution every time? Follow these steps to enable auto-approval.

๐Ÿ–ฅ๏ธ For Claude Desktop App Users

  1. Restart the Claude Desktop App.
  2. Ask your first question using k8s-doctor.
  3. When the permission dialog appears, check the box "Always allow requests from this server" and click Allow. (Future requests will execute automatically without prompts.)

โŒจ๏ธ For Claude Code (CLI) Users

If you are using the claude terminal command, manage permissions via the interactive menu:

  1. Run claude in your terminal.
  2. Type /permissions in the prompt and press Enter.
  3. Select Global Permissions (or Project Permissions) > Allowed Tools.
  4. Enter mcp__k8s-doctor__* to allow all tools, or add specific tools individually.

๐Ÿ’ก Tip: For most use cases, allowing diagnose-pod, debug-crashloop, and analyze-logs is sufficient. These three cover 90% of debugging scenarios.

Recommended configuration:

# Balanced approach - allow main diagnostic tools
claude config add allowedTools \
  "mcp__k8s-doctor__diagnose-pod" \
  "mcp__k8s-doctor__debug-crashloop" \
  "mcp__k8s-doctor__analyze-logs" \
  "mcp__k8s-doctor__full-diagnosis"

Prerequisites

  • kubectl configured and working (kubectl cluster-info should succeed)
  • kubeconfig file in default location (~/.kube/config) or KUBECONFIG env var set
  • Node.js 18 or higher
  • Access to a Kubernetes cluster (local like minikube/kind, or remote)

Usage Examples

Example 1: Diagnose a CrashLooping Pod

You: "My pod 'api-server' in namespace 'production' is CrashLooping. What's wrong?"

Claude (using k8s-doctor):
๐Ÿ” CrashLoopBackOff ์ง„๋‹จ

Exit Code: 137 (OOM Killed)
Root Cause: Container was killed due to Out Of Memory

Solution:
Increase memory limit:
```yaml
resources:
  limits:
    memory: "512Mi"  # Increase from current value

Relevant logs:

  • Line 1234: Error: JavaScript heap out of memory
  • Line 1256: FATAL ERROR: Reached heap limit

### Example 2: Analyze Application Logs

You: "Analyze logs for pod 'backend-worker' and tell me what's failing"

Claude (using analyze-logs): ๐Ÿ“ Log Analysis

Detected Error Patterns:

๐Ÿ”ด Database Connection Error (15 occurrences) Possible Causes:

  • DB service not ready
  • Wrong connection string
  • Authentication failed

Solutions:

  • Check DB pod status
  • Verify environment variables (ConfigMap/Secret)
  • Check service endpoints: kubectl get endpoints

๐ŸŸก Timeout (8 occurrences) Likely cause: Response time too slow or network delay Solution: Increase timeout values or optimize service performance


### Example 3: Cluster Health Check

You: "Check overall cluster health"

Claude (using full-diagnosis): ๐Ÿฅ Cluster Health Diagnosis

Overall Score: 72/100 ๐Ÿ’›

Nodes: 3/3 Ready โœ… Pods: 45/52 Running

  • CrashLoop: 2 ๐Ÿ”ฅ
  • Pending: 5 โณ

Critical Issues: ๐Ÿ”ด Pod "payment-service" CrashLooping (exit 1) ๐Ÿ”ด Pod "worker-3" OOM Killed

Recommendations:

  • Fix 2 CrashLoop pods immediately
  • Check if pending pods lack resources

## How It Works

1. **Connects to your cluster** via kubeconfig (same as kubectl)
2. **Gathers comprehensive data** - pod status, events, logs, resource usage
3. **Applies pattern matching** - recognizes common error patterns from production experience
4. **Analyzes root causes** - doesn't just show status, explains WHY it's failing
5. **Provides solutions** - gives exact commands and YAML to fix issues

## Error Patterns Detected

K8s Doctor recognizes these common patterns:

- ๐Ÿ”ด **Connection Refused** - Service not ready, wrong port, network policy
- ๐Ÿ”ด **Database Connection Errors** - DB auth, wrong connection strings
- ๐Ÿ”ด **Out of Memory** - OOM kills, memory leaks, undersized limits
- ๐ŸŸ  **File Not Found** - ConfigMap not mounted, wrong paths
- ๐ŸŸ  **Permission Denied** - SecurityContext issues, fsGroup problems
- ๐ŸŸ  **DNS Resolution Failed** - CoreDNS issues, wrong service names
- ๐ŸŸก **Port Already in Use** - Multiple processes on same port
- ๐ŸŸก **Timeout** - Slow responses, network delays
- ๐ŸŸก **SSL/TLS Errors** - Expired certs, missing CA bundles

## Architecture

k8s-doctor-mcp/ โ”œโ”€โ”€ src/ โ”‚ โ”œโ”€โ”€ index.ts # MCP server with all tools โ”‚ โ”œโ”€โ”€ types.ts # TypeScript type definitions โ”‚ โ”œโ”€โ”€ diagnostics/ โ”‚ โ”‚ โ”œโ”€โ”€ pod-diagnostics.ts # Pod health analysis โ”‚ โ”‚ โ””โ”€โ”€ cluster-health.ts # Cluster-wide diagnostics โ”‚ โ”œโ”€โ”€ analyzers/ โ”‚ โ”‚ โ””โ”€โ”€ log-analyzer.ts # Smart log pattern matching โ”‚ โ””โ”€โ”€ utils/ โ”‚ โ”œโ”€โ”€ k8s-client.ts # Kubernetes API client โ”‚ โ””โ”€โ”€ formatters.ts # Output formatting utilities โ””โ”€โ”€ package.json


## Security Considerations

- K8s Doctor uses **read-only** Kubernetes API calls (list, get, describe)
- Requires same permissions as `kubectl get/describe/logs`
- Never modifies cluster state
- kubeconfig credentials stay local
- No data sent to external servers

## Troubleshooting

### "kubeconfig not found"
```bash
# Verify kubectl works
kubectl cluster-info

# Check kubeconfig location
echo $KUBECONFIG

# Test with explicit path
export KUBECONFIG=~/.kube/config

"Permission denied"

# Check your cluster permissions
kubectl auth can-i get pods --all-namespaces

# You need at least read access to:
# - pods, events, namespaces, nodes

"Connection refused to cluster"

# Verify cluster connectivity
kubectl get nodes

# For local clusters (minikube/kind)
minikube status
kind get clusters

Development

# Clone and install
git clone https://github.com/ongjin/k8s-doctor-mcp.git
cd k8s-doctor-mcp
npm install

# Development mode
npm run dev

# Build
npm run build

# Test with Claude Code
npm run build
claude mcp add --scope project k8s-doctor-dev -- node $(pwd)/dist/index.js

Contributing

Contributions welcome! Especially:

  • ๐Ÿ†• New error pattern detections
  • ๐ŸŒ Internationalization (more languages)
  • ๐Ÿ“Š Metrics integration (Prometheus, etc.)
  • ๐Ÿงช Test coverage
  • ๐Ÿ“– Documentation improvements

Roadmap

  • [ ] Metrics Server integration (real-time CPU/Memory usage)
  • [ ] Network policy diagnostics
  • [ ] Storage/PVC troubleshooting
  • [ ] Helm chart analysis
  • [ ] Multi-cluster support
  • [ ] Interactive debugging mode
  • [ ] Export reports (PDF, HTML)

License

MIT ยฉ zerry

Acknowledgments

Built with:

Star History

If this tool saves you debugging time, please โญ star the repo!

Author

zerry

  • GitHub: @zerry
  • Created for the DevOps community who are tired of kubectl hell ๐Ÿ˜…

Made with โค๏ธ for Kubernetes users drowning in logs

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
E2B

E2B

Using MCP to run code via e2b.

Official
Featured