K8s Doctor MCP
AI-powered Kubernetes diagnostics that analyzes pod crashes, logs, and cluster health to provide root cause analysis and actionable solutions for common issues like CrashLoopBackOff, OOM kills, and connection errors.
README
๐ฅ K8s Doctor MCP
AI-powered Kubernetes cluster diagnostics and intelligent debugging recommendations
Demo
<!-- Add your demo GIF here -->

Why K8s Doctor?
When a Kubernetes issue strikes, developers typically run through an endless loop of:
kubectl get podskubectl logskubectl describe- Frantically searching StackOverflow...
K8s Doctor changes the game. It's not just a kubectl wrapper - it's an AI-powered diagnostic tool that:
- ๐ Analyzes root causes - Goes beyond simple status checks
- ๐ง Detects error patterns - Recognizes common issues (Connection Refused, OOM, DNS failures)
- ๐ก Provides actionable solutions - Gives you exact kubectl commands to fix problems
- ๐ Exit code analysis - Explains what exit 137, 143, 1 actually mean
- ๐ฏ Log pattern matching - Finds the signal in thousands of log lines
- ๐ฅ Health scoring - Rates your pod/cluster health 0-100
Features
| Tool | Description |
|---|---|
diagnose-pod |
Comprehensive pod diagnostics - analyzes status, events, resources, and provides health score |
debug-crashloop |
CrashLoopBackOff specialist - decodes exit codes, analyzes logs, finds root cause |
analyze-logs |
Smart log analysis - detects error patterns, suggests fixes for common issues |
check-resources |
Resource usage - validates CPU/Memory limits, warns about OOM risks |
full-diagnosis |
Cluster health check - scans all nodes and pods for issues |
check-events |
Event analysis - filters and analyzes Warning events |
list-namespaces |
Namespace listing - quick overview of all namespaces |
list-pods |
Pod listing - shows problematic pods with status indicators |
Installation
Via npm (recommended)
npm install -g @zerry_jin/k8s-doctor-mcp
From source
git clone https://github.com/ongjin/k8s-doctor-mcp.git
cd k8s-doctor-mcp
npm install && npm run build
Setup with Claude Code
# After npm global install
claude mcp add --scope project k8s-doctor -- k8s-doctor-mcp
# Or from source build
claude mcp add --scope project k8s-doctor -- node /path/to/k8s-doctor-mcp/dist/index.js
Quick Setup (Auto-approve Tools)
Tired of manually approving tool execution every time? Follow these steps to enable auto-approval.
๐ฅ๏ธ For Claude Desktop App Users
- Restart the Claude Desktop App.
- Ask your first question using
k8s-doctor. - When the permission dialog appears, check the box "Always allow requests from this server" and click Allow. (Future requests will execute automatically without prompts.)
โจ๏ธ For Claude Code (CLI) Users
If you are using the claude terminal command, manage permissions via the interactive menu:
- Run
claudein your terminal. - Type
/permissionsin the prompt and press Enter. - Select Global Permissions (or Project Permissions) > Allowed Tools.
- Enter
mcp__k8s-doctor__*to allow all tools, or add specific tools individually.
๐ก Tip: For most use cases, allowing
diagnose-pod,debug-crashloop, andanalyze-logsis sufficient. These three cover 90% of debugging scenarios.
Recommended configuration:
# Balanced approach - allow main diagnostic tools
claude config add allowedTools \
"mcp__k8s-doctor__diagnose-pod" \
"mcp__k8s-doctor__debug-crashloop" \
"mcp__k8s-doctor__analyze-logs" \
"mcp__k8s-doctor__full-diagnosis"
Prerequisites
- kubectl configured and working (
kubectl cluster-infoshould succeed) - kubeconfig file in default location (
~/.kube/config) orKUBECONFIGenv var set - Node.js 18 or higher
- Access to a Kubernetes cluster (local like minikube/kind, or remote)
Usage Examples
Example 1: Diagnose a CrashLooping Pod
You: "My pod 'api-server' in namespace 'production' is CrashLooping. What's wrong?"
Claude (using k8s-doctor):
๐ CrashLoopBackOff ์ง๋จ
Exit Code: 137 (OOM Killed)
Root Cause: Container was killed due to Out Of Memory
Solution:
Increase memory limit:
```yaml
resources:
limits:
memory: "512Mi" # Increase from current value
Relevant logs:
- Line 1234: Error: JavaScript heap out of memory
- Line 1256: FATAL ERROR: Reached heap limit
### Example 2: Analyze Application Logs
You: "Analyze logs for pod 'backend-worker' and tell me what's failing"
Claude (using analyze-logs): ๐ Log Analysis
Detected Error Patterns:
๐ด Database Connection Error (15 occurrences) Possible Causes:
- DB service not ready
- Wrong connection string
- Authentication failed
Solutions:
- Check DB pod status
- Verify environment variables (ConfigMap/Secret)
- Check service endpoints: kubectl get endpoints
๐ก Timeout (8 occurrences) Likely cause: Response time too slow or network delay Solution: Increase timeout values or optimize service performance
### Example 3: Cluster Health Check
You: "Check overall cluster health"
Claude (using full-diagnosis): ๐ฅ Cluster Health Diagnosis
Overall Score: 72/100 ๐
Nodes: 3/3 Ready โ Pods: 45/52 Running
- CrashLoop: 2 ๐ฅ
- Pending: 5 โณ
Critical Issues: ๐ด Pod "payment-service" CrashLooping (exit 1) ๐ด Pod "worker-3" OOM Killed
Recommendations:
- Fix 2 CrashLoop pods immediately
- Check if pending pods lack resources
## How It Works
1. **Connects to your cluster** via kubeconfig (same as kubectl)
2. **Gathers comprehensive data** - pod status, events, logs, resource usage
3. **Applies pattern matching** - recognizes common error patterns from production experience
4. **Analyzes root causes** - doesn't just show status, explains WHY it's failing
5. **Provides solutions** - gives exact commands and YAML to fix issues
## Error Patterns Detected
K8s Doctor recognizes these common patterns:
- ๐ด **Connection Refused** - Service not ready, wrong port, network policy
- ๐ด **Database Connection Errors** - DB auth, wrong connection strings
- ๐ด **Out of Memory** - OOM kills, memory leaks, undersized limits
- ๐ **File Not Found** - ConfigMap not mounted, wrong paths
- ๐ **Permission Denied** - SecurityContext issues, fsGroup problems
- ๐ **DNS Resolution Failed** - CoreDNS issues, wrong service names
- ๐ก **Port Already in Use** - Multiple processes on same port
- ๐ก **Timeout** - Slow responses, network delays
- ๐ก **SSL/TLS Errors** - Expired certs, missing CA bundles
## Architecture
k8s-doctor-mcp/ โโโ src/ โ โโโ index.ts # MCP server with all tools โ โโโ types.ts # TypeScript type definitions โ โโโ diagnostics/ โ โ โโโ pod-diagnostics.ts # Pod health analysis โ โ โโโ cluster-health.ts # Cluster-wide diagnostics โ โโโ analyzers/ โ โ โโโ log-analyzer.ts # Smart log pattern matching โ โโโ utils/ โ โโโ k8s-client.ts # Kubernetes API client โ โโโ formatters.ts # Output formatting utilities โโโ package.json
## Security Considerations
- K8s Doctor uses **read-only** Kubernetes API calls (list, get, describe)
- Requires same permissions as `kubectl get/describe/logs`
- Never modifies cluster state
- kubeconfig credentials stay local
- No data sent to external servers
## Troubleshooting
### "kubeconfig not found"
```bash
# Verify kubectl works
kubectl cluster-info
# Check kubeconfig location
echo $KUBECONFIG
# Test with explicit path
export KUBECONFIG=~/.kube/config
"Permission denied"
# Check your cluster permissions
kubectl auth can-i get pods --all-namespaces
# You need at least read access to:
# - pods, events, namespaces, nodes
"Connection refused to cluster"
# Verify cluster connectivity
kubectl get nodes
# For local clusters (minikube/kind)
minikube status
kind get clusters
Development
# Clone and install
git clone https://github.com/ongjin/k8s-doctor-mcp.git
cd k8s-doctor-mcp
npm install
# Development mode
npm run dev
# Build
npm run build
# Test with Claude Code
npm run build
claude mcp add --scope project k8s-doctor-dev -- node $(pwd)/dist/index.js
Contributing
Contributions welcome! Especially:
- ๐ New error pattern detections
- ๐ Internationalization (more languages)
- ๐ Metrics integration (Prometheus, etc.)
- ๐งช Test coverage
- ๐ Documentation improvements
Roadmap
- [ ] Metrics Server integration (real-time CPU/Memory usage)
- [ ] Network policy diagnostics
- [ ] Storage/PVC troubleshooting
- [ ] Helm chart analysis
- [ ] Multi-cluster support
- [ ] Interactive debugging mode
- [ ] Export reports (PDF, HTML)
License
MIT ยฉ zerry
Acknowledgments
Built with:
- @modelcontextprotocol/sdk - Model Context Protocol
- @kubernetes/client-node - Kubernetes JavaScript Client
- Claude Code - AI-powered development
Star History
If this tool saves you debugging time, please โญ star the repo!
Author
zerry
- GitHub: @zerry
- Created for the DevOps community who are tired of kubectl hell ๐
Made with โค๏ธ for Kubernetes users drowning in logs
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
E2B
Using MCP to run code via e2b.