MCP Server
Enables natural language DevOps automation with Docker, Kubernetes, and AWS integrations, plus resume tailoring, all secured with API key authentication and monitored via Prometheus/Grafana.
README
MCP Server - DevOps Automation Platform
A comprehensive Model Context Protocol (MCP) server for DevOps automation with real Docker, Kubernetes, and AWS integrations, API authentication, Prometheus/Grafana monitoring, and LLM-powered natural language automation.
Features
-
Real DevOps Integrations
- Docker container and image management
- Kubernetes cluster operations (deployments, pods, services)
- AWS EC2, S3, ECS, Lambda, RDS management
-
Security
- API key authentication
- JWT token support
- Rate limiting
-
Monitoring
- Prometheus metrics
- Grafana dashboards
- Health checks
- Alertmanager integration
-
LLM Integration
- Ollama (local, free) - default
- OpenAI API support
- Natural language task automation
Quick Start
Option 1: Local Development
# Install dependencies
pip install -r requirements.txt
# Run the server
make dev
# or: uvicorn src.main_v2:app --reload --host 0.0.0.0 --port 8000
# Access the API docs
open http://localhost:8000/docs
Option 2: Docker Compose (with Monitoring)
# Copy environment template
cp .env.example .env
# Edit .env with your credentials
# Start all services
make up
# or: docker-compose up -d
# Access services:
# - MCP Server: http://localhost:8000
# - Prometheus: http://localhost:9090
# - Grafana: http://localhost:3000 (admin/admin)
# - Alertmanager: http://localhost:9093
API Endpoints
Public Endpoints
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Server info |
/health |
GET | Health check |
/metrics |
GET | Prometheus metrics |
/docs |
GET | API documentation |
Authentication
| Endpoint | Method | Description |
|---|---|---|
/auth/api-key |
POST | Generate API key |
/auth/token |
POST | Generate JWT token |
Docker Operations
| Endpoint | Method | Description |
|---|---|---|
/docker/containers |
GET | List containers |
/docker/containers/{action} |
POST | Start/stop/restart container |
/docker/containers/{id}/logs |
GET | Get container logs |
/docker/containers/{id}/stats |
GET | Get container stats |
/docker/images |
GET | List images |
/docker/images/pull |
POST | Pull an image |
/docker/system/info |
GET | Docker system info |
/docker/system/prune |
POST | Cleanup unused resources |
Kubernetes Operations
| Endpoint | Method | Description |
|---|---|---|
/k8s/cluster/info |
GET | Cluster info |
/k8s/nodes |
GET | List nodes |
/k8s/namespaces |
GET | List namespaces |
/k8s/pods |
GET | List pods |
/k8s/pods/{pod}/logs |
GET | Get pod logs |
/k8s/deployments |
GET | List deployments |
/k8s/deployments/scale |
POST | Scale deployment |
/k8s/deployments/restart |
POST | Restart deployment |
/k8s/deployments/rollback |
POST | Rollback deployment |
/k8s/deployments/image |
POST | Update image |
/k8s/services |
GET | List services |
AWS Operations
| Endpoint | Method | Description |
|---|---|---|
/aws/ec2/instances |
GET | List EC2 instances |
/aws/ec2/{action} |
POST | Start/stop/reboot EC2 |
/aws/s3/buckets |
GET | List S3 buckets |
/aws/s3/buckets/{bucket}/objects |
GET | List objects |
/aws/ecs/clusters |
GET | List ECS clusters |
/aws/lambda/functions |
GET | List Lambda functions |
/aws/lambda/invoke |
POST | Invoke Lambda |
/aws/rds/instances |
GET | List RDS instances |
Task Automation
| Endpoint | Method | Description |
|---|---|---|
/tasks/run |
POST | Run automation task (orchestrated/queue-friendly path, including tailor_resume) |
/resume/tailor |
POST | Tailor a base resume to a specific mission/job opening (direct synchronous path) |
/resume/tailor/upload |
POST | Tailor from uploaded PDF/CSV/TXT/MD and export as JSON/PDF/CSV/Markdown/DOCX |
/resume/exports/cleanup |
POST | Delete generated resume export file from temporary storage |
/deploy |
POST | Deploy service |
/cicd/trigger |
POST | Trigger CI/CD pipeline |
/logs |
GET | Get logs |
/alerts |
POST | Send alert |
/alerts/history |
GET | Alert history |
Resume Tailoring Example
Use this endpoint to adapt a base resume to a specific mission or job opening.
Request Body
{
"base_resume": "# John Doe\n## Experience\n- Built CI/CD pipelines with GitHub Actions and Jenkins\n- Managed Kubernetes workloads on AWS EKS\n- Improved API latency by 35% in Python services",
"job_description": "We need a DevOps engineer with Kubernetes, AWS, CI/CD, Python, observability, and production reliability experience.",
"target_role": "Senior DevOps Engineer"
}
cURL (ready to run)
curl -X POST "http://localhost:8000/resume/tailor" \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"base_resume": "# John Doe\n## Experience\n- Built CI/CD pipelines with GitHub Actions and Jenkins\n- Managed Kubernetes workloads on AWS EKS\n- Improved API latency by 35% in Python services",
"job_description": "We need a DevOps engineer with Kubernetes, AWS, CI/CD, Python, observability, and production reliability experience.",
"target_role": "Senior DevOps Engineer"
}'
cURL via Task Runner (ready to run)
curl -X POST "http://localhost:8000/tasks/run" \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"task_name": "tailor_resume",
"parameters": {
"base_resume": "# John Doe\n## Experience\n- Built CI/CD pipelines with GitHub Actions and Jenkins\n- Managed Kubernetes workloads on AWS EKS\n- Improved API latency by 35% in Python services",
"job_description": "We need a DevOps engineer with Kubernetes, AWS, CI/CD, Python, observability, and production reliability experience.",
"target_role": "Senior DevOps Engineer"
}
}'
Task runner with file inputs (PDF/CSV/TXT/MD paths on server):
curl -X POST "http://localhost:8000/tasks/run" \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"task_name": "tailor_resume",
"parameters": {
"base_resume_path": "/data/incoming/my-resume.pdf",
"job_description_path": "/data/incoming/job-offer.csv",
"target_role": "Senior Platform Engineer",
"output_format": "docx",
"docx_style": "resumeio_inspired",
"use_v6_background": true
}
}'
When output_format is pdf, csv, markdown, or docx, /tasks/run returns an output_file path in the task result.
Supported docx_style values: minimal_executive, modern_creative, online_clean, visualcv_inspired, resumeio_inspired, resumebuilder_inspired.
Optional use_v6_background toggle (default false) applies the V6 soft-blue background and left accent band to any DOCX style.
Example Response
{
"success": true,
"tailored_resume_markdown": "## Tailored Summary\nEngineer aligned to Senior DevOps Engineer, with experience evidenced in kubernetes, aws, ci, cd, python.\n\n## Highlighted Experience\n- Managed Kubernetes workloads on AWS EKS\n- Built CI/CD pipelines with GitHub Actions and Jenkins\n- Improved API latency by 35% in Python services",
"keyword_alignment": {
"matched": ["kubernetes", "aws", "python", "ci", "cd"],
"partial": ["observability"],
"missing": ["reliability"]
},
"gap_report": [
"Missing or weak evidence for: reliability"
]
}
Which path should I use?
- Use
/resume/tailorfor a direct, synchronous API call when you only need resume tailoring. - Use
/tasks/runwithtask_name: tailor_resumewhen you want to route through the task orchestration flow (queueing, unified task handling, and automation pipelines).
File Upload + File Export
This endpoint accepts PDF, CSV, TXT, or MD files and can return output as json, pdf, csv, markdown, or docx.
curl -X POST "http://localhost:8000/resume/tailor/upload" \
-H "X-API-Key: your-api-key" \
-F "base_resume_file=@./my-resume.pdf" \
-F "job_description_file=@./job-offer.csv" \
-F "target_role=Senior Platform Engineer" \
-F "output_format=docx" \
-F "docx_style=resumebuilder_inspired" \
-F "use_v6_background=true" \
--output tailored_resume.docx
CSV export example:
curl -X POST "http://localhost:8000/resume/tailor/upload" \
-H "X-API-Key: your-api-key" \
-F "base_resume_file=@./my-resume.pdf" \
-F "job_description_file=@./job-offer.txt" \
-F "output_format=csv" \
--output tailored_resume.csv
Cleanup generated export file (using output_file returned by task/upload flows):
curl -X POST "http://localhost:8000/resume/exports/cleanup" \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"output_file": "/tmp/tmpabcd1234.pdf"
}'
Automatic TTL cleanup is enabled for generated exports:
- Job name:
resume_export_ttl_cleanup - Default schedule: every hour (
0 * * * *) - Default TTL: 24 hours
- Only generated files with prefix
resume_tailor_and extensions.pdf,.csv,.mdare deleted
Configuration via environment variables:
MCP_RESUME_EXPORT_TTL_HOURS=24
MCP_RESUME_EXPORT_CLEANUP_CRON="0 * * * *"
Authentication
Using API Key
# Generate an API key
curl -X POST "http://localhost:8000/auth/api-key?name=mykey"
# Use the key in requests
curl -H "X-API-Key: your-api-key" http://localhost:8000/docker/containers
Using JWT Token
# Get a token
curl -X POST "http://localhost:8000/auth/token?user_id=admin"
# Use the token
curl -H "Authorization: Bearer your-token" http://localhost:8000/docker/containers
LLM Integration (Natural Language)
Ollama (Default - Local & Free)
# Start Ollama
ollama serve
ollama pull mistral
# Run LLM integration
python src/llm_integration.py
OpenAI
export OPENAI_API_KEY="sk-..."
python src/llm_integration.py --openai
Example Commands
š¤ You: Check the health of our infrastructure
š¤ Assistant: The infrastructure is healthy. All systems operational.
š¤ You: List all running Docker containers
š¤ Assistant: Found 3 running containers: nginx, redis, postgres
š¤ You: Scale the web deployment to 5 replicas
š¤ Assistant: Scaled deployment "web" to 5 replicas successfully.
š¤ You: Show me the last 50 lines of logs from the api pod
š¤ Assistant: Here are the recent logs...
Environment Variables
# Authentication
MCP_API_KEY=your-api-key
MCP_JWT_SECRET=your-jwt-secret
# AWS
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=your-key
AWS_SECRET_ACCESS_KEY=your-secret
# OpenAI (optional)
OPENAI_API_KEY=sk-...
# Grafana
GRAFANA_USER=admin
GRAFANA_PASSWORD=admin
Project Structure
mcp/
āāā src/
ā āāā main_v2.py # Main server with all features
ā āāā main.py # Basic server (legacy)
ā āāā auth.py # Authentication module
ā āāā monitoring.py # Prometheus metrics & health checks
ā āāā llm_integration.py # Ollama/OpenAI integration
ā āāā devops/
ā āāā docker_ops.py # Docker operations
ā āāā kubernetes_ops.py # Kubernetes operations
ā āāā aws_ops.py # AWS operations
āāā monitoring/
ā āāā prometheus.yml # Prometheus config
ā āāā alertmanager.yml # Alertmanager config
ā āāā grafana/
ā āāā provisioning/ # Grafana provisioning
ā āāā dashboards/ # Grafana dashboards
āāā docs/
ā āāā llm_integration.md # LLM guide
ā āāā devops_endpoints.md # Endpoint reference
āāā .github/workflows/
ā āāā ci.yml # CI/CD pipeline
āāā docker-compose.yml # Full stack deployment
āāā Dockerfile # Container build
āāā Makefile # Convenience commands
āāā requirements.txt # Python dependencies
āāā .env.example # Environment template
Make Commands
make help # Show all commands
make install # Install dependencies
make dev # Run development server
make run # Run production server
make up # Start all services (docker-compose)
make down # Stop all services
make logs # View logs
make test # Run health check
make api-key # Generate API key
make llm # Run LLM integration
API Docs: http://localhost:8000/docs Health: http://localhost:8000/health Metrics: http://localhost:8000/metrics Prometheus: http://localhost:9090 (when using docker-compose) Grafana: http://localhost:3000 (admin/admin)
The server is running at http://localhost:8000 with all features active!
export MCP_API_KEY=$(curl -s -X POST "http://localhost:8000/auth/api-key?name=llm" | python -c "import sys,json; print(json.load(sys.stdin)['api_key'])") && python src/llm_integration.py
Stopping the Server
# If running in foreground (started without &)
# Press Ctrl+C
# If running in background (started with &)
pkill -f "uvicorn.*8000"
# Or find the PID and kill it
lsof -i :8000
kill <PID>
# If using Docker Compose
make down
# or: docker-compose down
Troubleshooting
Common Issues and Solutions
1. Server Not Responding / Hanging
Symptoms:
curl http://localhost:8000/healthhangs indefinitely- API requests don't return
- Server appears to be running but unresponsive
Solution:
# Kill any stuck uvicorn processes
pkill -f "uvicorn.*8000"
# Restart the server
source /path/to/venv/bin/activate
uvicorn src.main_v2:app --host 0.0.0.0 --port 8000
2. AttributeError: 'ScheduledJobRepository' object has no attribute 'list_all'
Symptoms:
- Server fails to start
- Error in startup event when loading scheduled jobs
Solution:
Ensure src/database.py has the list_all() method in ScheduledJobRepository:
class ScheduledJobRepository:
# ... other methods ...
def list_all(self) -> List[Dict]:
"""List all scheduled jobs (alias for list method)."""
return self.list()
3. RuntimeWarning: coroutine 'Scheduler.start' was never awaited
Symptoms:
- Warning on server startup
- Scheduler not running properly
- Scheduled jobs not executing
Solution:
In src/main_v2.py, ensure async functions are awaited in startup/shutdown:
@app.on_event("startup")
async def startup_event():
await global_scheduler.start() # Must use await
# ... rest of startup
@app.on_event("shutdown")
async def shutdown_event():
await global_scheduler.stop() # Must use await
4. LLM Integration Timeout / Long Response Times
Symptoms:
- LLM chat hangs or times out
- Very slow responses from Ollama
httpx.ReadTimeouterrors
Solution:
-
Ollama Performance: Ollama on CPU can be slow. Consider:
# Use a smaller model ollama pull phi # Or use GPU if available -
Timeout Configuration: The timeout is set to 300s in
llm_integration.py. Adjust if needed:timeout=httpx.Timeout(300.0) # Increase for slower systems -
Log Truncation: Large log outputs are automatically truncated to prevent token overflow.
5. Port Already in Use
Symptoms:
Address already in useerror on startup- Can't bind to port 8000
Solution:
# Find process using port 8000
lsof -i :8000
# Kill it
kill -9 <PID>
# or
pkill -f "uvicorn.*8000"
6. Database Errors
Symptoms:
sqlite3.OperationalError: database is locked- Data not persisting
Solution:
# Check if multiple processes are accessing the DB
lsof mcp_data.db
# If locked, stop all server instances first
pkill -f "uvicorn"
# Then restart
uvicorn src.main_v2:app --host 0.0.0.0 --port 8000
7. Missing MCP_API_KEY Environment Variable
Symptoms:
- LLM integration can't connect to MCP server
- 401 Unauthorized errors
Solution:
# Generate and export API key in one command
export MCP_API_KEY=$(curl -s -X POST "http://localhost:8000/auth/api-key?name=llm" | python -c "import sys,json; print(json.load(sys.stdin)['api_key'])")
# Verify it's set
echo $MCP_API_KEY
# Then run LLM integration
python src/llm_integration.py
Health Check Commands
# Check if server is running
curl http://localhost:8000/health
# Check server info
curl http://localhost:8000/
# Check metrics
curl http://localhost:8000/metrics
# Test authentication
API_KEY=$(curl -s -X POST "http://localhost:8000/auth/api-key?name=test" | python -c "import sys,json; print(json.load(sys.stdin)['api_key'])")
curl -H "X-API-Key: $API_KEY" http://localhost:8000/docker/containers
Logs Location
- Server logs: Console output from uvicorn
- Audit logs: SQLite database (
mcp_data.dbāaudit_logstable) - Docker logs:
docker-compose logs(when using compose)
License
MIT License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.