MCP Server ELK

MCP Server ELK

A production-oriented, read-only MCP server for secure ELK stack analysis, compatible with OpenClaw.

Category
Visit Server

README

MCP Server ELK (Read-Only, Production-Oriented)

MCP Server berbasis Python 3.12 + FastAPI untuk analisa ELK Stack secara aman (read-only) dan kompatibel OpenClaw.

1) Arsitektur Mermaid

flowchart TB
    OC[OpenClaw] -->|X-API-Key| API[FastAPI MCP Endpoint]
    API --> SEC[Security Layer\nAPI Key Auth + RBAC + Rate Limit]
    SEC --> REG[Tool Registry\nDiscovery + Execute + Schema Validation]
    REG --> TOOLS[MCP Tools\nELK Cluster/Logs/Kibana/Logstash/Filebeat/APM/Recommendation]
    TOOLS --> CTRL[Controllers]
    CTRL --> SRV[Services\nBusiness Logic]
    SRV --> REPO[Repositories\nRead-Only Data Access]
    REPO --> ES[(Elasticsearch)]
    REPO --> KB[(Kibana API)]
    REPO --> LS[(Logstash Monitoring API)]

    API --> AUDIT[Structured JSON Audit Log]
    API --> METRICS[Prometheus Metrics /metrics]

    classDef safe fill:#e7f7ef,stroke:#1f8f5f,stroke-width:1px;
    class SEC,REG,TOOLS,AUDIT,METRICS safe;

2) Struktur Folder

mcpserver-elk/
├── app/
│   ├── main.py
│   ├── api/routes/
│   │   ├── health_controller.py
│   │   ├── mcp_controller.py
│   │   └── metrics_controller.py
│   ├── core/
│   │   ├── config.py
│   │   ├── exceptions.py
│   │   ├── logging.py
│   │   ├── masking.py
│   │   ├── metrics.py
│   │   ├── rate_limit.py
│   │   └── security.py
│   ├── mcp/
│   │   ├── registry.py
│   │   ├── schemas.py
│   │   ├── server.py
│   │   └── tool_base.py
│   ├── models/
│   │   ├── common_models.py
│   │   ├── elk_models.py
│   │   ├── mcp_models.py
│   │   └── schemas.py
│   ├── controllers/
│   │   ├── elk_controller.py
│   │   └── recommendation_controller.py
│   ├── services/
│   │   ├── apm_service.py
│   │   ├── elk_cluster_service.py
│   │   ├── elk_logs_service.py
│   │   ├── filebeat_service.py
│   │   ├── kibana_service.py
│   │   ├── logstash_service.py
│   │   └── recommendation_service.py
│   ├── repositories/
│   │   ├── apm_repository.py
│   │   ├── elasticsearch_repository.py
│   │   ├── kibana_repository.py
│   │   └── logstash_repository.py
│   ├── clients/
│   │   ├── elasticsearch_client.py
│   │   └── http_client.py
│   ├── tools/
│   │   ├── elk_apm.py
│   │   ├── elk_cluster.py
│   │   ├── elk_cluster_tools.py
│   │   ├── elk_filebeat.py
│   │   ├── elk_kibana.py
│   │   ├── elk_logs.py
│   │   ├── elk_logs_tools.py
│   │   ├── elk_logstash.py
│   │   ├── filebeat_tools.py
│   │   ├── kibana_tools.py
│   │   ├── logstash_tools.py
│   │   ├── recommendation.py
│   │   └── recommendation_tools.py
│   └── utils/
│       ├── query_builder.py
│       ├── response_limiter.py
│       └── time_range.py
├── tests/
│   ├── conftest.py
│   ├── integration/
│   └── unit/
├── k8s/
│   ├── configmap.yaml
│   ├── deployment.yaml
│   ├── hpa.yaml
│   ├── ingress.yaml
│   ├── namespace.yaml
│   ├── networkpolicy.yaml
│   ├── pdb.yaml
│   ├── secret.yaml
│   ├── service.yaml
│   └── serviceaccount.yaml
├── .env.example
├── Dockerfile
├── docker-compose.yml
├── pyproject.toml
├── requirements.txt
└── README.md

3) Fitur Security & Safety

  • Read-only by default, tidak ada endpoint write/delete/restart.
  • API key auth via X-API-Key.
  • RBAC per tool (elk_viewer, elk_operator, elk_admin_readonly).
  • Allowlist index pattern (ALLOWED_INDEX_PATTERNS).
  • Denylist dangerous query (script, painless, delete_by_query, dsb).
  • Timeout + retry terbatas untuk ES/HTTP API.
  • Rate limit per API key.
  • Audit log sebelum/sesudah eksekusi tool.
  • Structured JSON logging.
  • Masking secret (password/token/api_key/authorization/cookie).
  • Response size limiter (MAX_RESPONSE_BYTES).
  • TLS verification aktif default.

4) Endpoint

  • GET /healthz
  • GET /readyz
  • GET /metrics
  • GET /metrics/json
  • GET /mcp/tools
  • POST /mcp/execute
  • POST /mcp (JSON-RPC)

5) Tools MCP Wajib

  • elk_cluster_health
  • elk_nodes_stats
  • elk_indices_summary
  • elk_search_logs
  • elk_detect_errors
  • elk_logstash_health
  • elk_filebeat_status
  • elk_kibana_status
  • elk_apm_summary
  • elk_recommend_fix

6) Konfigurasi Environment

Gunakan file .env.example:

cp .env.example .env

7) Jalankan Lokal

python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8080 --reload

8) Docker

Build & run:

docker build -t mcpserver-elk:1.0.0 .
docker run --rm -p 8080:8080 --env-file .env mcpserver-elk:1.0.0

Docker Compose lab (dengan sample ELK):

docker compose --profile lab up -d --build

9) Kubernetes Deploy

kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/serviceaccount.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/ingress.yaml
kubectl apply -f k8s/networkpolicy.yaml
kubectl apply -f k8s/hpa.yaml
kubectl apply -f k8s/pdb.yaml

10) OpenClaw Integration

Contoh konfigurasi OpenClaw (contoh JSON):

{
  "mcpServers": [
    {
      "name": "elk-prod-readonly",
      "url": "https://mcp-elk.example.com/mcp",
      "headers": {
        "X-API-Key": "ops-key"
      },
      "timeoutSeconds": 30,
      "tools": [
        "elk_cluster_health",
        "elk_nodes_stats",
        "elk_indices_summary",
        "elk_search_logs",
        "elk_detect_errors",
        "elk_logstash_health",
        "elk_filebeat_status",
        "elk_kibana_status",
        "elk_apm_summary",
        "elk_recommend_fix"
      ]
    }
  ]
}

Contoh prompt OpenClaw:

Gunakan MCP Server ELK untuk cek cluster health Elasticsearch, cari error log service payment-service dalam 1 jam terakhir, kelompokkan error terbanyak, analisa root cause, dan berikan rekomendasi perbaikan yang aman.

11) Contoh Request/Response MCP

List tools:

curl -sS -H "X-API-Key: dev-key" http://localhost:8080/mcp/tools | jq

Execute elk_cluster_health:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_cluster_health","input":{"include_shards":true}}' | jq

Execute elk_nodes_stats:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_nodes_stats","input":{"include_thread_pool":false}}' | jq

Execute elk_indices_summary:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_indices_summary","input":{"index_pattern":"logs-*","sort_by":"size","limit":20}}' | jq

Execute elk_search_logs:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_search_logs","input":{"index_pattern":"logs-*","start_time":"now-1h","end_time":"now","service_name":"payment-service","log_level":"error","limit":20}}' | jq

Execute elk_detect_errors:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_detect_errors","input":{"index_pattern":"logs-*","start_time":"now-1h","end_time":"now","service_name":"payment-service","top_n":10}}' | jq

Execute elk_logstash_health:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: ops-key" \
  -d '{"tool_name":"elk_logstash_health","input":{"pipeline_id":"main"}}' | jq

Execute elk_filebeat_status:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_filebeat_status","input":{"index_pattern":"filebeat-*","max_delay_minutes":5}}' | jq

Execute elk_kibana_status:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_kibana_status","input":{"include_plugins":true}}' | jq

Execute elk_apm_summary:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_apm_summary","input":{"service_name":"payment-service","start_time":"now-1h","end_time":"now"}}' | jq

Execute elk_recommend_fix:

curl -sS -X POST http://localhost:8080/mcp/execute \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"tool_name":"elk_recommend_fix","input":{"findings":{"cluster_health":{"status":"yellow","metrics":{"unassigned_shards":2}}}}}' | jq

Contoh JSON-RPC:

curl -sS -X POST http://localhost:8080/mcp \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dev-key" \
  -d '{"jsonrpc":"2.0","id":"1","method":"mcp.list_tools","params":{}}' | jq

Contoh response elk_cluster_health:

{
  "ok": true,
  "tool_name": "elk_cluster_health",
  "data": {
    "status": "yellow",
    "summary": "Cluster prod-elk status=yellow, nodes=6, unassigned_shards=2",
    "metrics": {
      "cluster_name": "prod-elk",
      "number_of_nodes": 6,
      "active_shards": 1240,
      "relocating_shards": 0,
      "initializing_shards": 0,
      "unassigned_shards": 2
    },
    "recommendation": [
      "Periksa replica shard yang belum ter-assign.",
      "Jalankan analisa allocation explain untuk shard unassigned (read-only)."
    ]
  }
}

Contoh response elk_detect_errors:

{
  "ok": true,
  "tool_name": "elk_detect_errors",
  "data": {
    "total_errors": 182,
    "errors_by_service": [
      {"service": "payment-service", "count": 145}
    ],
    "top_error_messages": [
      {"message": "timeout to fraud-service", "count": 72}
    ],
    "samples": [
      {"timestamp": "2026-04-26T01:25:00Z", "service": "payment-service", "log_level": "error", "message": "timeout to fraud-service", "trace_id": "abc"}
    ],
    "recommendation": [
      "Validasi error paling sering dengan trace_id untuk korelasi lintas service."
    ]
  }
}

12) Testing

Run semua test:

pytest -q

Script Uji Coba Cepat

Seed data simulasi:

chmod +x scripts/*.sh
./scripts/seed_data.sh

Smoke test end-to-end:

./scripts/smoke_test.sh

Contoh dengan custom endpoint/key:

MCP_BASE_URL=http://localhost:8080 \
MCP_VIEWER_KEY=dev-key \
MCP_OPERATOR_KEY=ops-key \
./scripts/smoke_test.sh

Run lint/type:

ruff check .
mypy app

Test yang sudah disediakan

  • Unit test tool registry.
  • Unit test RBAC.
  • Unit test secret masking.
  • Unit test Elasticsearch query builder.
  • Integration test MCP + mock Elasticsearch.
  • Integration test MCP + mock Kibana.
  • Integration test MCP + mock Logstash.

Smoke Test Checklist

  • GET /healthz mengembalikan 200.
  • GET /readyz status ready saat ES up.
  • GET /mcp/tools mengembalikan daftar tool.
  • POST /mcp/execute dengan key valid berhasil.
  • POST /mcp/execute dengan key invalid mengembalikan 401.
  • Tool elk_logstash_health dengan role viewer ditolak (403).
  • Query berbahaya ditolak.
  • Response besar ditolak (413) bila melewati limit.
  • /metrics dapat di-scrape Prometheus.

13) Troubleshooting Guide

Masalah Gejala Kemungkinan Penyebab Command Pengecekan Solusi Aman
OpenClaw tidak bisa connect MCP Server Timeout/connection refused DNS/Ingress/Service salah kubectl get ingress -n mcpserver-elk Perbaiki host/path Ingress dan Service port
401 API key invalid Response authentication_failed X-API-Key salah/tidak dikirim curl -i http://host/mcp/tools Update key di OpenClaw, sinkronkan Secret
403 RBAC denied Response permission_denied Role tidak punya akses tool curl ... /mcp/execute Gunakan API key role tepat atau sesuaikan policy
Elasticsearch TLS error CERTIFICATE_VERIFY_FAILED CA cert salah/expired openssl s_client -connect es:9200 -showcerts Mount CA valid, aktifkan verify TLS
Elasticsearch authentication failed 401 dari ES User/password salah curl -u user:pass https://es:9200/_cluster/health Rotasi secret kredensial readonly
index pattern denied 403 pattern not allowed Pattern di luar allowlist cek ALLOWED_INDEX_PATTERNS Tambah pattern aman di allowlist
query timeout 504 timeout Query berat / cluster sibuk GET /_tasks?detailed=true&actions=*search Kecilkan range waktu, turunkan limit, optimasi index
response too large 413 response_too_large Hasil terlalu besar cek MAX_RESPONSE_BYTES Kurangi limit/filter, naikkan limit secara terukur
Kibana 401 Tool kibana gagal auth User Kibana salah curl -u user:pass https://kibana/api/status -H 'kbn-xsrf:true' Pakai akun readonly Kibana valid
Kibana status unavailable status degraded/down Kibana/ES backend issue curl https://kibana/api/status Cek koneksi Kibana -> Elasticsearch
Logstash monitoring API mati tool logstash error 502 Port 9600 down/firewall curl http://logstash:9600/_node/stats Aktifkan monitoring API / perbaiki network
Filebeat delay ingestion tinggi delayed_hosts meningkat Agent terputus/backpressure GET filebeat-*/_search Cek output beat, network, queue Logstash
cluster yellow status yellow Replica belum teralokasi GET /_cluster/health + GET /_cat/shards?v Tambah node/disk, cek allocation rule
cluster red status red Primary shard unassigned GET /_cluster/allocation/explain Prioritaskan recovery shard primary
shard unassigned unassigned_shards > 0 Disk watermark/node down/filter allocation GET /_cluster/allocation/explain Bebaskan disk, perbaiki node, cek awareness setting
disk watermark exceeded shard tidak bisa allocate Disk penuh > watermark GET /_cat/allocation?v Tambah kapasitas, ILM cleanup, rebalance
JVM heap tinggi heap > 80% Query/aggs berat, shard terlalu banyak GET /_nodes/stats/jvm Optimasi query, kurangi shard, tuning heap
Logstash pipeline stuck events in naik, out stagnan Output blocked / queue penuh GET http://logstash:9600/_node/stats Cek output plugin, perbesar worker/queue dengan aman

14) Production Checklist

  • [ ] Elasticsearch user sudah read-only.
  • [ ] TLS certificate valid dan verify aktif.
  • [ ] API key dirotasi berkala.
  • [ ] RBAC per tool aktif.
  • [ ] Audit log aktif (JSON).
  • [ ] Prometheus scrape /metrics aktif.
  • [ ] Dashboard Grafana tersedia.
  • [ ] NetworkPolicy aktif.
  • [ ] Resource request/limit aktif.
  • [ ] HPA aktif.
  • [ ] Secret tidak muncul di log.
  • [ ] Dangerous query ditolak.
  • [ ] Backup config/manifest tersedia.
  • [ ] CI/CD security scan aktif.

15) Contoh Bamboo Pipeline (CI/CD)

---
version: 2
plan:
  project-key: MCP
  key: ELK
  name: mcpserver-elk

stages:
  - Build & Test:
      jobs:
        - lint-test-build

jobs:
  - lint-test-build:
      docker:
        image: python:3.12-slim
      tasks:
        - script: |
            python -m pip install --upgrade pip
            pip install -r requirements.txt
            ruff check .
            mypy app
            pytest -q
        - script: |
            docker build -t new-nexus.bri.co.id/mcp/dev/mcpserver-elk:1.0.0 .
        - script: |
            trivy image --exit-code 1 new-nexus.bri.co.id/mcp/dev/mcpserver-elk:1.0.0
        - script: |
            docker login new-nexus.bri.co.id -u "$NEXUS_USER" -p "$NEXUS_PASS"
            docker push new-nexus.bri.co.id/mcp/dev/mcpserver-elk:1.0.0
        - script: |
            kubectl apply -f k8s/
            kubectl rollout status deploy/mcpserver-elk -n mcpserver-elk
        - script: |
            curl -fsS https://mcp-elk.example.com/healthz
            curl -fsS https://mcp-elk.example.com/readyz

16) Catatan Enterprise

  • Gunakan HTTPS end-to-end (Ingress TLS + upstream TLS).
  • Simpan secret di secret manager (Vault/KMS/ExternalSecret), bukan plaintext di repo.
  • Gunakan image signing + SBOM untuk compliance.
  • Pastikan user Elasticsearch memiliki role read-only (monitor, read, tanpa write/manage).

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured