Skip to content

feat(kubescape): route runtime threat alerts off stdout → Coroot-native PromQL alert → Slack #2449

Description

@devantler

🤖 Generated by the Daily AI Assistant

Part of #2447.

Problem

Kubescape runtime threat detection is enabled and working (node-agent healthy, 288 ApplicationProfiles learned) but its alerts go nowhere durable. The node-agent config.json shows stdoutExporter: true with alertManagerExporterUrls: [], prometheusExporterEnabled: false, syslogExporterURL: "", httpExporterConfig: null, and malwareDetectionEnabled: false. So every runtime alert is written to stdout and vanishes — no route to Coroot, Slack, or the daily engineer. There are 0 RuntimeRuleAlertBindings and 0 alerts routed anywhere in 24h.

Proposed direction (Coroot-native, minimal-custom — settled with the maintainer)

All declarative, no new infrastructure, both ends native to the existing stack:

  1. kubescape HelmRelease (k8s/bases/infrastructure/controllers/kubescape/helm-release.yaml): enable the node-agent Prometheus exporter (nodeAgent.config.prometheusExporterEnabled: true), exposing node_agent_alert_counter{rule_id,…} on :8080/metrics.
  2. Expose it to Coroot via the pod annotations Coroot's cluster-agent scrapes (coroot.com/scrape-metrics: "true", coroot.com/metrics-port: "8080") — Coroot uses annotation-based service discovery, not ServiceMonitor.
  3. Coroot CR (.../coroot/patches/coroot-patch.yaml): add a custom PromQL alertingRules[] entry on increase(node_agent_alert_counter[…]) > 0, routed to Slack via the existing notificationIntegrations webhook (scoped so it doesn't reopen the muted-alerts fatigue).

Trade-off (accepted): the Prometheus exporter emits counters (rule_id + pod + namespace), so the Slack message is "kubescape rule X fired on pod Y — click through"; the full incident payload stays in the node-agent logs Coroot already ingests. The non-native alternative (a dedicated Alertmanager for the richer AlertManager-exporter payload) is deliberately rejected.

Validate live before committing (both doc-uncertain): (a) that cluster-agent-scraped custom series are queryable in Coroot PromQL alert rules; (b) the exact Coroot CR notificationIntegrations / alertingRules field names against the live CRD (kubectl explain coroot.spec…). Consider whether to also enable malwareDetectionEnabled (resource cost vs. coverage) as a follow-up.

Rough size

M.

Acceptance criteria

  • Runtime detections surface as a Coroot alert and reach Slack (counter-level, with click-through), fully declarative in Git.
  • No stdout-only dead end; no new standalone component.
  • The rule is scoped so it does not reintroduce Slack alert fatigue.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    Status
    ✅ Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions