feat(kubescape): route runtime-detection alerts to Headlamp, Slack, and Coroot#2445
feat(kubescape): route runtime-detection alerts to Headlamp, Slack, and Coroot#2445devantler wants to merge 2 commits into
Conversation
…nd Coroot
The Headlamp Kubescape plugin's "Runtime Detection > Alerts" tab warned
"Alertmanager URL is not configured" because that tab reads ONLY from a
Prometheus Alertmanager (GET /api/v2/alerts), and the Coroot migration removed
Alertmanager from the cluster — so there was no source and the node-agent
exported its runtime alerts nowhere.
Reintroduce a single minimal Alertmanager (prometheus-community chart 1.40.1,
~10m/32Mi, emptyDir, hardened securityContext) scoped to the kubescape namespace,
prod-only — NOT a re-adoption of the Prometheus stack. Wire the node-agent to fan
each alert out to all three destinations:
* Headlamp — nodeAgent.config.alertManagerExporterUrls -> the Alertmanager,
which the plugin queries. (One manual per-user step remains: set
"kubescape/alertmanager:9093" in the plugin settings; the address is
browser-local, not declaratively seedable — headlamp#3979.)
* Slack — the Alertmanager slack_configs receiver -> the shared
${alertmanager_webhook_url} incoming-webhook (same channel as Coroot/Flux).
* Coroot — nodeAgent.config.stdoutExporter (default) -> Coroot's eBPF log
capture surfaces the alert in its Logs view (Coroot CE has no alert receiver).
Adds a CiliumNetworkPolicy allowing the Headlamp API-server Service-proxy to
reach :9093 and the Alertmanager to reach hooks.slack.com; documents the design
and the manual Headlamp step in docs/dr/alerting.md.
Validated: ksail --config ksail.prod.yaml workload validate (485 files),
kustomize build of the hetzner controllers overlay, and the naming CI check.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📜 Recent review details⏰ Context from checks skipped due to timeout. (2)
📝 WalkthroughWalkthroughThis PR adds a Kubescape-scoped Alertmanager deployment, network policy, node-agent exporter wiring, and documentation for runtime-detection alert fan-out to Slack, Coroot, and Headlamp. ChangesKubescape Alertmanager deployment and wiring
Estimated code review effort: 3 (Moderate) | ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
k8s/providers/hetzner/infrastructure/controllers/alertmanager/secret.yaml (1)
17-18: 🧹 Nitpick | 🔵 TrivialVerify prod substitution and consider surfacing delivery failures.
Syntax and key-to-path coupling verified correct. One operational note: if
alertmanager_webhook_urlis ever missing/renamed in the prod variables Secret, this silently falls back to the.invalidplaceholder rather than failing reconciliation, so Slack delivery would quietly break. Alertmanager exposesalertmanager_notifications_failed_total; consider ensuring it's scraped/alerted on (e.g., via Coroot) so a bad substitution doesn't go unnoticed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@k8s/providers/hetzner/infrastructure/controllers/alertmanager/secret.yaml` around lines 17 - 18, The Alertmanager secret’s `${alertmanager_webhook_url:=...}` fallback can hide a missing or renamed prod variable by silently using the placeholder URL, so check the `slack-webhook-url` substitution path in the secret generation flow and make it fail or surface an obvious configuration error when the variable is absent. Also ensure `alertmanager_notifications_failed_total` is being scraped and alerted on (for example through Coroot) so broken Slack delivery is detected quickly.k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yaml (1)
37-86: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick winConsider disabling the default-mounted service account token.
The pod's securityContext is hardened extensively (drop ALL, readOnlyRootFilesystem, non-root, seccomp), but
automountServiceAccountTokenis left at the chart's default (true), even though this Alertmanager instance has no need to call the Kubernetes API.🔒 Suggested addition
fullnameOverride: alertmanager replicaCount: 1 + automountServiceAccountToken: false🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yaml` around lines 37 - 86, Disable the default-mounted service account token in the Alertmanager Helm values by setting automountServiceAccountToken to false alongside the existing podSecurityContext and securityContext hardening in the alertmanager Helm release values. This Alertmanager instance does not need Kubernetes API access, so add the setting in the same values block that defines fullnameOverride, persistence, and extraSecretMounts to keep the pod least-privileged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In
`@k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yaml`:
- Around line 37-86: Disable the default-mounted service account token in the
Alertmanager Helm values by setting automountServiceAccountToken to false
alongside the existing podSecurityContext and securityContext hardening in the
alertmanager Helm release values. This Alertmanager instance does not need
Kubernetes API access, so add the setting in the same values block that defines
fullnameOverride, persistence, and extraSecretMounts to keep the pod
least-privileged.
In `@k8s/providers/hetzner/infrastructure/controllers/alertmanager/secret.yaml`:
- Around line 17-18: The Alertmanager secret’s
`${alertmanager_webhook_url:=...}` fallback can hide a missing or renamed prod
variable by silently using the placeholder URL, so check the `slack-webhook-url`
substitution path in the secret generation flow and make it fail or surface an
obvious configuration error when the variable is absent. Also ensure
`alertmanager_notifications_failed_total` is being scraped and alerted on (for
example through Coroot) so broken Slack delivery is detected quickly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 5261491e-8f47-45cc-badd-4a6d7130a7f0
📒 Files selected for processing (8)
docs/dr/alerting.mdk8s/providers/hetzner/infrastructure/controllers/alertmanager/cilium-network-policy.yamlk8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yamlk8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-repository.yamlk8s/providers/hetzner/infrastructure/controllers/alertmanager/kustomization.yamlk8s/providers/hetzner/infrastructure/controllers/alertmanager/secret.yamlk8s/providers/hetzner/infrastructure/controllers/kubescape/patches/helm-release-patch.yamlk8s/providers/hetzner/infrastructure/controllers/kustomization.yaml
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{yaml,yml}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{yaml,yml}: Use Kustomize overlays rather than editing base resources directly;k8s/bases/is immutable from overlays and changes should be made withpatches:in provider or cluster overlays.
Keep manifest changes small and use YAML/schema validation before submitting a manifest PR; for files with cluster context, preferksail workload validate/kubectl kustomize/kubectl apply --dry-run=clientas appropriate.
Files:
k8s/providers/hetzner/infrastructure/controllers/alertmanager/kustomization.yamlk8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-repository.yamlk8s/providers/hetzner/infrastructure/controllers/alertmanager/secret.yamlk8s/providers/hetzner/infrastructure/controllers/kubescape/patches/helm-release-patch.yamlk8s/providers/hetzner/infrastructure/controllers/kustomization.yamlk8s/providers/hetzner/infrastructure/controllers/alertmanager/cilium-network-policy.yamlk8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yaml
k8s/**
📄 CodeRabbit inference engine (AGENTS.md)
k8s/**: Respect Flux dependency order:bootstrap→infrastructure-controllers→infrastructure→apps, with the prod-onlyinfrastructure-overprovisioninglayer hanging offinfrastructurewithout gatingapps.
Follow the hierarchical Kustomization flow: base configurations ink8s/bases/feed provider overlays ink8s/providers/, which feed cluster overlays ink8s/clusters/.
Files:
k8s/providers/hetzner/infrastructure/controllers/alertmanager/kustomization.yamlk8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-repository.yamlk8s/providers/hetzner/infrastructure/controllers/alertmanager/secret.yamlk8s/providers/hetzner/infrastructure/controllers/kubescape/patches/helm-release-patch.yamlk8s/providers/hetzner/infrastructure/controllers/kustomization.yamlk8s/providers/hetzner/infrastructure/controllers/alertmanager/cilium-network-policy.yamlk8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yaml
🧠 Learnings (1)
📚 Learning: 2026-07-01T21:13:36.950Z
Learnt from: devantler
Repo: devantler-tech/platform PR: 2359
File: k8s/bases/apps/actual-budget/helm-release.yaml:62-111
Timestamp: 2026-07-01T21:13:36.950Z
Learning: When reviewing Kustomize/Helm YAML in this repo, keep the base vs provider overlay split: `k8s/bases/apps/**` and `k8s/bases/infrastructure/**` should contain each app’s full, environment-agnostic configuration (including base-level postRenderer Kustomize patches such as deployment strategy, topology spread, probes, and env injection). `k8s/providers/{docker,hetzner}/**` should only add small provider-specific deltas (e.g., `interval`, `persistence.size`) via patch files (like `k8s/providers/<provider>/apps/<app>/patches/helm-release-patch.yaml`). If configuration is identical across providers (e.g., OIDC/OAuth env vars where `${domain}` is resolved per cluster via envsubst), it belongs in the base and must not be duplicated into provider overlays.
Applied to files:
k8s/providers/hetzner/infrastructure/controllers/alertmanager/kustomization.yamlk8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-repository.yamlk8s/providers/hetzner/infrastructure/controllers/alertmanager/secret.yamlk8s/providers/hetzner/infrastructure/controllers/kubescape/patches/helm-release-patch.yamlk8s/providers/hetzner/infrastructure/controllers/kustomization.yamlk8s/providers/hetzner/infrastructure/controllers/alertmanager/cilium-network-policy.yamlk8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yaml
🪛 markdownlint-cli2 (0.22.1)
docs/dr/alerting.md
[warning] 122-122: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (8)
docs/dr/alerting.md (2)
122-124: 📐 Maintainability & Code Quality | ⚡ Quick winAdd a language tag to the example fence.
Line 122 trips MD040. Mark the block as
text(orconsole) so docs lint cleanly.Proposed fix
-``` +```text kubescape/alertmanager:9093</details> <!-- cr-comment:v1:47221755df8546f9c0a84d36 --> _Source: Linters/SAST tools_ --- `126-128`: _🎯 Functional Correctness_ | _⚡ Quick win_ **Verify the proxy-RBAC note.** This section says the plugin reads via the API-server service proxy, but the `get/create` permission claim is specific enough that it should be confirmed against the actual RBAC rule before publishing. If the binding only grants `get` on `services/proxy`, this will mislead operators. <!-- cr-comment:v1:bc8cc05c86b2145ae12415b6 --> </blockquote></details> <details> <summary>k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-repository.yaml (1)</summary><blockquote> `1-10`: LGTM! <!-- cr-comment:v1:a01ce1910a40def551e3a146 --> </blockquote></details> <details> <summary>k8s/providers/hetzner/infrastructure/controllers/alertmanager/helm-release.yaml (1)</summary><blockquote> `1-118`: LGTM! Chart version, `extraSecretMounts` field, and emptyDir-on-disabled-persistence behavior all verified against the upstream `prometheus-community/alertmanager` chart. <!-- cr-comment:v1:07cf03233f46f8811cb08f08 --> </blockquote></details> <details> <summary>k8s/providers/hetzner/infrastructure/controllers/alertmanager/kustomization.yaml (1)</summary><blockquote> `1-9`: LGTM! <!-- cr-comment:v1:dc581c3ba6db26d02d702759 --> </blockquote></details> <details> <summary>k8s/providers/hetzner/infrastructure/controllers/alertmanager/cilium-network-policy.yaml (1)</summary><blockquote> `18-31`: _🩺 Stability & Availability_ **Cross-file dependency is already covered** `allow-kubescape` already allows intra-namespace traffic and DNS egress for every `kubescape` pod, so Alertmanager doesn’t need additional rules for the node-agent path or `hooks.slack.com` resolution. > Likely an incorrect or invalid review comment. <!-- cr-comment:v1:73c2a31054187e2d1e4a57d2 --> </blockquote></details> <details> <summary>k8s/providers/hetzner/infrastructure/controllers/kubescape/patches/helm-release-patch.yaml (1)</summary><blockquote> `1-34`: LGTM! <!-- cr-comment:v1:3c7fb0dd0be87c1256b50abb --> </blockquote></details> <details> <summary>k8s/providers/hetzner/infrastructure/controllers/kustomization.yaml (1)</summary><blockquote> `11-17`: LGTM! Also applies to: 80-83 <!-- cr-comment:v1:b14629628be1e049bbe8b938 --> </blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
…anager It never calls the Kubernetes API; chart default is true. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review-body nitpick resolution record (no threads exist for these):
|
Resolution record for the CodeRabbit review-body nitpicks (2026-07-04 12:27Z review — no inline threads exist for these):
|
Why
The Headlamp Kubescape plugin's Runtime Detection → Alerts page shows "Alertmanager URL is not configured", and Kubescape's runtime-detection alerts (rule violations, malware) were flowing nowhere. That tab reads only from a Prometheus Alertmanager, which the Coroot migration removed from the cluster — so there was no source to point it at.
What
Reintroduces a single, tiny, Kubescape-scoped Alertmanager (prod-only; not a return of the old Prometheus stack) and points the node-agent at it, so runtime alerts now reach all three intended places: the Headlamp plugin, Slack (the existing shared webhook), and Coroot (via the node-agent's stdout, which Coroot's log capture surfaces).
Operational notes
kubescape/alertmanager:9093. Until then the data source exists but the tab stays empty. Documented indocs/dr/alerting.md.