Skip to content

Replace status apps with Prometheus monitoring#266

Merged
edgard merged 5 commits into
masterfrom
codex-monitoring-stack
Jun 30, 2026
Merged

Replace status apps with Prometheus monitoring#266
edgard merged 5 commits into
masterfrom
codex-monitoring-stack

Conversation

@edgard

@edgard edgard commented Jun 29, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add kube-prometheus-stack with Prometheus, Grafana, Alertmanager, default dashboards, persistent storage, and actionable alert rules.
  • Add prometheus-blackbox-exporter with repo-managed Probe resources for routed HTTP services, DNS resolvers, and basic external connectivity.
  • Remove Gatus/Homepage apps, annotations, credentials, and policy requirements while keeping hostname/Gateway validation.

Test Plan

  • task fmt:check
  • task lint:kubernetes
  • task lint:static
  • task lint
  • helm template kube-prometheus-stack 87.3.0 with repo values
  • helm template prometheus-blackbox-exporter 11.15.0 with repo values
  • stale reference scan for gatus/homepage/status/dash annotations
  • routed target coverage check for blackbox Probe resources

Manual Verification After Merge

  • Sync kube-prometheus-stack and prometheus-blackbox-exporter in Argo CD.
  • Confirm Grafana loads at https://grafana.edgard.org.
  • Confirm Prometheus targets include kube-state-metrics, node-exporter, blackbox exporter, cert-manager, and Argo CD metrics.
  • Confirm Alertmanager sends a Telegram test alert.
  • Confirm old dash.edgard.org and status.edgard.org routes are gone or unused.

edgard added 2 commits June 30, 2026 10:55
Use kube-prometheus-stack, Grafana, Alertmanager, and blackbox exporter probes as the compact monitoring surface for the homelab. This removes Gatus/Homepage-specific route metadata and policy requirements while keeping actionable Telegram alerting and routed-service checks in GitOps.
Keep the monitoring replacement diff focused by restoring the existing blank-line grouping in app values files after removing the Gatus and Homepage annotations.
@edgard edgard force-pushed the codex-monitoring-stack branch from d2346cf to 3c22d10 Compare June 30, 2026 08:57
edgard added 3 commits June 30, 2026 11:27
Let cert-manager and Argo CD own their native metrics ServiceMonitor resources instead of defining those scrapes from kube-prometheus-stack. This keeps kube-prometheus-stack focused on selecting monitoring CRs while app charts expose their own metrics integration.
Add a Grafana sidecar dashboard for blackbox probe health, failures, latency, and HTTP status codes so the new routed, DNS, and connectivity checks have a first-class visual surface.
Use a boolean comparison for the failing probe stat so failed probes count as one instead of summing zero-valued series. Also remove a table column for a probe label that is not reliably produced by Prometheus Operator Probe targets.
@edgard edgard merged commit 13ca215 into master Jun 30, 2026
6 checks passed
@edgard edgard deleted the codex-monitoring-stack branch June 30, 2026 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant