Skip to content

fargocd: drop dead webhook port; fargocd-manager: add metrics + probes#485

Merged
tamalsaha merged 3 commits into
masterfrom
charts-port-surface
Jun 3, 2026
Merged

fargocd: drop dead webhook port; fargocd-manager: add metrics + probes#485
tamalsaha merged 3 commits into
masterfrom
charts-port-surface

Conversation

@tamalsaha

@tamalsaha tamalsaha commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Two related chart changes that align the deployed port surface with what the binaries actually serve.

charts/fargocd — drop dead webhook port

Removes containerPort: 9443 from the deployment and the https 443 → 9443 entry from the service. The operator constructs a controller-runtime webhook.NewServer(...) but never registers any handlers (SetupWebhookWithManager is not called), so the TLS endpoint had nothing listening behind it. Metrics (8443) and the in-pod probes port (8081) are unchanged.

charts/fargocd-manager — add metrics surface and real probes

The hub-side OCM AddOn manager pod previously declared no ports, ran no probes, and had no Service. The addon-framework's genericapiserver already listens on :8443 HTTPS — this PR wires it up:

  • containerPort: 8443 named metrics
  • Liveness + readiness httpGet /healthz against port: metrics with scheme: HTTPS (kubelet probes skip TLS verification, so the addon-framework's runtime self-signed cert is fine)
  • ClusterIP Service exposing metrics 8443 → metrics, plus the prometheus.io/builtin annotations
  • Optional ServiceMonitor + SA-token Secret, gated on monitoring.agent == prometheus.io/operator. Uses insecureSkipVerify: true because the addon-framework's serving cert SAN is localhost.
  • monitoring.{agent, serviceMonitor.labels} values + helper templates mirroring the fargocd chart
  • README regen reflects the new values keys

Pairs with kubeops/fargocd#27 which wires real workqueue + reflector metrics into the manager's /metrics. The ServiceMonitor will scrape successfully without that PR too — it'll just see Go runtime + process metrics until a fargocd image carrying #27 lands.

Caveat

charts/fargocd-manager/values.openapiv3_schema.yaml is auto-generated from apis/installer/v1alpha1/fargocd_manager_types.go via make manifests. The new monitoring block needs a Monitoring field added to FargocdManagerSpec (mirroring FargocdSpec) and a regen run before strict schema validation will accept it. Kept out of scope here to keep this PR chart-only.

Test plan

  • helm template charts/fargocd renders without the https / 9443 entries
  • helm template charts/fargocd-manager renders Service + Deployment with probes against port: metrics scheme: HTTPS (no ServiceMonitor)
  • helm template charts/fargocd-manager --set monitoring.agent=prometheus.io/operator renders the ServiceMonitor + SA-token Secret with insecureSkipVerify: true
  • Deploy to a hub cluster: confirm kubelet liveness + readiness probes against /healthz pass
  • With manager: expose Prometheus workqueue/reflector metrics; drop dead webhook port fargocd#27 image: confirm a Prometheus Operator scrapes workqueue_* and reflector_* series via the ServiceMonitor

fargocd chart -- drop the dead `https` (443 -> 9443) port from both the
deployment and the service. The operator constructs a controller-runtime
webhook server but never registers any handlers (`SetupWebhookWithManager`
is not called anywhere), so the TLS endpoint had nothing listening
behind it. Metrics (8443) and the in-pod probes port (8081) stay as is.

fargocd-manager chart -- declare matching port surface and a metrics
service:

  - containerPort 8081 (probes) and 8443 (metrics)
  - HTTP /healthz readiness + liveness probes against the probes port
    (paired with the new --health-probe-bind-address plumbing in the
    fargocd manager binary)
  - ClusterIP Service exposing metrics 8443 -> metrics, with the
    prometheus.io/builtin annotations matching the fargocd chart
  - Optional ServiceMonitor + SA-token Secret gated on
    monitoring.agent == prometheus.io/operator
  - monitoring.{agent,serviceMonitor.labels} values + helper templates
    mirroring the fargocd chart

The ServiceMonitor uses insecureSkipVerify because the addon-framework
controller's serving cert is generated at runtime with SAN=localhost.

Note: values.openapiv3_schema.yaml is auto-generated; the new
monitoring block will need a corresponding `Monitoring` field on
FargocdManagerSpec (in kubeops.dev/installer/apis) and a `make manifests`
run before strict schema validation will accept it.

Signed-off-by: Tamal Saha <tamal@appscode.com>
kodiakhq[bot]
kodiakhq Bot previously approved these changes Jun 3, 2026
chart-doc-gen output picks up the new monitoring.agent and
monitoring.serviceMonitor.labels keys added to values.yaml.

Signed-off-by: Tamal Saha <tamal@appscode.com>
kodiakhq[bot]
kodiakhq Bot previously approved these changes Jun 3, 2026
tamalsaha added a commit to kubeops/fargocd that referenced this pull request Jun 3, 2026
The addon-framework controller exposes Prometheus /metrics on its
:8443 HTTPS endpoint via genericapiserver, but the framework itself
registers no collectors. Without a workqueue metrics provider, the
endpoint only returns Go runtime + process collectors, which is
enough to confirm the pod is alive but says nothing about reconcile
backlog or throughput.

Blank-import k8s.io/component-base/metrics/prometheus/workqueue in
the manager package so its init() registers the prometheus provider
against client-go's workqueue (via workqueue.SetProvider) and adds
the standard workqueue_{depth,adds_total,queue_duration_seconds,
work_duration_seconds,retries_total,longest_running_processor_seconds,
unfinished_work_seconds} collectors to legacyregistry.

The ServiceMonitor shipped by the fargocd-manager chart in
kubeops/installer#485 picks these up automatically — no chart
change needed.

Signed-off-by: Tamal Saha <tamal@appscode.com>
tamalsaha added a commit to kubeops/fargocd that referenced this pull request Jun 3, 2026
The chart change in kubeops/installer#485 points liveness/readiness
probes at the addon-framework's existing HTTPS /healthz on :8443
(kubelet's httpGet skips TLS verification, so the runtime self-signed
cert is fine). With that, the dedicated plain-HTTP server on :8081
has no consumer -- drop the listener, the --health-probe-bind-address
flag, and the ProbeAddr option.

The workqueue and reflector metrics wired up in the previous commits
stay (they live on the same :8443 endpoint as /healthz).

Signed-off-by: Tamal Saha <tamal@appscode.com>
Drop the dedicated probes (8081) container port and point the
readiness + liveness httpGet at the metrics port (8443) with
scheme: HTTPS. The OCM addon-framework's genericapiserver already
serves /healthz there, and kubelet's httpGet probes skip TLS
verification so the runtime self-signed cert (SAN=localhost) is
not an issue.

Drops the dependency on the now-removed --health-probe-bind-address
plumbing in kubeops/fargocd#27.

Signed-off-by: Tamal Saha <tamal@appscode.com>
@tamalsaha tamalsaha changed the title fargocd,fargocd-manager: rationalize chart port surface fargocd: drop dead webhook port; fargocd-manager: add metrics + probes Jun 3, 2026
tamalsaha added a commit to kubeops/fargocd that referenced this pull request Jun 3, 2026
…hook port (#27)

* manager: serve health probes on :8081; drop dead webhook port

The fargocd chart's deployment declared a containerPort 9443 (`https`)
and the service forwarded port 443 to it, but the operator only
constructs a controller-runtime webhook server -- no admission handlers
are ever registered, so nothing was listening behind that TLS endpoint.
Drop both from the embedded chart copy under pkg/manager/agent-manifests
(the installer-side copy is updated in the kubeops/installer repo).

For the OCM AddOn `fargocd manager` subcommand, the addon-framework
controller binds HTTPS on :8443 with a runtime self-signed cert
(SAN=localhost), which makes kubelet probes awkward. Stand up a plain
HTTP probe server (default :8081) that serves /healthz and /readyz,
gated by --health-probe-bind-address (set to empty to disable). The
embedded fargocd chart still uses controller-runtime's own probe
plumbing on the same port name, so naming stays consistent.

Signed-off-by: Tamal Saha <tamal@appscode.com>

* manager: wire client-go workqueue metrics into /metrics

The addon-framework controller exposes Prometheus /metrics on its
:8443 HTTPS endpoint via genericapiserver, but the framework itself
registers no collectors. Without a workqueue metrics provider, the
endpoint only returns Go runtime + process collectors, which is
enough to confirm the pod is alive but says nothing about reconcile
backlog or throughput.

Blank-import k8s.io/component-base/metrics/prometheus/workqueue in
the manager package so its init() registers the prometheus provider
against client-go's workqueue (via workqueue.SetProvider) and adds
the standard workqueue_{depth,adds_total,queue_duration_seconds,
work_duration_seconds,retries_total,longest_running_processor_seconds,
unfinished_work_seconds} collectors to legacyregistry.

The ServiceMonitor shipped by the fargocd-manager chart in
kubeops/installer#485 picks these up automatically — no chart
change needed.

Signed-off-by: Tamal Saha <tamal@appscode.com>

* manager: wire client-go reflector metrics into /metrics

Pairs with the workqueue blank import: workqueue metrics cover
reconcile backlog/throughput on the addon controllers, reflector
metrics cover the list/watch behaviour of the informers feeding
those reconcilers.

client-go/tools/cache exposes a MetricsProvider interface but ships
no off-the-shelf prometheus wrapper, so register a labelled set of
collectors (reflector_{lists_total, list_duration_seconds,
items_per_list, watches_total, short_watches_total,
watch_duration_seconds, items_per_watch, last_resource_version})
against legacyregistry and bind them via SetReflectorMetricsProvider.
Each metric is labelled by reflector name (the watched resource
type) so per-informer behaviour is distinguishable.

go.mod: promote github.com/prometheus/client_golang from indirect
to direct -- it was already vendored transitively, so no vendor/
change.

Signed-off-by: Tamal Saha <tamal@appscode.com>

* manager: drop dedicated probe server; use addon-framework /healthz

The chart change in kubeops/installer#485 points liveness/readiness
probes at the addon-framework's existing HTTPS /healthz on :8443
(kubelet's httpGet skips TLS verification, so the runtime self-signed
cert is fine). With that, the dedicated plain-HTTP server on :8081
has no consumer -- drop the listener, the --health-probe-bind-address
flag, and the ProbeAddr option.

The workqueue and reflector metrics wired up in the previous commits
stay (they live on the same :8443 endpoint as /healthz).

Signed-off-by: Tamal Saha <tamal@appscode.com>

---------

Signed-off-by: Tamal Saha <tamal@appscode.com>
@tamalsaha tamalsaha merged commit 5322394 into master Jun 3, 2026
4 of 8 checks passed
@tamalsaha tamalsaha deleted the charts-port-surface branch June 3, 2026 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant