manager: expose Prometheus workqueue/reflector metrics; drop dead webhook port#27
Merged
Conversation
The fargocd chart's deployment declared a containerPort 9443 (`https`) and the service forwarded port 443 to it, but the operator only constructs a controller-runtime webhook server -- no admission handlers are ever registered, so nothing was listening behind that TLS endpoint. Drop both from the embedded chart copy under pkg/manager/agent-manifests (the installer-side copy is updated in the kubeops/installer repo). For the OCM AddOn `fargocd manager` subcommand, the addon-framework controller binds HTTPS on :8443 with a runtime self-signed cert (SAN=localhost), which makes kubelet probes awkward. Stand up a plain HTTP probe server (default :8081) that serves /healthz and /readyz, gated by --health-probe-bind-address (set to empty to disable). The embedded fargocd chart still uses controller-runtime's own probe plumbing on the same port name, so naming stays consistent. Signed-off-by: Tamal Saha <tamal@appscode.com>
5 tasks
The addon-framework controller exposes Prometheus /metrics on its
:8443 HTTPS endpoint via genericapiserver, but the framework itself
registers no collectors. Without a workqueue metrics provider, the
endpoint only returns Go runtime + process collectors, which is
enough to confirm the pod is alive but says nothing about reconcile
backlog or throughput.
Blank-import k8s.io/component-base/metrics/prometheus/workqueue in
the manager package so its init() registers the prometheus provider
against client-go's workqueue (via workqueue.SetProvider) and adds
the standard workqueue_{depth,adds_total,queue_duration_seconds,
work_duration_seconds,retries_total,longest_running_processor_seconds,
unfinished_work_seconds} collectors to legacyregistry.
The ServiceMonitor shipped by the fargocd-manager chart in
kubeops/installer#485 picks these up automatically — no chart
change needed.
Signed-off-by: Tamal Saha <tamal@appscode.com>
Pairs with the workqueue blank import: workqueue metrics cover
reconcile backlog/throughput on the addon controllers, reflector
metrics cover the list/watch behaviour of the informers feeding
those reconcilers.
client-go/tools/cache exposes a MetricsProvider interface but ships
no off-the-shelf prometheus wrapper, so register a labelled set of
collectors (reflector_{lists_total, list_duration_seconds,
items_per_list, watches_total, short_watches_total,
watch_duration_seconds, items_per_watch, last_resource_version})
against legacyregistry and bind them via SetReflectorMetricsProvider.
Each metric is labelled by reflector name (the watched resource
type) so per-informer behaviour is distinguishable.
go.mod: promote github.com/prometheus/client_golang from indirect
to direct -- it was already vendored transitively, so no vendor/
change.
Signed-off-by: Tamal Saha <tamal@appscode.com>
The chart change in kubeops/installer#485 points liveness/readiness probes at the addon-framework's existing HTTPS /healthz on :8443 (kubelet's httpGet skips TLS verification, so the runtime self-signed cert is fine). With that, the dedicated plain-HTTP server on :8081 has no consumer -- drop the listener, the --health-probe-bind-address flag, and the ProbeAddr option. The workqueue and reflector metrics wired up in the previous commits stay (they live on the same :8443 endpoint as /healthz). Signed-off-by: Tamal Saha <tamal@appscode.com>
tamalsaha
added a commit
to kubeops/installer
that referenced
this pull request
Jun 3, 2026
Drop the dedicated probes (8081) container port and point the readiness + liveness httpGet at the metrics port (8443) with scheme: HTTPS. The OCM addon-framework's genericapiserver already serves /healthz there, and kubelet's httpGet probes skip TLS verification so the runtime self-signed cert (SAN=localhost) is not an issue. Drops the dependency on the now-removed --health-probe-bind-address plumbing in kubeops/fargocd#27. Signed-off-by: Tamal Saha <tamal@appscode.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Make the OCM AddOn
fargocd managersubcommand observable. Before this PR/metricsreturned onlygo_*/process_*collectors — enough to confirm the pod was alive but blind to reconcile backlog or list/watch behaviour. Companion to kubeops/installer#485, which wires the chart-side surface (ClusterIP Service, ServiceMonitor, and probes against the addon-framework's existing/healthzon:8443).1. Workqueue metrics
Blank-import
k8s.io/component-base/metrics/prometheus/workqueuein the manager package. Itsinit()registers the standardworkqueue_{depth, adds_total, queue_duration_seconds, work_duration_seconds, retries_total, longest_running_processor_seconds, unfinished_work_seconds}collectors againstlegacyregistry(which backs the addon-framework's/metrics) and callsworkqueue.SetProvider.2. Reflector (informer) metrics
k8s.io/client-go/tools/cacheexposes aMetricsProviderinterface but ships no off-the-shelf Prometheus wrapper, so register a labelled set of collectors and bind them viaSetReflectorMetricsProvider:reflector_lists_total,reflector_list_duration_seconds,reflector_items_per_listreflector_watches_total,reflector_short_watches_total,reflector_watch_duration_seconds,reflector_items_per_watchreflector_last_resource_versionEach is labelled
name=<resource>so per-informer behaviour is distinguishable.go.modpromotesgithub.com/prometheus/client_golangfrom indirect to direct; no vendor change (already vendored transitively).Caveat:
cache.SetReflectorMetricsProviderissync.Once-gated upstream. If a future vendored package wins the race, our provider becomes a no-op and thereflector_*series stay flat at zero. Nothing in the current import graph does this — flagging for the future.3. Drop dead webhook port from the embedded fargocd chart
pkg/manager/agent-manifests/fargocdis the spoke chart shipped via OCM (kept byte-identical withinstaller/charts/fargocd). The deployment declaredcontainerPort: 9443(https) and the service forwarded port 443 to it, but the operator constructs a controller-runtimewebhook.NewServer(...)and never callsSetupWebhookWithManager— no admission handlers are registered, so nothing was listening behind the TLS endpoint. Drop both. The installer-side copy is dropped in kubeops/installer#485.Resulting
/metricssurfaceworkqueue_*(8 metrics, labelled by queue)reflector_*(8 metrics, labelled by resource)go_*,process_*legacyregistrydefaults (already there)apiserver_*,etcd3_*, …genericapiserverdefaults — definitions exist but values stay at 0 (no real apiserver/etcd)Probes are not introduced here; the chart in kubeops/installer#485 points liveness/readiness at the addon-framework's existing HTTPS
/healthzon the same:8443endpoint.Test plan
go build ./...cleango vet ./...cleanhelm templateof the embedded fargocd chart renders without the 9443 / 443 entrieshttps://<pod>:8443/metrics(with-kfor the self-signed cert) and confirmworkqueue_depth{name=…}andreflector_lists_total{name=…}appear with non-empty label values once informers start