Provisional — found by code inspection while investigating a separate flake (datum-cloud/infra#2950), not yet reproduced live. Needs owner confirmation before treating as definite.
Summary
A derived downstream/edge Gateway listener can be silently dropped when its Hostname is still nil at reconcile time, while the upstream Gateway continues to report that listener Programmed=True. Result: the edge serves only the listener(s) that had a hostname, the other is missing, and status hides the gap.
Mechanism
The default HTTP (default-http, :80) and HTTPS (default-https, :443) listeners are created with Hostname: nil (internal/util/gateway/listeners.go:44-70). Their hostnames are then stamped in two independent nil-guarded blocks:
internal/controller/gateway_controller.go:190-200
defaultHTTPListener := gatewayutil.GetListenerByName(gateway.Spec.Listeners, gatewayutil.DefaultHTTPListenerName)
if defaultHTTPListener != nil && defaultHTTPListener.Hostname == nil {
needsUpdate = true
defaultHTTPListener.Hostname = gatewayDefaultHostname
}
defaultHTTPSListener := gatewayutil.GetListenerByName(gateway.Spec.Listeners, gatewayutil.DefaultHTTPSListenerName)
if defaultHTTPSListener != nil && defaultHTTPSListener.Hostname == nil {
needsUpdate = true
defaultHTTPSListener.Hostname = gatewayDefaultHostname
}
When building the downstream gateway, the entire per-listener append is wrapped in if l.Hostname != nil — internal/controller/gateway_controller.go:709:
if l.Hostname != nil {
listenerCopy := l.DeepCopy()
...
listeners = append(listeners, *listenerCopy) // :743
}
// no else, no log, no error
...
downstreamGateway.Spec.Listeners = listeners // :749
A listener whose Hostname is still nil is never added to downstreamGateway.Spec.Listeners — it just vanishes from the edge with no diagnostic.
If a reconcile observes one default listener's hostname already populated while the other is still nil (cache staleness, a needsUpdate short-circuit, or the per-listener re-pin at internal/controller/httpproxy_controller.go:199-209 carrying one hostname forward but not the other), only the populated listener reaches the edge.
Why status hides it
gateway_controller.go (listener-status block, ~:1742-1830) sets ListenerConditionProgrammed=True for every spec listener, flipping False only on a hostname problem or a cert problem (cert gating is HTTPS-only). So a default-http listener that was never replicated downstream still reports Programmed=True upstream.
The aggregate Gateway Programmed condition (:1000-1016) is mirrored verbatim from the single downstream-gateway condition and does not break down per-listener, so a missing edge listener does not flip it False. The requeue keyed on the aggregate (:1031-1036) therefore won't self-heal the drop; only an unrelated reconcile event that re-runs getDesiredDownstreamGateway with the hostname now populated will add it back — i.e. intermittent "comes up eventually, or sometimes never within a window."
Symptom (predicted, unverified)
A proxy whose edge listener was dropped would refuse/reset TCP on that port (e.g. :80) while the other port serves — distinct from a DNS failure (connection refused/timeout, not could not resolve host). NOT what infra#2950 turned out to be (that was DNS); this is filed separately as the latent code path.
File:line references
internal/util/gateway/listeners.go:44-70 — both default listeners created with Hostname: nil; only HTTPS gets TLS.
internal/controller/gateway_controller.go:190-200 — independent nil-guarded hostname assignment, default-http vs default-https.
internal/controller/gateway_controller.go:709-749 — the drop: nil-hostname listeners excluded from the downstream gateway; no else/log/error.
internal/controller/gateway_controller.go:~1742-1830 — per-listener Programmed=True set regardless of downstream presence.
internal/controller/gateway_controller.go:~1000-1016, ~1031-1036 — aggregate Programmed mirrored from downstream; requeue keyed on aggregate, not per-listener.
internal/controller/httpproxy_controller.go:199-209 — independent per-listener re-pin on update; can carry one listener's hostname forward but not the other.
Asks
Summary
A derived downstream/edge Gateway listener can be silently dropped when its
Hostnameis stillnilat reconcile time, while the upstream Gateway continues to report that listenerProgrammed=True. Result: the edge serves only the listener(s) that had a hostname, the other is missing, and status hides the gap.Mechanism
The default HTTP (
default-http, :80) and HTTPS (default-https, :443) listeners are created withHostname: nil(internal/util/gateway/listeners.go:44-70). Their hostnames are then stamped in two independent nil-guarded blocks:internal/controller/gateway_controller.go:190-200When building the downstream gateway, the entire per-listener append is wrapped in
if l.Hostname != nil—internal/controller/gateway_controller.go:709:A listener whose
Hostnameis stillnilis never added todownstreamGateway.Spec.Listeners— it just vanishes from the edge with no diagnostic.If a reconcile observes one default listener's hostname already populated while the other is still
nil(cache staleness, aneedsUpdateshort-circuit, or the per-listener re-pin atinternal/controller/httpproxy_controller.go:199-209carrying one hostname forward but not the other), only the populated listener reaches the edge.Why status hides it
gateway_controller.go(listener-status block, ~:1742-1830) setsListenerConditionProgrammed=Truefor every spec listener, flipping False only on a hostname problem or a cert problem (cert gating is HTTPS-only). So adefault-httplistener that was never replicated downstream still reportsProgrammed=Trueupstream.The aggregate Gateway
Programmedcondition (:1000-1016) is mirrored verbatim from the single downstream-gateway condition and does not break down per-listener, so a missing edge listener does not flip it False. The requeue keyed on the aggregate (:1031-1036) therefore won't self-heal the drop; only an unrelated reconcile event that re-runsgetDesiredDownstreamGatewaywith the hostname now populated will add it back — i.e. intermittent "comes up eventually, or sometimes never within a window."Symptom (predicted, unverified)
A proxy whose edge listener was dropped would refuse/reset TCP on that port (e.g. :80) while the other port serves — distinct from a DNS failure (
connection refused/timeout, notcould not resolve host). NOT what infra#2950 turned out to be (that was DNS); this is filed separately as the latent code path.File:line references
internal/util/gateway/listeners.go:44-70— both default listeners created withHostname: nil; only HTTPS gets TLS.internal/controller/gateway_controller.go:190-200— independent nil-guarded hostname assignment, default-http vs default-https.internal/controller/gateway_controller.go:709-749— the drop: nil-hostname listeners excluded from the downstream gateway; no else/log/error.internal/controller/gateway_controller.go:~1742-1830— per-listenerProgrammed=Trueset regardless of downstream presence.internal/controller/gateway_controller.go:~1000-1016,~1031-1036— aggregate Programmed mirrored from downstream; requeue keyed on aggregate, not per-listener.internal/controller/httpproxy_controller.go:199-209— independent per-listener re-pin on update; can carry one listener's hostname forward but not the other.Asks
getDesiredDownstreamGatewaywithHostname == nil(timing/cache).