Skip to content

bug (provisional): nil-hostname default listener silently dropped from downstream gateway while reported Programmed=True #235

Description

@ecv

Provisional — found by code inspection while investigating a separate flake (datum-cloud/infra#2950), not yet reproduced live. Needs owner confirmation before treating as definite.

Summary

A derived downstream/edge Gateway listener can be silently dropped when its Hostname is still nil at reconcile time, while the upstream Gateway continues to report that listener Programmed=True. Result: the edge serves only the listener(s) that had a hostname, the other is missing, and status hides the gap.

Mechanism

The default HTTP (default-http, :80) and HTTPS (default-https, :443) listeners are created with Hostname: nil (internal/util/gateway/listeners.go:44-70). Their hostnames are then stamped in two independent nil-guarded blocks:

internal/controller/gateway_controller.go:190-200

defaultHTTPListener := gatewayutil.GetListenerByName(gateway.Spec.Listeners, gatewayutil.DefaultHTTPListenerName)
if defaultHTTPListener != nil && defaultHTTPListener.Hostname == nil {
    needsUpdate = true
    defaultHTTPListener.Hostname = gatewayDefaultHostname
}

defaultHTTPSListener := gatewayutil.GetListenerByName(gateway.Spec.Listeners, gatewayutil.DefaultHTTPSListenerName)
if defaultHTTPSListener != nil && defaultHTTPSListener.Hostname == nil {
    needsUpdate = true
    defaultHTTPSListener.Hostname = gatewayDefaultHostname
}

When building the downstream gateway, the entire per-listener append is wrapped in if l.Hostname != nilinternal/controller/gateway_controller.go:709:

if l.Hostname != nil {
    listenerCopy := l.DeepCopy()
    ...
    listeners = append(listeners, *listenerCopy)   // :743
}
// no else, no log, no error
...
downstreamGateway.Spec.Listeners = listeners       // :749

A listener whose Hostname is still nil is never added to downstreamGateway.Spec.Listeners — it just vanishes from the edge with no diagnostic.

If a reconcile observes one default listener's hostname already populated while the other is still nil (cache staleness, a needsUpdate short-circuit, or the per-listener re-pin at internal/controller/httpproxy_controller.go:199-209 carrying one hostname forward but not the other), only the populated listener reaches the edge.

Why status hides it

gateway_controller.go (listener-status block, ~:1742-1830) sets ListenerConditionProgrammed=True for every spec listener, flipping False only on a hostname problem or a cert problem (cert gating is HTTPS-only). So a default-http listener that was never replicated downstream still reports Programmed=True upstream.

The aggregate Gateway Programmed condition (:1000-1016) is mirrored verbatim from the single downstream-gateway condition and does not break down per-listener, so a missing edge listener does not flip it False. The requeue keyed on the aggregate (:1031-1036) therefore won't self-heal the drop; only an unrelated reconcile event that re-runs getDesiredDownstreamGateway with the hostname now populated will add it back — i.e. intermittent "comes up eventually, or sometimes never within a window."

Symptom (predicted, unverified)

A proxy whose edge listener was dropped would refuse/reset TCP on that port (e.g. :80) while the other port serves — distinct from a DNS failure (connection refused/timeout, not could not resolve host). NOT what infra#2950 turned out to be (that was DNS); this is filed separately as the latent code path.

File:line references

  • internal/util/gateway/listeners.go:44-70 — both default listeners created with Hostname: nil; only HTTPS gets TLS.
  • internal/controller/gateway_controller.go:190-200 — independent nil-guarded hostname assignment, default-http vs default-https.
  • internal/controller/gateway_controller.go:709-749the drop: nil-hostname listeners excluded from the downstream gateway; no else/log/error.
  • internal/controller/gateway_controller.go:~1742-1830 — per-listener Programmed=True set regardless of downstream presence.
  • internal/controller/gateway_controller.go:~1000-1016, ~1031-1036 — aggregate Programmed mirrored from downstream; requeue keyed on aggregate, not per-listener.
  • internal/controller/httpproxy_controller.go:199-209 — independent per-listener re-pin on update; can carry one listener's hostname forward but not the other.

Asks

  • Confirm whether a default listener can legitimately reach getDesiredDownstreamGateway with Hostname == nil (timing/cache).
  • Decide handling: either always stamp both default hostnames before any downstream build, or surface a non-Programmed/log when a spec listener is dropped from downstream instead of silently omitting it.
  • Repro: force a reconcile where one default hostname is unset, assert the downstream gateway is missing that listener while upstream reports it Programmed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions