Skip to content

feat: add orphaned downstream HTTPRoute GC controller#234

Closed
ecv wants to merge 2 commits into
mainfrom
feat/issue-233-orphaned-httproute-gc
Closed

feat: add orphaned downstream HTTPRoute GC controller#234
ecv wants to merge 2 commits into
mainfrom
feat/issue-233-orphaned-httproute-gc

Conversation

@ecv

@ecv ecv commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds OrphanedDownstreamHTTPRouteGCReconciler that scans Karmada (downstream cluster) for HTTPRoutes that reference a Gateway which no longer exists on the upstream project cluster, and deletes them
  • Watches both upstream HTTPRoutes and downstream Karmada HTTPRoutes so orphans from before NSO v0.23.6 are caught on startup via the initial informer list
  • 5-minute minimum age threshold prevents deletion during normal Karmada propagation lag
  • A route with multiple parentRefs is only deleted when all referenced upstream Gateways are gone

Motivation

The v0.23.5 bug fixed in #231 (detachHTTPRoutes using Status.Parents) could leave downstream HTTPRoutes alive on Karmada even after the upstream Gateway was deleted. Since the finalizer already ran, those orphans survive the v0.23.6 rollout and need a separate scan-and-delete pass. They also cause HTTPRouteStatusWarning/BackendNotFound alerts that fire indefinitely with no auto-heal.

Incident: datum-cloud/infra#2901, datum-cloud/engineering#330

Test plan

  • TestOrphanedDownstreamHTTPRouteGC/orphaned_route_deleted_when_upstream_gateway_missing
  • TestOrphanedDownstreamHTTPRouteGC/healthy_route_not_deleted_when_upstream_gateway_exists
  • TestOrphanedDownstreamHTTPRouteGC/young_route_not_deleted;_requeue_returned
  • TestOrphanedDownstreamHTTPRouteGC/route_with_multiple_parentRefs_kept_if_any_gateway_still_exists
  • TestOrphanedDownstreamHTTPRouteGC/no_downstream_routes_is_a_no-op
  • Full controller package test suite passes (go test ./internal/controller/...)
  • Verify existing orphan tunnel-858j6 is cleaned up after this ships (datum-cloud/infra#2901)

Closes #233

Add OrphanedDownstreamHTTPRouteGCReconciler, which scans Karmada for
downstream HTTPRoutes that reference a Gateway that no longer exists on
the upstream project cluster, and deletes them.

This is a safety net for the pre-v0.23.6 bug where detachHTTPRoutes
used Status.Parents instead of Spec.ParentRefs during gateway
finalization (#231). Existing orphans left by that bug survive the
v0.23.6 rollout because the finalizer already ran; this controller
catches them on startup and on any subsequent event.

Key design decisions:
- Reconcile key is the upstream HTTPRoute identity (cluster+ns+name)
  derived from labels NSO stamps on every downstream HTTPRoute via
  SetControllerReference. No new indexes required.
- Watches both upstream HTTPRoutes (deletions) and downstream HTTPRoutes
  (Karmada) so existing orphans are caught at startup via the initial
  informer list.
- 5-minute minimum age threshold prevents deletion during Karmada
  propagation lag on healthy new routes.
- A route with multiple parentRefs is only deleted when all referenced
  upstream Gateways are gone.

Closes #233
@ecv ecv requested a review from scotwells June 29, 2026 15:44
@ecv ecv assigned drewr and mattdjenkinson and unassigned drewr and mattdjenkinson Jun 29, 2026
@ecv ecv requested review from drewr and mattdjenkinson June 29, 2026 15:45
@scotwells

scotwells commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Seems like this is a bandaid for a garage collection issue? I wouldn't want to operationalize a bandaid.

@ecv

ecv commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Mmm, we could delete the orphans from Karmada, but I'm preferring not to issue one-off administrative actions.

I suppose we could develop a scripted admin process for deleting stuff from Karmada in the event we have to do so. But, shouldn't the component take care of them?

The underlying orphan-creator has (probably) been fixed.

@ecv ecv marked this pull request as draft June 29, 2026 16:56
Adding OrphanedDownstreamHTTPRouteGCReconciler pushed NewCommand's
cyclomatic complexity to 61, exceeding the 60 limit.

Extract the RunE body into a named runManager function, moving the
complexity out of NewCommand. Introduce managerConfig struct to pass
flag values cleanly. Rename inner cfg (rest.Config) to restCfg to
avoid shadowing the parameter.
@ecv

ecv commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Doesn't seem like the most sane implementation.

@ecv ecv removed the request for review from mattdjenkinson June 29, 2026 17:21
@ecv

ecv commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Closing as won't-do, sense of group is we do this as a one-off.

@ecv ecv closed this Jun 29, 2026
@ecv ecv deleted the feat/issue-233-orphaned-httproute-gc branch June 29, 2026 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: scan for and clean up orphaned downstream HTTPRoutes after gateway finalization

4 participants