Skip to content

fix: null-guard responseObject.metadata.name in ActivityPolicy creates#672

Merged
ecv merged 2 commits into
mainfrom
fix/activitypolicy-create-summary-dlq-guard
Jun 30, 2026
Merged

fix: null-guard responseObject.metadata.name in ActivityPolicy creates#672
ecv merged 2 commits into
mainfrom
fix/activitypolicy-create-summary-dlq-guard

Conversation

@ecv

@ecv ecv commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

What

Make the create rule summaries null-safe in six ActivityPolicies that
dereference audit.responseObject.metadata.name. Guard the leaf and fall back
to audit.objectRef.name.

Why

Same DLQ-leak class as the prod DLQSlowLeak on the NSO gateway policy
(milo-os/activity#212). A create rule matches on verb/requestObject, so a
rejected create (409/422/admission-deny) still matches — but the apiserver
returns a Status object as responseObject with no metadata.name. CEL
raises no such key: name, the event goes to the DLQ, and retries fail
identically. Latent until the first rejected create of each kind.

Fixed (create rules)

  • config/services/activity/policies/resourcemanager/ — project, organization
  • config/services/activity/policies/iam/ — role, group, serviceaccount
  • config/services/identity/policies/ — serviceaccount

Remediation

Merge + milo release → the milo-activity-policies Flux Kustomization
re-applies the corrected CRs; processor retries any DLQ backlog. No kubectl.

Related

The create rules dereference audit.responseObject.metadata.name in their
summary, but a rejected create (409/422/admission-deny) returns a Status
object with no metadata.name. The rules still match (keyed on verb/request),
so CEL raises "no such key: name", the event goes to the DLQ, and retries fail
identically -- a slow DLQ leak (same class as the gateway policy leak that
fired DLQSlowLeak in prod).

Guard the leaf and fall back to audit.objectRef.name across the resourcemanager
(project, organization), iam (role, group, serviceaccount), and identity
(serviceaccount) create rules.
…creates

The create summaries guarded responseObject.metadata.name and fell back to
audit.objectRef.name, but a generateName create rejected before a name is
assigned has an EMPTY objectRef.name. The else branch then derefs
audit.objectRef.name on a Status responseObject and raises "no such key: name"
again, so the event still dead-letters. The assigned name is carried on the
Status at responseObject.details.name.

Replace each create summary with a per-level-guarded fallback chain:
  responseObject.metadata.name -> objectRef.name -> responseObject.details.name -> literal
The details branch is guarded with has(responseObject.details) because some
Status responses carry no details.

Confirmed live in the sibling NSO repo: activity-processor logs show
generateName creates (Connector) re-failing the DLQ retry on the objectRef.name
branch. Same latent gap exists for every metadata.name create rule here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ecv

ecv commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

Blast radius + suggested follow-up (not in this PR)

While extending this fix (added responseObject.details.name fallback for generateName creates rejected before naming — see latest commit), I swept the rest of the activity policies for the same responseObject deref bug class. Summary of what's left:

Already safe — these non-create rules deref responseObject but their match clause guards it down to the ref object, so a rejected/Status response never reaches the summary:

  • resourcemanager/organizationmembership (delete) — has(responseObject.spec) && has(responseObject.spec.userRef)
  • notification/contactgroupmembership (delete) — guards contactRef + contactGroupRef
  • iam/groupmembership (delete) — guards userRef + groupRef
  • iam/userdeactivation (delete) — guards userRef
  • quota/resourcegrant (delete) — guards consumerRef
  • iam/userinvitation (accept/decline/cancel) — guards responseObject.status.organization.displayName fully → safe

Residual (low-probability) gap: each of the above derefs one level deeper than its match guards — e.g. match checks ...spec.userRef, summary uses ...spec.userRef.name. If the ref object is ever present without name, CEL raises no such key: name → DLQ. In practice refs always carry a name, so this is latent, not active. No DLQ evidence for these.

Recommendation: a separate follow-up to push each guard to the leaf (&& has(...userRef.name)) or fall the summary back to objectRef.name. Out of scope for this PR — flagging so it's tracked, not pursuing it here.

@ecv ecv marked this pull request as ready for review June 28, 2026 04:56
@ecv ecv requested review from JoseSzycho and scotwells June 29, 2026 22:37
@ecv ecv enabled auto-merge June 30, 2026 01:44
@ecv ecv merged commit f865d85 into main Jun 30, 2026
5 checks passed
@ecv ecv deleted the fix/activitypolicy-create-summary-dlq-guard branch June 30, 2026 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants