fix: null-guard responseObject.metadata.name in ActivityPolicy creates#672
Conversation
The create rules dereference audit.responseObject.metadata.name in their summary, but a rejected create (409/422/admission-deny) returns a Status object with no metadata.name. The rules still match (keyed on verb/request), so CEL raises "no such key: name", the event goes to the DLQ, and retries fail identically -- a slow DLQ leak (same class as the gateway policy leak that fired DLQSlowLeak in prod). Guard the leaf and fall back to audit.objectRef.name across the resourcemanager (project, organization), iam (role, group, serviceaccount), and identity (serviceaccount) create rules.
…creates The create summaries guarded responseObject.metadata.name and fell back to audit.objectRef.name, but a generateName create rejected before a name is assigned has an EMPTY objectRef.name. The else branch then derefs audit.objectRef.name on a Status responseObject and raises "no such key: name" again, so the event still dead-letters. The assigned name is carried on the Status at responseObject.details.name. Replace each create summary with a per-level-guarded fallback chain: responseObject.metadata.name -> objectRef.name -> responseObject.details.name -> literal The details branch is guarded with has(responseObject.details) because some Status responses carry no details. Confirmed live in the sibling NSO repo: activity-processor logs show generateName creates (Connector) re-failing the DLQ retry on the objectRef.name branch. Same latent gap exists for every metadata.name create rule here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Blast radius + suggested follow-up (not in this PR)While extending this fix (added Already safe — these non-create rules deref
Residual (low-probability) gap: each of the above derefs one level deeper than its Recommendation: a separate follow-up to push each guard to the leaf ( |
What
Make the
createrule summaries null-safe in six ActivityPolicies thatdereference
audit.responseObject.metadata.name. Guard the leaf and fall backto
audit.objectRef.name.Why
Same DLQ-leak class as the prod
DLQSlowLeakon the NSO gateway policy(milo-os/activity#212). A
createrule matches on verb/requestObject, so arejected create (409/422/admission-deny) still matches — but the apiserver
returns a
Statusobject asresponseObjectwith nometadata.name. CELraises
no such key: name, the event goes to the DLQ, and retries failidentically. Latent until the first rejected create of each kind.
Fixed (create rules)
config/services/activity/policies/resourcemanager/— project, organizationconfig/services/activity/policies/iam/— role, group, serviceaccountconfig/services/identity/policies/— serviceaccountRemediation
Merge + milo release → the
milo-activity-policiesFlux Kustomizationre-applies the corrected CRs; processor retries any DLQ backlog. No
kubectl.Related