OCPBUGS-84534: fix concurrent map race in project authorization cache#642
OCPBUGS-84534: fix concurrent map race in project authorization cache#642sanchezl wants to merge 3 commits into
Conversation
|
@sanchezl: This pull request references Jira Issue OCPBUGS-84534, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
WalkthroughAuthorizationCache now stores per-type caches in a single atomically-swapped ChangesCache Atomicity & Copy-on-Write Sync
sequenceDiagram
participant Sync as Synchronizer
participant IDX as Namespace/RBAC Indexers
participant Stores as authorizationCacheStores (snapshot)
participant ac as AuthorizationCache (stores pointer)
participant Reader as Reader (List)
rect rgba(100,149,237,0.5)
Sync->>Stores: Load snapshot via ac.stores.Load()
end
rect rgba(34,139,34,0.5)
Sync->>IDX: Read namespaces/policies/rolebindings
IDX-->>Sync: Items
Sync->>Stores: Clone subjectRecords (copy-on-write) and update copies
Sync->>Stores: Or build fresh stores for full invalidation
Sync->>ac: Atomic swap of new authorizationCacheStores pointer
end
rect rgba(255,165,0,0.5)
Reader->>ac: Load snapshot once via ac.stores.Load()
Reader->>Stores: Read user/group subject records from snapshot
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/verified by "TestAuthorizationCacheRace" |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@sanchezl: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@sanchezl: This pull request references Jira Issue OCPBUGS-84534, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/project/auth/cache.go (1)
451-471:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMove
lastCacheInvalidationto after the atomic store swap.On the full-rebuild path, Line 453 resets the expiry timer before the rebuilt stores are visible.
List()keeps serving the old snapshot until Line 467, so a slow rebuild can leave stale data live longer thanmaxCacheLifespan, and the next expiry window starts too early.Suggested fix
invalidateCache := ac.invalidateCache(expired) if invalidateCache { - ac.lastCacheInvalidation = ac.clock.Now() userSubjectRecordStore = cache.NewStore(subjectRecordKeyFn) groupSubjectRecordStore = cache.NewStore(subjectRecordKeyFn) reviewRecordStore = cache.NewStore(reviewRecordKeyFn) } @@ if invalidateCache { ac.stores.Store(&authorizationCacheStores{ userSubjectRecordStore: userSubjectRecordStore, groupSubjectRecordStore: groupSubjectRecordStore, reviewRecordStore: reviewRecordStore, }) + ac.lastCacheInvalidation = ac.clock.Now() }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/project/auth/cache.go` around lines 451 - 471, The cache expiry timestamp ac.lastCacheInvalidation is being set when invalidateCache is true before swapping in the rebuilt stores, which can extend stale-serving time; move the assignment of ac.lastCacheInvalidation to after the atomic swap (the ac.stores.Store call that installs the new authorizationCacheStores with userSubjectRecordStore, groupSubjectRecordStore, reviewRecordStore) so the expiry timer starts only once the new stores are visible to List()/readers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@pkg/project/auth/cache.go`:
- Around line 451-471: The cache expiry timestamp ac.lastCacheInvalidation is
being set when invalidateCache is true before swapping in the rebuilt stores,
which can extend stale-serving time; move the assignment of
ac.lastCacheInvalidation to after the atomic swap (the ac.stores.Store call that
installs the new authorizationCacheStores with userSubjectRecordStore,
groupSubjectRecordStore, reviewRecordStore) so the expiry timer starts only once
the new stores are visible to List()/readers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f932d9a3-ce8c-40cc-a062-02fafaee0b7a
📒 Files selected for processing (2)
pkg/project/auth/cache.gopkg/project/auth/cache_test.go
|
/retest-required |
|
/jira refresh |
|
@sanchezl: This pull request references Jira Issue OCPBUGS-84534, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| if len(subjectRecord.namespaces) == 0 { | ||
| subjectRecordStore.Delete(subjectRecord) | ||
| old := obj.(*subjectRecord) | ||
| newNamespaces := sets.NewString(old.namespaces.UnsortedList()...) |
There was a problem hiding this comment.
I'm concerned about this from a performance perspective -- especially allocations -- because this will run something like O(N*U) times on each invalidation (N=namespaces and U=users).
10ef6dd to
84c0d39
Compare
|
@sanchezl: This pull request references Jira Issue OCPBUGS-84534, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
84c0d39 to
1cf6ada
Compare
Alternative Approaches EvaluatedThree approaches were prototyped independently and benchmarked using 1. Dual-path COW (this PR)Branch: During full cache invalidation, new stores are private to the writer — mutations happen in place with zero copy overhead. During incremental updates, stores are shared with readers — copy-on-write creates new 2. sync.MapBranch: Replace 3. Fine-grained RWMutexBranch: Add a per-record Benchmark Results
All three optimized approaches are within ~30% of each other and are orders of magnitude better than the original unconditional COW approach that @benluddy identified as O(N×U) catastrophic. Why Dual-path COW?
The sync.Map and fine-grained RWMutex approaches are both viable alternatives if the team prefers a different tradeoff. The branches are available for review. |
|
/retest-required |
|
/retest ci/prow/e2e-aws-ovn ci/prow/e2e-aws-ovn-serial-1of2 ci/prow/e2e-aws-ovn-serial-2of2 |
|
/test e2e-aws-ovn |
|
/test e2e-aws-ovn-serial-1of2 |
|
/test e2e-aws-ovn-serial-2of2 |
|
/retest-required |
1cf6ada to
044c077
Compare
|
/retest ci/prow/e2e-aws-ovn |
|
/retest ci/prow/e2e-aws-ovn-upgrade |
|
/test e2e-aws-ovn |
|
/test e2e-aws-ovn-upgrade |
e61a4e8 to
f17e018
Compare
|
/test e2e-aws-ovn |
f17e018 to
7d02e79
Compare
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
|
/verified |
|
@sanchezl: The DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/verified by "TestAuthorizationCacheRace" |
|
@sanchezl: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
addSubjectsToNamespace and deleteNamespaceFromSubjects mutate subjectRecord.namespaces (a sets.String / map) in place while List() iterates the same map from HTTP request goroutines. This causes a fatal "concurrent map iteration and map write" panic that crashes openshift-apiserver pods intermittently. Use a dual-path strategy: during full cache invalidation, where new stores are private to the writer goroutine, mutate in place for zero-copy performance. During incremental updates, where stores are shared with concurrent readers, use copy-on-write to create new subjectRecord objects with copied namespaces sets. Group all three cache stores behind an atomic.Pointer so they swap as a single unit during full invalidation, ensuring readers see a consistent view. This avoids the lock contention that caused the previous mutex fix (PR openshift#267) to be reverted (PR openshift#326), while also avoiding the O(n²) allocation overhead of unconditional copy-on-write.
Benchmark synchronize() at varying namespace × user scales (10/10 through 1000/1000) with the cache always expired, forcing full invalidation on every iteration.
Benchmark the incremental COW path with varying levels of subject duplication (D=1 for no duplicates, D=10 for 10x duplicates per user). This exercises the scenario where broken upstream dedup in AllowedSubjects causes redundant COW copies in addSubjectsToNamespace during incremental cache updates.
7d02e79 to
2de9d35
Compare
|
@sanchezl: This pull request references Jira Issue OCPBUGS-84534, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
No-op Guard Optimization + Upstream Kube BugFollowing @benluddy's review, this push adds two improvements: 1. No-op guards in COW path
2. Upstream kube bug identified
Benchmark:
|
| Scale | Time | Memory | Allocs |
|---|---|---|---|
| N=1000, U=100, D=1 | 2.05s | 7.1 GB | 824K |
| N=1000, U=100, D=10 | 20.5s | 71.1 GB | 8.0M |
After (with no-op guards):
| Scale | Time | Memory | Allocs |
|---|---|---|---|
| N=1000, U=100, D=1 | 130ms | 209 KB | 1K |
| N=1000, U=100, D=10 | 135ms | 215 KB | 1K |
The no-op guards eliminate ~99.99% of allocations in the incremental path. Duplicates no longer matter because the guard catches the no-op before any copying occurs.
|
/verified by "TestAuthorizationCacheRace" |
|
@sanchezl: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@sanchezl: This pull request references Jira Issue OCPBUGS-84534, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@sanchezl: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Fix a fatal
concurrent map iteration and map writepanic inAuthorizationCachethat intermittently crashes openshift-apiserver pods. The race has existed for years and was never successfully fixed.Dual-path copy-on-write with atomic store swap:
During full cache invalidation (every 15s or on RBAC changes), new stores are built privately by the writer goroutine — mutations happen in place with zero copy overhead. During incremental updates between invalidations, stores are shared with concurrent
List()readers —addSubjectsToNamespaceanddeleteNamespaceFromSubjectsuse copy-on-write to create newsubjectRecordobjects with copied namespaces sets, so readers iterate immutable snapshots. All three stores are grouped behindatomic.Pointer[authorizationCacheStores]so the full-rebuild swap is a single atomic operation — readers always see a consistent view.No-op guards skip the COW copy entirely when the namespace is already present/absent from the subject's set — this eliminates unnecessary allocations when subjects are re-processed with unchanged access (common in incremental updates, especially with duplicate subjects from upstream).
This avoids the lock contention that caused the previous mutex fix (PR #267) to be reverted (PR #326), while also avoiding the O(n²) allocation overhead of unconditional copy-on-write that was identified during code review.
Root Cause
List()(called from HTTP request goroutines viaproxy.(*REST).List) readssubjectRecord.namespaces(sets.String=map[string]Empty). Meanwhile,synchronize()(background goroutine) mutates the same maps in place:addSubjectsToNamespace():item.namespaces.Insert(namespace)deleteNamespaceFromSubjects():delete(subjectRecord.namespaces, namespace)Go's runtime detects the concurrent map read+write and kills the process with a fatal panic. Additionally, three store pointers (
userSubjectRecordStore,groupSubjectRecordStore,reviewRecordStore) were swapped non-atomically during full cache rebuilds — a concurrentList()could read stores from different generations, returning silently wrong results.Fix Strategy
Two races and one performance issue are fixed:
Map race → dual-path COW: When stores are shared with readers (incremental updates), create new
subjectRecordwith a copiedsets.Stringinstead of mutating in place. When stores are private to the writer (full invalidation rebuild), mutate in place for performance. ThecopyOnWrite boolparameter threads through the call chain to select the appropriate path.Non-atomic store swap →
atomic.Pointer: Group all three stores inauthorizationCacheStoresbehindatomic.Pointer. Full rebuilds populate private stores, then swap atomically.List()snapshots the pointer once at entry.No-op guards: In the COW path, check
Has(namespace)/!Has(namespace)before copying. When a subject already has (or already lacks) access to a namespace, skip the copy entirely. This is critical because an upstream kube bug inAllowedSubjects()returns duplicate subjects (line 124 returnssubjectsinstead ofdedupedSubjects), causing each duplicate to trigger a redundant COW copy.Alternatives Considered
Three approaches were prototyped and benchmarked (see comment below for full comparison):
bugfix/project-auth-cache-racebugfix/project-auth-cache-race-syncmapbugfix/project-auth-cache-race-rwmutexAll three are viable. Dual-path COW was chosen because it is the fastest, fully lock-free on the read path, and carries no risk of reintroducing the lock contention that caused the PR #326 revert.
Fix History
sync.RWMutexto synchronize accessList()requests (goroutine dumps showed 3-4 minute waits onRLock)List()never blocks, no O(n²) regressionRelated Issues
subject_locator.go:124returnssubjectsinstead ofdedupedSubjects, causing duplicate subjects to flow intoaddSubjectsToNamespace. Mitigated by no-op guards in this PR; upstream fix tracked separately.QA Validation
Test 1: Race condition is fixed
Test 2: No regression on large clusters (the PR #267 revert scenario)
oc get projectslatency before and after — should not regressList()should never block waiting on the sync goroutineTest 3: Cache freshness
oc projectsoutput within ~15 secondsVerification
/verified by "TestAuthorizationCacheRace"Summary by CodeRabbit
Refactor
Tests