xds/internal: fix xds security-config race flaky test#9183
Conversation
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the clusterImplBalancer to avoid closing active certificate providers when a configuration update is received with an unchanged security configuration, and adds a corresponding test case. The review feedback highlights a potential data race on the global clientConnUpdateHook variable, suggests a defensive nil-check for b.currentSecCfg before calling Equal, and recommends copying the global blockingProvidersChan to a local variable under a lock in the test to prevent concurrent access issues.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #9183 +/- ##
==========================================
- Coverage 83.28% 83.20% -0.09%
==========================================
Files 418 419 +1
Lines 33741 33877 +136
==========================================
+ Hits 28102 28188 +86
- Misses 4232 4267 +35
- Partials 1407 1422 +15
🚀 New features to boost your workflow:
|
|
From the maintainers chat
@easwars do you think we should be avoiding RPC failures for any security config updates in general and not only for duplicates? |
Fixes #9015
Summary
Avoid recreating and closing xDS certificate provider wrappers when the incoming security configuration has not changed. This fixes a race condition where asynchronous connection handshakes fail because their active certificate provider wrappers are prematurely closed.
Root Cause
During a client connection handshake, the transport credentials layer retrieves trusted roots and identity certificates using wrappers (
singleCloseWrappedProvider) around the active certificate providers.If the balancer receives an xDS cluster configuration update while a handshake is in progress—even if the security configuration remains unchanged—it unconditionally recreated the provider wrappers and called
Close()on the old ones. This swapped the active provider reference in the old wrapper to aclosedProvider, causing the handshake to fail withprovider instance is closed.Fix
SecurityConfiginclusterImplBalancerascurrentSecCfg.handleSecurityConfig, compare the incoming security configuration withcurrentSecCfgusingconfig.Equal. Skip recreating and closing provider wrappers if the configuration has not changed.SetClientConnUpdateHookForTestinghook setter inclusterimplto allow tests to synchronize on connection state update completion.Verification
TestSecurityConfigUpdate_NoRaceOnSameConfiginclusterimpl_security_test.gothat blocks a handshake mid-flight, updates the configuration, and verifies that the handshake succeeds. The test uses channel-based synchronization and the new update hook rather than arbitrary sleeps.RELEASE NOTES: none