xds/internal : Fix Connected metric test#9181
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #9181 +/- ##
==========================================
- Coverage 83.28% 83.15% -0.14%
==========================================
Files 418 419 +1
Lines 33741 33858 +117
==========================================
+ Hits 28102 28155 +53
- Misses 4232 4276 +44
- Partials 1407 1427 +20 🚀 New features to boost your workflow:
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the TestConnectedMetric_Reconnection test in internal/xds/clients/xdsclient/test/metrics_test.go to ensure that the client's first request is received by the management server before checking connection metrics. This is implemented by introducing a requestReceived channel and utilizing the OnStreamRequest callback. There are no review comments, and the changes look correct with no further feedback to provide.
arjan-bal
left a comment
There was a problem hiding this comment.
This test already uses too many channels for synchronization (six); adding another makes it even harder to understand the test and prove that all events are correctly synchronized. Without this confidence, we'll keep introducing race conditions and fixing them after the fact. For example, here is another racy block:
grpc-go/internal/xds/clients/xdsclient/test/metrics_test.go
Lines 818 to 823 in 5c7f936
It waits for a channel to be cleared and then immediately reads from that same channel.
I suggest simplifying the test by reducing the number of channels and waiting for a single event that guarantees the metrics have been emitted before checking them. A straightforward approach would be to have the metrics recorder pass all received metrics to the test's closure. The closure can then use a channel or grpcsync.Event to signal when the target metrics are received.
…nc.Event and local helper closures
4c76588 to
1f8472c
Compare
Fixes #9141
Fix flaky TestConnectedMetric_Reconnection in xDS Client metrics test
Description / Root Cause
A race condition existed in
TestConnectedMetric_Reconnectionbetween the server-side stream open event and the client-side stream establishment registration:waitForStreamSuccess(), which unblocks when the control plane executes theOnStreamOpencallback.XDSClientConnectedto be1.NewStreamcall stack and had not yet set the internalstreamEstablishedflag totrue.0, causing the test to block indefinitely and time out.Solution
Introduced event-based synchronization to guarantee the client has completed stream establishment before scraping the metric value:
OnStreamRequestcallback on the test's management server that writes to a newrequestReceivedchannel.<-requestReceivedafter the stream succeeds.streamEstablishedflag, receiving the request on the server guarantees the client-side state is updated.This completely removes the race condition without relying on periodic polling loops or arbitrary sleeps.
Verification
100msdelay on the client runner immediately before registering stream establishment, causing 100% failure rate.RELEASE NOTES: none