providers/base: fix race condition in turn_down_nm_connections (BugFix)#2449
providers/base: fix race condition in turn_down_nm_connections (BugFix)#2449eugene-yujinwu wants to merge 7 commits into
Conversation
nmcli returns exit code 10 when a connection is not active. There is a race condition where _get_nm_wireless_connections() reports a connection as 'activated', but by the time 'nmcli c down <uuid>' is called the connection has already gone inactive. This caused sp.check_call() to raise CalledProcessError, failing the test. Replace sp.check_call() with sp.call() and treat exit code 10 as success, since the goal (connection is down) has already been achieved. Any other non-zero exit code still raises CalledProcessError. Add a unit test to cover the race condition (exit code 10 must not raise). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2449 +/- ##
==========================================
+ Coverage 58.74% 58.78% +0.04%
==========================================
Files 475 476 +1
Lines 47936 48005 +69
Branches 8559 8569 +10
==========================================
+ Hits 28159 28222 +63
- Misses 18883 18887 +4
- Partials 894 896 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
738c5db to
a8e6158
Compare
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Fixes a race condition in turn_down_nm_connections() where nmcli c down can legitimately return exit code 10 (“not an active connection”) if the connection deactivates between query and teardown, preventing spurious test failures.
Changes:
- Switch from
subprocess.check_call()tosubprocess.call()and treat return code10as a successful “already inactive” outcome. - Add a unit test for the “already inactive” case and update existing tests to mock
sp.call().
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| providers/base/bin/wifi_nmcli_test.py | Makes connection teardown resilient to nmcli’s rc=10 by handling return codes explicitly. |
| providers/base/tests/test_wifi_nmcli_test.py | Updates mocks for sp.call() and adds coverage for the rc=10 “already inactive” scenario. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
In this doc the exit code |
tomli380576
left a comment
There was a problem hiding this comment.
See Hanhsuan's comment
| ret = sp.call(shlex.split(cmd)) | ||
| # exit code 10: connection is not active — already down, that's fine | ||
| if ret not in (0, 10): | ||
| raise sp.CalledProcessError(ret, cmd) |
There was a problem hiding this comment.
optional: I think you can use try-catch here to avoid manually constructing the error
try:
sp.check_call(["nmcli", "connection", "down", uuid])
except sp.CalledProcessError as e:
if e.returncode == 10:
print("WARN: connection {} doesn't exist, ignoring".format(uuid))
else:
raise eThere was a problem hiding this comment.
Thanks for the suggestion! We've actually taken it a step further in the latest commit. Rather than catching the error after the fact, we now pre-check the specific connection's state via UUID before issuing the down command:
nmcli -t -f GENERAL.STATE c show <uuid>
If the connection is no longer activated, we skip it. This removes the need to interpret exit code 10 entirely, and check_call reverts to its default behavior — any real error still raises as expected.
Replace sp.call() + manual return code check with sp.check_call() wrapped in a try/except block, as suggested in code review. This is more Pythonic: check_call() expresses the expectation of success, while the except clause clearly documents the one tolerated exception (exit code 10: connection already inactive). Also use bare 'raise' instead of 'raise e' to preserve the original traceback. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…turn down Instead of catching exit code 10 from 'nmcli c down', re-check the specific connection's state via UUID right before issuing the down command: nmcli -t -f GENERAL.STATE c show <uuid> If the connection is no longer activated (e.g. it was deactivated by NM automatically after another connection was brought down), skip it with a warning instead of calling 'nmcli c down'. This avoids relying on exit code interpretation and makes the intent explicit. Updated tests mock sp.check_output for the per-UUID state check and add a dedicated test case for the already-inactive scenario. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_turn_down_connection_already_inactive( | ||
| self, get_connections_mock, sp_check_call_mock, sp_check_output_mock | ||
| ): | ||
| # UUID state re-check shows connection already inactive — skip down | ||
| turn_down_nm_connections() | ||
| sp_check_output_mock.assert_called_once_with( | ||
| "nmcli -t -f GENERAL.STATE c show uuid1".split(), | ||
| universal_newlines=True, | ||
| ) | ||
| sp_check_call_mock.assert_not_called() | ||
|
|
There was a problem hiding this comment.
The new test test_turn_down_connection_already_inactive covers the pre-check path (state becomes deactivated before calling down), but it doesn’t cover the failure reported in the PR description where nmcli c down returns exit code 10. Add a unit test that simulates the down command returning/raising with rc=10 (e.g., check_call raising CalledProcessError(returncode=10, ...) or sp.call returning 10, depending on the chosen implementation) and assert turn_down_nm_connections() does not raise.
|
Thanks for raising this — you're right that exit code 10 can mean multiple things ("connection/device/AP does not exist"), making it risky to ignore unconditionally. Instead of relying on exit code interpretation, the latest commit takes a different approach: before calling If the state is no longer |
The pre-check (nmcli -t -f GENERAL.STATE c show <uuid>) reduces the race window, but a connection can still become inactive in the narrow gap between the pre-check and the actual 'nmcli c down' call. Add a try/except around check_call to treat exit code 10 as success in that case, and add a dedicated test to cover this path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@eugene-yujinwu two things:
|
If nmcli c down returns exit code 10 in the narrow window after the pre-check, let it fail rather than silently ignoring it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sorry, I forgot to put it in draft PR as I still modify it. |
Problem
There is a race condition in
turn_down_nm_connections()inwifi_nmcli_test.py.The function calls
_get_nm_wireless_connections()to list active connections, then runsnmcli c down <uuid>for each activated one. However, between the query and thenmcli c downcall, the connection may have already become inactive (e.g., due to a network event or prior disconnect).In this case
nmclireturns exit code 10 ("not an active connection"), which causedsp.check_call()to raiseCalledProcessErrorand crash the test — even though the connection being down is exactly the desired outcome.Observed failure:
https://warthogs.atlassian.net/browse/SUTTON-4809
https://certification.canonical.com/hardware/202603-38441/submission/482952/test/84491/result/57704127/
https://certification.canonical.com/hardware/202602-38419/submission/481505/test/221519/result/57390085/
Fix
Replace
sp.check_call()withsp.call()and explicitly handle the return code:0: success10: connection already inactive — goal achieved, treat as successCalledProcessErrorTests
sp.callinstead ofsp.check_calltest_turn_down_connection_already_inactiveto verify exit code 10 is handled gracefullyRe-run the test with the new way which use statce check other than ignore exit 10:
[CE-QA-PC]: Automatic (BugFix) #2449_use_state_check test for Paris-P-SIT-S15
https://certification.canonical.com/hardware/202603-38440/submission/483385/