FROMLIST: Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling#577
Conversation
…andling
When a Bluetooth controller encounters a coredump, it triggers the
Subsystem Restart (SSR) mechanism. The controller first reports the
coredump data and, once the upload is complete, sends a hw_error
event. The host relies on this event to proceed with subsequent
recovery actions.
If the host has not finished processing the coredump data when the
hw_error event is received, it waits until either the processing is
complete or the 8-second timeout expires before handling the event.
The current implementation clears QCA_MEMDUMP_COLLECTION using
clear_bit(), which does not wake up waiters sleeping in
wait_on_bit_timeout(). As a result, the waiting thread may remain
blocked until the timeout expires even if the coredump collection
has already completed.
Fix this by clearing QCA_MEMDUMP_COLLECTION with
clear_and_wake_up_bit(), which also wakes up the waiting thread and
allows the hw_error handling to proceed immediately.
Test case:
- Trigger a controller coredump using:
hcitool cmd 0x3f 0c 26
- Tested on QCA6390.
- Capture HCI logs using btmon.
- Verify that the delay between receiving the hw_error event and
initiating the power-off sequence is reduced compared to the
timeout-based behavior.
Link: https://lore.kernel.org/all/20260410095443.4167332-1-shuai.zhang@oss.qualcomm.com/
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Link: https://lore.kernel.org/stable/20251107033924.3707495-2-quic_shuaz%40quicinc.com
Signed-off-by: Shuai Zhang <shuai.zhang@oss.qualcomm.com>
a2c8448 to
2c211b4
Compare
PR #577 — validate-patchPR: #577
|
PR #577 — checker-log-analyzerPR: #577
Detailed report: Full report
|
Test Matrix
|
LAVA Failed Case Triage SummaryPR: #577 Job 101906 | SoC qcs8300-rideLAVA job: https://lava-oss.qualcomm.com/scheduler/job/101906 Failed test cases in LAVA job 101906 (SoC: qcs8300-ride).
Job 101907 | SoC x1e80100LAVA job: https://lava-oss.qualcomm.com/scheduler/job/101907 Failed test cases in LAVA job 101907 (SoC: x1e80100).
Job 101908 | SoC qcs615-rideLAVA job: https://lava-oss.qualcomm.com/scheduler/job/101908 Failed test cases in LAVA job 101908 (SoC: qcs615-ride).
Job 101909 | SoC monaco-evkLAVA job: https://lava-oss.qualcomm.com/scheduler/job/101909 Failed test cases in LAVA job 101909 (SoC: monaco-evk).
Job 101912 | SoC lemans-evkLAVA job: https://lava-oss.qualcomm.com/scheduler/job/101912 Failed test cases in LAVA job 101912 (SoC: lemans-evk).
Job 101915 | SoC qcs6490-rb3gen2LAVA job: https://lava-oss.qualcomm.com/scheduler/job/101915 Failed test cases in LAVA job 101915 (SoC: qcs6490-rb3gen2).
Job 101916 | SoC qcs9100-rideLAVA job: https://lava-oss.qualcomm.com/scheduler/job/101916 Failed test cases in LAVA job 101916 (SoC: qcs9100-ride).
|
When a Bluetooth controller encounters a coredump, it triggers the Subsystem Restart (SSR) mechanism. The controller first reports the coredump data and, once the upload is complete, sends a hw_error event. The host relies on this event to proceed with subsequent recovery actions.
If the host has not finished processing the coredump data when the hw_error event is received, it waits until either the processing is complete or the 8-second timeout expires before handling the event.
The current implementation clears QCA_MEMDUMP_COLLECTION using clear_bit(), which does not wake up waiters sleeping in wait_on_bit_timeout(). As a result, the waiting thread may remain blocked until the timeout expires even if the coredump collection has already completed.
Fix this by clearing QCA_MEMDUMP_COLLECTION with
clear_and_wake_up_bit(), which also wakes up the waiting thread and allows the hw_error handling to proceed immediately.
Test case:
Reviewed-by: Bartosz Golaszewski bartosz.golaszewski@oss.qualcomm.com
Reviewed-by: Paul Menzel pmenzel@molgen.mpg.de
Link: https://lore.kernel.org/all/20260410095443.4167332-1-shuai.zhang@oss.qualcomm.com/
CRs-Fixed: 4498534