Skip to content

FROMLIST: Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling#577

Merged
sgaud-quic merged 1 commit into
qualcomm-linux:qcom-6.18.yfrom
shuaz-shuai:wake_ssr_timer
May 19, 2026
Merged

FROMLIST: Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling#577
sgaud-quic merged 1 commit into
qualcomm-linux:qcom-6.18.yfrom
shuaz-shuai:wake_ssr_timer

Conversation

@shuaz-shuai
Copy link
Copy Markdown

When a Bluetooth controller encounters a coredump, it triggers the Subsystem Restart (SSR) mechanism. The controller first reports the coredump data and, once the upload is complete, sends a hw_error event. The host relies on this event to proceed with subsequent recovery actions.

If the host has not finished processing the coredump data when the hw_error event is received, it waits until either the processing is complete or the 8-second timeout expires before handling the event.

The current implementation clears QCA_MEMDUMP_COLLECTION using clear_bit(), which does not wake up waiters sleeping in wait_on_bit_timeout(). As a result, the waiting thread may remain blocked until the timeout expires even if the coredump collection has already completed.

Fix this by clearing QCA_MEMDUMP_COLLECTION with
clear_and_wake_up_bit(), which also wakes up the waiting thread and allows the hw_error handling to proceed immediately.

Test case:

  • Trigger a controller coredump using: hcitool cmd 0x3f 0c 26
  • Tested on QCA6390.
  • Capture HCI logs using btmon.
  • Verify that the delay between receiving the hw_error event and initiating the power-off sequence is reduced compared to the timeout-based behavior.

Reviewed-by: Bartosz Golaszewski bartosz.golaszewski@oss.qualcomm.com
Reviewed-by: Paul Menzel pmenzel@molgen.mpg.de
Link: https://lore.kernel.org/all/20260410095443.4167332-1-shuai.zhang@oss.qualcomm.com/

CRs-Fixed: 4498534

@shuaz-shuai shuaz-shuai requested review from a team, jingyiwang42, ndechesne and yijiyang May 13, 2026 02:56
Copy link
Copy Markdown

@shashim-quic shashim-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix FROMLIST in subject.

@shuaz-shuai shuaz-shuai changed the title Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling FROMLIST: Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling May 19, 2026
…andling

When a Bluetooth controller encounters a coredump, it triggers the
Subsystem Restart (SSR) mechanism. The controller first reports the
coredump data and, once the upload is complete, sends a hw_error
event. The host relies on this event to proceed with subsequent
recovery actions.

If the host has not finished processing the coredump data when the
hw_error event is received, it waits until either the processing is
complete or the 8-second timeout expires before handling the event.

The current implementation clears QCA_MEMDUMP_COLLECTION using
clear_bit(), which does not wake up waiters sleeping in
wait_on_bit_timeout(). As a result, the waiting thread may remain
blocked until the timeout expires even if the coredump collection
has already completed.

Fix this by clearing QCA_MEMDUMP_COLLECTION with
clear_and_wake_up_bit(), which also wakes up the waiting thread and
allows the hw_error handling to proceed immediately.

Test case:
- Trigger a controller coredump using:
    hcitool cmd 0x3f 0c 26
- Tested on QCA6390.
- Capture HCI logs using btmon.
- Verify that the delay between receiving the hw_error event and
  initiating the power-off sequence is reduced compared to the
  timeout-based behavior.

Link: https://lore.kernel.org/all/20260410095443.4167332-1-shuai.zhang@oss.qualcomm.com/
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Link: https://lore.kernel.org/stable/20251107033924.3707495-2-quic_shuaz%40quicinc.com
Signed-off-by: Shuai Zhang <shuai.zhang@oss.qualcomm.com>
@qcomlnxci qcomlnxci requested a review from a team May 19, 2026 02:07
@shuaz-shuai shuaz-shuai requested a review from shashim-quic May 19, 2026 02:18
@knaveen-qc
Copy link
Copy Markdown

PR #577 — validate-patch

PR: #577

Verdict Issues Detailed Report
⚠️ 0 Full report
Verdict: ⚠️ — click to expand

Patch Validation Report

PR: PR #577FROMLIST: Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling
Upstream: https://lore.kernel.org/all/20260410095443.4167332-1-shuai.zhang@oss.qualcomm.com/
Verdict: ⚠️ PARTIAL


Commit Message

Check Status Note
Subject matches upstream FROMLIST: Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling — well-formed prefix + subsystem path
Body preserves rationale Clearly explains clear_bit() not waking waiters in wait_on_bit_timeout(), the fix via clear_and_wake_up_bit(), and includes a test case
Fixes: tag present/correct ⚠️ Missing. This is a bug fix; a Fixes: <sha> ("Bluetooth: hci_qca: ...") tag referencing the commit that introduced clear_bit(QCA_MEMDUMP_COLLECTION, ...) should be present
Authorship preserved FROMLIST: rule: submitter and lore author are the same person (Shuai Zhang <shuai.zhang@oss.qualcomm.com>); Signed-off-by: is present
Backport note N/A Not a BACKPORT: commit
Co-developed-by misuse Not present; no issue
Second Link: tag ⚠️ Points to an older stable-tree series (20251107033924.3707495-2-quic_shuaz@quicinc.com, Nov 2025) from a different sender email (quic_shuaz@quicinc.com). This is patch 2/N of a different series — its relationship to the current patch is not documented in the commit message. If this is a prior stable submission by the same author, it should be noted (e.g., Link: <url> # earlier stable submission); if it is unrelated, it should be removed

Diff

File Status Notes
drivers/bluetooth/hci_qca.c Two identical, surgical substitutions: clear_bit(QCA_MEMDUMP_COLLECTION, &qca->flags)clear_and_wake_up_bit(QCA_MEMDUMP_COLLECTION, &qca->flags) at lines 1105 and 1183. Change is minimal and consistent with the stated fix

Upstream Patch Status

Commit Community Verdict
Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling ⏳ Decision Pending — network unavailable; could not fetch lore thread to verify ACK/NAK/merge status. The primary lore link (20260410095443.4167332-1-shuai.zhang@oss.qualcomm.com) is dated 2026-04-10 and the Reviewed-by: tags from Bartosz Golaszewski and Paul Menzel are positive signals, but no merge confirmation is available

Dependency Check

  • ✅ Single-patch series (message-id suffix -1-); no Depends-on: or prerequisite series mentioned
  • ✅ Only drivers/bluetooth/hci_qca.c is touched; no header changes required for this substitution

qcom-next Presence

Commit Status
FROMLIST: Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling ⏭️ Skipped — no git tooling or network access available; verify manually with git log origin/qcom-next --oneline --grep="Fix missing wakeup during SSR memdump handling"

Issues Found

  1. Missing Fixes: tag — The commit fixes a regression introduced when clear_bit(QCA_MEMDUMP_COLLECTION, ...) was added. A Fixes: <sha> ("Bluetooth: hci_qca: ...") tag is expected for upstream acceptance and stable-tree backport tracking. Add it by identifying the commit that introduced the clear_bit(QCA_MEMDUMP_COLLECTION, &qca->flags) call.

  2. Ambiguous second Link: tagLink: https://lore.kernel.org/stable/20251107033924.3707495-2-quic_shuaz%40quicinc.com references a Nov 2025 stable-tree patch (patch 2 of a series) sent from a different email address (quic_shuaz@quicinc.com). Its relationship to the current FROMLIST: submission is unexplained. Either:

    • Document it explicitly (e.g., add a comment like # prior stable submission) if it is a related earlier attempt, or
    • Remove it if it is not directly related to this patch.

Recommendation

The diff is clean and the fix is technically sound — two minimal, correct substitutions of clear_bit() with clear_and_wake_up_bit() in qca_controller_memdump(). Request two changes before merging: (1) add a Fixes: tag identifying the commit that introduced the clear_bit(QCA_MEMDUMP_COLLECTION, ...) calls, and (2) clarify or remove the second Link: tag pointing to the older stable-tree series from a different sender email.


Final Summary

  1. Lore link present: Yes — https://lore.kernel.org/all/20260410095443.4167332-1-shuai.zhang@oss.qualcomm.com/
  2. Lore link matches PR commits: Likely yes — the primary lore message-id date (2026-04-10) and author match the PR commit exactly; diff content is internally consistent with the stated fix; upstream patch could not be fetched to confirm byte-for-byte identity (network unavailable)
  3. Upstream patch status: ⏳ Decision Pending — two Reviewed-by: tags present (positive signal) but merge into mainline/stable could not be confirmed; patch is dated 2026-04-10 and may still be under review
  4. PR present in qcom-next: ⏭️ Skipped — no git or network access; verify manually

@knaveen-qc
Copy link
Copy Markdown

PR #577 — checker-log-analyzer

PR: #577
Checker run: https://github.com/qualcomm-linux/kernel-config/actions/runs/26071785371

Checker Result Summary
Checker Result Summary
checkpatch 0 errors, 0 warnings, 0 checks
dt-binding-check ⏭️ No changes in Documentation/devicetree/bindings
dtb-check ⏭️ No changes in arch/arm64/boot/dts/
sparse-check Passed (DTB warnings in log are pre-existing, not from this PR)
check-uapi-headers Passed
check-patch-compliance b4 fetch failed for the primary Link: URL
tag-check Subject starts with FROMLIST: — valid prefix for qcom-6.18.y
qcom-next-check N/A Target branch is qcom-6.18.y, not qcom-next

Detailed report: Full report

Checker analysis — click to expand

🤖 CI Checker Analysis (checker-log-analyzer)

PR: FROMLIST: Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling (#577)
Source: https://github.com/qualcomm-linux/kernel-config/actions/runs/26071785371

Checker Result Summary
checkpatch 0 errors, 0 warnings, 0 checks
dt-binding-check ⏭️ No changes in Documentation/devicetree/bindings
dtb-check ⏭️ No changes in arch/arm64/boot/dts/
sparse-check Passed (DTB warnings in log are pre-existing, not from this PR)
check-uapi-headers Passed
check-patch-compliance b4 fetch failed for the primary Link: URL
tag-check Subject starts with FROMLIST: — valid prefix for qcom-6.18.y
qcom-next-check N/A Target branch is qcom-6.18.y, not qcom-next

❌ check-patch-compliance

Root cause: The checker's b4 am fetch of the primary Link: URL (https://lore.kernel.org/all/20260410095443.4167332-1-shuai.zhang@oss.qualcomm.com/) failed or returned a result that did not match the committed diff, triggering "Something seems wrong with the provided link."

Failure details:

Checking commit: FROMLIST: Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling
Something seems wrong with the provided link. Please verify it
Try below command to run locally-
b4 am --single-message -C -l -3 https://lore.kernel.org/all/20260410095443.4167332-1-shuai.zhang@oss.qualcomm.com/
https://lore.kernel.org/stable/20251107033924.3707495-2-quic_shuaz%40quicinc.com

The commit has two Link: tags:

  • Link: https://lore.kernel.org/all/20260410095443.4167332-1-shuai.zhang@oss.qualcomm.com/ ← primary (checked by compliance script)
  • Link: https://lore.kernel.org/stable/20251107033924.3707495-2-quic_shuaz%40quicinc.com ← secondary (stable backport reference)

The compliance checker uses the first Link: to fetch the upstream patch via b4 and compare it to the committed diff. The failure means either:

  1. The lore URL is not yet indexed / the message-ID is wrong, or
  2. The fetched upstream patch content differs from what was committed (e.g. local adaptations were made without being documented).

Fix:

  1. Verify the lore URL is reachable and correct:

    b4 am --single-message -C -l -3 \
      https://lore.kernel.org/all/20260410095443.4167332-1-shuai.zhang@oss.qualcomm.com/ \
      -o /tmp/out
    • If b4 fails to fetch → the message-ID may be wrong or not yet indexed. Find the correct lore URL for this patch and update the Link: tag.
    • If b4 succeeds → compare the fetched diff to the committed diff:
      diff <(git format-patch -1 2c211b42815700b641ccba3f32d2eeec9d4ac360 --stdout \
               | awk '/^diff/,/^--$/' | grep -E '^[+-][^+-]') \
           <(awk '/^diff/,/^--$/' /tmp/out/*.mbx | grep -E '^[+-][^+-]')
  2. If the diff differs due to local adaptations (e.g. context changes for qcom-6.18.y), document the delta in the commit message body and ensure the Link: points to the exact upstream message that is the closest ancestor.

  3. If the URL is simply wrong, update it:

    git rebase -i <base_sha>   # mark commit as 'edit'
    git commit --amend          # fix the Link: line
    git rebase --continue

Reproduce locally:

b4 am --single-message -C -l -3 \
  https://lore.kernel.org/all/20260410095443.4167332-1-shuai.zhang@oss.qualcomm.com/ \
  -o /tmp/out

Verdict

1 blocker to fix before merge: resolve the check-patch-compliance failure by verifying the Link: URL is correct and that the committed diff matches the upstream patch fetched via b4. All other checkers pass or were legitimately skipped.

@qcomlnxci
Copy link
Copy Markdown

Test Matrix

Test Case lemans-evk monaco-evk qcs615-ride qcs6490-rb3gen2 qcs8300-ride qcs9100-ride-r3 x1e80100-crd
BT_FW_KMD_Service ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
BT_ON_OFF ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
BT_SCAN ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
CPUFreq_Validation ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
CPU_affinity ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
DSP_AudioPD ✅ Pass ✅ Pass ◻️ ✅ Pass ✅ Pass ⚠️ skip ◻️
Ethernet ⚠️ skip ✅ Pass ⚠️ skip ⚠️ skip ⚠️ skip ⚠️ skip ◻️
Freq_Scaling ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
GIC ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
IPA ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
Interrupts ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
OpenCV ✅ Pass ⚠️ skip ◻️ ✅ Pass ✅ Pass ✅ Pass ◻️
PCIe ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
Probe_Failure_Check ❌ Fail ❌ Fail ✅ Pass ❌ Fail ❌ Fail ❌ Fail ◻️
RMNET ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
UFS_Validation ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
USBHost ❌ Fail ✅ Pass ❌ Fail ❌ Fail ❌ Fail ❌ Fail ◻️
WiFi_Firmware_Driver ❌ Fail ⚠️ skip ◻️ ✅ Pass ✅ Pass ✅ Pass ◻️
WiFi_OnOff ✅ Pass ❌ Fail ◻️ ✅ Pass ✅ Pass ✅ Pass ◻️
adsp_remoteproc ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ❌ Fail ◻️
cdsp_remoteproc ✅ Pass ✅ Pass ◻️ ✅ Pass ✅ Pass ❌ Fail ◻️
gpdsp_remoteproc ✅ Pass ✅ Pass ⚠️ skip ⚠️ skip ✅ Pass ❌ Fail ◻️
hotplug ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
irq ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
kaslr ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
pinctrl ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
qcom_hwrng ✅ Pass ✅ Pass ◻️ ✅ Pass ✅ Pass ✅ Pass ◻️
remoteproc ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ❌ Fail ◻️
rngtest ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
shmbridge ❌ Fail ✅ Pass ❌ Fail ❌ Fail ❌ Fail ❌ Fail ◻️
smmu ❌ Fail ✅ Pass ❌ Fail ✅ Pass ✅ Pass ❌ Fail ◻️
watchdog ✅ Pass ✅ Pass ◻️ ✅ Pass ✅ Pass ✅ Pass ◻️
wpss_remoteproc ✅ Pass ✅ Pass ⚠️ skip ✅ Pass ✅ Pass ✅ Pass ◻️

@sgaud-quic sgaud-quic merged commit f01ea96 into qualcomm-linux:qcom-6.18.y May 19, 2026
4 of 8 checks passed
@knaveen-qc
Copy link
Copy Markdown

LAVA Failed Case Triage Summary

PR: #577

Job 101906 | SoC qcs8300-ride

LAVA job: https://lava-oss.qualcomm.com/scheduler/job/101906

Failed test cases in LAVA job 101906 (SoC: qcs8300-ride).

  Case 1: ** Probe_Failure_Check
  1. Failed case: ** Probe_Failure_Check
  2. Root cause: ** Three pre-existing driver probe failures on qcs8300-ride triggered the Probe_Failure_Check test: (1) cpufreq-dt fails with -EEXIST (-17) because qcom-cpufreq-hw already registered the cpufreq slot; (2) Aquantia AQR115C PHY fails with -EINVAL (-22) because the DT node at stmmac-0:08 is missing the required firmware-name property; (3) faux_driver regulatory fails with -ENOENT (-2) because regulatory.db is absent from the test rootfs — none of these are introduced by the PR (which only modifies hci_qca.c BT SSR wakeup logic).
  3. Possible fix: Suppress the three known-benign probe errors in the Probe_Failure_Check test allowlist (cpufreq-dt -EEXIST is harmless since qcom-cpufreq-hw owns cpufreq; AQR115C -EINVAL requires adding firmware-name to the qcs8300 MDIO DT node for the AQR115C PHY; regulatory.db -ENOENT requires including the wireless regulatory database in the test rootfs image).
  4. Detail analysis attachment: failed_case_job101906_1_detailed.md
  Case 2: ** USBHost
  1. Failed case: ** USBHost
  2. Root cause: ** The xHCI host controller (xhci-hcd.1.auto) and USB2 root hub probed successfully on qcs8300-ride, but lsusb at test time enumerated only the Linux Foundation 2.0 root hub (ID 1d6b:0002) — no external USB peripheral device is physically connected to the board's USB host port in the LAVA lab, causing the test to fail with "Only USB hubs detected, no functional USB devices."
  3. Possible fix: Attach a functional USB peripheral device (e.g. USB mass storage or HID dongle) to the qcs8300-ride board's USB host port in the LAVA lab rack and re-trigger the CI job; if no permanent USB device can be provisioned, mark USBHost as SKIP for this board/lab configuration.
  4. Detail analysis attachment: failed_case_job101906_2_detailed.md
  Case 3: ** shmbridge — Suppressed (Known Benign: shmbridge CI noise)
  1. Failed case: ** shmbridge — Suppressed (Known Benign: shmbridge CI noise)
  2. Root cause: ** The shmbridge test unconditionally fails in this CI environment due to known infrastructure noise (qcom_scm-related kernel log errors are always present on qcs8300-ride in this test harness); per lava-known-benign-failures.md Rule 1, this failure is always suppressed and does not indicate a kernel regression introduced by PR FROMLIST: Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling #577.
  3. Possible fix: No fix required — suppress this result per Rule 1 in lava-known-benign-failures.md; the PR patch (Bluetooth HCI SSR memdump fix) is entirely unrelated to shmbridge/qcom_scm and has not caused this failure.
  4. Detail analysis attachment: failed_case_job101906_3_detailed.md
  Case 4: ** `0_qcom-next-ci-premerge-tests`
  1. Failed case: ** 0_qcom-next-ci-premerge-tests
  2. Root cause: ** The LAVA dispatcher marked the overall test run as failed ("Marking unfinished test run as failed") because three sub-tests within the qcom-next-ci-premerge suite reported FAIL: Probe_Failure_Check (triggered by pre-existing cpufreq-dt probe error -17 and Aquantia AQR115C probe error -22 in the kernel log), USBHost (no functional USB devices detected beyond hubs), and shmbridge (the test's grep for qcom_scm-related errors in the kernel log matched the kernel command-line string qcom_scm.download_mode=1 as a false positive, causing a spurious FAIL). None of these failures are related to the PR patch (hci_qca.c Bluetooth SSR fix).
  3. Possible fix: No fix required in the PR; re-trigger the CI job to confirm reproducibility — the three sub-test failures (Probe_Failure_Check, USBHost, shmbridge) are pre-existing platform/test-script issues on qcs8300-ride unrelated to the hci_qca.c change. The shmbridge test script's grep pattern should be tightened to exclude the kernel command-line from its qcom_scm error scan, and Probe_Failure_Check should whitelist the known-benign cpufreq-dt -17 and Aquantia AQR115C -22 probe failures for this board.
  4. Detail analysis attachment: failed_case_job101906_4_detailed.md
Job 101907 | SoC x1e80100

LAVA job: https://lava-oss.qualcomm.com/scheduler/job/101907

Failed test cases in LAVA job 101907 (SoC: x1e80100).

  Case 1: ** Build Load Failure — HTTP download timeout
  1. Failed case: ** Build Load Failure — HTTP download timeout
  2. Root cause: ** Result: Build Load Failure — the 43 MB kernel Image artifact download from S3 (qualcomm-linux/kernel-config/26083313137-1/Image) stalled at ~0.076 MB/s and hit the hard 300-second http-download timeout at 50 % completion (21 MB transferred), with error_type: Infrastructure and no retry configured; the DTB downloaded successfully, confirming the artifact exists but the network path was severely throttled.
  3. Possible fix: Re-trigger the CI job; if the slow-download recurs, increase the http-download timeout from 300 s to 600 s and the enclosing download-retry block timeout from ~10 min to 15 min in the LAVA job definition, and add at least one retry attempt (max_retries: 2) to the download-retry block.
  4. Detail analysis attachment: failed_case_job101907_1_detailed.md
  Case 2: ** Build Load Failure — HTTP download timeout
  1. Failed case: ** Build Load Failure — HTTP download timeout
  2. Root cause: ** Result: Build Load Failure — the LAVA dispatcher's http-download action (level 1.3.1) timed out after exactly 300 seconds while downloading the 43 MB kernel Image artifact from S3 (qualcomm-linux/kernel-config/26083313137-1/Image); only ~21 MB (50%) transferred in 300 s (~0.07 MB/s) before the hard timeout fired, with no retry configured (1 of 1 attempts), causing error_type: Infrastructure.
  3. Possible fix: Re-trigger the CI job; if the timeout recurs, increase the http-download timeout from 300 s to 600 s and the download-retry block timeout from ~10 min to 15 min in the LAVA job definition to accommodate the 43 MB Image artifact over a slow or congested S3 egress path.
  4. Detail analysis attachment: failed_case_job101907_2_detailed.md
  Case 3: ** Build Load Failure — HTTP download timeout
  1. Failed case: ** Build Load Failure — HTTP download timeout
  2. Root cause: ** Result: Build Load Failure. The LAVA dispatcher timed out downloading the 43 MB kernel Image artifact from S3 (qualcomm-linux/kernel-config/26083313137-1/Image) at step 1.3.1 — the transfer stalled at ~71 KB/s, reaching only 50% (21 MB) before the hard 300-second http-download timeout expired; the exact error is "http-download timed out after 300 seconds".
  3. Possible fix: Re-trigger the CI job; if the timeout recurs, increase the http-download timeout from 300 s to 600 s and the enclosing download-retry block timeout from ~10 min to 15 min in the LAVA job definition to accommodate the degraded S3 throughput observed on this worker.
  4. Detail analysis attachment: failed_case_job101907_3_detailed.md
  Case 4: ** Build Load Failure — HTTP download timeout
  1. Failed case: ** Build Load Failure — HTTP download timeout
  2. Root cause: ** Result: Build Load Failure. The LAVA dispatcher failed at step 1.3.1 (http-download) while fetching the 43 MB kernel Image artifact from S3 (qli-prd-kernel-gh-artifacts.s3.us-west-2.amazonaws.com); transfer stalled at ~50% (21 MB) with severely degraded throughput (~0.07 MB/s) and hit the hard 300 s timeout — exact error: "http-download timed out after 300 seconds". The x1e80100 DUT was never reached.
  3. Possible fix: Re-trigger the CI job; if the timeout recurs, increase the http-download timeout from 300 s to 600 s and the download-retry block timeout from ~10 min to 15 min in the LAVA job definition, and consider increasing download-retry attempts from 1 to 3 to absorb transient S3 slowness.
  4. Detail analysis attachment: failed_case_job101907_4_detailed.md
Job 101908 | SoC qcs615-ride

LAVA job: https://lava-oss.qualcomm.com/scheduler/job/101908

Failed test cases in LAVA job 101908 (SoC: qcs615-ride).

  Case 1: smmu
  1. Failed case: smmu
  2. Root cause: Could not be determined confidently from available logs.
  3. Possible fix: Collect additional focused logs and rerun the failed test case.
  4. Detail analysis attachment: failed_case_job101908_1_detailed.md
  Case 2: ** USBHost
  1. Failed case: ** USBHost
  2. Root cause: ** No physical USB device was connected to the qcs615-ride board's USB host port at test time; lsusb enumerated only the Linux Foundation 2.0 root hub (1d6b:0002), so the test script correctly reported "Only USB hubs detected, no functional USB devices" and failed — the kernel USB host stack (xhci-hcd at 0x0a800000) is fully functional.
  3. Possible fix: Ensure a functional USB peripheral device (e.g. USB storage or HID device) is physically connected to the qcs615-ride board's USB host port before the LAVA job runs; if the board's USB host port is not wired to a device in this lab setup, update the LAVA job definition or test plan to skip USBHost for this board configuration.
  4. Detail analysis attachment: failed_case_job101908_2_detailed.md
  Case 3: ** shmbridge
  1. Failed case: ** shmbridge
  2. Root cause: ** The shmbridge test is a known CI infrastructure false positive — it fails because the test script detects qcom_scm-related messages in the kernel log (specifically the recurring qcom,fastrpc … Error: dsp information is incorrect err: -1 noise from the 62400000.remoteproc fastrpc node on qcs615-ride), which the test script incorrectly classifies as a qcom_scm error; this is pre-existing infrastructure noise unrelated to the PR under test.
  3. Possible fix: No fix required — suppress per Rule 1 of lava-known-benign-failures.md; this failure is a known CI false positive and must not block PR merge.
  4. Detail analysis attachment: failed_case_job101908_3_detailed.md
  Case 4: ** 0_qcom-next-ci-premerge-tests
  1. Failed case: ** 0_qcom-next-ci-premerge-tests
  2. Root cause: ** The rootfs image flashed to the qcs615-ride board contains kernel modules only under /lib/modules/6.19.0-00717-ge3aded47f3e5/, but the PR kernel running on the board is 6.18.25-gddbdd8bb84fc; modprobe cannot load ath11k/ath11k_pci/ath11k_ahb/cfg80211/mac80211 for the running kernel, causing the WiFi_Firmware_Driver test to fail and the lava-test-shell to time out after 419 seconds.
  3. Possible fix: Rebuild and re-flash the qcs615-ride rootfs image with modules matching kernel 6.18.25-gddbdd8bb84fc (i.e., ensure the CI artifact bundle packages the kernel and its modules together from the same build); then re-trigger the LAVA job.
  4. Detail analysis attachment: failed_case_job101908_4_detailed.md
  Case 5: ** lava-test-shell
  1. Failed case: ** lava-test-shell
  2. Root cause: ** The running kernel is 6.18.25-gddbdd8bb84fc but the rootfs contains ath11k .ko modules only under /lib/modules/6.19.0-00717-ge3aded47f3e5/; modprobe cannot load them against the running kernel, so the WiFi_Firmware_Driver test script's modprobe ath11k_pci/ath11k_ahb/ath11k_snoc/ath11k all fail, and the script then hangs in the "=== WiFi Firmware Load Evidence ===" collection phase (likely a blocking journalctl or dmesg call) until the lava-test-shell 419 s timeout fires.
  3. Possible fix: Rebuild and re-flash the rootfs image so that the kernel modules directory matches the running kernel version (6.18.25-gddbdd8bb84fc), or ensure the kernel image and rootfs are built from the same commit/version; then re-trigger the CI job.
  4. Detail analysis attachment: failed_case_job101908_5_detailed.md
  Case 6: ** lava-test-retry
  1. Failed case: ** lava-test-retry
  2. Root cause: ** The PR kernel (6.18.25-gddbdd8bb84fc) was flashed but the rootfs still contains modules only for a stale kernel version (6.19.0-00717-ge3aded47f3e5); modprobe cannot load ath11k, rmnet, ipa, or any other out-of-tree module because /lib/modules/6.18.25-gddbdd8bb84fc/ is empty/absent, causing the WiFi_Firmware_Driver test to hang waiting for a LAVA result signal that never arrives, exhausting the 419-second lava-test-shell timeout.
  3. Possible fix: Rebuild the rootfs image so that the kernel modules directory matches the PR kernel version (6.18.25-gddbdd8bb84fc), or ensure the copy-modules mechanism correctly populates /lib/modules/<running-kernel>/ from the PR build artifacts before the test suite runs.
  4. Detail analysis attachment: failed_case_job101908_6_detailed.md
  Case 7: ** job
  1. Failed case: ** job
  2. Root cause: ** The WiFi_Firmware_Driver test script (last test in the 0_qcom-next-ci-premerge-tests suite) failed to load any ath11k module (ath11k_pci/ath11k_ahb/ath11k_snoc/ath11k) on the qcs615-ride board, then stalled while collecting WiFi firmware load evidence, causing the lava-test-shell to exhaust its 419-second timeout without emitting LAVA_SIGNAL_ENDRUN — the top-level job case therefore failed with error_type: Test.
  3. Possible fix: This is a pre-existing test infrastructure / rootfs issue (ath11k modules in the rootfs are indexed under kernel 6.19.0-00717-ge3aded47f3e5 but the running kernel is 6.18.25-gddbdd8bb84fc, so modprobe cannot find them); re-trigger the CI job with a rootfs whose /lib/modules/ tree matches the PR kernel version, or ensure the test definition pins the correct rootfs artifact for this kernel build.
  4. Detail analysis attachment: failed_case_job101908_7_detailed.md
Job 101909 | SoC monaco-evk

LAVA job: https://lava-oss.qualcomm.com/scheduler/job/101909

Failed test cases in LAVA job 101909 (SoC: monaco-evk).

  Case 1: ** Probe Failure Check — Driver Probe Hard-Fail (`cpufreq-dt` probe with error -17 / `-EEXIST`)
  1. Failed case: ** Probe Failure Check — Driver Probe Hard-Fail (cpufreq-dt probe with error -17 / -EEXIST)
  2. Root cause: ** On monaco-evk, the Qualcomm platform CPUfreq driver (qcom-cpufreq-nvmem) registers successfully first during boot, so when the generic cpufreq-dt driver subsequently probes it calls cpufreq_register_driver() and receives -EEXIST (-17) because only one CPUfreq driver can be active at a time — this is expected platform behavior, not a regression introduced by the PR.
  3. Possible fix: Add cpufreq-dt probe failure with error -17 on Qualcomm platforms to the Probe_Failure_Check test's suppression/allowlist, since this is a known and benign driver-conflict artifact on SoCs that use a dedicated Qualcomm CPUfreq driver; no kernel change is required.
  4. Detail analysis attachment: failed_case_job101909_1_detailed.md
  Case 2: ** WiFi_OnOff
  1. Failed case: ** WiFi_OnOff
  2. Root cause: ** Both qcom-pcie controllers on monaco-evk (1c00000.pci and 1c10000.pci) report "Phy link never came up" at kernel boot time because the PCIe PHY power supplies (vdda, vddpe-3v3) are absent from the DT and fall back to dummy regulators, preventing PCIe link training and leaving the WCN6855 WiFi chip unenumerated; ath11k_pci loads but finds no PCIe endpoint, so no wlan interface is ever created.
  3. Possible fix: Add vdda-supply and vddpe-3v3-supply phandle entries pointing to the correct PMIC regulators in the monaco-evk DTS for both PCIe nodes (pci@1c00000 and pci@1c10000), and include ath11k WCN6855 firmware under /lib/firmware/ath11k/WCN6855/hw2.0/ in the LAVA test rootfs image.
  4. Detail analysis attachment: failed_case_job101909_2_detailed.md
  Case 3: ** `0_qcom-next-ci-premerge-tests` — Test Suite Failure: `Probe_Failure_Check` FAIL + `WiFi_OnOff` FAIL causing LAVA to mark the overall test run as failed
  1. Failed case: ** 0_qcom-next-ci-premerge-tests — Test Suite Failure: Probe_Failure_Check FAIL + WiFi_OnOff FAIL causing LAVA to mark the overall test run as failed
  2. Root cause: ** The Probe_Failure_Check sub-test failed because the kernel log contains cpufreq-dt cpufreq-dt: probe with driver cpufreq-dt failed with error -17 (EEXIST — driver already registered) and two Bluetooth firmware-load errors (qca/wcnhpbtfw21.tlv and qca/hpbtfw21.tlv failed with error -2/ENOENT); additionally WiFi_OnOff failed because no WiFi interface appeared (ath11k_pci loaded but no ieee80211 interface enumerated, consistent with missing WiFi firmware on the monaco-evk initramfs). Neither failure is introduced by PR FROMLIST: Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling #577 — the PR only touches drivers/bluetooth/hci_qca.c (SSR memdump wakeup path), which is unrelated to cpufreq-dt probe ordering or WiFi firmware availability.
  3. Possible fix: The cpufreq-dt EEXIST probe failure and missing WiFi/BT firmware are pre-existing platform issues on monaco-evk unrelated to this PR; add cpufreq-dt: probe with driver cpufreq-dt failed with error -17, wcnhpbtfw21.tlv failed with error -2, and hpbtfw21.tlv failed with error -2 to the Probe_Failure_Check allowlist (or ensure the monaco-evk firmware initramfs includes the required ath11k and QCA BT firmware blobs) so that known-benign platform noise does not block PR validation.
  4. Detail analysis attachment: failed_case_job101909_3_detailed.md
Job 101912 | SoC lemans-evk

LAVA job: https://lava-oss.qualcomm.com/scheduler/job/101912

Failed test cases in LAVA job 101912 (SoC: lemans-evk).

  Case 1: ** Probe_Failure_Check
  1. Failed case: ** Probe_Failure_Check
  2. Root cause: ** All 3 entries in probe_failures.log are known-benign firmware load errors: two Bluetooth QCA firmware files (qca/wcnhpbtfw21.tlv, qca/hpbtfw21.tlv) and the WiFi regulatory database (regulatory.db) failed with -ENOENT (error -2) at early boot, but BT_ON_OFF and WiFi_OnOff both passed, confirming the firmware loaded correctly at runtime; the Probe_Failure_Check test script does not apply the same suppression logic and flagged these as failures.
  3. Possible fix: Suppress these three firmware-load patterns in the Probe_Failure_Check test script (or its filter list) by adding qca/wcnhpbtfw21.tlv, qca/hpbtfw21.tlv, and regulatory.db firmware-load-failure lines to the test's known-benign exclusion list, mirroring the suppression already applied by Rules 2 and 3 in lava-known-benign-failures.md.
  4. Detail analysis attachment: failed_case_job101912_1_detailed.md
  Case 2: ** smmu — Missing IOMMU Group Attachment for Critical Masters (Video + Audio)
  1. Failed case: ** smmu — Missing IOMMU Group Attachment for Critical Masters (Video + Audio)
  2. Root cause: ** The SMMU test script detected that two critical DMA masters — aa00000.video-codec (Video) and interconnect-lpass-ag-noc (Audio) — are not attached to any IOMMU group at runtime on the lemans-evk (IQ-9075-EVK) board; neither device was enumerated or probed by the kernel, so the IOMMU framework never assigned them a group, causing the test's critical-master protection check to fail.
  3. Possible fix: Add or correct the iommus / iommu-map DT property for aa00000.video-codec and interconnect-lpass-ag-noc in the lemans-evk device tree so both devices are mapped to the ARM SMMU at 15000000.iommu or 15200000.iommu; alternatively, if these devices are intentionally not IOMMU-protected on this platform, update the SMMU test script's critical-master list to exclude them for the IQ-9075-EVK target.
  4. Detail analysis attachment: failed_case_job101912_2_detailed.md
  Case 3: ** USBHost
  1. Failed case: ** USBHost
  2. Root cause: ** The USBHost test failed because lsusb enumerated only two Genesys Logic USB hubs (05e3:0610 HS, 05e3:0625 SS+) on the lemans-evk board's xHCI bus with no functional (non-hub) USB device physically attached to any downstream port; the test script treats a hub-only result as a failure.
  3. Possible fix: Attach a functional USB peripheral (e.g. USB mass-storage device or USB-to-serial dongle) to a downstream port of the Genesys Logic hub on the lemans-evk LAVA board, or update the USBHost test script to accept a hub-only enumeration as PASS when no peripheral is expected in the CI lab setup.
  4. Detail analysis attachment: failed_case_job101912_3_detailed.md
  Case 4: ** shmbridge — Suppressed (Known Benign: CI Infrastructure Noise)
  1. Failed case: ** shmbridge — Suppressed (Known Benign: CI Infrastructure Noise)
  2. Root cause: ** The shmbridge test script flagged qcom_scm-related errors detected in current-boot kernel log, but the only matching kernel log line is the informational message qcom_scm firmware:scm: qseecom: untested machine, skipping — a benign, expected log on lemans-evk (IQ-9075-EVK) that the test script incorrectly classifies as an error; per lava-known-benign-failures.md Rule 1, shmbridge failures are always suppressed as known CI infrastructure noise.
  3. Possible fix: No fix required — suppress this result per Rule 1; the shmbridge test script's error-pattern matching is a known false positive on this platform and does not indicate any kernel regression introduced by PR FROMLIST: Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling #577.
  4. Detail analysis attachment: failed_case_job101912_4_detailed.md
  Case 5: ** WiFi_Firmware_Driver — **Suppressed (Known Benign: WiFi Firmware False Positive — WiFi ON/OFF Passed)**
  1. Failed case: ** WiFi_Firmware_Driver — Suppressed (Known Benign: WiFi Firmware False Positive — WiFi ON/OFF Passed)
  2. Root cause: ** The WiFi_Firmware_Driver test probed ath12k module load state before the ath12k driver had fully initialised on the lemans-evk (IQ-9075-EVK) board; the functional WiFi ON/OFF test (WiFi_OnOff) passed immediately after, confirming WiFi firmware loaded and the interface (wlp1s0 via ath11k_pci) operated correctly — this is a known false positive per suppression Rule 2.
  3. Possible fix: No fix required — suppress this result per Rule 2 in lava-known-benign-failures.md; the WiFi stack is functional on this board as confirmed by WiFi_OnOff PASS.
  4. Detail analysis attachment: failed_case_job101912_5_detailed.md
  Case 6: ** `0_qcom-next-ci-premerge-tests`
  1. Failed case: ** 0_qcom-next-ci-premerge-tests
  2. Root cause: ** The LAVA test shell ran to completion but LAVA marked the overall test definition as failed because two sub-tests reported explicit FAIL results: smmu (two critical IOMMU masters — aa00000.video-codec and interconnect-lpass-ag-noc — had no IOMMU group attachment on the lemans-evk) and Probe_Failure_Check (Bluetooth firmware files qca/wcnhpbtfw21.tlv and qca/hpbtfw21.tlv missing from the rootfs, plus regulatory.db absent); both failures are pre-existing infra/rootfs issues unrelated to the PR.
  3. Possible fix: No action required on the PR (drivers/bluetooth/hci_qca.c SSR wakeup fix is unrelated); the smmu failure requires the DT/driver to register aa00000.video-codec and interconnect-lpass-ag-noc with the IOMMU on lemans-evk (pre-existing platform gap), and the Probe_Failure_Check failure requires the missing Bluetooth firmware blobs (qca/wcnhpbtfw21.tlv, qca/hpbtfw21.tlv) and regulatory.db to be added to the rootfs image for this board.
  4. Detail analysis attachment: failed_case_job101912_6_detailed.md
Job 101915 | SoC qcs6490-rb3gen2

LAVA job: https://lava-oss.qualcomm.com/scheduler/job/101915

Failed test cases in LAVA job 101915 (SoC: qcs6490-rb3gen2).

  Case 1: ** Probe_Failure_Check
  1. Failed case: ** Probe_Failure_Check
  2. Root cause: ** The Probe_Failure_Check test failed because the kernel firmware loader could not find regulatory.db at boot (faux_driver regulatory: Direct firmware load for regulatory.db failed with error -2, -ENOENT), which cfg80211 requires for wireless regulatory domain initialization; the file is absent from the test rootfs /lib/firmware/ tree, and the test script flags this dmesg line as a firmware-related probe error.
  3. Possible fix: Install regulatory.db (from the wireless-regdb package) into the test rootfs under /lib/firmware/regulatory.db so the cfg80211 firmware request succeeds at boot; alternatively, add a suppression rule in the Probe_Failure_Check test script for the known-benign faux_driver regulatory: Direct firmware load for regulatory.db message when no WiFi functional failure is present.
  4. Detail analysis attachment: failed_case_job101915_1_detailed.md
  Case 2: ** USBHost
  1. Failed case: ** USBHost
  2. Root cause: ** The pmic-glink connector (/pmic-glink/connector@0) failed to create device links with the EUD role-switch (eud-path0-role-switch) and Type-C mux (1-001c) suppliers on QCS6490-RB3Gen2, preventing the DWC3 USB controller from switching to host mode; both USB ports remained in peripheral/gadget mode and no host-mode root hub was created, so lsusb found zero devices.
  3. Possible fix: This is a pre-existing platform issue unrelated to PR FROMLIST: Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling #577 (which only modifies hci_qca.c); investigate the pmic-glink role-switch supplier registration order on QCS6490 — verify that the EUD driver (88e0000.eud) and the I2C Type-C mux (1-001c) probe successfully before pmic-glink attempts to create device links, and check whether a DT status = "disabled" or missing dr_mode = "host" property on the USB nodes is preventing host-mode activation.
  4. Detail analysis attachment: failed_case_job101915_2_detailed.md
  Case 3: ** shmbridge — Suppressed (Known Benign: shmbridge CI noise)
  1. Failed case: ** shmbridge — Suppressed (Known Benign: shmbridge CI noise)
  2. Root cause: ** The shmbridge test script incorrectly matched the string qcom_scm.download_mode=1 in the kernel command line as a "qcom_scm-related error," triggering a false FAIL; this is a known test-script false positive unrelated to any kernel regression, and is always suppressed per CI suppression Rule 1.
  3. Possible fix: No kernel fix required — suppress this result as known benign CI noise; the shmbridge test script's error-detection regex should be tightened to exclude kernel command-line parameter strings (e.g. qcom_scm.download_mode=1) from its "qcom_scm error" scan pattern.
  4. Detail analysis attachment: failed_case_job101915_3_detailed.md
  Case 4: ** 0_qcom-next-ci-premerge-tests
  1. Failed case: ** 0_qcom-next-ci-premerge-tests
  2. Root cause: ** The LAVA test shell completed normally but the overall test definition was marked failed because three sub-cases reported FAIL: Probe_Failure_Check (benign regulatory.db firmware-load error -2 from faux_driver, a known false positive on qcs6490-rb3gen2), USBHost (no USB peripheral attached to the board's host port in the lab), and shmbridge (test script false positive — grep for qcom_scm-related errors matches the kernel command line string qcom_scm.download_mode=1 rather than a real error). None of these failures are introduced by the PR patch (hci_qca.c Bluetooth memdump fix).
  3. Possible fix: Suppress the three known false-positive sub-cases in the qcom-next-ci-premerge test plan for qcs6490-rb3gen2: (1) exclude regulatory.db load errors from Probe_Failure_Check's denylist, (2) mark USBHost as skip when no USB device is present in the lab, and (3) fix the shmbridge grep pattern to exclude kernel command-line matches; re-trigger the CI job to confirm the PR itself is clean.
  4. Detail analysis attachment: failed_case_job101915_4_detailed.md
Job 101916 | SoC qcs9100-ride

LAVA job: https://lava-oss.qualcomm.com/scheduler/job/101916

Failed test cases in LAVA job 101916 (SoC: qcs9100-ride).

  Case 1: ** Remoteproc Boot Failure — PAS/SCM firmware init error (-EINVAL)
  1. Failed case: ** Remoteproc Boot Failure — PAS/SCM firmware init error (-EINVAL)
  2. Root cause: ** On qcs9100-ride-sx (sa8775p), qcom_scm logs qseecom: untested machine, skipping at boot, leaving the PAS interface uninitialized; all subsequent qcom_q6v5_pas firmware init calls return error -22 (EINVAL), causing both CDSP instances (remoteproc2/3) to remain offline.
  3. Possible fix: Add qcs9100/sa8775p to the tested-machine list in the qcom_scm QSEECOM initialization path (or ensure the correct SCM/TZ firmware version is paired with this kernel), so that qcom_scm_pas_init_image() succeeds for all remoteproc subsystems; this failure is pre-existing and unrelated to the PR under test (BT HCI memdump wakeup fix).
  4. Detail analysis attachment: failed_case_job101916_1_detailed.md
  Case 2: ** Remoteproc Boot Failure — PAS firmware init error
  1. Failed case: ** Remoteproc Boot Failure — PAS firmware init error
  2. Root cause: ** On qcs9100-ride (sa8775p), qcom_scm logs qseecom: untested machine, skipping at boot, leaving the SCM/PAS interface uninitialised; every subsequent qcom_q6v5_pas call to qcom_scm_pas_init_image() for qcom/sa8775p/adsp.mbn returns -EINVAL (-22), preventing ADSP (and all other DSPs) from booting.
  3. Possible fix: Add the qcs9100/sa8775p SoC to the qcom_scm qseecom tested-machine allowlist (or resolve why it is being skipped) so that qcom_scm_pas_init_image() succeeds; this failure is pre-existing and unrelated to the PR under test (Bluetooth hci_qca.c change).
  4. Detail analysis attachment: failed_case_job101916_2_detailed.md
  Case 3: ** Remoteproc Boot Failure — PAS/SCM EINVAL on firmware init
  1. Failed case: ** Remoteproc Boot Failure — PAS/SCM EINVAL on firmware init
  2. Root cause: ** On qcs9100-ride (sa8775p), qcom_scm qseecom driver logs untested machine, skipping at boot (line 1399), leaving the SCM PAS authentication interface non-functional; every subsequent qcom_q6v5_pas firmware init call returns error -22 (EINVAL), causing both gpdsp0 (20c00000.remoteproc) and gpdsp1 (21c00000.remoteproc) to remain offline. This is a pre-existing platform integration issue unrelated to PR FROMLIST: Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling #577.
  3. Possible fix: Add the sa8775p machine ID to the qseecom tested-machine list in drivers/firmware/qcom/qcom_scm.c (or the relevant qseecom init path) so that SCM PAS authentication is enabled on qcs9100-ride, allowing qcom_q6v5_pas to successfully authenticate and boot gpdsp0/gpdsp1 firmware.
  4. Detail analysis attachment: failed_case_job101916_3_detailed.md
  Case 4: ** Remoteproc Boot Failure — PAS firmware initialization error -22 (EINVAL) on all DSP subsystems
  1. Failed case: ** Remoteproc Boot Failure — PAS firmware initialization error -22 (EINVAL) on all DSP subsystems
  2. Root cause: ** All five remoteproc subsystems (gpdsp0, gpdsp1, cdsp0, cdsp1, adsp) fail at PAS firmware initialization with error -22 (EINVAL) because qcom_scm reports qseecom: untested machine, skipping for the qcs9100-ride SoC, causing the SCM/PAS init_image call to return -EINVAL and blocking every subsystem from reaching running state.
  3. Possible fix: Add the qcs9100 (sa8775p) SoC to the qseecom allow-list in drivers/firmware/qcom/qcom_scm.c so that PAS firmware authentication proceeds; this is a pre-existing platform integration gap unrelated to the PR under test — the PR (hci_qca.c Bluetooth fix) should not be blocked by this failure.
  4. Detail analysis attachment: failed_case_job101916_4_detailed.md
  Case 5: ** Probe_Failure_Check
  1. Failed case: ** Probe_Failure_Check
  2. Root cause: ** The Probe_Failure_Check test scans dmesg for probe/firmware errors and found two pre-existing failures: (1) Aquantia AQR115C stmmac-0:08 PHY probe hard-fails with -EINVAL because the DT node for the PHY at MDIO address 0x08 is missing the mandatory firmware-name property; (2) faux_driver regulatory fails to load regulatory.db with -ENOENT because the file is absent from the rootfs firmware paths on this qcs9100-ride-sx image. Neither failure is introduced by this PR.
  3. Possible fix: Add the missing firmware-name property to the Aquantia AQR115C PHY DT node at MDIO address 0x08 in the qcs9100-ride DTS, and include regulatory.db in the rootfs firmware package for qcs9100-ride-sx; both are pre-existing platform issues unrelated to the Bluetooth hci_qca SSR fix in this PR.
  4. Detail analysis attachment: failed_case_job101916_5_detailed.md
  Case 6: ** smmu — Critical Master IOMMU Coverage Failure
  1. Failed case: ** smmu — Critical Master IOMMU Coverage Failure
  2. Root cause: ** Two critical DMA masters on qcs9100-ride — aa00000.video-codec (Video/Iris) and interconnect-lpass-ag-noc (Audio LPASS) — are absent from all IOMMU groups at test time: the video-codec device never completes probe because its VPU firmware (vpu30_p4_s6_16mb.mbn) fails to load with -EINVAL, and the LPASS AG-NOC interconnect device has no IOMMU group attachment (missing or absent iommus DT binding), causing the SMMU test's mandatory critical-master coverage check to fail.
  3. Possible fix: (1) Verify and add qcom/vpu/vpu30_p4_s6_16mb.mbn to the LAVA rootfs firmware package for qcs9100-ride so the iris video-codec driver can complete probe and attach to an IOMMU group; (2) add or verify the iommus DT property for the interconnect-lpass-ag-noc node in arch/arm64/boot/dts/qcom/qcs9100.dtsi with the correct SMMU SID.
  4. Detail analysis attachment: failed_case_job101916_6_detailed.md
  Case 7: ** USBHost
  1. Failed case: ** USBHost
  2. Root cause: ** The xHCI host controllers on qcs9100-ride probed and registered successfully (buses 1/2/3 with root hubs only), but no USB peripheral device was physically connected to any host-capable port at test execution time — lsusb returned only three root hub entries (1d6b:0002 × 2, 1d6b:0003 × 1), causing the test script to fail with "Only USB hubs detected, no functional USB devices."
  3. Possible fix: Ensure a USB peripheral device (e.g., USB storage stick or USB hub with a downstream device) is physically connected to the qcs9100-ride board's host-capable USB port in the LAVA lab before the test runs; if the board is correctly cabled, re-trigger the job to rule out a transient enumeration miss.
  4. Detail analysis attachment: failed_case_job101916_7_detailed.md
  Case 8: ** shmbridge
  1. Failed case: ** shmbridge
  2. Root cause: ** The shmbridge test script incorrectly flags the benign kernel log message qcom_scm firmware:scm: qseecom: untested machine, skipping as a qcom_scm-related error, triggering a false FAIL; this is a known CI infrastructure false positive on qcs9100-ride and is unrelated to the PR under test.
  3. Possible fix: No action required — suppress this result per Rule 1 of lava-known-benign-failures.md; the shmbridge test is known CI noise and does not indicate a kernel regression.
  4. Detail analysis attachment: failed_case_job101916_8_detailed.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants