switch correctness checks to SNR-based assertion for cuda quant int4_matmul by Gasoonjia · Pull Request #19300 · pytorch/executorch

Gasoonjia · 2026-05-05T17:20:25Z

Replace torch.allclose(atol/rtol) with an SNR (signal-to-noise ratio) assertion across all int4_matmul / int4_matvec / dequant-vs-fused tests.

Why:

test_prefill_short was flaking on CI (A10G) with max_abs_err=1.0000. Root cause: bf16 GEMM with K=2048 reduction produces output magnitudes up to ~200; at that scale, the bf16 ULP gap is 0.5-1.0. Triton fused kernel and cuBLAS reduce in different orders (and Triton autotune picks different tile configs on different hardware), so 1-ULP element-wise differences are unavoidable. atol/rtol false-fails on these outliers; SNR averages them out.
atol/rtol thresholds also depend on size: a value tuned for K=2048 is too loose for K=64 and too tight for K=4096. SNR is size-invariant (||signal|| and ||noise|| both scale with sqrt(N) and sqrt(K), canceling in the ratio).

What:

Add _assert_snr(test_case, actual, expected, label) helper that asserts 20*log10(||expected|| / ||actual-expected||) >= 50 dB.
Replace 4 call sites: TestInt4Matmul, TestInt4Matvec (x2), TestDequantThenMatmul.
50 dB ~ 0.3% RMS error: well below observed clean noise (80-90 dB) and well above any real functional bug (<20 dB SNR for wrong stride / flipped nibble / off-by-one group_idx / missing mask).

Test plan:
python -m pytest backends/cuda/tests/test_int4_matmul.py -v
-> 35/35 passed

Replace torch.allclose(atol/rtol) with an SNR (signal-to-noise ratio) assertion across all int4_matmul / int4_matvec / dequant-vs-fused tests. Why: - test_prefill_short was flaking on CI (A10G) with max_abs_err=1.0000. Root cause: bf16 GEMM with K=2048 reduction produces output magnitudes up to ~200; at that scale, the bf16 ULP gap is 0.5-1.0. Triton fused kernel and cuBLAS reduce in different orders (and Triton autotune picks different tile configs on different hardware), so 1-ULP element-wise differences are unavoidable. atol/rtol false-fails on these outliers; SNR averages them out. - atol/rtol thresholds also depend on size: a value tuned for K=2048 is too loose for K=64 and too tight for K=4096. SNR is size-invariant (||signal|| and ||noise|| both scale with sqrt(N) and sqrt(K), canceling in the ratio). What: - Add _assert_snr(test_case, actual, expected, label) helper that asserts 20*log10(||expected|| / ||actual-expected||) >= 60 dB. - Replace 4 call sites: TestInt4Matmul, TestInt4Matvec (x2), TestDequantThenMatmul. - 60 dB ~ 0.1% RMS error: well below observed clean noise (80-90 dB) and well above any real functional bug (<20 dB SNR for wrong stride / flipped nibble / off-by-one group_idx / missing mask). Test plan: python -m pytest backends/cuda/tests/test_int4_matmul.py -v -> 35/35 passed

pytorch-bot · 2026-05-05T17:20:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19300

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Ubuntu services are down

❌ 4 New Failures, 7 Pending, 3 Unrelated Failures

As of commit 85d06de with merge base acffcb0 ():

NEW FAILURES - The following jobs have failed:

pull / test-lora-linux / linux-job (gh)
RuntimeError: Command docker exec -t 61c4fc5c8edb61877a880bbdcdb1e0a79486c19fb695323e105bc59eb4243a40 /exec failed with exit code 1
pull / unittest / linux / linux-job (gh)
examples/models/llama/tests/test_static_attention.py::StaticAttentionTest::test_within_transformer
Test CUDA Builds / export-model-cuda-artifact (mistralai, Voxtral-Mini-3B-2507, quantized-int4-weight-only) / linux-job (gh)
RuntimeError: Command docker exec -t 8de4cae8354e4d3e2f980102e752ffbf7404016086f01e64f4f76a5118634e9f /exec failed with exit code 1
Test CUDA Windows Export and E2E / export-model-cuda-windows-artifact (mistralai, Voxtral-Mini-3B-2507, quantized-int4-weight-only) / linux-job (gh)
RuntimeError: Command docker exec -t 1c0bf9f14c849f761700ee7c98af0ec4f506f42e0fc608c3a7ed8c4ed0a780e2 /exec failed with exit code 1

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / test-lora-multimethod-linux / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
pull / unittest-editable / linux / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
Test CUDA Builds / export-model-cuda-artifact (openai, whisper-large-v3-turbo, quantized-int4-weight-only) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-05-05T17:21:08Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 5, 2026

Gasoonjia added the ciflow/cuda label May 5, 2026

Gasoonjia marked this pull request as ready for review May 5, 2026 17:20

Gasoonjia added 2 commits May 5, 2026 11:10

make the SNR threshould larger

a4adfd3

Merge branch 'main' into int4-matmul-snr-test

85d06de

JacobSzwejbka approved these changes May 5, 2026

View reviewed changes

Gasoonjia requested review from digantdesai and mergennachin May 5, 2026 18:12

digantdesai approved these changes May 5, 2026

View reviewed changes

Gasoonjia merged commit a0d6e9b into main May 5, 2026
209 of 220 checks passed

Gasoonjia deleted the int4-matmul-snr-test branch May 5, 2026 19:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

switch correctness checks to SNR-based assertion for cuda quant int4_matmul#19300

switch correctness checks to SNR-based assertion for cuda quant int4_matmul#19300
Gasoonjia merged 3 commits intomainfrom
int4-matmul-snr-test

Gasoonjia commented May 5, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented May 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Gasoonjia commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19300

❗ 1 Active SEVs

❌ 4 New Failures, 7 Pending, 3 Unrelated Failures

Uh oh!

github-actions Bot commented May 5, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Gasoonjia commented May 5, 2026 •

edited

Loading

pytorch-bot Bot commented May 5, 2026 •

edited

Loading

This PR needs a `release notes:` label