Skip to content

A19 (iPhone 17 Pro) GPU returns numerically wrong results in float32 workloads — deterministic HF corruption in a Demucs separation pipeline; M2/M3 clean #3702

@IARFLOW

Description

@IARFLOW

Summary

On A19 (iPhone 17 Pro Max, iOS 26.5.1) MLX produces deterministically wrong numerical results on the GPU in a real Hybrid-Transformer-Demucs source-separation pipeline. The separated audio stems contain +12 to +21 dB of spurious high-frequency energy (14–22 kHz) that is not present when the identical code, weights and parameters run on Apple Silicon Macs (M2). The corruption is:

  • deterministic (not NaN, reproducible run to run),
  • present in the output tensors themselves (we exported the raw WAVs from the device — it is not a playback/codec artifact),
  • present in every output stem (drums/bass/other/vocals), growing toward Nyquist and broadband on vocals.

This matches the A19 GPU dot-product issue independently documented by Taras Zakharko's microbenchmark "Investigating the GPU Neural Accelerators on Apple A19/M5", which reports that "choosing certain matrix dimensions produces invalid results on A19", attributed to "a bug with masking out unused lanes in the dot product hardware."

We suspect the same root cause is surfacing through MLX's Neural-Accelerator matmul path, which #3083 enables for gen >= 18 phone architectures (i.e. A19).

Environment

  • Device: iPhone 17 Pro Max (A19 Pro, 5 GPU cores), iOS 26.5.1, 12 GB
  • mlx-swift 0.30.6 (also reproduced on 0.31.4)
  • Model: Hybrid Transformer Demucs (htdemucs and htdemucs_ft), FP32 weights
  • Reference (clean): identical code/weights on M2 (both the app and the demucs-mlx-swift CLI)

Symptom (measured)

Per-band energy (dB relative to that stem's total) of each separated stem, A19 device vs. M2 CLI, identical model/params (htdemucs_ft, shifts=1, overlap=0.75, segment=5):

stem band A19 device M2 reference delta
drums 18–22 kHz −59.4 −71.8 +12.4 dB
bass 18–22 kHz −63.5 −76.1 +12.5 dB
other 14–18 / 18–22 kHz −51.8 / −57.5 −60.8 / −72.4 +8.9 / +14.9 dB
vocals 10–14 / 14–18 / 18–22 kHz −12.2 / −14.7 / −12.4 −25.2 / −29.3 / −33.0 +13.0 / +14.6 / +20.6 dB

Output peaks are not clipped (≈0.90 on both), ruling out output quantization/saturation. A third-party reference (a cloud separation of the same track) agrees closely with the M2 output, confirming the M2 result is the correct one.

What we tried (did not resolve)

  • MLX_ENABLE_TF32=0verified actually applied (we logged getenv("MLX_ENABLE_TF32") at separation time and it reads "0"): no change, the spurious HF energy persists and the output is essentially identical to TF32 on. This is the documented control for the M5/A19 Neural-Accelerator reduced-precision path (cf. [BUG] M5 float32 precision issue since 0.30.0 #3534, closed as "expected behavior — Neural Accelerators trade precision for performance"), so what we are seeing is not the expected TF32 precision tradeoff but a separate correctness issue.
  • Forcing the high-level NAX flag off in the metal backend (can_use_nax = false): the output changed but remained corrupted.
  • Forcing rfft/irfft to the CPU stream (stream: .cpu): no change.
  • Upgrading mlx-swift 0.30.6 → 0.31.4: no change.

So the corruption is not the TF32 precision mode and not fully gated by the high-level NAX flag alone. We have not yet isolated the exact op (matmul / SDPA / conv / a custom Metal kernel).

Questions

  1. Is A19 (gen >= 18 phone) intended to use the Neural-Accelerator matmul path after Fix nax condition for iphone #3083? If so, is the dot-product correctness issue Zakharko documents for certain matrix shapes on A19 known to the team?
  2. Is there a recommended workaround today to get numerically correct GPU results on A19 (e.g. a supported way to disable the NAX/tensor path on A19, or to avoid the affected matmul shapes)?
  3. Would a minimal matmul/conv GPU-vs-CPU reproducer on A19 be useful? We can build and share one, plus the full audio measurements and raw stems.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions