Metal backend: Enable Voxtral Realtime by manuelcandales · Pull Request #17536 · pytorch/executorch

manuelcandales · 2026-02-18T19:59:11Z

This pull request adds Metal backend support for the Voxtral Realtime model (offline mode, streaming support is coming next)

Voxtral Realtime Metal backend support:

Added Metal backend support to export_voxtral_rt.py, including custom decompositions and dynamic shape handling for Metal/AOTI compatibility. Also added validation for Metal-specific quantization (fpa4w).
Updated export_model_artifact.sh and test_model_e2e.sh scripts to support a new mode parameter. This is used in the Voxtral Realtime model for selecting between streaming/offline export modes.

CI/CD and workflow updates:

Added quantized Voxtral-Mini-4B-Realtime-2602 to Metal workflow matrix.
Added new CMake presets for building and testing the Voxtral Realtime runner with Metal backend.

pytorch-bot · 2026-02-18T19:59:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17536

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 14 Pending

As of commit ab8aaa2 with merge base 119a099 ():

NEW FAILURES - The following jobs have failed:

pull / android / run-emulator (gh)
The process '/usr/bin/sh' failed with exit code 1
pull / test-openvino-linux / linux-job (gh)
RuntimeError: Command docker exec -t bad5a62a029ec9aba3f621ff8da168d809e6a29a071ac11c958748e07860bac6 /exec failed with exit code 1
pull / unittest-nxp-neutron / linux-job (gh)
RuntimeError: Command docker exec -t c06130f5d6379bfc3aff56e5fedb77b2310ff1c4d92121735b6c49ae9c614313 /exec failed with exit code 1
trunk / test-torchao-huggingface-checkpoints (lfm2_5_1_2b, linux.2xlarge, executorch-ubuntu-22.04-clang12... / linux-job (gh)
RuntimeError: Command docker exec -t 994382b9479a62ba26aa93bf6277af1f17866d4d2eb770c7cadedac20839077f /exec failed with exit code 1
trunk / test-torchao-huggingface-checkpoints (lfm2_5_1_2b, linux.arm64.2xlarge, executorch-ubuntu-22.04-g... / linux-job (gh)
RuntimeError: Command docker exec -t b6d0d5b666ba98b88e83ac6ae103be6ac2623ea33991c17f49c4fdda6196e8c6 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mergennachin · 2026-02-20T13:49:16Z

+        max_t_mel = 24000  # 3000 * 8
+        sample_mel = torch.randn(
+            1, model.config.num_mel_bins, max_t_mel, dtype=param_dtype
+        )
+        dynamic_shapes = {"mel": {2: Dim.AUTO}}


I wonder if we can just use this for xnnpack?

mergennachin

Also update the README.md and model.md file (you can just use your coding agent to update the model.md file)

Copilot

Pull request overview

This pull request adds Metal backend support for the Voxtral Realtime model, enabling it to run on Apple Silicon GPUs. The implementation automatically switches between custom SDPA operations (for XNNPACK) and standard PyTorch operations (for Metal/AOTI) based on the target backend, while maintaining support for both streaming and offline modes.

Changes:

Implemented Metal-compatible attention mechanism using standard PyTorch SDPA and StaticKVCache with index_copy_ operations
Added backend auto-detection and configuration in export and test scripts with support for vr-streaming and vr-offline modes
Extended CI/CD workflows to test Voxtral Realtime on Metal backend with quantized-int4-metal configuration

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
examples/models/voxtral_realtime/model.py	Added `use_standard_attention` config flag, implemented `StaticKVCache` and `StandardSDPA` classes for Metal backend compatibility, updated `LMAttention` to switch between custom and standard attention based on backend
examples/models/voxtral_realtime/export_voxtral_rt.py	Added Metal backend support in export functions with Dim.AUTO for dynamic shapes, implemented linear bias decomposition for Metal, added Metal partitioner configuration, and updated documentation
examples/models/voxtral_realtime/CMakePresets.json	Added Metal-specific CMake presets for building the Voxtral Realtime runner with Metal backend support on Darwin platforms
.github/workflows/metal.yml	Added Voxtral-Mini-4B-Realtime-2602 to Metal CI test matrix, excluded non-quantized variant due to size constraints
.ci/scripts/test_model_e2e.sh	Added mode parameter support for vr-streaming/vr-offline modes with auto-detection (XNNPACK defaults to streaming, others to offline) and validation logic
.ci/scripts/export_model_artifact.sh	Added mode parameter support with auto-detection and validation, configured preprocessor arguments based on streaming mode, added fpa4w quantization support for Metal

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

manuelcandales requested a review from mergennachin February 18, 2026 19:59

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 18, 2026

manuelcandales added 5 commits February 18, 2026 18:52

metal-vr-basic

184272f

metal-vr-decomp

505f721

metal-vr-static-cache-standard-sdpa

f41590d

metal-vr-preset

a479777

metal-vr-audio-sample-input

5f3ae0b

manuelcandales force-pushed the manuel/metal-vr-streaming-decomp branch from 6ef541f to 5f3ae0b Compare February 18, 2026 23:56

manuelcandales added the release notes: none Do not include this in the release notes label Feb 18, 2026

manuelcandales added 4 commits February 19, 2026 14:00

metal-vr-fix-standard-sdpa

12792ca

metal-vr-int4

34565ca

metal-vr-ci

d866728

fix lint

a282589

manuelcandales marked this pull request as ready for review February 20, 2026 13:44

manuelcandales requested a review from larryliu0820 as a code owner February 20, 2026 13:44

Copilot AI review requested due to automatic review settings February 20, 2026 13:44

manuelcandales requested review from kirklandsign and lucylq as code owners February 20, 2026 13:44

manuelcandales removed request for kirklandsign and lucylq February 20, 2026 13:44

Copilot started reviewing on behalf of manuelcandales February 20, 2026 13:44 View session

mergennachin approved these changes Feb 20, 2026

View reviewed changes

mergennachin reviewed Feb 20, 2026

View reviewed changes

Copilot AI reviewed Feb 20, 2026

View reviewed changes

manuelcandales temporarily deployed to upload-benchmark-results February 20, 2026 14:45 — with GitHub Actions Inactive

manuelcandales added 3 commits February 20, 2026 13:55

update README.md and model.md

bf81550

update makefile

50a5f5e

add todo

ab8aaa2

manuelcandales merged commit 4a3bc34 into main Feb 20, 2026
349 of 355 checks passed

manuelcandales deleted the manuel/metal-vr-streaming-decomp branch February 20, 2026 19:43

manuelcandales temporarily deployed to upload-benchmark-results February 20, 2026 19:50 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metal backend: Enable Voxtral Realtime#17536

Metal backend: Enable Voxtral Realtime#17536
manuelcandales merged 12 commits intomainfrom
manuel/metal-vr-streaming-decomp

manuelcandales commented Feb 18, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

mergennachin Feb 20, 2026

Uh oh!

mergennachin left a comment •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

manuelcandales commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17536

❌ 5 New Failures, 14 Pending

Uh oh!

Uh oh!

mergennachin Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

mergennachin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

manuelcandales commented Feb 18, 2026 •

edited

Loading

pytorch-bot Bot commented Feb 18, 2026 •

edited

Loading

mergennachin left a comment •

edited

Loading