Skip to content

Metal backend: Enable Voxtral Realtime#17536

Merged
manuelcandales merged 12 commits intomainfrom
manuel/metal-vr-streaming-decomp
Feb 20, 2026
Merged

Metal backend: Enable Voxtral Realtime#17536
manuelcandales merged 12 commits intomainfrom
manuel/metal-vr-streaming-decomp

Conversation

@manuelcandales
Copy link
Copy Markdown
Contributor

@manuelcandales manuelcandales commented Feb 18, 2026

This pull request adds Metal backend support for the Voxtral Realtime model (offline mode, streaming support is coming next)

Voxtral Realtime Metal backend support:

  • Added Metal backend support to export_voxtral_rt.py, including custom decompositions and dynamic shape handling for Metal/AOTI compatibility. Also added validation for Metal-specific quantization (fpa4w).
  • Updated export_model_artifact.sh and test_model_e2e.sh scripts to support a new mode parameter. This is used in the Voxtral Realtime model for selecting between streaming/offline export modes.

CI/CD and workflow updates:

  • Added quantized Voxtral-Mini-4B-Realtime-2602 to Metal workflow matrix.
  • Added new CMake presets for building and testing the Voxtral Realtime runner with Metal backend.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Feb 18, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17536

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 14 Pending

As of commit ab8aaa2 with merge base 119a099 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 18, 2026
@manuelcandales manuelcandales force-pushed the manuel/metal-vr-streaming-decomp branch from 6ef541f to 5f3ae0b Compare February 18, 2026 23:56
@manuelcandales manuelcandales added the release notes: none Do not include this in the release notes label Feb 18, 2026
@manuelcandales manuelcandales marked this pull request as ready for review February 20, 2026 13:44
Copilot AI review requested due to automatic review settings February 20, 2026 13:44
Comment thread examples/models/voxtral_realtime/export_voxtral_rt.py
Comment on lines +190 to +194
max_t_mel = 24000 # 3000 * 8
sample_mel = torch.randn(
1, model.config.num_mel_bins, max_t_mel, dtype=param_dtype
)
dynamic_shapes = {"mel": {2: Dim.AUTO}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can just use this for xnnpack?

Copy link
Copy Markdown
Contributor

@mergennachin mergennachin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also update the README.md and model.md file (you can just use your coding agent to update the model.md file)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds Metal backend support for the Voxtral Realtime model, enabling it to run on Apple Silicon GPUs. The implementation automatically switches between custom SDPA operations (for XNNPACK) and standard PyTorch operations (for Metal/AOTI) based on the target backend, while maintaining support for both streaming and offline modes.

Changes:

  • Implemented Metal-compatible attention mechanism using standard PyTorch SDPA and StaticKVCache with index_copy_ operations
  • Added backend auto-detection and configuration in export and test scripts with support for vr-streaming and vr-offline modes
  • Extended CI/CD workflows to test Voxtral Realtime on Metal backend with quantized-int4-metal configuration

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
examples/models/voxtral_realtime/model.py Added use_standard_attention config flag, implemented StaticKVCache and StandardSDPA classes for Metal backend compatibility, updated LMAttention to switch between custom and standard attention based on backend
examples/models/voxtral_realtime/export_voxtral_rt.py Added Metal backend support in export functions with Dim.AUTO for dynamic shapes, implemented linear bias decomposition for Metal, added Metal partitioner configuration, and updated documentation
examples/models/voxtral_realtime/CMakePresets.json Added Metal-specific CMake presets for building the Voxtral Realtime runner with Metal backend support on Darwin platforms
.github/workflows/metal.yml Added Voxtral-Mini-4B-Realtime-2602 to Metal CI test matrix, excluded non-quantized variant due to size constraints
.ci/scripts/test_model_e2e.sh Added mode parameter support for vr-streaming/vr-offline modes with auto-detection (XNNPACK defaults to streaming, others to offline) and validation logic
.ci/scripts/export_model_artifact.sh Added mode parameter support with auto-detection and validation, configured preprocessor arguments based on streaming mode, added fpa4w quantization support for Metal

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@manuelcandales manuelcandales temporarily deployed to upload-benchmark-results February 20, 2026 14:45 — with GitHub Actions Inactive
@manuelcandales manuelcandales merged commit 4a3bc34 into main Feb 20, 2026
349 of 355 checks passed
@manuelcandales manuelcandales deleted the manuel/metal-vr-streaming-decomp branch February 20, 2026 19:43
@manuelcandales manuelcandales temporarily deployed to upload-benchmark-results February 20, 2026 19:50 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: none Do not include this in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants