[CUDA] Make qmv support global scale by zcbenz · Pull Request #3723 · ml-explore/mlx

zcbenz · 2026-06-19T11:22:11Z

qqmm reroutes to qmv when M=1, while the latter did not support global scales. This PR fixes it for CUDA backend, and it should be trivial to implement in Metal and CPU later if this is the right approach.

Another thing we can do is to add global scale support to qmm generally, not necessarily as a public API but as a fallback for qqmm when there is no native nvfp4 kernels, for example for Metal backend and for sm80/sm90 in CUDA backend.

[CUDA] Make qmv support global scale

74f8ab1

zcbenz requested a review from nastya236 June 19, 2026 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Make qmv support global scale#3723

[CUDA] Make qmv support global scale#3723
zcbenz wants to merge 1 commit into
ml-explore:mainfrom
zcbenz:cuda-qmv-global-scale

zcbenz commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zcbenz commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant