[Vulkan] Fix sum_rows block size for n_cols < 32#55
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1abbd5463a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| push_constants.n_cols < 32u ? 1u | ||
| : push_constants.n_cols <= 32u ? 32u | ||
| : push_constants.n_cols <= 64u ? 64u | ||
| : push_constants.n_cols >= 4096u ? 1024u | ||
| : 128u, |
There was a problem hiding this comment.
Avoid leaving idle lanes in small reductions
When running on drivers affected by the idle-lane barrier issue this patch is meant to work around, this conditional still selects a BLOCK_SIZE larger than n_cols for widths 33–63 (64) and 65–127 (128). Those rows still leave some invocations with no accumulation before entering the same barrier/reduction pattern in the row-reduction shaders, so non-power-of-two reductions in those ranges can continue returning incorrect results; choose a block size that does not exceed n_cols (or use the single-thread workaround) for all such ranges.
Useful? React with 👍 / 👎.
This pull request makes a small adjustment to the logic for determining the
block_sizein thedispatch_sum_rows_opfunction. The change ensures that when the number of columns is less than 32, the block size is set to 1 instead of 32, which can improve efficiency for small row sizes.block_sizecalculation indispatch_sum_rows_opto use a block size of 1 whenpush_constants.n_colsis less than 32 (mlx/backend/vulkan/kernels.cpp).fixes goniz/mlx-vulkan#54