Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion mlx/backend/vulkan/kernels.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2599,7 +2599,8 @@ void dispatch_sum_rows_op(
const auto row_count = checked_u32(out.size(), "sum_rows output rows");
const uint32_t max_invocations = max_compute_work_group_invocations();
const uint32_t block_size = std::min(
push_constants.n_cols <= 32u ? 32u
push_constants.n_cols < 32u ? 1u
: push_constants.n_cols <= 32u ? 32u
: push_constants.n_cols <= 64u ? 64u
: push_constants.n_cols >= 4096u ? 1024u
: 128u,
Comment on lines +2602 to 2606

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid leaving idle lanes in small reductions

When running on drivers affected by the idle-lane barrier issue this patch is meant to work around, this conditional still selects a BLOCK_SIZE larger than n_cols for widths 33–63 (64) and 65–127 (128). Those rows still leave some invocations with no accumulation before entering the same barrier/reduction pattern in the row-reduction shaders, so non-power-of-two reductions in those ranges can continue returning incorrect results; choose a block size that does not exceed n_cols (or use the single-thread workaround) for all such ranges.

Useful? React with 👍 / 👎.

Expand Down