Skip to content

[Vulkan] Fix sum_rows block size for n_cols < 32#55

Open
goniz wants to merge 1 commit into
feat/vulkanfrom
fix/issue-54
Open

[Vulkan] Fix sum_rows block size for n_cols < 32#55
goniz wants to merge 1 commit into
feat/vulkanfrom
fix/issue-54

Conversation

@goniz

@goniz goniz commented Jun 16, 2026

Copy link
Copy Markdown
Owner

This pull request makes a small adjustment to the logic for determining the block_size in the dispatch_sum_rows_op function. The change ensures that when the number of columns is less than 32, the block size is set to 1 instead of 32, which can improve efficiency for small row sizes.

  • Updated block_size calculation in dispatch_sum_rows_op to use a block size of 1 when push_constants.n_cols is less than 32 (mlx/backend/vulkan/kernels.cpp).

fixes goniz/mlx-vulkan#54

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1abbd5463a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +2602 to 2606
push_constants.n_cols < 32u ? 1u
: push_constants.n_cols <= 32u ? 32u
: push_constants.n_cols <= 64u ? 64u
: push_constants.n_cols >= 4096u ? 1024u
: 128u,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid leaving idle lanes in small reductions

When running on drivers affected by the idle-lane barrier issue this patch is meant to work around, this conditional still selects a BLOCK_SIZE larger than n_cols for widths 33–63 (64) and 65–127 (128). Those rows still leave some invocations with no accumulation before entering the same barrier/reduction pattern in the row-reduction shaders, so non-power-of-two reductions in those ranges can continue returning incorrect results; choose a block size that does not exceed n_cols (or use the single-thread workaround) for all such ranges.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduction ops fail when n_cols < BLOCK_SIZE (Mali-G610 panvk)

1 participant