Fix compiled kernel correctness for negative-strided inputs#3720
Open
lyonsno wants to merge 1 commit into
Open
Fix compiled kernel correctness for negative-strided inputs#3720lyonsno wants to merge 1 commit into
lyonsno wants to merge 1 commit into
Conversation
zcbenz
reviewed
Jun 20, 2026
The compiled (fused) kernel path produced wrong results when inputs had negative strides (e.g. x[::-1]). Two issues: 1. compiled_check_contiguity used the broad contiguous flag for single inputs, which is true for negative-strided arrays (no data gaps). Changed to require row_contiguous or col_contiguous, matching the multi-input path. 2. Metal/CUDA strided compiled kernels used unsigned index arithmetic (elem_to_loc_1<uint>), wrapping negative strides. Force int64_t indices when any input has negative strides. Also generate the _large (int64_t) strided kernel variant for ndim=1. The CPU compiled path uses signed pointer arithmetic and only needed the contiguity check fix. Fixes ml-explore#3716. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
db1bdb0 to
ac40f52
Compare
zcbenz
reviewed
Jun 21, 2026
| if (max_size > UINT32_MAX) { | ||
| return true; | ||
| } | ||
| // Check for negative strides in inputs (strides[0] is the output). |
Collaborator
There was a problem hiding this comment.
I think it makes sense skipping strides[0], but can you elaborate what "strides[0] is the output" means?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #3716.
This is an alternative to #3717 that fixes the compiled-kernel path in place instead of materializing negative-strided inputs before the generated kernel runs.
mx.compileproduces incorrect results for negative-strided inputs such asx[::-1], while eager execution is correct.This patch fixes two compiled-kernel path issues:
_largeMetal variant is now always generated (matching ndim > 1 behavior) since negative strides force int64_t indexing even for 1D arrays.Added regression coverage for 1D reverse, slice update, 2D reverse, mixed positive/negative strides, and a 4D negative-stride case.
Tests:
python -m pytest python/tests/test_compile.py -q # 59 passed