Fix shapeless compile eliding reductions over size-1 dimensions#3672
Open
discobot wants to merge 1 commit into
Open
Fix shapeless compile eliding reductions over size-1 dimensions#3672discobot wants to merge 1 commit into
discobot wants to merge 1 commit into
Conversation
During shapeless tracing, reductions over dimensions that are size 1 at trace time were elided from the graph, so replays with larger dynamic shapes returned stale values. Keep the reduction in the graph when tracing shapeless and the axes are non-empty. Handle the resulting runtime-identity reductions in the Metal and CUDA backends with a cast-copy in place of the debug asserts that rejected them. Adds a regression test covering all reductions on the output of a gather.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed changes
Fixes #3201.
This picks up where #3202 left off: reductions over dimensions that are size 1
at trace time are elided from the graph, so shapeless replays with larger
dynamic shapes return stale values.
The reason #3202 failed CI was not a numerical regression. Once the reduction
is kept in the graph it can be a runtime identity (
out.size() == in.size(),e.g. a replay with size 1), and
Reduce::eval_gpurejects that case withdebug-only asserts in both the Metal and CUDA backends — which is why the
CPU-only jobs passed while the macOS and CUDA jobs aborted on the PR's own new
test. This PR replaces those asserts with an explicit identity path that
cast-copies the input to the output (the CPU backend already computes the
identity case correctly).
The frontend change is also restricted to non-empty reduction axes. The
unconditional version in #3202 could create a
Reducewith no axes (0-diminput, or an empty axes list for
all/any/max), trippingassert(!axes_.empty())in the same functions. No-axis reductions are shapeindependent so eliding them stays correct.
The regression test covers all eight reductions plus
meanon the output ofmx.take, in both directions: trace at size 1 and replay larger (the originalbug), and trace at size 2 and replay at size 1 (the runtime-identity path that
broke #3202). Verified against a Debug (assertions enabled) CPU build;
test_compile,test_reduce, andtest_opspass.Checklist
Put an
xin the boxes that apply.pre-commit run --all-filesto format my code / installed pre-commit prior to committing changesNotes on the pre-commit box:
clang-format(the three C++ files) andblack/isort --profile=black(the test file) were run and report clean, butwith
clang-formatv22.1.5 rather than the v21.1.8 the repo pins, and the fullpre-commit run --all-fileswas not run, so I have left that box unchecked forthe maintainer to confirm.
This change was developed with the help of Claude Code.