Skip to content

Fix shapeless compile eliding reductions over size-1 dimensions#3672

Open
discobot wants to merge 1 commit into
ml-explore:mainfrom
discobot:fix/3201-shapeless-reduce-elision
Open

Fix shapeless compile eliding reductions over size-1 dimensions#3672
discobot wants to merge 1 commit into
ml-explore:mainfrom
discobot:fix/3201-shapeless-reduce-elision

Conversation

@discobot

Copy link
Copy Markdown
Contributor

Proposed changes

Fixes #3201.

This picks up where #3202 left off: reductions over dimensions that are size 1
at trace time are elided from the graph, so shapeless replays with larger
dynamic shapes return stale values.

The reason #3202 failed CI was not a numerical regression. Once the reduction
is kept in the graph it can be a runtime identity (out.size() == in.size(),
e.g. a replay with size 1), and Reduce::eval_gpu rejects that case with
debug-only asserts in both the Metal and CUDA backends — which is why the
CPU-only jobs passed while the macOS and CUDA jobs aborted on the PR's own new
test. This PR replaces those asserts with an explicit identity path that
cast-copies the input to the output (the CPU backend already computes the
identity case correctly).

The frontend change is also restricted to non-empty reduction axes. The
unconditional version in #3202 could create a Reduce with no axes (0-dim
input, or an empty axes list for all/any/max), tripping
assert(!axes_.empty()) in the same functions. No-axis reductions are shape
independent so eliding them stays correct.

The regression test covers all eight reductions plus mean on the output of
mx.take, in both directions: trace at size 1 and replay larger (the original
bug), and trace at size 2 and replay at size 1 (the runtime-identity path that
broke #3202). Verified against a Debug (assertions enabled) CPU build;
test_compile, test_reduce, and test_ops pass.

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

Notes on the pre-commit box: clang-format (the three C++ files) and
black/isort --profile=black (the test file) were run and report clean, but
with clang-format v22.1.5 rather than the v21.1.8 the repo pins, and the full
pre-commit run --all-files was not run, so I have left that box unchecked for
the maintainer to confirm.

This change was developed with the help of Claude Code.

During shapeless tracing, reductions over dimensions that are size 1 at trace time were elided from the graph, so replays with larger dynamic shapes returned stale values. Keep the reduction in the graph when tracing shapeless and the axes are non-empty. Handle the resulting runtime-identity reductions in the Metal and CUDA backends with a cast-copy in place of the debug asserts that rejected them. Adds a regression test covering all reductions on the output of a gather.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] compile(shapeless=True): Reduce returns stale values on dynamically-shaped inputs from take/gather

1 participant