Skip to content

Disable AOTAutograd donated buffers for the adjoint (1.1.3)#27

Merged
hugary1995 merged 1 commit into
mainfrom
donated-buffer-fix
Jun 25, 2026
Merged

Disable AOTAutograd donated buffers for the adjoint (1.1.3)#27
hugary1995 merged 1 commit into
mainfrom
donated-buffer-fix

Conversation

@hugary1995

Copy link
Copy Markdown
Collaborator

Symptom

A torch.compile'd model differentiated through pyzag's adjoint (on torch ≥ 2.12) raises:

RuntimeError: This backward function was compiled with non-empty donated buffers
which requires create_graph=False and retain_graph=False ...

Root cause

  • pyzag's adjoint reuses the autograd graph across the reverse sweep via torch.autograd.grad(..., retain_graph=True) (RecursiveNonlinearEquationSolver.accumulate). It cannot avoid retain_graph=True.
  • torch ≥ 2.12 AOTAutograd collects donated buffers for a torch.compile'd graph; a backward compiled with non-empty donated buffers requires retain_graph=False. The two are fundamentally incompatible.
  • torch._functorch.config.donated_buffer is a ContextVar-backed config. AOTAutograd compiles and runs the backward under its own contextvars contexts, where a normal config.donated_buffer = False (or config.patch(...)) override is not visible — those contexts read the config default. (Verified with an in-torch probe: at the compile gate, user_override=<UNSET>, so it reads default.) So the usual override never reaches the code that decides donation.

Fix

Lower the config default (the only setting that reaches AOTAutograd's contexts), scoped to RecursiveNonlinearEquationSolver.__init__ — it runs before any solve compiles a backward, and importantly merely importing pyzag (or neml2) does not touch the global, so non-adjoint code and unrelated torch.compile users are unaffected. It is process-global and emits a one-time UserWarning explaining the change and how to revert it (config._config['donated_buffer'].default = True) with the consequence spelled out. No-op on torch builds without the flag.

Verification

  • pyzag test suite: 28 passed, 649 subtests (1 warning — the intentional one).
  • Scoping confirmed: import neml2 leaves the default True; constructing a solver sets it False + warns once.
  • End-to-end: neml2's two pyzag calibration notebooks (deterministic + statistical), which neml2.compile the residual and run the adjoint, fail without this and pass with it on torch 2.12.1 (and 2.12.0).
  • black + copyright clean.

After merge: tag/release v1.1.3 to publish to PyPI; neml2 will then pin pyzag==1.1.3.

🤖 Generated with Claude Code

The adjoint reuses the autograd graph via torch.autograd.grad(..., retain_graph=True)
(RecursiveNonlinearEquationSolver.accumulate). On torch>=2.12 AOTAutograd collects
"donated buffers" for a torch.compile'd model, and a backward compiled with non-empty
donated buffers requires retain_graph=False -- so a compiled model differentiated
through the adjoint raises "compiled with non-empty donated buffers".

donated_buffer is a ContextVar-backed torch config: AOTAutograd compiles/runs the
backward under contextvars contexts where a normal `config.donated_buffer = False`
(or config.patch) override is not visible -- those contexts read the config *default*.
So this lowers the *default* (the only cross-context lever), scoped to
RecursiveNonlinearEquationSolver.__init__ (runs before any solve compiles a backward;
merely importing pyzag/neml2 does not touch the global). It is process-global and
emits a one-time UserWarning explaining the change and how to revert it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-25 23:33 UTC

@hugary1995 hugary1995 merged commit 98a40ff into main Jun 25, 2026
7 checks passed
@hugary1995 hugary1995 deleted the donated-buffer-fix branch June 25, 2026 23:30
hugary1995 added a commit to applied-material-modeling/neml2 that referenced this pull request Jun 25, 2026
pyzag 1.1.3 disables torch's AOTAutograd "donated buffers" for its adjoint
(retain_graph=True), which otherwise raises "compiled with non-empty donated
buffers" when the residual is neml2.compile'd on torch>=2.12. See
applied-material-modeling/pyzag#27 for the root-cause analysis.

The two expensive calibration notebooks (optimization/deterministic and
optimization/statistical) neml2.compile the residual and run the adjoint, so
they hit this on torch 2.12.x. Re-executed against pyzag 1.1.3: both pass, and
each now shows pyzag's one-time UserWarning (emitted when the solver is
constructed) explaining the global donated_buffer default change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant