Disable AOTAutograd donated buffers for the adjoint (1.1.3) by hugary1995 · Pull Request #27 · applied-material-modeling/pyzag

hugary1995 · 2026-06-25T23:26:33Z

Symptom

A torch.compile'd model differentiated through pyzag's adjoint (on torch ≥ 2.12) raises:

RuntimeError: This backward function was compiled with non-empty donated buffers
which requires create_graph=False and retain_graph=False ...

Root cause

pyzag's adjoint reuses the autograd graph across the reverse sweep via torch.autograd.grad(..., retain_graph=True) (RecursiveNonlinearEquationSolver.accumulate). It cannot avoid retain_graph=True.
torch ≥ 2.12 AOTAutograd collects donated buffers for a torch.compile'd graph; a backward compiled with non-empty donated buffers requires retain_graph=False. The two are fundamentally incompatible.
torch._functorch.config.donated_buffer is a ContextVar-backed config. AOTAutograd compiles and runs the backward under its own contextvars contexts, where a normal config.donated_buffer = False (or config.patch(...)) override is not visible — those contexts read the config default. (Verified with an in-torch probe: at the compile gate, user_override=<UNSET>, so it reads default.) So the usual override never reaches the code that decides donation.

Fix

Lower the config default (the only setting that reaches AOTAutograd's contexts), scoped to RecursiveNonlinearEquationSolver.__init__ — it runs before any solve compiles a backward, and importantly merely importing pyzag (or neml2) does not touch the global, so non-adjoint code and unrelated torch.compile users are unaffected. It is process-global and emits a one-time UserWarning explaining the change and how to revert it (config._config['donated_buffer'].default = True) with the consequence spelled out. No-op on torch builds without the flag.

Verification

pyzag test suite: 28 passed, 649 subtests (1 warning — the intentional one).
Scoping confirmed: import neml2 leaves the default True; constructing a solver sets it False + warns once.
End-to-end: neml2's two pyzag calibration notebooks (deterministic + statistical), which neml2.compile the residual and run the adjoint, fail without this and pass with it on torch 2.12.1 (and 2.12.0).
black + copyright clean.

After merge: tag/release v1.1.3 to publish to PyPI; neml2 will then pin pyzag==1.1.3.

🤖 Generated with Claude Code

The adjoint reuses the autograd graph via torch.autograd.grad(..., retain_graph=True) (RecursiveNonlinearEquationSolver.accumulate). On torch>=2.12 AOTAutograd collects "donated buffers" for a torch.compile'd model, and a backward compiled with non-empty donated buffers requires retain_graph=False -- so a compiled model differentiated through the adjoint raises "compiled with non-empty donated buffers". donated_buffer is a ContextVar-backed torch config: AOTAutograd compiles/runs the backward under contextvars contexts where a normal `config.donated_buffer = False` (or config.patch) override is not visible -- those contexts read the config *default*. So this lowers the *default* (the only cross-context lever), scoped to RecursiveNonlinearEquationSolver.__init__ (runs before any solve compiles a backward; merely importing pyzag/neml2 does not touch the global). It is process-global and emits a one-time UserWarning explaining the change and how to revert it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-25T23:29:46Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-25 23:33 UTC

pyzag 1.1.3 disables torch's AOTAutograd "donated buffers" for its adjoint (retain_graph=True), which otherwise raises "compiled with non-empty donated buffers" when the residual is neml2.compile'd on torch>=2.12. See applied-material-modeling/pyzag#27 for the root-cause analysis. The two expensive calibration notebooks (optimization/deterministic and optimization/statistical) neml2.compile the residual and run the adjoint, so they hit this on torch 2.12.x. Re-executed against pyzag 1.1.3: both pass, and each now shows pyzag's one-time UserWarning (emitted when the solver is constructed) explaining the global donated_buffer default change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

hugary1995 merged commit 98a40ff into main Jun 25, 2026
7 checks passed

hugary1995 deleted the donated-buffer-fix branch June 25, 2026 23:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Disable AOTAutograd donated buffers for the adjoint (1.1.3)#27

Disable AOTAutograd donated buffers for the adjoint (1.1.3)#27
hugary1995 merged 1 commit into
mainfrom
donated-buffer-fix

hugary1995 commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hugary1995 commented Jun 25, 2026

Symptom

Root cause

Fix

Verification

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 25, 2026 •

edited

Loading