Disable AOTAutograd donated buffers for the adjoint (1.1.3)#27
Merged
Conversation
The adjoint reuses the autograd graph via torch.autograd.grad(..., retain_graph=True) (RecursiveNonlinearEquationSolver.accumulate). On torch>=2.12 AOTAutograd collects "donated buffers" for a torch.compile'd model, and a backward compiled with non-empty donated buffers requires retain_graph=False -- so a compiled model differentiated through the adjoint raises "compiled with non-empty donated buffers". donated_buffer is a ContextVar-backed torch config: AOTAutograd compiles/runs the backward under contextvars contexts where a normal `config.donated_buffer = False` (or config.patch) override is not visible -- those contexts read the config *default*. So this lowers the *default* (the only cross-context lever), scoped to RecursiveNonlinearEquationSolver.__init__ (runs before any solve compiles a backward; merely importing pyzag/neml2 does not touch the global). It is process-global and emits a one-time UserWarning explaining the change and how to revert it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
hugary1995
added a commit
to applied-material-modeling/neml2
that referenced
this pull request
Jun 25, 2026
pyzag 1.1.3 disables torch's AOTAutograd "donated buffers" for its adjoint (retain_graph=True), which otherwise raises "compiled with non-empty donated buffers" when the residual is neml2.compile'd on torch>=2.12. See applied-material-modeling/pyzag#27 for the root-cause analysis. The two expensive calibration notebooks (optimization/deterministic and optimization/statistical) neml2.compile the residual and run the adjoint, so they hit this on torch 2.12.x. Re-executed against pyzag 1.1.3: both pass, and each now shows pyzag's one-time UserWarning (emitted when the solver is constructed) explaining the global donated_buffer default change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Symptom
A
torch.compile'd model differentiated through pyzag's adjoint (on torch ≥ 2.12) raises:Root cause
torch.autograd.grad(..., retain_graph=True)(RecursiveNonlinearEquationSolver.accumulate). It cannot avoidretain_graph=True.torch.compile'd graph; a backward compiled with non-empty donated buffers requiresretain_graph=False. The two are fundamentally incompatible.torch._functorch.config.donated_bufferis aContextVar-backed config. AOTAutograd compiles and runs the backward under its own contextvars contexts, where a normalconfig.donated_buffer = False(orconfig.patch(...)) override is not visible — those contexts read the config default. (Verified with an in-torch probe: at the compile gate,user_override=<UNSET>, so it readsdefault.) So the usual override never reaches the code that decides donation.Fix
Lower the config default (the only setting that reaches AOTAutograd's contexts), scoped to
RecursiveNonlinearEquationSolver.__init__— it runs before any solve compiles a backward, and importantly merely importing pyzag (or neml2) does not touch the global, so non-adjoint code and unrelatedtorch.compileusers are unaffected. It is process-global and emits a one-timeUserWarningexplaining the change and how to revert it (config._config['donated_buffer'].default = True) with the consequence spelled out. No-op on torch builds without the flag.Verification
import neml2leaves the defaultTrue; constructing a solver sets itFalse+ warns once.neml2.compilethe residual and run the adjoint, fail without this and pass with it on torch 2.12.1 (and 2.12.0).After merge: tag/release v1.1.3 to publish to PyPI; neml2 will then pin
pyzag==1.1.3.🤖 Generated with Claude Code