Add: GradAccumulator wrapper struct for optimizer#28
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00eba4a267
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let Some(mut store) = self.accum.take() else { | ||
| return Ok(()); | ||
| }; |
There was a problem hiding this comment.
Preserve accumulated gradients across failed step calls
If self.opt.step(&store) returns an error (for example due to a backend/device failure), the gradients are already removed from self.accum via take(), so they are dropped and cannot be retried, while self.count remains non-zero because it is reset only after a successful step. This leaves the accumulator in an inconsistent state (pending() > 0 but no stored grads) and can silently lose an update after transient failures.
Useful? React with 👍 / 👎.
Adds GradAccumulator<O: Optimizer>, a wrapper for optimizers that enables gradient accumulation during training. Call accumulate(&loss) once per micro-batch, then step() once to apply a single averaged update — effective batch size K × micro_batch without holding K computation graphs in memory simultaneously