Skip to content

PPO tuning: LR anneal, value clipping, per-minibatch adv norm#128

Open
dnddnjs wants to merge 1 commit into
masterfrom
atari-ppo-tuning
Open

PPO tuning: LR anneal, value clipping, per-minibatch adv norm#128
dnddnjs wants to merge 1 commit into
masterfrom
atari-ppo-tuning

Conversation

@dnddnjs
Copy link
Copy Markdown
Contributor

@dnddnjs dnddnjs commented May 17, 2026

Summary

Three CleanRL 'PPO 37 details' that were missing, plus a frame budget bump:

  • Linear LR anneal 2.5e-4 → 0 across the run.
  • Value loss clipping around the old value prediction (CLIP_COEF range).
  • Per-minibatch advantage normalization (was once per batch).
  • TOTAL_FRAMES 5M → 10M to align with CleanRL's published Atari budget.
  • Log lr to W&B so the anneal is visible.

Why

5M and 10M Breakout runs both plateaued at per-game ~75, with entropy stuck around 0.8 and the PPO clip rarely activating — classic 'missing details' symptoms. These three changes are the standard fixes; the audit against CleanRL ppo_atari.py flagged exactly these.

Test plan

  • Smoke run (TOTAL_FRAMES=5120) succeeds
  • Full 10M Breakout run shows return climbing past the prior ~75 plateau, and entropy/lr curves both visibly anneal

…ames

Three of CleanRL's 'PPO 37 details' that were missing — flagged when the
5M and 10M Breakout runs both plateaued at per-game ~75 with entropy
stuck around 0.8 (policy wasn't sharpening, clip rarely activating):

- Linear LR anneal from 2.5e-4 -> 0 across the run; lets late updates
  fine-tune instead of bouncing.
- Value-function loss clipping around the old prediction (CLIP_COEF),
  matching the policy clipping range; stabilizes value targets.
- Advantage normalization moved inside the minibatch loop instead of
  once per batch.

Also bumps TOTAL_FRAMES 5M -> 10M to match the CleanRL Atari budget so
runs are directly comparable to their published curves. lr now logged
to wandb so the anneal is visible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant