PPO tuning: LR anneal, value clipping, per-minibatch adv norm by dnddnjs · Pull Request #128 · rlcode/reinforcement-learning

dnddnjs · 2026-05-17T22:00:08Z

Summary

Three CleanRL 'PPO 37 details' that were missing, plus a frame budget bump:

Linear LR anneal 2.5e-4 → 0 across the run.
Value loss clipping around the old value prediction (CLIP_COEF range).
Per-minibatch advantage normalization (was once per batch).
TOTAL_FRAMES 5M → 10M to align with CleanRL's published Atari budget.
Log lr to W&B so the anneal is visible.

Why

5M and 10M Breakout runs both plateaued at per-game ~75, with entropy stuck around 0.8 and the PPO clip rarely activating — classic 'missing details' symptoms. These three changes are the standard fixes; the audit against CleanRL ppo_atari.py flagged exactly these.

Test plan

Smoke run (TOTAL_FRAMES=5120) succeeds
Full 10M Breakout run shows return climbing past the prior ~75 plateau, and entropy/lr curves both visibly anneal

…ames Three of CleanRL's 'PPO 37 details' that were missing — flagged when the 5M and 10M Breakout runs both plateaued at per-game ~75 with entropy stuck around 0.8 (policy wasn't sharpening, clip rarely activating): - Linear LR anneal from 2.5e-4 -> 0 across the run; lets late updates fine-tune instead of bouncing. - Value-function loss clipping around the old prediction (CLIP_COEF), matching the policy clipping range; stabilizes value targets. - Advantage normalization moved inside the minibatch loop instead of once per batch. Also bumps TOTAL_FRAMES 5M -> 10M to match the CleanRL Atari budget so runs are directly comparable to their published curves. lr now logged to wandb so the anneal is visible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO tuning: LR anneal, value clipping, per-minibatch adv norm#128

PPO tuning: LR anneal, value clipping, per-minibatch adv norm#128
dnddnjs wants to merge 1 commit into
masterfrom
atari-ppo-tuning

dnddnjs commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dnddnjs commented May 17, 2026

Summary

Why

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant