### System Info Is it normal to get ppo_kl != 0? The same run with `Qwen/Qwen3-4B-Instruct-2507` get ppo_kl = 0 <img width="581" height="321" alt="Image" src="https://github.com/user-attachments/assets/26a3312d-abf0-43c5-bf0b-a19f2761a792" /> ### Information - [ ] The official example scripts - [x] My own modified scripts ### Tasks - [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below) ### Reproduction grpo base script ### Expected behavior It should be always 0 for stable training
System Info
Is it normal to get ppo_kl != 0?
The same run with
Qwen/Qwen3-4B-Instruct-2507get ppo_kl = 0Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
grpo base script
Expected behavior
It should be always 0 for stable training