Qwen3.5-4b ppo_kl != 0

### System Info

Is it normal to get ppo_kl != 0?
The same run with `Qwen/Qwen3-4B-Instruct-2507` get ppo_kl = 0

<img width="581" height="321" alt="Image" src="https://github.com/user-attachments/assets/26a3312d-abf0-43c5-bf0b-a19f2761a792" />

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

grpo base script

### Expected behavior

It should be always 0 for stable training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3.5-4b ppo_kl != 0 #6829

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Qwen3.5-4b ppo_kl != 0 #6829

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions