Skip to content

Qwen3.5-4b ppo_kl != 0 #6829

Description

@dipta007

System Info

Is it normal to get ppo_kl != 0?
The same run with Qwen/Qwen3-4B-Instruct-2507 get ppo_kl = 0

Image

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

grpo base script

Expected behavior

It should be always 0 for stable training

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions