Skip to content

Decide whether GRPO/GSPO need real old-logprob ratios #68

Description

@adohe

Spec candidate: #19.

Scope

  • Write a short decision note comparing current surrogate semantics with real old-logprob ratio semantics.
  • Include PPO as the existing reference for real old-logprob ratio behavior.
  • Defer implementation unless maintainers approve a follow-up.

Acceptance

  • Decision note has an explicit recommendation.
  • Any behavior-changing follow-up is split into separate issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/algorithmsIssues or PRs related to training algorithms (SFT, DPO, GSPO, GRPO, PPO)area/apiIssues or PRs related to the SDK/Trainer public APIkind/designCategorizes issue or PR as related to design discussion

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions