Finding
The vectorized packed_next_token_logprobs (keep[cu_seqlens[1:] - 1] = False) is correct and faster only while every sequence length >= 1, an invariant established by _pack_train_data's lengths.clamp(min=1, ...). A zero-length segment would make cu_seqlens[i+1]-1 point at the previous sequence and silently mask a valid action site (the old loop guarded with if length <= 1: continue).
Acceptance
- Add a cheap shape/invariant assertion, or document the cross-module coupling at both ends.
Finding
The vectorized
packed_next_token_logprobs(keep[cu_seqlens[1:] - 1] = False) is correct and faster only while every sequence length >= 1, an invariant established by_pack_train_data'slengths.clamp(min=1, ...). A zero-length segment would makecu_seqlens[i+1]-1point at the previous sequence and silently mask a valid action site (the old loop guarded withif length <= 1: continue).Acceptance