1. Finding input prompt length in RL_SCHEMA after tokenizer fix 2. Generate attention mask for `generate_k_completions` so that the later functions can use the mask. 3. Fix the famous indexing issue: ``` student_output_scores, collected_exit_logits = student(completions['tokens'][:, :-1], prescribed_exit_layer_idxs = prescribed_exit_layers[:,1:]) ```
generate_k_completionsso that the later functions can use the mask.