Skip to content

Add passive distillation losses#849

Open
klei22 wants to merge 6 commits into
ReaLLMASIC:masterfrom
klei22:add-passive-distillation-losses
Open

Add passive distillation losses#849
klei22 wants to merge 6 commits into
ReaLLMASIC:masterfrom
klei22:add-passive-distillation-losses

Conversation

@klei22

@klei22 klei22 commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds “passive” knowledge distillation support by enabling distillation losses to be computed/logged (including validation-time distillation loss), without necessarily contributing to the training objective. It also updates exploration tooling to track the new metrics and adds example exploration configs for distillation vs. regular training.

Changes:

  • Add distillation validation metrics (distillation_val_loss) and optional baseline metric logging (ntp_val_loss) into train.py validation + TensorBoard logging flows.
  • Introduce CLI flags to support passive distillation loss logging and optional baseline logging.
  • Extend exploration runner/monitor metric schemas and add distillation-focused exploration YAMLs (including ${DISTILLATION_SOURCE} substitution support).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
train.py Computes/logs distillation validation loss (and optional baseline metric), tracks latest values, and supports “passive” distillation (log without adding to objective).
train_args.py Adds --passive_distillation_loss_log and --log_ntp_val_loss_during_distillation CLI flags.
run_exploration_monitor.py Adds new metric columns so the TUI can display distillation/baseline validation losses.
optimization_and_search/run_experiments.py Adds new metric keys and ${DISTILLATION_SOURCE} substitution + reserved config keys for launcher-only values.
explorations/distillation_vs_regular_default_inf.yaml New exploration template comparing regular vs. distillation vs. passive distillation logging.
explorations/distillation_parent_architecture_sweep.yaml New exploration template that trains parent checkpoints and uses them as distillation sources in later runs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread train.py
Comment on lines +1011 to +1017
teacher_logits, _ = self.teacher_model(
X,
targets=Y,
iter_num=self.iter_num,
dataset_idx=dataset_idx if self.args.multidataset_wte else None,
loss_fn=None,
)
Comment thread train.py
Comment on lines 2283 to +2287
iter_num=self.iter_num,
)
distill_component = distill_component.to(loss.dtype)
loss = loss + self.distillation_weight * distill_component
if not self.args.passive_distillation_loss_log:
loss = loss + self.distillation_weight * distill_component
Comment on lines 624 to 628
float,
float,
float,
float,
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants