When TUNEOS_INTEGRATION_TESTS=1 is set against a freshly-resolved dependency set, two long-standing integration tests fail due to upstream API drift (independent of #58):
test_train_step_runs_and_writes_adapter — SFTTrainer.__init__() got an unexpected keyword argument 'tokenizer'. Newer trl renamed tokenizer → processing_class and moved dataset_text_field/max_seq_length into SFTConfig. trainer/finetune.py uses these same removed kwargs, so this is a latent forward-compat issue, not just a test problem.
test_perplexity_is_finite — compute_perplexity does batch["input_ids"].to(model.device) but receives a Python list (dataset not set to torch format), raising AttributeError.
CI does not catch these because the integration suite is gated off by default.
Task: pin/upgrade trl deliberately and migrate SFTTrainer construction to SFTConfig (or the supported kwargs), and set the eval dataset to torch format (or batch via a DataLoader) in trainer/metrics.py:compute_perplexity.
When
TUNEOS_INTEGRATION_TESTS=1is set against a freshly-resolved dependency set, two long-standing integration tests fail due to upstream API drift (independent of #58):test_train_step_runs_and_writes_adapter—SFTTrainer.__init__() got an unexpected keyword argument 'tokenizer'. Newer trl renamedtokenizer→processing_classand moveddataset_text_field/max_seq_lengthintoSFTConfig.trainer/finetune.pyuses these same removed kwargs, so this is a latent forward-compat issue, not just a test problem.test_perplexity_is_finite—compute_perplexitydoesbatch["input_ids"].to(model.device)but receives a Python list (dataset not set to torch format), raisingAttributeError.CI does not catch these because the integration suite is gated off by default.
Task: pin/upgrade trl deliberately and migrate
SFTTrainerconstruction toSFTConfig(or the supported kwargs), and set the eval dataset to torch format (or batch via a DataLoader) intrainer/metrics.py:compute_perplexity.