Skip to content

[codex] Update Torch and Lightning for Python 3.9#180

Open
jbohnslav wants to merge 2 commits into
masterfrom
codex/update-torch-lightning-py39
Open

[codex] Update Torch and Lightning for Python 3.9#180
jbohnslav wants to merge 2 commits into
masterfrom
codex/update-torch-lightning-py39

Conversation

@jbohnslav

Copy link
Copy Markdown
Owner

Summary

  • Bump Torch to the highest release that still supports Python 3.9: torch==2.8.0.
  • Bump PyTorch Lightning to the highest release that still supports Python 3.9: pytorch_lightning==2.6.0.
  • Move Linux Torch wheels from cu124 to cu126 and update the Modal GPU test image to CUDA 12.6.3.
  • Port training setup to Lightning 2 APIs for Trainer devices, Tuner usage, progress bars, and callback hook signatures.
  • Make TensorBoard an explicit dependency and update checkpoint loading for Torch 2.6+ weights_only behavior.
  • Remove the obsolete ReduceLROnPlateau(verbose=...) kwarg rejected by Torch 2.8.

Validation

  • uv lock --check --python 3.9 --no-python-downloads
  • uv run ruff check deepethogram/base.py deepethogram/callbacks.py deepethogram/losses.py deepethogram/schedulers.py deepethogram/utils.py modal_gpu_test.py
  • uv run pytest -> 28 passed, 5 deselected
  • uvx --from modal modal run --detach modal_gpu_test.py --write-junit pytest-gpu.xml -> 5 passed on CUDA 12.6.3 / Torch 2.8.0+cu126

Modal run: https://modal.com/apps/jbohnslav/main/ap-uM8UCZmZzvAZCKx2w6Dsgt

@claude

claude Bot commented Jul 3, 2026

Copy link
Copy Markdown

Claude finished @jbohnslav's task in 3m 14s —— View job


Review of PR #180 — Update Torch/Lightning for Python 3.9

  • Gather context (PR diff against correct merge-base 42a21c5^)
  • Read CLAUDE.md for repo guidelines (none present at root)
  • Review base.py (Lightning 2 Trainer/Tuner APIs)
  • Review callbacks.py (callback hook signatures)
  • Review losses.py / schedulers.py / utils.py
  • Review pyproject.toml / uv.lock dependency bumps
  • Review tests/test_callbacks.py
  • Post final review feedback

Summary

This is a clean, well-scoped dependency bump with a correct port to the Lightning 2 API. Verified each hunk against the actual PR diff (git diff 895154c HEAD, which matches the stated +414/-326 file stats).

Correctness checks — all look right:

  • deepethogram/base.py:40-43 — new trainer_device_kwargs() correctly replaces the removed gpus=[...] kwarg with accelerator/devices, and gracefully falls back to CPU when CUDA isn't available (an improvement over the old code, which would've hard-failed on CPU-only environments).
  • Tuner import from pytorch_lightning.tuner (replacing pl.tuner.tuning.Tuner) and use of trainer.tuner → a persisted local tuner variable — correct for Lightning 2, which removed Trainer.tuner.
  • TQDMProgressBar(refresh_rate=...) + enable_progress_bar=refresh_rate > 0 correctly replaces the removed progress_bar_refresh_rate kwarg (base.py:392-406).
  • callbacks.py hook signatures correctly match Lightning 2: dataloader_idx dropped from on_train_batch_start/end (single train loader assumption), kept with =0 default on validation/test hooks, on_keyboard_interrupton_exception with an isinstance(exception, KeyboardInterrupt) guard, and removal of the deprecated on_init_end/on_epoch_start/on_epoch_end hooks.
  • tests/test_callbacks.py adds solid coverage for the new on_exception behavior on MetricsCallback and CheckpointCallback (both the keyboard-interrupt path and the "ignore other exceptions" path).
  • torch.load(..., weights_only=False) in losses.py/utils.py correctly preserves pre-2.6 behavior for checkpoints that bundle non-tensor training metadata, now that Torch 2.6+ defaults weights_only=True.
  • Removed ReduceLROnPlateau(verbose=True) — correctly matches Torch 2.8 removing that kwarg. Only side effect is losing the automatic per-step LR-change log line, which is a minor, acceptable trade-off.
  • I confirmed there are no other stale torch.load call sites, gpus=[...] usages, or removed-hook usages left elsewhere in the codebase.

Minor nit (non-blocking):

  • deepethogram/base.py:285-286 — the comment above the tuning branch still says reload_dataloaders_every_epoch = True: a bit slower..., but the code (both before and after this PR) actually uses the newer reload_dataloaders_every_n_epochs kwarg. Stale terminology, purely cosmetic. Fix this →

Note on weights_only=False: this restores the pre-Torch-2.6 default rather than introducing new risk, but it's worth keeping in mind that it disables Torch's newer guard against unsafe unpickling for any pretrained-weight files sourced from outside the project (e.g. user-shared checkpoints). Not a regression from this PR — just flagging for awareness since Torch's changed default was a deliberate security hardening step.

No functional bugs found. Given the validation already performed (unit tests, ruff, and a real GPU run on Modal with CUDA 12.6.3 / Torch 2.8.0+cu126), this looks ready to merge.

@jbohnslav jbohnslav marked this pull request as ready for review July 4, 2026 01:26
@cursor

cursor Bot commented Jul 4, 2026

Copy link
Copy Markdown

Bugbot is not enabled for your account, so this pull request was not reviewed.

Enable Bugbot in the Cursor dashboard to get automatic reviews on future PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant