Port to PyTorch 2.x / CUDA 12.x / Python 3.12 by ilessiorobotflowlabs · Pull Request #53 · NVlabs/ODISE

ilessiorobotflowlabs · 2026-03-29T08:48:17Z

Summary

Complete compatibility port of ODISE to the modern PyTorch ecosystem. All 42 files updated, validated end-to-end on 8xL4 GPU server with zero errors.

Stack: PyTorch 2.10 · CUDA 12.8 · Python 3.12 · Pillow 12 · NumPy 2.x · Gradio 4.x · pytorch-lightning 2.x

Changes

PyTorch 2.x API migrations:

torch.cuda.amp.autocast → torch.amp.autocast('cuda') (10 files)
torch.cuda.amp.GradScaler → torch.amp.GradScaler('cuda')
torch._six.inf → math.inf
pkg_resources → importlib.resources
torch.load(..., weights_only=False) for legacy LDM checkpoints
torch.meshgrid(..., indexing='ij') to silence deprecation
use_reentrant=False in gradient checkpointing

CUDA C++ (Mask2Former deformable attention):

Tensor.data<T>() → Tensor.data_ptr<T>() (14 call sites, removed in PyTorch 2.x)
AT_ERROR → TORCH_CHECK(false, ...)
Removed deleted ATen/cuda/CUDAApplyUtils.cuh include
Added gpuAtomicAdd wrapper with BFloat16/Half specializations
Removed -D__CUDA_NO_HALF* build flags (no longer needed in PyTorch 2.x)

NumPy 2.x:

np.int → np.int64, np.bool → np.bool_
int() wrapping for np.linspace count args

Third-party compatibility:

pytorch_lightning.utilities.distributed → .rank_zero
PIL.Image.LINEAR → Image.BILINEAR (removed in Pillow 10+)
Gradio 3.x → 4.x API migration in demo/app.py
Removed hard detectron2==0.6 pin in Mask2Former/setup.py

Bug fixes found during port:

Fixed NameError on non-main DDP workers (writers variable unbound)
Fixed undeclared enable_visualizer key crashing OmegaConf struct mode
Fixed inverted autocast logic in msdeformattn.py (re-enabled AMP during inference instead of disabling)
Fixed = instead of += for demo_stuff_colors (mutated module-level constant)
Fixed bare except: catching KeyboardInterrupt in CUDA op fallback
Fixed file handle leak in default_setup
Fixed shell injection via $CXX in collect_env.py (shell=True → list args)
Fixed operator precedence bug in extract_features.py
Added bootstrap scripts for third-party submodule setup

Validation

Tested on 8x NVIDIA L4 (CUDA 12.8, PyTorch 2.10, Python 3.12):

Test	Status
All core imports	✅
CUDA deformable attention (fp32)	✅
CUDA deformable attention (fp16)	✅
Config loading (LazyConfig)	✅
LDM backbone (LatentDiffusion)	✅
Demo app	✅
train_net.py CLI	✅
End-to-end panoptic segmentation inference	✅

Motivation

ODISE is an excellent open-vocabulary panoptic segmentation model, but the original codebase targets PyTorch 1.x / CUDA 11.x which makes it unusable on modern GPU infrastructure. This PR brings full compatibility with current-generation hardware and software, making ODISE accessible to researchers and practitioners on modern setups.

Ported by RobotFlow Labs 🤖

🤖 Generated with Claude Code

Complete compatibility port for modern stack: - PyTorch 2.10+, CUDA 12.8, Python 3.12, Pillow 12, NumPy 2.x Core changes: - torch.cuda.amp.autocast → torch.amp.autocast('cuda') across all files - torch.cuda.amp.GradScaler → torch.amp.GradScaler('cuda') - torch._six.inf → math.inf - pkg_resources → importlib.resources - weights_only=False for legacy LDM checkpoints - Deferred imports for optional deps (gradio, nltk) CUDA C++ (Mask2Former deformable attention): - Tensor.data<T>() → data_ptr<T>() (removed in PyTorch 2.x) - AT_ERROR → TORCH_CHECK(false, ...) - Removed deleted ATen/cuda/CUDAApplyUtils.cuh include - Added gpuAtomicAdd wrapper with BFloat16/Half specializations - Removed -D__CUDA_NO_HALF* flags for fp16 support - use_reentrant=False in gradient checkpointing Bug fixes found via code review: - Fixed NameError on non-main DDP workers (writers variable) - Fixed OmegaConf crash with undeclared enable_visualizer key - Fixed inverted autocast logic in msdeformattn.py - Fixed = instead of += for demo_stuff_colors (module global mutation) - Fixed bare except: catching KeyboardInterrupt - Fixed file handle leak in default_setup - Fixed shell injection via $CXX in collect_env - Fixed operator precedence bug in extract_features.py - Added torch.meshgrid indexing='ij' to silence deprecation - NumPy 2.x int casts for np.linspace throughout Third-party: - pytorch_lightning.utilities.distributed → .rank_zero - PIL.Image.LINEAR → Image.BILINEAR - Gradio 3.x → 4.x API migration in demo/app.py - Removed detectron2 v0.6 hard pin in Mask2Former/setup.py Validated: all imports, CUDA ops (fp32+fp16), config loading, LDM, demo inference on 4 images — zero errors on 8xL4 GPU server. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port to PyTorch 2.x / CUDA 12.x / Python 3.12#53

Port to PyTorch 2.x / CUDA 12.x / Python 3.12#53
ilessiorobotflowlabs wants to merge 1 commit into
NVlabs:mainfrom
ilessiorobotflowlabs:main

ilessiorobotflowlabs commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ilessiorobotflowlabs commented Mar 29, 2026

Summary

Changes

Validation

Motivation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants