Skip to content

Port to PyTorch 2.x / CUDA 12.x / Python 3.12#53

Open
ilessiorobotflowlabs wants to merge 1 commit into
NVlabs:mainfrom
ilessiorobotflowlabs:main
Open

Port to PyTorch 2.x / CUDA 12.x / Python 3.12#53
ilessiorobotflowlabs wants to merge 1 commit into
NVlabs:mainfrom
ilessiorobotflowlabs:main

Conversation

@ilessiorobotflowlabs
Copy link
Copy Markdown

Summary

Complete compatibility port of ODISE to the modern PyTorch ecosystem. All 42 files updated, validated end-to-end on 8xL4 GPU server with zero errors.

Stack: PyTorch 2.10 · CUDA 12.8 · Python 3.12 · Pillow 12 · NumPy 2.x · Gradio 4.x · pytorch-lightning 2.x

Changes

PyTorch 2.x API migrations:

  • torch.cuda.amp.autocasttorch.amp.autocast('cuda') (10 files)
  • torch.cuda.amp.GradScalertorch.amp.GradScaler('cuda')
  • torch._six.infmath.inf
  • pkg_resourcesimportlib.resources
  • torch.load(..., weights_only=False) for legacy LDM checkpoints
  • torch.meshgrid(..., indexing='ij') to silence deprecation
  • use_reentrant=False in gradient checkpointing

CUDA C++ (Mask2Former deformable attention):

  • Tensor.data<T>()Tensor.data_ptr<T>() (14 call sites, removed in PyTorch 2.x)
  • AT_ERRORTORCH_CHECK(false, ...)
  • Removed deleted ATen/cuda/CUDAApplyUtils.cuh include
  • Added gpuAtomicAdd wrapper with BFloat16/Half specializations
  • Removed -D__CUDA_NO_HALF* build flags (no longer needed in PyTorch 2.x)

NumPy 2.x:

  • np.intnp.int64, np.boolnp.bool_
  • int() wrapping for np.linspace count args

Third-party compatibility:

  • pytorch_lightning.utilities.distributed.rank_zero
  • PIL.Image.LINEARImage.BILINEAR (removed in Pillow 10+)
  • Gradio 3.x → 4.x API migration in demo/app.py
  • Removed hard detectron2==0.6 pin in Mask2Former/setup.py

Bug fixes found during port:

  • Fixed NameError on non-main DDP workers (writers variable unbound)
  • Fixed undeclared enable_visualizer key crashing OmegaConf struct mode
  • Fixed inverted autocast logic in msdeformattn.py (re-enabled AMP during inference instead of disabling)
  • Fixed = instead of += for demo_stuff_colors (mutated module-level constant)
  • Fixed bare except: catching KeyboardInterrupt in CUDA op fallback
  • Fixed file handle leak in default_setup
  • Fixed shell injection via $CXX in collect_env.py (shell=True → list args)
  • Fixed operator precedence bug in extract_features.py
  • Added bootstrap scripts for third-party submodule setup

Validation

Tested on 8x NVIDIA L4 (CUDA 12.8, PyTorch 2.10, Python 3.12):

Test Status
All core imports
CUDA deformable attention (fp32)
CUDA deformable attention (fp16)
Config loading (LazyConfig)
LDM backbone (LatentDiffusion)
Demo app
train_net.py CLI
End-to-end panoptic segmentation inference

ODISE inference output on PyTorch 2.10 + CUDA 12.8

Motivation

ODISE is an excellent open-vocabulary panoptic segmentation model, but the original codebase targets PyTorch 1.x / CUDA 11.x which makes it unusable on modern GPU infrastructure. This PR brings full compatibility with current-generation hardware and software, making ODISE accessible to researchers and practitioners on modern setups.

Ported by RobotFlow Labs 🤖

🤖 Generated with Claude Code

Complete compatibility port for modern stack:
- PyTorch 2.10+, CUDA 12.8, Python 3.12, Pillow 12, NumPy 2.x

Core changes:
- torch.cuda.amp.autocast → torch.amp.autocast('cuda') across all files
- torch.cuda.amp.GradScaler → torch.amp.GradScaler('cuda')
- torch._six.inf → math.inf
- pkg_resources → importlib.resources
- weights_only=False for legacy LDM checkpoints
- Deferred imports for optional deps (gradio, nltk)

CUDA C++ (Mask2Former deformable attention):
- Tensor.data<T>() → data_ptr<T>() (removed in PyTorch 2.x)
- AT_ERROR → TORCH_CHECK(false, ...)
- Removed deleted ATen/cuda/CUDAApplyUtils.cuh include
- Added gpuAtomicAdd wrapper with BFloat16/Half specializations
- Removed -D__CUDA_NO_HALF* flags for fp16 support
- use_reentrant=False in gradient checkpointing

Bug fixes found via code review:
- Fixed NameError on non-main DDP workers (writers variable)
- Fixed OmegaConf crash with undeclared enable_visualizer key
- Fixed inverted autocast logic in msdeformattn.py
- Fixed = instead of += for demo_stuff_colors (module global mutation)
- Fixed bare except: catching KeyboardInterrupt
- Fixed file handle leak in default_setup
- Fixed shell injection via $CXX in collect_env
- Fixed operator precedence bug in extract_features.py
- Added torch.meshgrid indexing='ij' to silence deprecation
- NumPy 2.x int casts for np.linspace throughout

Third-party:
- pytorch_lightning.utilities.distributed → .rank_zero
- PIL.Image.LINEAR → Image.BILINEAR
- Gradio 3.x → 4.x API migration in demo/app.py
- Removed detectron2 v0.6 hard pin in Mask2Former/setup.py

Validated: all imports, CUDA ops (fp32+fp16), config loading, LDM,
demo inference on 4 images — zero errors on 8xL4 GPU server.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants