Build wheels for AMD GPU with ROCm#1356
Conversation
Co-Authored-By: Claude <noreply@anthropic.com>
- Cast cudaDeviceSynchronize()/hipDeviceSynchronize() to (void) in K2_CUDA_SAFE_CALL to suppress -Wunused-result warnings on HIP builds - Add ROCm version, HIP version, and kWithHip to version.h.in - Detect ROCM_VERSION from hip_VERSION cmake package or env var - Detect TORCH_HIP_VERSION from torch.version.hip Co-Authored-By: Claude <noreply@anthropic.com>
- version.cu: expose rocm_version, torch_hip_version, with_hip - version.py: print ROCm version, HIP version, and with_hip flag Co-Authored-By: Claude <noreply@anthropic.com>
The hipcub functions (e.g. DeviceScan::ExclusiveScan) are [[nodiscard]]. Cast the whole expression to (void) to suppress the warning. Co-Authored-By: Claude <noreply@anthropic.com>
#pragma unroll(N) with parentheses triggers -Wcuda-compat warning. Use #pragma unroll N instead. Co-Authored-By: Claude <noreply@anthropic.com>
rocprim/hipcub requires iterator operators (operator+, operator[], operator()) to be callable from __host__ context during template instantiation. Change __device__ __forceinline__ to K2_CUDA_HOSTDEV on RowSplitsDiff, HashInputIterator, PairInputIterator, and HashCombineOp. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces ROCm/HIP support to the k2 library, updating build configurations, documentation, versioning scripts, and CI workflows to support AMD GPUs. It also refactors CUDA kernels and macros for compatibility with HIP. The review feedback highlights a potential AttributeError in get_version.py if the ROCm version cannot be resolved, and recommends pinning the libhipcxx repository to a specific tag in the build script to ensure reproducible builds.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| rocm_version = get_rocm_version() | ||
| # Keep only major.minor (e.g., 7.1.52802 -> 7.1) | ||
| rocm_version = '.'.join(rocm_version.split('.')[:2]) |
There was a problem hiding this comment.
If get_rocm_version() returns None (which can happen if is_rocm() is True due to K2_WITH_HIP=ON in K2_CMAKE_ARGS but the ROCm version cannot be detected), calling rocm_version.split('.') will raise an AttributeError: 'NoneType' object has no attribute 'split'. We should handle the case where rocm_version is None to prevent build crashes.
| rocm_version = get_rocm_version() | |
| # Keep only major.minor (e.g., 7.1.52802 -> 7.1) | |
| rocm_version = '.'.join(rocm_version.split('.')[:2]) | |
| rocm_version = get_rocm_version() | |
| if rocm_version: | |
| # Keep only major.minor (e.g., 7.1.52802 -> 7.1) | |
| rocm_version = '.'.join(rocm_version.split('.')[:2]) | |
| else: | |
| rocm_version = 'unknown' |
|
|
||
| # Install libhipcxx (provides <cuda/std/*> headers needed by k2's HIP build) | ||
| echo "Installing libhipcxx..." | ||
| git clone --depth 1 https://github.com/ROCm/libhipcxx.git /tmp/libhipcxx |
There was a problem hiding this comment.
Cloning the libhipcxx repository without specifying a tag or commit hash can lead to non-reproducible builds if upstream changes break compatibility. It is highly recommended to pin the repository to a specific stable release tag or commit hash.
| git clone --depth 1 https://github.com/ROCm/libhipcxx.git /tmp/libhipcxx | |
| git clone --depth 1 --branch rocm-6.3.0 https://github.com/ROCm/libhipcxx.git /tmp/libhipcxx |
|
Tested on an AMD Instinct MI250X (gfx90a), ROCm 7.2.1, Python 3.10, torch 2.12.1+rocm7.2. The wheel installs and runs correctly: Two notes: 1. The documented pip command fails. This, from the installation page: errors with 2. A small FYI on the arch list (not a problem, just an option). The arch selection in If it's ever useful to have the list stay in lockstep with the torch build the wheel is paired against, HIP_ARCH=$(python -c "import torch; print(';'.join(a for a in torch.cuda.get_arch_list() if a.startswith('gfx')))")
cmake -DCMAKE_HIP_ARCHITECTURES="$HIP_ARCH" ...For reference, on this build the two lists differ only by |
See also #1353
Installation doc: https://k2-fsa.github.io/k2/installation/from_wheels.html#linux-rocm-example
See
https://k2-fsa.github.io/k2/rocm.html
cc @jeffdaily