[WIP] Enable hlo lit tests for gfx1250#857
Conversation
| "a100_pcie_80", | ||
| "a6000", | ||
| "b200", | ||
| "gfx1250", |
9a45087 to
d425382
Compare
| // RUN: %if IS_ROCM %{ hlo-opt %s --platform=gpu --stage=llvm-before-optimizations --xla_gpu_target_config_filename=%S/../target_config/specs/gfx1250.txtpb | FileCheck %s --check-prefixes=CHECK-TDM %} | ||
| // RUN: %if IS_ROCM %{ hlo-opt %s --platform=gpu --stage=llvm-before-optimizations --xla_gpu_target_config_filename=%S/../target_config/specs/mi200.txtpb | FileCheck %s --check-prefixes=CHECK-NOTDM %} |
There was a problem hiding this comment.
nit: Both RUN lines are guarded by %if IS_ROCM, which means this test is a no-op on CUDA builds — it will be scheduled (and vacuously pass) on all seven CUDA targets (a100, a6000, b200, h100, p100, v100) since it's not in their disabled_on_gpus lists.
Consider either:
- Adding the CUDA GPUs to
disabled_on_gpusfor this test, or - Adding
%if !IS_ROCMRUN lines that validate something on CUDA (similar to howdot_bf16.hlohandles both paths).
There was a problem hiding this comment.
Thanks, addressed this
| // CHECK-NOTDM-NOT: @llvm.amdgcn.tensor.load.to.lds | ||
| // CHECK-NOTDM-NOT: @llvm.amdgcn.s.wait.tensorcnt | ||
| // CHECK-NOTDM-NOT: @llvm.amdgcn.tensor.store.from.lds | ||
| // CHECK-NOTDM: @llvm.amdgcn.raw.ptr.buffer.load |
There was a problem hiding this comment.
nit: The CHECK-NOTDM-NOT directives only check for the absence of TDM intrinsics up to the final positive CHECK-NOTDM: @llvm.amdgcn.raw.ptr.buffer.load match. If TDM intrinsics appeared after the buffer load in the output, they wouldn't be caught.
Adding a CHECK-NOTDM-NOT: @llvm.amdgcn.raw.ptr.buffer directive after line 15 would make the negative check cover the rest of the output too.
| shared_memory_per_block: 327680 | ||
| shared_memory_per_block_optin: 327680 | ||
| shared_memory_per_core: 327680 |
There was a problem hiding this comment.
The TODO is noted — just flagging the downstream risk: shared_memory_per_block, shared_memory_per_block_optin, and shared_memory_per_core are all set to 327680 (320KB). On NVIDIA GPUs these form a tiered model (per_block < per_block_optin <= per_core). If gfx1250 has a similar tiered model, setting all three equal could cause the Triton codegen to request block-level shared memory beyond what the hardware supports without opt-in. Worth verifying once real hardware specs are available.
Claude Review SummaryThis PR adds gfx1250 support to the HLO lit test infrastructure — new GPU target config spec, a TDM elementwise lit test, and updates to Bazel build rules and Key observations:
3 inline comments posted with details. |
d621bc8 to
e975b2f
Compare
PiperOrigin-RevId: 917794602
Imported from GitHub PR openxla#42500 📝 Summary of Changes Clean-up rocm_configre 🎯 Justification Removing dead code not relevant to current implementation 🚀 Kind of Contribution Please remove what does not apply: ♻️ Cleanup, 📊 Benchmark (for Performance Improvements) Not relevant 🧪 Unit Tests: Not relevant 🧪 Execution Tests: Not relevant Copybara import of the project: -- 122d8d3 by Alexandros Theodoridis <atheodor@amd.com>: Clean-up rocm_configure Merging this change closes openxla#42500 COPYBARA_INTEGRATE_REVIEW=openxla#42500 from ROCm:clean_up_rocm_configure 122d8d3 PiperOrigin-RevId: 917803372
…t mode. Imported from GitHub PR openxla#42185 📝 Summary of Changes The JAX ROCm Bazel presubmit workflow was originally using 64-bit mode. For now, it is being set to 32-bit mode for broader testing. Eventually, both will be enabled for full testing. The only change is to the ROCm CI workflow file (`.github/workflows/rocm_ci.yml`). The flag `JAX_ENABLE_X64` is now set to 0 instead of 1. Please see the associated PR on the JAX side for reference: jax-ml/jax#37475 🎯 Justification This will help the ROCm JAX tests in XLA align with what is being tested in JAX itself on the ROCm side. 🚀 Kind of Contribution 🧪 Tests Copybara import of the project: -- 8b599a9 by tsrw2048 <239799652+tsrw2048@users.noreply.github.com>: [ROCm] Move JAX ROCm Bazel presubmit check to 32-bit mode. The JAX ROCm Bazel presubmit workflow was originally using 64-bit mode. For now, it is being set to 32-bit mode for broader testing. Eventually, both will be enabled for full testing. Merging this change closes openxla#42185 COPYBARA_INTEGRATE_REVIEW=openxla#42185 from ROCm:rocm-move-jax-ci-32bit 8b599a9 PiperOrigin-RevId: 917813152
Minor additions and clarifications. PiperOrigin-RevId: 917816750
Adds a gfx1250 target spec to xla/backends/gpu/target_config/specs and wires it into the hlo_lit_tests suite in xla/backends/gpu/tests/BUILD. Generalizes the existing mi200 special-case in xla/lit.bzl into a small rocm_gpus list so rocm-only tag apply to any ROCm target, not just mi200. Add hlo lit test for Triton TDM intrinsic lowering xla/backends/gpu/tests/triton_tdm_elemwise.hlo
e975b2f to
6ec42ef
Compare
Motivation
[Do not merge] To get review
Submission Checklist