Skip to content

[WIP] Enable hlo lit tests for gfx1250#857

Open
nurmukhametov wants to merge 5 commits into
mainfrom
anurmukh/hlo-tests-gfx1250
Open

[WIP] Enable hlo lit tests for gfx1250#857
nurmukhametov wants to merge 5 commits into
mainfrom
anurmukh/hlo-tests-gfx1250

Conversation

@nurmukhametov
Copy link
Copy Markdown
Member

Motivation

[Do not merge] To get review

Submission Checklist

"a100_pcie_80",
"a6000",
"b200",
"gfx1250",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 :)

@nurmukhametov nurmukhametov force-pushed the anurmukh/hlo-tests-gfx1250 branch 3 times, most recently from 9a45087 to d425382 Compare May 15, 2026 20:32
@nurmukhametov nurmukhametov added the claude-review Request a Claude AI code review for this PR label May 15, 2026
Comment on lines +1 to +2
// RUN: %if IS_ROCM %{ hlo-opt %s --platform=gpu --stage=llvm-before-optimizations --xla_gpu_target_config_filename=%S/../target_config/specs/gfx1250.txtpb | FileCheck %s --check-prefixes=CHECK-TDM %}
// RUN: %if IS_ROCM %{ hlo-opt %s --platform=gpu --stage=llvm-before-optimizations --xla_gpu_target_config_filename=%S/../target_config/specs/mi200.txtpb | FileCheck %s --check-prefixes=CHECK-NOTDM %}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Both RUN lines are guarded by %if IS_ROCM, which means this test is a no-op on CUDA builds — it will be scheduled (and vacuously pass) on all seven CUDA targets (a100, a6000, b200, h100, p100, v100) since it's not in their disabled_on_gpus lists.

Consider either:

  1. Adding the CUDA GPUs to disabled_on_gpus for this test, or
  2. Adding %if !IS_ROCM RUN lines that validate something on CUDA (similar to how dot_bf16.hlo handles both paths).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addressed this

Comment on lines +12 to +15
// CHECK-NOTDM-NOT: @llvm.amdgcn.tensor.load.to.lds
// CHECK-NOTDM-NOT: @llvm.amdgcn.s.wait.tensorcnt
// CHECK-NOTDM-NOT: @llvm.amdgcn.tensor.store.from.lds
// CHECK-NOTDM: @llvm.amdgcn.raw.ptr.buffer.load
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The CHECK-NOTDM-NOT directives only check for the absence of TDM intrinsics up to the final positive CHECK-NOTDM: @llvm.amdgcn.raw.ptr.buffer.load match. If TDM intrinsics appeared after the buffer load in the output, they wouldn't be caught.

Adding a CHECK-NOTDM-NOT: @llvm.amdgcn.raw.ptr.buffer directive after line 15 would make the negative check cover the rest of the output too.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +21 to +23
shared_memory_per_block: 327680
shared_memory_per_block_optin: 327680
shared_memory_per_core: 327680
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO is noted — just flagging the downstream risk: shared_memory_per_block, shared_memory_per_block_optin, and shared_memory_per_core are all set to 327680 (320KB). On NVIDIA GPUs these form a tiered model (per_block < per_block_optin <= per_core). If gfx1250 has a similar tiered model, setting all three equal could cause the Triton codegen to request block-level shared memory beyond what the hardware supports without opt-in. Worth verifying once real hardware specs are available.

@claude
Copy link
Copy Markdown

claude Bot commented May 15, 2026

Claude Review Summary

This PR adds gfx1250 support to the HLO lit test infrastructure — new GPU target config spec, a TDM elementwise lit test, and updates to Bazel build rules and lit.bzl.

Key observations:

  • The new triton_tdm_elemwise.hlo test is ROCm-only but will vacuously pass on CUDA targets — consider disabling it for CUDA GPUs or adding CUDA-specific checks.
  • The CHECK-NOTDM-NOT pattern has a minor coverage gap after its last positive check.
  • The shared memory placeholder values (all equal at 320KB) are acknowledged by a TODO — worth verifying against real hardware specs when available.
  • The disabled_on_gpus list for gfx1250 is clean (only references existing files), unlike some stale entries in mi200's list.

3 inline comments posted with details.

@github-actions github-actions Bot removed the claude-review Request a Claude AI code review for this PR label May 15, 2026
@nurmukhametov nurmukhametov force-pushed the anurmukh/hlo-tests-gfx1250 branch 2 times, most recently from d621bc8 to e975b2f Compare May 15, 2026 21:14
dimitar-asenov and others added 5 commits May 19, 2026 06:21
Imported from GitHub PR openxla#42500

📝 Summary of Changes
Clean-up rocm_configre

🎯 Justification
Removing dead code not relevant to current implementation

🚀 Kind of Contribution
Please remove what does not apply:  ♻️ Cleanup,

📊 Benchmark (for Performance Improvements)
Not relevant

🧪 Unit Tests:
Not relevant

🧪 Execution Tests:
Not relevant

Copybara import of the project:

--
122d8d3 by Alexandros Theodoridis <atheodor@amd.com>:

Clean-up rocm_configure

Merging this change closes openxla#42500

COPYBARA_INTEGRATE_REVIEW=openxla#42500 from ROCm:clean_up_rocm_configure 122d8d3
PiperOrigin-RevId: 917803372
…t mode.

Imported from GitHub PR openxla#42185

📝 Summary of Changes
The JAX ROCm Bazel presubmit workflow was originally using 64-bit mode. For now, it is being set to 32-bit mode for broader testing. Eventually, both will be enabled for full testing.

The only change is to the ROCm CI workflow file (`.github/workflows/rocm_ci.yml`). The flag `JAX_ENABLE_X64` is now set to 0 instead of 1.

Please see the associated PR on the JAX side for reference: jax-ml/jax#37475

🎯 Justification
This will help the ROCm JAX tests in XLA align with what is being tested in JAX itself on the ROCm side.

🚀 Kind of Contribution
🧪 Tests

Copybara import of the project:

--
8b599a9 by tsrw2048 <239799652+tsrw2048@users.noreply.github.com>:

[ROCm] Move JAX ROCm Bazel presubmit check to 32-bit mode.

The JAX ROCm Bazel presubmit workflow was originally using 64-bit mode.
For now, it is being set to 32-bit mode for broader testing.
Eventually, both will be enabled for full testing.

Merging this change closes openxla#42185

COPYBARA_INTEGRATE_REVIEW=openxla#42185 from ROCm:rocm-move-jax-ci-32bit 8b599a9
PiperOrigin-RevId: 917813152
Minor additions and clarifications.

PiperOrigin-RevId: 917816750
Adds a gfx1250 target spec to xla/backends/gpu/target_config/specs and
wires it into the hlo_lit_tests suite in xla/backends/gpu/tests/BUILD.
Generalizes the existing mi200 special-case in xla/lit.bzl into a small
rocm_gpus list so rocm-only tag apply to any ROCm target, not just
mi200.

Add hlo lit test for Triton TDM intrinsic lowering
xla/backends/gpu/tests/triton_tdm_elemwise.hlo
@nurmukhametov nurmukhametov force-pushed the anurmukh/hlo-tests-gfx1250 branch from e975b2f to 6ec42ef Compare May 19, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants