Skip to content

update gkm-extract with unified (NVIDIA/AMD) support#145

Open
Billy99 wants to merge 1 commit into
redhat-et:mainfrom
Billy99:billy99-gkm-extract-unify-image
Open

update gkm-extract with unified (NVIDIA/AMD) support#145
Billy99 wants to merge 1 commit into
redhat-et:mainfrom
Billy99:billy99-gkm-extract-unify-image

Conversation

@Billy99

@Billy99 Billy99 commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Follow #138 by adding NVIDIA support to the gkm-extract image, AMD support already there.

Summary by CodeRabbit

  • New Features
    • Expanded the container runtime to support both NVIDIA and AMD GPU setups with unified GPU readiness.
    • Improved AMD/ROCm tooling availability and GPU/SMI helpers for smoother detection.
    • Updated container storage configuration (overlay via FUSE) to improve startup compatibility.
  • Maintenance
    • Refreshed the build and runtime base images and dependency set for more consistent, modern container builds.
  • Chores
    • Updated image metadata labels to reflect the unified NVIDIA+AMD GPU support and ROCm version.

@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: c66c64d4-5e13-44d9-8b0c-c9ae23165067

📥 Commits

Reviewing files that changed from the base of the PR and between 039aac0 and 6271b63.

📒 Files selected for processing (1)
  • Containerfile.gkm-extract
🚧 Files skipped from review as they are similar to previous changes (1)
  • Containerfile.gkm-extract

📝 Walkthrough

Walkthrough

Containerfile.gkm-extract switches the runtime image to a unified NVIDIA and AMD GPU base, installs ROCm and container runtime dependencies, configures overlay storage, and updates the final image metadata.

Changes

Unified GPU Runtime Image

Layer / File(s) Summary
Base image and runtime args
Containerfile.gkm-extract
The deps stage base image changes, and the runtime stage switches to the unified GPU base with ROCm and AMDGPU build arguments.
Runtime packages and ROCm install
Containerfile.gkm-extract
The runtime installs base packages, downloads and installs amdgpu-install, adds ROCm SMI tools and libdrm2, and creates the amd-smi and rocm-smi symlinks.
Storage config, binary, and labels
Containerfile.gkm-extract
The image writes container storage configuration, copies in gkm-extract, and updates labels for unified GPU support and ROCm version.

Estimated code review effort: 3 (Moderate) | ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly reflects the main change: making gkm-extract support both NVIDIA and AMD GPUs in a unified image.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
Containerfile.gkm-extract (2)

95-95: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Remove the duplicate binary copy layer.

Line 103 copies the same builder output to /gkm-extract again, so line 95 adds a redundant image layer with the same binary bytes.

Proposed fix
-COPY --from=builder /workspace/bin/gkm-extract /gkm-extract
-
 LABEL description="MCV Unified - includes both NVIDIA (CUDA/NVML) and AMD (ROCm) GPU support for auto-detection"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Containerfile.gkm-extract` at line 95, The Containerfile currently copies the
gkm-extract binary from the builder stage twice, creating a redundant image
layer with the same bytes. Remove the duplicate COPY for /gkm-extract so the
final image only includes a single copy of the binary, and keep the remaining
build steps in the Containerfile.gkm-extract stage consistent with the intended
final artifact location.

48-50: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Single-source the ROCm version used by install paths and metadata.

ROCM_VERSION, OPT_ROCM_VERSION, and the hard-coded rocm-version label can drift when build args are overridden, leaving broken SMI symlinks or incorrect image metadata. Use one version arg and fail fast if the expected binaries are missing.

Proposed fix
 ARG ROCM_VERSION=7.0.1
 ARG AMDGPU_VERSION=7.0.1.70001
-ARG OPT_ROCM_VERSION=7.0.1
@@
-    ln -s /opt/rocm-${OPT_ROCM_VERSION}/bin/amd-smi /usr/bin/amd-smi && \
-    ln -s /opt/rocm-${OPT_ROCM_VERSION}/bin/rocm-smi /usr/bin/rocm-smi && \
+    test -x /opt/rocm-${ROCM_VERSION}/bin/amd-smi && \
+    test -x /opt/rocm-${ROCM_VERSION}/bin/rocm-smi && \
+    ln -sf /opt/rocm-${ROCM_VERSION}/bin/amd-smi /usr/bin/amd-smi && \
+    ln -sf /opt/rocm-${ROCM_VERSION}/bin/rocm-smi /usr/bin/rocm-smi && \
@@
-LABEL rocm-version="7.0.1"
+LABEL rocm-version="${ROCM_VERSION}"

Also applies to: 85-86, 101-101

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Containerfile.gkm-extract` around lines 48 - 50, The ROCm version is
currently duplicated across ROCM_VERSION, OPT_ROCM_VERSION, and the hard-coded
rocm-version label, which can drift when build args change. Update the
Containerfile to single-source the version from one build arg and reuse it
everywhere the install paths, symlinks, and metadata are defined. In the ROCm
install/setup steps and the label assignment, reference the same version symbol
consistently, and add a fail-fast check in the relevant setup stage to ensure
the expected ROCm binaries are present before continuing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Containerfile.gkm-extract`:
- Around line 75-77: The ROCm installer download in the Containerfile still
points to the ubuntu/jammy channel, which can mismatch the ubuntu24.04 base
image. Update the download URL used in the AMDGPU install RUN step to the Noble
(ubuntu/noble) path so the installer matches this image and ROCm/GPU detection
can work correctly.

---

Nitpick comments:
In `@Containerfile.gkm-extract`:
- Line 95: The Containerfile currently copies the gkm-extract binary from the
builder stage twice, creating a redundant image layer with the same bytes.
Remove the duplicate COPY for /gkm-extract so the final image only includes a
single copy of the binary, and keep the remaining build steps in the
Containerfile.gkm-extract stage consistent with the intended final artifact
location.
- Around line 48-50: The ROCm version is currently duplicated across
ROCM_VERSION, OPT_ROCM_VERSION, and the hard-coded rocm-version label, which can
drift when build args change. Update the Containerfile to single-source the
version from one build arg and reuse it everywhere the install paths, symlinks,
and metadata are defined. In the ROCm install/setup steps and the label
assignment, reference the same version symbol consistently, and add a fail-fast
check in the relevant setup stage to ensure the expected ROCm binaries are
present before continuing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 20571795-f39b-4ac0-acd9-c5f3ec6d5907

📥 Commits

Reviewing files that changed from the base of the PR and between c13060b and 039aac0.

📒 Files selected for processing (1)
  • Containerfile.gkm-extract

Comment thread Containerfile.gkm-extract
Comment on lines +75 to +77
# Install ROCm tools for AMD GPU detection
# Note: Using Ubuntu 22.04 (jammy) packages as ROCm doesn't officially support Ubuntu 24.04 yet
RUN wget https://repo.radeon.com/amdgpu-install/${ROCM_VERSION}/ubuntu/jammy/amdgpu-install_${AMDGPU_VERSION}-1_all.deb && \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate the file and inspect the relevant section with line numbers.
git ls-files 'Containerfile.gkm-extract'
echo '---'
cat -n Containerfile.gkm-extract | sed -n '65,85p'

# Show the surrounding ROCm-related build args and copy steps.
echo '---'
cat -n Containerfile.gkm-extract | sed -n '40,110p'

Repository: redhat-et/GKM

Length of output: 4047


🌐 Web query:

ROCm 7.0.1 Ubuntu 24.04 quick start amdgpu-install ubuntu noble path

💡 Result:

To install ROCm 7.0.1 on Ubuntu 24.04 (Noble Numbat) using the amdgpu-install package, follow these official steps derived from the ROCm 7.0.1 documentation [1]. 1. Register the AMDGPU repository Download and install the amdgpu-install package for Ubuntu 24.04: wget https://repo.radeon.com/amdgpu-install/7.0.1/ubuntu/noble/amdgpu-install_7.0.1.70001-1_all.deb sudo apt install./amdgpu-install_7.0.1.70001-1_all.deb sudo apt update 2. Configure Repository and Install After registering the repository, you can proceed with the installation. Note that for ROCm 7.0.1, the repository is located at the path associated with 'noble' (the codename for Ubuntu 24.04) [1][2]. For a complete installation including the necessary kernel drivers and ROCm stack, refer to the official ROCm 7.0.1 Quick Start Guide [1] or the Ubuntu-specific native installation guide [3]. Important installation notes: - System Requirements: Ensure your system meets the documented ROCm hardware and kernel requirements before beginning [1]. - GPG Keys: If performing a manual repository setup (native method), ensure you correctly add the ROCm GPG key to /etc/apt/keyrings/rocm.gpg to verify package authenticity [3]. - Conflicts: If you are upgrading or switching versions, remove previous ROCm installations to avoid dependency conflicts [4]. - Permissions: After installation, remember to add your user to the render and video groups to access GPU resources: sudo usermod -a -G render,video $LOGNAME [5].

Citations:


Use the Noble AMDGPU installer path for this Ubuntu 24.04 image. The current ubuntu/jammy URL can pull the wrong package channel for the ubuntu24.04 base and break the ROCm install or GPU detection.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Containerfile.gkm-extract` around lines 75 - 77, The ROCm installer download
in the Containerfile still points to the ubuntu/jammy channel, which can
mismatch the ubuntu24.04 base image. Update the download URL used in the AMDGPU
install RUN step to the Noble (ubuntu/noble) path so the installer matches this
image and ROCm/GPU detection can work correctly.

@Billy99 Billy99 requested a review from maryamtahhan June 30, 2026 12:28
Comment thread Containerfile.gkm-extract
libbtrfs0 \
libffi8 \
libc6 \
buildah \

@Billy99 Billy99 Jun 30, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maryamtahhan I took this list straight from #138. Do we need buildah in the running stage? Are there others that maybe were needed in #138 but aren't really needed in the extraction container?

@cmagina cmagina left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than my 2 small comments, lgtm

Comment thread Containerfile.gkm-extract Outdated
amd-smi-lib \
rocm-smi-lib \
libdrm2 && \
ln -s /opt/rocm-${OPT_ROCM_VERSION}/bin/amd-smi /usr/bin/amd-smi && \

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use /usr/local for this to avoid package clashes.

Comment thread Containerfile.gkm-extract Outdated
rocm-smi-lib \
libdrm2 && \
ln -s /opt/rocm-${OPT_ROCM_VERSION}/bin/amd-smi /usr/bin/amd-smi && \
ln -s /opt/rocm-${OPT_ROCM_VERSION}/bin/rocm-smi /usr/bin/rocm-smi && \

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same: /usr/local

@Billy99 Billy99 force-pushed the billy99-gkm-extract-unify-image branch from 039aac0 to 6271b63 Compare July 2, 2026 14:32

@cmagina cmagina left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Follow PR#138 by adding NVIDIA support to the gkm-extract image, AMD
support already there.

Signed-off-by: Billy McFall <22157057+Billy99@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants