Problem Description
Deploy amd-gpu-operator >= v1.5.z on any OCP 4.22 cluster with a DeviceConfig that builds the rpm from amdgpu package repository under the default https://repo.radeon.com/amdgpu/.
The kernel driver builds fail while attempt to pull from the amdgpu radeon repo url due to the new DockerfileTemplate.rpm.coreos attempt to fall back to the non-existent OS '*/el/9/main/x86_64/` package repo
[1/2] STEP 8/9: RUN dnf clean all && cat /etc/yum.repos.d/amdgpu.repo && dnf install amdgpu-dkms -y && depmod ${KERNEL_VERSION} && find /lib/modules/${KERNEL_VERSION} -name "*.ko.xz" -exec xz -d {} \; && depmod ${KERNEL_VERSION}
(microdnf:8): librhsm-WARNING **: 18:52:44.002: Found 0 entitlement certificates
(microdnf:8): librhsm-WARNING **: 18:52:44.004: Found 0 entitlement certificates
Complete.
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/30.20.1/el/9/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
...
error: cannot update repo 'amdgpu': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried; Last error: Status code: 404 for https://repo.radeon.com/amdgpu/30.20.1/el/9/main/x86_64/repodata/repomd.xml (IP: 23.212.251.199)
error: build error: building at STEP "RUN dnf clean all && cat /etc/yum.repos.d/amdgpu.repo && dnf install amdgpu-dkms -y && depmod ${KERNEL_VERSION} && find /lib/modules/${KERNEL_VERSION} -name "*.ko.xz" -exec xz -d {} \; && depmod ${KERNEL_VERSION}": while running runtime: exit status 1
Operating System
RHEL CoreOS 9.8 (Plow) / OCP v4.22
CPU
N/A
GPU
N/A
ROCm Version
All amdgpu versions
ROCm Component
No response
Steps to Reproduce
- Deploy release candidate for OCP v4.22
- Create
DeviceConfig that attempt to build the kernel driver from the amdgpu-dkms RPM in the amdgpu package repo
- Build fails
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
The workaround to use useSourceImage: true in DeviceConfig fails for a similar reason since https://hub.docker.com/r/rocm/amdgpu-driver/tags has no tags for coreos-9.8-*
Problem Description
Deploy amd-gpu-operator >= v1.5.z on any OCP 4.22 cluster with a
DeviceConfigthat builds the rpm fromamdgpupackage repository under the default https://repo.radeon.com/amdgpu/.The kernel driver builds fail while attempt to pull from the
amdgpuradeon repo url due to the new DockerfileTemplate.rpm.coreos attempt to fall back to the non-existent OS '*/el/9/main/x86_64/` package repoOperating System
RHEL CoreOS 9.8 (Plow) / OCP v4.22
CPU
N/A
GPU
N/A
ROCm Version
All amdgpu versions
ROCm Component
No response
Steps to Reproduce
DeviceConfigthat attempt to build the kernel driver from theamdgpu-dkmsRPM in theamdgpupackage repo(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
The workaround to use
useSourceImage: trueinDeviceConfigfails for a similar reason since https://hub.docker.com/r/rocm/amdgpu-driver/tags has no tags forcoreos-9.8-*