-
Notifications
You must be signed in to change notification settings - Fork 5
Add MIG validations including RHEL setup #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| --- | ||
| - name: Add NVIDIA CUDA repo | ||
| become: true | ||
| ansible.builtin.yum_repository: | ||
| name: nvidia-cuda | ||
| description: nvidia cuda repo | ||
| baseurl: "{{ edpm_accel_drivers_nvidia_repo_url }}/$basearch/" | ||
| gpgcheck: true | ||
| gpgkey: "{{ edpm_accel_drivers_nvidia_repo_gpgkey }}" | ||
|
|
||
| - name: Install nvidia-container-toolkit | ||
| become: true | ||
| ansible.builtin.dnf: | ||
| use_backend: dnf4 | ||
| name: nvidia-container-toolkit | ||
| state: present | ||
|
|
||
| - name: Reboot the VM to find the installed drivers | ||
| become: true | ||
| ansible.builtin.reboot: | ||
| reboot_timeout: 600 | ||
|
|
||
| - name: Check if CDI configfile exists | ||
| become: true | ||
| ansible.builtin.stat: | ||
| path: /etc/cdi/nvidia.yaml | ||
| register: nvidia_driver_cdi_config_file | ||
|
|
||
| - name: Configure NVIDIA container runtime | ||
| when: not nvidia_driver_cdi_config_file.stat.exists | ||
| become: true | ||
| block: | ||
| - name: Ensure CDI directory exists | ||
| ansible.builtin.file: | ||
| path: /etc/cdi | ||
| state: directory | ||
| mode: "0755" | ||
| owner: root | ||
|
|
||
| - name: Configure NVIDIA container runtime | ||
| ansible.builtin.command: nvidia-ctk runtime configure --runtime=containerd | ||
|
Comment on lines
+40
to
+41
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Search for containerd installation tasks
echo "=== Checking for containerd installation ==="
rg -n "name:.*containerd" --type yaml -C 2
# Check if containerd is a dependency or installed via dnf/yum
rg -n "containerd" --type yaml -C 3
# Check nvidia-ctk documentation or usage for runtime requirements
rg -n "nvidia-ctk.*runtime.*configure" --type yaml -C 3Repository: rhos-vaf/gpu-validation Length of output: 1212 Add a containerd dependency/presence check before configuring NVIDIA container runtime
Ensure containerd is installed (and the expected config exists) before this task, or add a guard (e.g., 🤖 Prompt for AI Agents |
||
| changed_when: true | ||
|
|
||
| - name: Generate NVIDIA CDI configuration | ||
| ansible.builtin.command: nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml | ||
| changed_when: true | ||
|
|
||
| - name: Install NVIDIA Management Library | ||
| become: true | ||
| ansible.builtin.dnf: | ||
| use_backend: dnf4 | ||
| name: "{{ gpu_validation_libnvidia_ml_version }}" | ||
| state: present | ||
|
|
||
| - name: Refresh package facts after driver installation | ||
| ansible.builtin.package_facts: | ||
| manager: rpm | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These (
Check if CDI configfile existsandConfigure NVIDIA container runtime) can be part of a configuration task afterInstall NVIDIA CUDA reposandReboot if system updates require ittasksThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting by leaving the reboot where it is - let's see if the tasks can get by with it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reboot was required - conditional reboot task did not work here - added reboot back.