Use JIT CDI spec generation by default#45
Conversation
This change switches to using JIT CDI spec generation for the injection of devices into kind nodes. Signed-off-by: Evan Lezar <elezar@nvidia.com>
There was a problem hiding this comment.
Pull Request Overview
This PR switches the NVIDIA Container Runtime from legacy device injection to JIT (Just-In-Time) CDI (Container Device Interface) spec generation by default. The change modifies mount paths to use the CDI format and updates the parsing logic to handle both legacy and CDI device specifications.
Key changes:
- Updates container mount paths to use CDI format (
/var/run/nvidia-container-devices/cdi/runtime.nvidia.com/gpu/) - Modifies device parsing logic to support both legacy and CDI device specifications
- Applies changes consistently across default configuration template and examples
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/nvkind/node.go | Updates device parsing logic to handle both legacy and CDI mount paths |
| pkg/nvkind/default-config-template.yaml | Changes default container path to use CDI format |
| examples/one-worker-per-gpu.yaml | Updates example to use CDI format |
| examples/equally-distributed-gpus.yaml | Updates example to use CDI format |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
| deviceID := filepath.Base(mount.ContainerPath) | ||
| switch { | ||
| case strings.HasPrefix(dir, "/var/run/nvidia-container-devices/cdi"): | ||
| cdiKind := strings.TrimPrefix(dir, "/var/run/nvidia-container-devices/cdi/") |
There was a problem hiding this comment.
The variable cdiKind extracts the CDI kind from the directory path, but this appears to be incorrect. Based on the container path format /var/run/nvidia-container-devices/cdi/runtime.nvidia.com/gpu/, the cdiKind should be runtime.nvidia.com/gpu, not just the remainder after trimming the prefix. This will result in incorrect device identifiers being generated.
| cdiKind := strings.TrimPrefix(dir, "/var/run/nvidia-container-devices/cdi/") | |
| // Robustly extract the CDI kind as the two path components after the CDI base path | |
| cdiBase := "/var/run/nvidia-container-devices/cdi/" | |
| relPath := strings.TrimPrefix(dir, cdiBase) | |
| relPath = strings.TrimPrefix(relPath, "/") // handle possible missing trailing slash | |
| parts := strings.Split(relPath, string(filepath.Separator)) | |
| cdiKind := "" | |
| if len(parts) >= 2 { | |
| cdiKind = parts[0] + "/" + parts[1] | |
| } else if len(parts) == 1 { | |
| cdiKind = parts[0] | |
| } |
|
@elezar do we still need to do something like this? |
This change switches to using JIT CDI spec generation in the NVIDIA Contianer Runtime instead of the legacy injection mechanism.