-
Notifications
You must be signed in to change notification settings - Fork 29
docs(operations): add containerized GPU workloads guide #555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Aleksei Sviridkin (lexfrei)
wants to merge
1
commit into
main
Choose a base branch
from
feat/gpu-container-workloads-docs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
129 changes: 129 additions & 0 deletions
129
content/en/docs/next/operations/gpu-container-workloads.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,129 @@ | ||
| --- | ||
| title: "Running Containerized GPU Workloads" | ||
| linkTitle: "GPU Containers" | ||
| description: "Run CUDA pods and other containerized GPU workloads on Cozystack management nodes that ship the NVIDIA driver and container toolkit via the distro package manager." | ||
| weight: 160 | ||
| --- | ||
|
|
||
| This page covers running GPU workloads in regular Kubernetes pods (CUDA, ML training, inference) on Cozystack management cluster nodes. It targets the typical Linux GPU node shape — `apt`-installed (or `dnf`-installed) NVIDIA driver plus `nvidia-container-toolkit` — and uses the `container` variant of the `cozystack.gpu-operator` package. | ||
|
|
||
| If instead you want to pass whole GPUs to KubeVirt VMs, see [GPU Passthrough](/docs/next/virtualization/gpu/) and [GPU Sharing with HAMi](/docs/next/kubernetes/gpu-sharing/) (HAMi is for fractional sharing in tenant Kubernetes clusters and is orthogonal to this variant; you can stack HAMi on top once the container variant is up). | ||
|
|
||
| ## When to pick this variant | ||
|
|
||
| The `cozystack.gpu-operator` package exposes three architectural variants. Pick `container` when **all** of the following are true: | ||
|
|
||
| - The host already runs the NVIDIA driver, installed via the distro package manager (`apt install nvidia-driver-*` on Ubuntu/Debian or the equivalent on RHEL/Fedora/openSUSE). The operator must not load its own kernel module. | ||
| - The host already has `nvidia-container-toolkit` installed (`apt install nvidia-container-toolkit`). The operator must not deploy its own toolkit DaemonSet — that would overwrite `/etc/containerd/config.toml` and the CDI hooks the host package already wired up. | ||
| - You want GPUs exposed to containers as `nvidia.com/gpu`, not passed through to KubeVirt VMs. | ||
|
|
||
| The other two variants exist for the opposite host shape: `default` (passthrough) unbinds the host driver and binds `vfio-pci` for VM passthrough, and `vgpu` requires the proprietary NVIDIA vGPU host driver plus a license server. Neither path produces a working setup on a host that already ships the driver and container toolkit through apt — the operator and the host install fight each other. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - A Cozystack management cluster with at least one GPU-enabled node. | ||
| - The GPU node runs a supported Linux distribution (Ubuntu, Debian, RHEL, Fedora, openSUSE) with the NVIDIA driver installed via the distro package manager. Verify with `nvidia-smi` over SSH or `kubectl debug node/<node-name>` — it must enumerate the physical GPUs and report a working driver version. | ||
| - `nvidia-container-toolkit` installed on the same node and registered with containerd (`grep nvidia /etc/containerd/config.toml` shows the runtime entry). | ||
| - `kubectl` configured against the management cluster. | ||
|
|
||
| The operator-validator auto-detects pre-installed host drivers by probing `/host/usr/bin/nvidia-smi`, so on standard Ubuntu/Debian/RHEL/Fedora layouts no `hostPaths.driverInstallDir` override is needed. On Talos this probe misses (the extension installs `nvidia-smi` at `/usr/local/bin/`), so Talos requires a different starting point — see `packages/system/gpu-operator/examples/values-native-talos.yaml` in the [cozystack repo](https://github.com/cozystack/cozystack) for a working reference with the compat DaemonSet and the matching `driverInstallDir` override. | ||
|
|
||
| ## 1. Install the GPU Operator (container variant) | ||
|
|
||
| **Do not** add `cozystack.gpu-operator` to `bundles.enabledPackages` for this variant. The platform Helm chart's optional-package template hardcodes `spec.variant: default` for every name in `enabledPackages` and reconciles the resulting `Package` CR under Helm ownership — any user `Package` CR with `variant: container` is overwritten on the next reconcile. Apply the `Package` CR directly instead; the cozystack platform controller installs it without the bundle entry. | ||
|
|
||
| Apply a `Package` CR with `variant: container`: | ||
|
|
||
| ```yaml | ||
| apiVersion: cozystack.io/v1alpha1 | ||
| kind: Package | ||
| metadata: | ||
| name: cozystack.gpu-operator | ||
| spec: | ||
| variant: container | ||
| ``` | ||
|
|
||
| ```bash | ||
| kubectl apply -f gpu-operator-container.yaml | ||
| ``` | ||
|
|
||
| The platform controller resolves the variant against the `PackageSource` (`packages/core/platform/sources/gpu-operator.yaml`), pulls `values.yaml` + `values-container.yaml` from the OCI repository, and installs the chart into `cozy-gpu-operator`. | ||
|
|
||
| ## 2. Verify the operator is healthy | ||
|
|
||
| All pods in the `cozy-gpu-operator` namespace should reach `Running`: | ||
|
|
||
| ```bash | ||
| kubectl get pods --namespace cozy-gpu-operator | ||
| ``` | ||
|
|
||
| Example output (pod names will vary): | ||
|
|
||
| ```console | ||
| NAME READY STATUS RESTARTS AGE | ||
| gpu-feature-discovery-7jpzv 1/1 Running 0 2m | ||
| gpu-operator-7976b5b8fb-xqg2z 1/1 Running 0 3m | ||
| nvidia-cuda-validator-tjkfh 0/1 Completed 0 2m | ||
| nvidia-dcgm-exporter-rmpfg 1/1 Running 0 2m | ||
| nvidia-device-plugin-daemonset-cqj9w 1/1 Running 0 2m | ||
| nvidia-operator-validator-q5n4k 1/1 Running 0 3m | ||
| ``` | ||
|
|
||
| The `container` variant does **not** spawn `nvidia-driver-daemonset`, `nvidia-container-toolkit-daemonset`, or `nvidia-vfio-manager` — all three are pinned off by design. | ||
|
|
||
| The node should advertise `nvidia.com/gpu` as an allocatable resource: | ||
|
|
||
| ```bash | ||
| kubectl describe node <node-name> | ||
| ``` | ||
|
|
||
| ```console | ||
| ... | ||
| Capacity: | ||
| ... | ||
| nvidia.com/gpu: 2 | ||
| ... | ||
| Allocatable: | ||
| ... | ||
| nvidia.com/gpu: 2 | ||
| ... | ||
| ``` | ||
|
|
||
| ## 3. Run a sample CUDA pod | ||
|
|
||
| Create a pod that requests one GPU and runs `nvidia-smi`: | ||
|
|
||
| ```yaml | ||
| apiVersion: v1 | ||
| kind: Pod | ||
| metadata: | ||
| name: cuda-smoke | ||
| spec: | ||
| restartPolicy: OnFailure | ||
| containers: | ||
| - name: cuda | ||
| image: nvcr.io/nvidia/cuda:12.4.1-base-ubuntu22.04 | ||
| command: ["nvidia-smi"] | ||
| resources: | ||
| limits: | ||
| nvidia.com/gpu: 1 | ||
| ``` | ||
|
|
||
| ```bash | ||
| kubectl apply -f cuda-smoke.yaml | ||
| kubectl logs cuda-smoke | ||
| ``` | ||
|
|
||
| The output should enumerate the GPU(s) visible to the pod and report the driver version that the host runs. | ||
|
|
||
| ## Fractional GPU sharing | ||
|
|
||
| The `container` variant exposes whole GPUs through the upstream NVIDIA device plugin. To slice one GPU across multiple pods (memory and compute quotas per pod), enable HAMi on top — HAMi reuses the same device plugin layer and is wired in via the `cozystack.hami` package, which already depends on `cozystack.gpu-operator`. See [GPU Sharing with HAMi](/docs/next/kubernetes/gpu-sharing/) for the tenant Kubernetes flow; for management-cluster workloads the wiring is the same package set with HAMi enabled. | ||
|
|
||
| ## Variant comparison | ||
|
|
||
| | Workload shape | Variant | Host driver | Host container toolkit | Notes | | ||
| | --- | --- | --- | --- | --- | | ||
| | Containers (CUDA pods, ML) | `container` | required | required | This page | | ||
| | Whole GPU to one VM | `default` | must NOT be loaded — operator binds `vfio-pci` | not used | [GPU Passthrough](/docs/next/virtualization/gpu/) | | ||
| | Sliced GPU to multiple VMs | `vgpu` | proprietary NVIDIA vGPU host driver | not used | Requires NVIDIA vGPU license + a Delegated License Service endpoint | | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
Packageresource needs to be created in thecozy-systemnamespace for the Cozystack operator to detect and reconcile it. Addingnamespace: cozy-systemto the metadata ensures it is applied to the correct namespace.