diff --git a/content/en/docs/next/operations/gpu-container-workloads.md b/content/en/docs/next/operations/gpu-container-workloads.md new file mode 100644 index 00000000..8e324e72 --- /dev/null +++ b/content/en/docs/next/operations/gpu-container-workloads.md @@ -0,0 +1,129 @@ +--- +title: "Running Containerized GPU Workloads" +linkTitle: "GPU Containers" +description: "Run CUDA pods and other containerized GPU workloads on Cozystack management nodes that ship the NVIDIA driver and container toolkit via the distro package manager." +weight: 160 +--- + +This page covers running GPU workloads in regular Kubernetes pods (CUDA, ML training, inference) on Cozystack management cluster nodes. It targets the typical Linux GPU node shape — `apt`-installed (or `dnf`-installed) NVIDIA driver plus `nvidia-container-toolkit` — and uses the `container` variant of the `cozystack.gpu-operator` package. + +If instead you want to pass whole GPUs to KubeVirt VMs, see [GPU Passthrough](/docs/next/virtualization/gpu/) and [GPU Sharing with HAMi](/docs/next/kubernetes/gpu-sharing/) (HAMi is for fractional sharing in tenant Kubernetes clusters and is orthogonal to this variant; you can stack HAMi on top once the container variant is up). + +## When to pick this variant + +The `cozystack.gpu-operator` package exposes three architectural variants. Pick `container` when **all** of the following are true: + +- The host already runs the NVIDIA driver, installed via the distro package manager (`apt install nvidia-driver-*` on Ubuntu/Debian or the equivalent on RHEL/Fedora/openSUSE). The operator must not load its own kernel module. +- The host already has `nvidia-container-toolkit` installed (`apt install nvidia-container-toolkit`). The operator must not deploy its own toolkit DaemonSet — that would overwrite `/etc/containerd/config.toml` and the CDI hooks the host package already wired up. +- You want GPUs exposed to containers as `nvidia.com/gpu`, not passed through to KubeVirt VMs. + +The other two variants exist for the opposite host shape: `default` (passthrough) unbinds the host driver and binds `vfio-pci` for VM passthrough, and `vgpu` requires the proprietary NVIDIA vGPU host driver plus a license server. Neither path produces a working setup on a host that already ships the driver and container toolkit through apt — the operator and the host install fight each other. + +## Prerequisites + +- A Cozystack management cluster with at least one GPU-enabled node. +- The GPU node runs a supported Linux distribution (Ubuntu, Debian, RHEL, Fedora, openSUSE) with the NVIDIA driver installed via the distro package manager. Verify with `nvidia-smi` over SSH or `kubectl debug node/` — it must enumerate the physical GPUs and report a working driver version. +- `nvidia-container-toolkit` installed on the same node and registered with containerd (`grep nvidia /etc/containerd/config.toml` shows the runtime entry). +- `kubectl` configured against the management cluster. + +The operator-validator auto-detects pre-installed host drivers by probing `/host/usr/bin/nvidia-smi`, so on standard Ubuntu/Debian/RHEL/Fedora layouts no `hostPaths.driverInstallDir` override is needed. On Talos this probe misses (the extension installs `nvidia-smi` at `/usr/local/bin/`), so Talos requires a different starting point — see `packages/system/gpu-operator/examples/values-native-talos.yaml` in the [cozystack repo](https://github.com/cozystack/cozystack) for a working reference with the compat DaemonSet and the matching `driverInstallDir` override. + +## 1. Install the GPU Operator (container variant) + +**Do not** add `cozystack.gpu-operator` to `bundles.enabledPackages` for this variant. The platform Helm chart's optional-package template hardcodes `spec.variant: default` for every name in `enabledPackages` and reconciles the resulting `Package` CR under Helm ownership — any user `Package` CR with `variant: container` is overwritten on the next reconcile. Apply the `Package` CR directly instead; the cozystack platform controller installs it without the bundle entry. + +Apply a `Package` CR with `variant: container`: + +```yaml +apiVersion: cozystack.io/v1alpha1 +kind: Package +metadata: + name: cozystack.gpu-operator +spec: + variant: container +``` + +```bash +kubectl apply -f gpu-operator-container.yaml +``` + +The platform controller resolves the variant against the `PackageSource` (`packages/core/platform/sources/gpu-operator.yaml`), pulls `values.yaml` + `values-container.yaml` from the OCI repository, and installs the chart into `cozy-gpu-operator`. + +## 2. Verify the operator is healthy + +All pods in the `cozy-gpu-operator` namespace should reach `Running`: + +```bash +kubectl get pods --namespace cozy-gpu-operator +``` + +Example output (pod names will vary): + +```console +NAME READY STATUS RESTARTS AGE +gpu-feature-discovery-7jpzv 1/1 Running 0 2m +gpu-operator-7976b5b8fb-xqg2z 1/1 Running 0 3m +nvidia-cuda-validator-tjkfh 0/1 Completed 0 2m +nvidia-dcgm-exporter-rmpfg 1/1 Running 0 2m +nvidia-device-plugin-daemonset-cqj9w 1/1 Running 0 2m +nvidia-operator-validator-q5n4k 1/1 Running 0 3m +``` + +The `container` variant does **not** spawn `nvidia-driver-daemonset`, `nvidia-container-toolkit-daemonset`, or `nvidia-vfio-manager` — all three are pinned off by design. + +The node should advertise `nvidia.com/gpu` as an allocatable resource: + +```bash +kubectl describe node +``` + +```console +... +Capacity: + ... + nvidia.com/gpu: 2 + ... +Allocatable: + ... + nvidia.com/gpu: 2 +... +``` + +## 3. Run a sample CUDA pod + +Create a pod that requests one GPU and runs `nvidia-smi`: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: cuda-smoke +spec: + restartPolicy: OnFailure + containers: + - name: cuda + image: nvcr.io/nvidia/cuda:12.4.1-base-ubuntu22.04 + command: ["nvidia-smi"] + resources: + limits: + nvidia.com/gpu: 1 +``` + +```bash +kubectl apply -f cuda-smoke.yaml +kubectl logs cuda-smoke +``` + +The output should enumerate the GPU(s) visible to the pod and report the driver version that the host runs. + +## Fractional GPU sharing + +The `container` variant exposes whole GPUs through the upstream NVIDIA device plugin. To slice one GPU across multiple pods (memory and compute quotas per pod), enable HAMi on top — HAMi reuses the same device plugin layer and is wired in via the `cozystack.hami` package, which already depends on `cozystack.gpu-operator`. See [GPU Sharing with HAMi](/docs/next/kubernetes/gpu-sharing/) for the tenant Kubernetes flow; for management-cluster workloads the wiring is the same package set with HAMi enabled. + +## Variant comparison + +| Workload shape | Variant | Host driver | Host container toolkit | Notes | +| --- | --- | --- | --- | --- | +| Containers (CUDA pods, ML) | `container` | required | required | This page | +| Whole GPU to one VM | `default` | must NOT be loaded — operator binds `vfio-pci` | not used | [GPU Passthrough](/docs/next/virtualization/gpu/) | +| Sliced GPU to multiple VMs | `vgpu` | proprietary NVIDIA vGPU host driver | not used | Requires NVIDIA vGPU license + a Delegated License Service endpoint |