[Bug]: bug(recipes): nvidia-dra-driver-gpu 0.4.0 fails to install via Flux

### Prerequisites

- [x] I searched existing issues and found no duplicates
- [x] I can reproduce this issue consistently
- [x] This is not a security vulnerability (use [Security Advisories](https://github.com/NVIDIA/aicr/security/advisories/new) instead)

### Bug Description

Since the registry.k8s.io migration in #1285, the `nvidia-dra-driver-gpu`
component fails `helm install` on **every Flux deployment** (`aicr bundle
--deployer flux`, OCI or Git source). The chart renders a duplicate YAML
mapping key in the pod-template labels; plain `helm install` parses leniently
(last key wins) so the helm deployer is unaffected, but Flux's
helm-controller always runs a post-renderer whose strict parser rejects the
manifest:

```
Helm install failed for release nvidia-dra-driver/nvidia-dra-driver-nvidia-dra-driver-gpu
with chart dra-driver-nvidia-gpu@0.4.0: error while running post render on manifests:
map[string]interface {}(nil): yaml: unmarshal errors:
  line 29: mapping key "nvidia-dra-driver-gpu-component" already defined at line 28
```

### Impact

Blocking (cannot proceed)

### Component

Other / Unknown

### Regression?

Yes, this worked before (please specify version below)

### Steps to Reproduce

Upstream chart defect triggered by an AICR value:

- The `dra-driver-nvidia-gpu@0.4.0` chart writes TWO component labels into
  each workload's pod template, back to back
  (`templates/kubeletplugin.yaml:40-41`, `templates/controller.yaml:37-38`):
  the `selectorLabels` helper emits `<nameOverride || .Chart.Name>-component:
  <name>`, and the next line hardcodes `nvidia-dra-driver-gpu-component:
  <name>`. With upstream defaults these are two *different* keys — redundant
  but valid.
- `recipes/components/nvidia-dra-driver-gpu/values.yaml` sets
  `nameOverride: nvidia-dra-driver-gpu` (added in #1285 to keep the rendered
  workload names `nvidia-dra-driver-gpu-*` for the health check, the
  conformance validator, and the ai-conformance chainsaw assert). That makes
  the helper-derived key identical to the hardcoded key → duplicate mapping
  key in the same map → invalid YAML per spec.

Reproduce without a cluster:

```bash
helm template x oci://registry.k8s.io/dra-driver-nvidia/charts/dra-driver-nvidia-gpu \
  --version 0.4.0 --set gpuResourcesEnabledOverride=true \
  --set nameOverride=nvidia-dra-driver-gpu \
  | grep -n "nvidia-dra-driver-gpu-component"
# the key appears twice inside the same labels block (both workloads)
```

## Why CI didn't catch it

#1288 the flux-oci KWOK lane's chainsaw HelmRelease gate has
exists-semantics (passes when ANY HelmRelease is Ready), so the
`InstallFailed` dra-driver release never failed the lane; and a failed
install creates no pods, so `verify_pods` saw nothing Pending either.

### Expected Behavior

No values-level workaround exists: any `nameOverride` that yields the
`nvidia-dra-driver-gpu-*` workload names necessarily collides with the
hardcoded label.

1. Drop `nameOverride` from
   `recipes/components/nvidia-dra-driver-gpu/values.yaml` (keep
   `fullnameOverride`, which feeds the ServiceAccount/RoleBinding names and
   is not involved in the collision). Workload names become
   `dra-driver-nvidia-gpu-controller` / `dra-driver-nvidia-gpu-kubelet-plugin`.
2. Update the name references in lockstep:
   - `recipes/checks/nvidia-dra-driver-gpu/health-check.yaml`
   - `validators/conformance/dra_support_check.go` (extract to named
     constants while touching it)
   - `tests/chainsaw/ai-conformance/common/assert-dra-driver.yaml`
   - prose mentions in `docs/contributor/validator.md` and
     `docs/contributor/inference-perf-fluctuation.md`
   (`validators/deployment/expected_resources.go` discovers the DaemonSet by
   role suffix and needs no change; historical conformance evidence under
   `docs/conformance/**` is captured output and must NOT be rewritten.)
3. File the upstream bug against kubernetes-sigs/dra-driver-nvidia-gpu
   (duplicate label when `nameOverride` matches the hardcoded prefix) and
   restore the override + `nvidia-dra-driver-gpu-*` names once a fixed chart
   ships.

## Acceptance criteria

- [ ] `aicr bundle --deployer flux` bundle of any recipe containing
      `nvidia-dra-driver-gpu` reconciles to `Ready=True` under helm-controller.
- [ ] Health check, conformance validator, and chainsaw asserts pass against
      the renamed workloads.
- [ ] Upstream issue filed and linked; tracking note in values.yaml for the
      revert.


### Actual Behavior

-

### Environment

- AICR version (CLI `aicr version`, API image tag, or commit SHA):
- Install method (release binary / build from source / container image):
- Platform (eks/gke/aks/oke/kind/lke/bcm/other):
- Kubernetes version:
- OS (ubuntu/cos/other) + version:
- Kernel version:
- GPU type (h100/h200/gb200/b200/a100/l40/rtx-pro-6000/other):
- Workload intent (training/inference):


### Command / Request Used

_No response_

### Logs / Error Output

```shell

```

### Additional Context

- #1285 (registry.k8s.io v0.4.0 migration that introduced the trigger)
- #1288 (sync-gate exists-semantics bug that masked this in CI)
- https://github.com/kubernetes-sigs/dra-driver-nvidia-gpu/issues/1184

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: bug(recipes): nvidia-dra-driver-gpu 0.4.0 fails to install via Flux #1289

Prerequisites

Bug Description

Impact

Component

Regression?

Steps to Reproduce

Why CI didn't catch it

Expected Behavior

Acceptance criteria

Actual Behavior

Environment

Command / Request Used

Logs / Error Output

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: bug(recipes): nvidia-dra-driver-gpu 0.4.0 fails to install via Flux #1289

Description

Prerequisites

Bug Description

Impact

Component

Regression?

Steps to Reproduce

Why CI didn't catch it

Expected Behavior

Acceptance criteria

Actual Behavior

Environment

Command / Request Used

Logs / Error Output

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions