Skip to content

CNTRLPLANE-3553: Wire usesRunc detection into RHEL stream resolution#8832

Draft
sdminonne wants to merge 2 commits into
openshift:mainfrom
sdminonne:CNTRLPLANE-3553-usesRunc-detection
Draft

CNTRLPLANE-3553: Wire usesRunc detection into RHEL stream resolution#8832
sdminonne wants to merge 2 commits into
openshift:mainfrom
sdminonne:CNTRLPLANE-3553-usesRunc-detection

Conversation

@sdminonne

@sdminonne sdminonne commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add usesRuncRuntime() to scan NodePool spec.config ConfigMaps for ContainerRuntimeConfig with defaultRuntime: runc
  • Wire actual runc detection into getRHELStream(), validateOSImageStream(), and the config hash normalization in NewConfigGenerator()
  • When runc is detected: implicit stream on OCP >= 5.0 falls back to rhel-9; explicit rhel-10 + runc returns a validation error via ValidMachineConfig condition
  • Removes all TODO([CNTRLPLANE-3553](https://redhat.atlassian.net/browse/CNTRLPLANE-3553)): pass actual usesRunc placeholders

Dependencies

#8675 (CNTRLPLANE-3022: Add osImageStream to NodePool spec/status) ✅ Merged
  └── #8719 (CNTRLPLANE-3023: CEL rule to prevent osImageStream removal) ✅ Merged
        └── #8730 (CNTRLPLANE-3553: Wire osImageStream into NodePool controller) 🔄 Open
              └── This PR (CNTRLPLANE-3553: Wire usesRunc detection) 🔄 Open
                    └── #8792 (CNTRLPLANE-3030: ignition-server os-stream consumption) 🔄 Open

Parallel dependency (no ordering constraint with this PR):

#8669 (CNTRLPLANE-3552: Multi-stream CoreOS metadata parsing) ✅ Merged
  └── #8699 (CNTRLPLANE-3026: Decouple AWS AMI resolution) ✅ Merged
        └── #8709 (CNTRLPLANE-3027: Decouple all platform boot image resolvers) 🔄 Open

Test plan

  • TestUsesRuncRuntime — 7 cases: no configs, runc, crun, empty runtime, MachineConfig, missing ConfigMap, multiple configs
  • Test_getRHELStream — extended with 5 runc-aware cases: runc+5.0→rhel-9, runc+rhel-10→error, crun+5.0→rhel-10, runc+4.x→rhel-9, missing config
  • TestValidateOSImageStream — extended with 2 runc-aware cases: rhel-10+runc→error, rhel-9+runc→ok
  • make verify after rebase onto merged CNTRLPLANE-3553: Wire osImageStream into NodePool controller (hash, token, status, validation) #8730

JIRA: CNTRLPLANE-3553

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • NodePools now track and surface the resolved OS image stream more consistently during reconciliation.
    • Added broader support for RHEL stream selection across AWS, GCP, and token-based image handling.
  • Bug Fixes

    • Improved image validation so invalid OS image stream settings fail early with clearer errors.
    • Fixed AMI/image resolution to use the correct RHEL stream, reducing incorrect defaults and rollout noise.
    • Better detects the OS image stream from running Machines and updates status when a majority is consistent.

sdminonne and others added 2 commits June 19, 2026 08:59
Wire the osImageStream API field into the NodePool controller:

- Validate spec.osImageStream via GetRHELStream before config generation
  (fail fast on invalid stream/version/runc combinations).
- Return StreamRHEL9 (not "") for implicit pre-5.0 releases so that
  downstream consumers like StreamForName() always receive a concrete
  stream name, avoiding errors when legacy StreamMetadata is removed.
- Normalize rhelStream in rolloutConfig so that setting the default
  stream does not change the config hash (no spurious fleet-wide rollouts).
- Keep resolvedRHELStream on ConfigGenerator (not rolloutConfig) for
  downstream consumers that need a concrete stream name (GCP, AWS AMI,
  token secret).
- Add getRHELStream wrapper and validateOSImageStream in osstream.go,
  both delegating to GetRHELStream from stream.go.
- Write os-stream key to the token secret for future ignition-server
  consumption.
- Infer status.osImageStream from Machine NodeInfo.OSImage using
  majority consensus (rhcosStreamFromOSImage, osImageStreamFromMachines,
  setOSImageStreamStatus in version.go).
- Pass resolved stream to defaultNodePoolAMI / setAWSConditions for
  consistent boot image resolution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Scan NodePool spec.config ConfigMaps for ContainerRuntimeConfig
resources with defaultRuntime set to "runc". When detected:
- Implicit stream on OCP >= 5.0 falls back to rhel-9 (instead of rhel-10)
- Explicit rhel-10 + runc returns a validation error

This replaces the hardcoded usesRunc=false TODO(CNTRLPLANE-3553) in
getRHELStream, validateOSImageStream, and the config hash normalization
in NewConfigGenerator.

JIRA: CNTRLPLANE-3553

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 25, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 25, 2026

Copy link
Copy Markdown

@sdminonne: This pull request references CNTRLPLANE-3553 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Add usesRuncRuntime() to scan NodePool spec.config ConfigMaps for ContainerRuntimeConfig with defaultRuntime: runc
  • Wire actual runc detection into getRHELStream(), validateOSImageStream(), and the config hash normalization in NewConfigGenerator()
  • When runc is detected: implicit stream on OCP >= 5.0 falls back to rhel-9; explicit rhel-10 + runc returns a validation error via ValidMachineConfig condition
  • Removes all TODO([CNTRLPLANE-3553](https://redhat.atlassian.net/browse/CNTRLPLANE-3553)): pass actual usesRunc placeholders

Dependencies

#8675 ([CNTRLPLANE-3022](https://redhat.atlassian.net/browse/CNTRLPLANE-3022): Add osImageStream to NodePool spec/status) ✅ Merged
 └── #8719 ([CNTRLPLANE-3023](https://redhat.atlassian.net/browse/CNTRLPLANE-3023): CEL rule to prevent osImageStream removal) ✅ Merged
       └── #8730 ([CNTRLPLANE-3553](https://redhat.atlassian.net/browse/CNTRLPLANE-3553): Wire osImageStream into NodePool controller) 🔄 Open
             └── This PR ([CNTRLPLANE-3553](https://redhat.atlassian.net/browse/CNTRLPLANE-3553): Wire usesRunc detection) 🔄 Open
                   └── #8792 ([CNTRLPLANE-3030](https://redhat.atlassian.net/browse/CNTRLPLANE-3030): ignition-server os-stream consumption) 🔄 Open

Parallel dependency (no ordering constraint with this PR):

#8669 ([CNTRLPLANE-3552](https://redhat.atlassian.net/browse/CNTRLPLANE-3552): Multi-stream CoreOS metadata parsing) ✅ Merged
 └── #8699 ([CNTRLPLANE-3026](https://redhat.atlassian.net/browse/CNTRLPLANE-3026): Decouple AWS AMI resolution) ✅ Merged
       └── #8709 ([CNTRLPLANE-3027](https://redhat.atlassian.net/browse/CNTRLPLANE-3027): Decouple all platform boot image resolvers) 🔄 Open

Test plan

  • TestUsesRuncRuntime — 7 cases: no configs, runc, crun, empty runtime, MachineConfig, missing ConfigMap, multiple configs
  • Test_getRHELStream — extended with 5 runc-aware cases: runc+5.0→rhel-9, runc+rhel-10→error, crun+5.0→rhel-10, runc+4.x→rhel-9, missing config
  • TestValidateOSImageStream — extended with 2 runc-aware cases: rhel-10+runc→error, rhel-9+runc→ok
  • make verify after rebase onto merged CNTRLPLANE-3553: Wire osImageStream into NodePool controller (hash, token, status, validation) #8730

JIRA: CNTRLPLANE-3553

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 25, 2026
@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

NodePool reconciliation now infers status.osImageStream from machine OS images, validates requested OS image streams before machine config generation, and resolves RHEL stream values through shared helpers. The resolved stream is stored on ConfigGenerator, included in hash inputs, and passed into AWS and GCP image selection as well as token secret and Karpenter AMI label generation. GetRHELStream now defaults older releases with no explicit stream to StreamRHEL9, and tests were updated for the new stream resolution, validation, status, and hashing behavior.

Sequence Diagram(s)

Status and validation

sequenceDiagram
  participant "NodePoolReconciler.reconcile" as Reconcile
  participant "setOSImageStreamStatus" as SetOSImageStreamStatus
  participant "osImageStreamFromMachines" as OsImageStreamFromMachines
  participant "validMachineConfigCondition" as ValidMachineConfigCondition
  participant "validateOSImageStream" as ValidateOSImageStream
  participant "GetRHELStream" as GetRHELStream

  Reconcile->>SetOSImageStreamStatus: infer status.osImageStream
  SetOSImageStreamStatus->>OsImageStreamFromMachines: inspect Machine NodeInfo.OSImage
  OsImageStreamFromMachines-->>SetOSImageStreamStatus: majority stream or empty
  Reconcile->>ValidMachineConfigCondition: validate requested OS image stream
  ValidMachineConfigCondition->>ValidateOSImageStream: check spec.osImageStream.name
  ValidateOSImageStream->>GetRHELStream: resolve stream from release image and runtime
  GetRHELStream-->>ValidateOSImageStream: stream or error
  ValidateOSImageStream-->>ValidMachineConfigCondition: validation result
Loading

AWS and GCP image resolution

sequenceDiagram
  participant "awsMachineTemplate" as AwsMachineTemplate
  participant "awsMachineTemplateSpec" as AwsMachineTemplateSpec
  participant "resolveAWSAMI" as ResolveAWSAMI
  participant "defaultNodePoolAMI" as DefaultNodePoolAMI
  participant "gcpMachineTemplate" as GcpMachineTemplate
  participant "gcpMachineTemplateSpec" as GcpMachineTemplateSpec
  participant "resolveGCPImage" as ResolveGCPImage
  participant "defaultNodePoolGCPImage" as DefaultNodePoolGCPImage

  AwsMachineTemplate->>AwsMachineTemplateSpec: pass c.resolvedRHELStream
  AwsMachineTemplateSpec->>ResolveAWSAMI: resolve AMI
  ResolveAWSAMI->>DefaultNodePoolAMI: default Linux/RHCOS AMI

  GcpMachineTemplate->>GcpMachineTemplateSpec: pass c.resolvedRHELStream
  GcpMachineTemplateSpec->>ResolveGCPImage: resolve image
  ResolveGCPImage->>DefaultNodePoolGCPImage: default GCP image
Loading

Token secret and AMI labels

sequenceDiagram
  participant "token secret reconciliation" as TokenSecretReconciliation
  participant "tokenSecret" as TokenSecret
  participant "setKarpenterAMILabels" as SetKarpenterAMILabels
  participant "defaultNodePoolAMI" as DefaultNodePoolAMI

  TokenSecretReconciliation->>TokenSecret: store TokenSecretOSStreamKey
  TokenSecretReconciliation->>SetKarpenterAMILabels: pass t.resolvedRHELStream
  SetKarpenterAMILabels->>DefaultNodePoolAMI: resolve AMI labels
Loading

Possibly related PRs

  • openshift/hypershift#8699: Also changes AWS AMI resolution to carry stream metadata through defaultNodePoolAMI and resolveAWSAMI.

Suggested reviewers

  • devguyio
  • muraee
🚥 Pre-merge checks | ✅ 11
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: wiring usesRunc detection into RHEL stream resolution.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Touched test titles are static string literals; no generated names, timestamps, UUIDs, or runtime-derived values appear in any modified test title.
Test Structure And Quality ✅ Passed PASS: The PR only updates table-driven unit tests; no Ginkgo specs, cluster resources, or Eventually/BeforeEach patterns were added, so the checklist items don’t apply.
Topology-Aware Scheduling Compatibility ✅ Passed PR only changes RHEL-stream resolution/status/hash plumbing; no new affinity, topology-spread, nodeSelector, toleration, or replica scheduling logic appears in the PR files.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Not applicable: the PR adds only controller unit tests (testing/Gomega), not Ginkgo e2e tests, and I found no IPv4-only or external connectivity assumptions.
No-Weak-Crypto ✅ Passed Touched files add no MD5/SHA1/DES/RC4/3DES/Blowfish/ECB usage or secret/token comparisons; hashing still uses shared FNV helper.
Container-Privileges ✅ Passed PR files are controller Go changes only; no added privileged/hostPID/hostNetwork/hostIPC/SYS_ADMIN/allowPrivilegeEscalation settings were found in them.
No-Sensitive-Data-In-Logs ✅ Passed New logs are generic status/error messages; no added logging prints tokens, passwords, PII, hostnames, or customer data.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sdminonne
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform area/platform/gcp PR/issue for GCP (GCPPlatform) platform and removed do-not-merge/needs-area labels Jun 25, 2026
@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 69.31818% with 54 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.07%. Comparing base (adccbd6) to head (6a62ea3).
⚠️ Report is 84 commits behind head on main.

Files with missing lines Patch % Lines
...pershift-operator/controllers/nodepool/osstream.go 78.33% 8 Missing and 5 partials ⚠️
hypershift-operator/controllers/nodepool/config.go 52.00% 8 Missing and 4 partials ⚠️
hypershift-operator/controllers/nodepool/aws.go 52.38% 9 Missing and 1 partial ⚠️
...rshift-operator/controllers/nodepool/conditions.go 0.00% 10 Missing ⚠️
...erator/controllers/nodepool/nodepool_controller.go 50.00% 2 Missing and 2 partials ⚠️
...ypershift-operator/controllers/nodepool/version.go 92.68% 2 Missing and 1 partial ⚠️
hypershift-operator/controllers/nodepool/gcp.go 75.00% 1 Missing ⚠️
hypershift-operator/controllers/nodepool/token.go 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8832      +/-   ##
==========================================
+ Coverage   41.86%   42.07%   +0.21%     
==========================================
  Files         759      770      +11     
  Lines       94101    97005    +2904     
==========================================
+ Hits        39392    40813    +1421     
- Misses      51949    53367    +1418     
- Partials     2760     2825      +65     
Files with missing lines Coverage Δ
hypershift-operator/controllers/nodepool/stream.go 100.00% <100.00%> (ø)
hypershift-operator/controllers/nodepool/gcp.go 66.21% <75.00%> (-0.30%) ⬇️
hypershift-operator/controllers/nodepool/token.go 82.70% <83.33%> (+0.10%) ⬆️
...ypershift-operator/controllers/nodepool/version.go 94.11% <92.68%> (-0.97%) ⬇️
...erator/controllers/nodepool/nodepool_controller.go 44.25% <50.00%> (+1.07%) ⬆️
hypershift-operator/controllers/nodepool/aws.go 78.05% <52.38%> (-2.12%) ⬇️
...rshift-operator/controllers/nodepool/conditions.go 53.27% <0.00%> (-0.66%) ⬇️
hypershift-operator/controllers/nodepool/config.go 81.96% <52.00%> (-3.56%) ⬇️
...pershift-operator/controllers/nodepool/osstream.go 78.33% <78.33%> (ø)

... and 48 files with indirect coverage changes

Flag Coverage Δ
cmd-support 35.63% <ø> (+0.49%) ⬆️
cpo-hostedcontrolplane 44.88% <ø> (+0.73%) ⬆️
cpo-other 44.94% <ø> (+1.49%) ⬆️
hypershift-operator 50.30% <69.31%> (-1.76%) ⬇️
other 31.69% <ø> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
hypershift-operator/controllers/nodepool/aws_test.go (1)

301-301: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Cover the non-empty stream path in the AWS tests.

These call sites still only exercise rhelStream == "". Since the production change is about selecting AWS images from the resolved stream, add a case with OSStreams["rhel-10"] and a non-empty stream argument so awsMachineTemplateSpec/resolveAWSAMI prove they read named-stream metadata instead of the legacy default path.

As per coding guidelines, **/*_test.go: Unit test code changes and additions; include e2e tests when changes impact consumer behaviour.

Also applies to: 1344-1344

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/aws_test.go` at line 301, The AWS
test coverage still only exercises the empty rhelStream path, so add a non-empty
stream case using OSStreams["rhel-10"] to verify the AWS image selection logic.
Update the relevant test call sites around awsMachineTemplateSpec and
resolveAWSAMI so they assert the resolved stream metadata is used instead of the
legacy default path, and keep the existing empty-stream case as a separate
baseline.

Source: Coding guidelines

hypershift-operator/controllers/nodepool/token_test.go (1)

1184-1184: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add a named-stream Karpenter label case.

This signature update still only tests the empty-stream path. Please add a case with per-stream AWS metadata and a non-empty rhelStream so setKarpenterAMILabels is pinned to resolvedRHELStream rather than the legacy default lookup.

As per coding guidelines, **/*_test.go: Unit test code changes and additions; include e2e tests when changes impact consumer behaviour.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/token_test.go` at line 1184, The
test coverage for setKarpenterAMILabels only exercises the empty-stream path, so
add a new table-driven case in token_test.go with per-stream AWS metadata and a
non-empty rhelStream to verify the function uses resolvedRHELStream instead of
falling back to the legacy default lookup. Extend the existing
setKarpenterAMILabels call path in the test to pass the named stream scenario
and assert the expected Karpenter AMI label resolution for that case.

Source: Coding guidelines

hypershift-operator/controllers/nodepool/config.go (1)

107-116: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Avoid shadowing err in the normalization block.

Line 109 creates a new err scoped to the if, which makes this already branchy path harder to scan and goes against the repo’s Go guidance. Reusing the outer err keeps the flow clearer.

As per coding guidelines, **/!(*.pb).go: Avoid variable shadowing.

Suggested cleanup
  if rhelStream != "" {
-		version, err := semver.Parse(releaseImage.Version())
+		var version semver.Version
+		version, err = semver.Parse(releaseImage.Version())
 		if err != nil {
 			return nil, fmt.Errorf("failed to parse release image version %q: %w", releaseImage.Version(), err)
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/config.go` around lines 107 - 116,
The normalization block in the NodePool config path is shadowing the outer err
variable by redeclaring it inside the rhelStream branch, which goes against the
Go guidance for avoiding shadowing. Update the logic in the config normalization
flow around rhelStream, semver.Parse, and usesRuncRuntime to reuse the existing
err variable instead of introducing a new scoped one, while preserving the same
error handling and return behavior.

Source: Coding guidelines

hypershift-operator/controllers/nodepool/config_test.go (1)

250-305: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add a usesRuncRuntime normalization case here.

These new cases only pin the release-version branch. NewConfigGenerator now also normalizes against usesRuncRuntime(...), so please add a 5.x case with a ContainerRuntimeConfig setting defaultRuntime: runc and assert that explicit rhel-9 hashes like the implicit default. That is the new branch this PR introduces.

As per coding guidelines, **/*_test.go: Unit test code changes and additions; include e2e tests when changes impact consumer behaviour.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/config_test.go` around lines 250 -
305, The hash test table in config_test.go only covers release-version-based
normalization, but NewConfigGenerator also normalizes through
usesRuncRuntime(...). Add a 5.x test case that includes a ContainerRuntimeConfig
with defaultRuntime set to runc and verify that an explicit rhel-9 OSImageStream
hashes the same as the implicit default path. Use the existing test table around
the NodePool, releaseImage, and hostedCluster cases to extend coverage for this
new normalization branch.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@hypershift-operator/controllers/nodepool/osstream.go`:
- Around line 40-46: The ConfigMap lookup in the osstream logic is swallowing
all c.Get failures instead of only missing objects, which can hide transient
API/RBAC errors; update the error handling around the client Get call to skip
only NotFound and return all other errors so the failure propagates through
validateOSImageStream and getRHELStream. Add the needed
k8s.io/apimachinery/pkg/api/errors import and use the existing
ref.Name/nodePool.Namespace lookup path to keep the behavior limited to true
absent ConfigMaps.

In `@hypershift-operator/controllers/nodepool/version.go`:
- Around line 165-174: In setOSImageStreamStatus, the
NodePool.Status.OSImageStream field is only ever set and never cleared, which
can leave stale status behind. Update this helper to resolve the default stream
via GetRHELStream using the release version and usesRunc context, compare it
with osImageStreamFromMachines(machines), and assign an empty
OSImageStreamReference when there is no majority or the observed stream matches
the default. Keep the existing getMachinesForNodePool and
osImageStreamFromMachines flow, but ensure the status field is omitted whenever
the pool is on the release-default OS images.

---

Nitpick comments:
In `@hypershift-operator/controllers/nodepool/aws_test.go`:
- Line 301: The AWS test coverage still only exercises the empty rhelStream
path, so add a non-empty stream case using OSStreams["rhel-10"] to verify the
AWS image selection logic. Update the relevant test call sites around
awsMachineTemplateSpec and resolveAWSAMI so they assert the resolved stream
metadata is used instead of the legacy default path, and keep the existing
empty-stream case as a separate baseline.

In `@hypershift-operator/controllers/nodepool/config_test.go`:
- Around line 250-305: The hash test table in config_test.go only covers
release-version-based normalization, but NewConfigGenerator also normalizes
through usesRuncRuntime(...). Add a 5.x test case that includes a
ContainerRuntimeConfig with defaultRuntime set to runc and verify that an
explicit rhel-9 OSImageStream hashes the same as the implicit default path. Use
the existing test table around the NodePool, releaseImage, and hostedCluster
cases to extend coverage for this new normalization branch.

In `@hypershift-operator/controllers/nodepool/config.go`:
- Around line 107-116: The normalization block in the NodePool config path is
shadowing the outer err variable by redeclaring it inside the rhelStream branch,
which goes against the Go guidance for avoiding shadowing. Update the logic in
the config normalization flow around rhelStream, semver.Parse, and
usesRuncRuntime to reuse the existing err variable instead of introducing a new
scoped one, while preserving the same error handling and return behavior.

In `@hypershift-operator/controllers/nodepool/token_test.go`:
- Line 1184: The test coverage for setKarpenterAMILabels only exercises the
empty-stream path, so add a new table-driven case in token_test.go with
per-stream AWS metadata and a non-empty rhelStream to verify the function uses
resolvedRHELStream instead of falling back to the legacy default lookup. Extend
the existing setKarpenterAMILabels call path in the test to pass the named
stream scenario and assert the expected Karpenter AMI label resolution for that
case.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: b26364bd-aea7-431e-9260-ca7e560c480c

📥 Commits

Reviewing files that changed from the base of the PR and between 4c582e0 and 6a62ea3.

📒 Files selected for processing (17)
  • hypershift-operator/controllers/nodepool/aws.go
  • hypershift-operator/controllers/nodepool/aws_test.go
  • hypershift-operator/controllers/nodepool/capi_test.go
  • hypershift-operator/controllers/nodepool/conditions.go
  • hypershift-operator/controllers/nodepool/config.go
  • hypershift-operator/controllers/nodepool/config_test.go
  • hypershift-operator/controllers/nodepool/gcp.go
  • hypershift-operator/controllers/nodepool/gcp_test.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/nodepool/osstream.go
  • hypershift-operator/controllers/nodepool/osstream_test.go
  • hypershift-operator/controllers/nodepool/stream.go
  • hypershift-operator/controllers/nodepool/stream_test.go
  • hypershift-operator/controllers/nodepool/token.go
  • hypershift-operator/controllers/nodepool/token_test.go
  • hypershift-operator/controllers/nodepool/version.go
  • hypershift-operator/controllers/nodepool/version_test.go

Comment on lines +40 to +46
if err := c.Get(ctx, client.ObjectKey{
Namespace: nodePool.Namespace,
Name: ref.Name,
}, cm); err != nil {
// If the ConfigMap doesn't exist, skip — validation catches this elsewhere.
continue
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Handle non-NotFound ConfigMap read failures.

This currently treats any c.Get failure as “no runc config”. A transient API error or RBAC failure will silently disable the runc check, which can make validateOSImageStream and getRHELStream resolve rhel-10 when the pool should have been rejected or retried. Only NotFound should be skipped here; other errors need to bubble up.

Suggested fix
 		if err := c.Get(ctx, client.ObjectKey{
 			Namespace: nodePool.Namespace,
 			Name:      ref.Name,
 		}, cm); err != nil {
-			// If the ConfigMap doesn't exist, skip — validation catches this elsewhere.
-			continue
+			if apierrors.IsNotFound(err) {
+				// Validation catches missing ConfigMaps elsewhere.
+				continue
+			}
+			return false, fmt.Errorf("failed to get configmap %s/%s: %w", nodePool.Namespace, ref.Name, err)
 		}

Add the corresponding k8s.io/apimachinery/pkg/api/errors import.

As per path instructions, **/*.go: Go security (prodsec-skills): Never ignore error returns.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err := c.Get(ctx, client.ObjectKey{
Namespace: nodePool.Namespace,
Name: ref.Name,
}, cm); err != nil {
// If the ConfigMap doesn't exist, skip — validation catches this elsewhere.
continue
}
if err := c.Get(ctx, client.ObjectKey{
Namespace: nodePool.Namespace,
Name: ref.Name,
}, cm); err != nil {
if apierrors.IsNotFound(err) {
// Validation catches missing ConfigMaps elsewhere.
continue
}
return false, fmt.Errorf("failed to get configmap %s/%s: %w", nodePool.Namespace, ref.Name, err)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/osstream.go` around lines 40 - 46,
The ConfigMap lookup in the osstream logic is swallowing all c.Get failures
instead of only missing objects, which can hide transient API/RBAC errors;
update the error handling around the client Get call to skip only NotFound and
return all other errors so the failure propagates through validateOSImageStream
and getRHELStream. Add the needed k8s.io/apimachinery/pkg/api/errors import and
use the existing ref.Name/nodePool.Namespace lookup path to keep the behavior
limited to true absent ConfigMaps.

Source: Path instructions

Comment on lines +165 to +174
func (r *NodePoolReconciler) setOSImageStreamStatus(ctx context.Context, nodePool *hyperv1.NodePool) error {
machines, err := r.getMachinesForNodePool(ctx, nodePool)
if err != nil {
return fmt.Errorf("failed to get Machines for OSImageStream status: %w", err)
}

stream := osImageStreamFromMachines(machines)
if stream != "" {
nodePool.Status.OSImageStream = hyperv1.OSImageStreamReference{Name: stream}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

status.osImageStream can become sticky and violate the API contract.

This helper only writes nodePool.Status.OSImageStream when it finds a majority and never clears it otherwise. Once set, the value survives scale-to-zero, split rollouts, and pools that are simply running the release-default stream. The NodePoolStatus.OSImageStream contract says the field should be omitted when the pool is using the release version’s default OS images, so this will publish stale/non-actionable status to clients.

This path needs enough context to compare the observed stream against the resolved default (GetRHELStream("", releaseVersion, usesRunc)) and clear the field whenever there is no majority or the observed stream matches that default.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/version.go` around lines 165 - 174,
In setOSImageStreamStatus, the NodePool.Status.OSImageStream field is only ever
set and never cleared, which can leave stale status behind. Update this helper
to resolve the default stream via GetRHELStream using the release version and
usesRunc context, compare it with osImageStreamFromMachines(machines), and
assign an empty OSImageStreamReference when there is no majority or the observed
stream matches the default. Keep the existing getMachinesForNodePool and
osImageStreamFromMachines flow, but ensure the status field is omitted whenever
the pool is on the release-default OS images.

@hypershift-jira-solve-ci

hypershift-jira-solve-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

I have all the evidence. Let me verify the import structure of the new file to confirm the exact gci ordering problem:

The file osstream.go has this import block (lines 4-24):

import (
    "bufio"
    "context"
    coreerrors "errors"
    "fmt"
    "io"
    "strings"

    hyperv1 "github.com/openshift/hypershift/api/hypershift/v1beta1"
    "github.com/openshift/hypershift/support/releaseinfo"

    mcfgv1 "github.com/openshift/api/machineconfiguration/v1"

    corev1 "k8s.io/api/core/v1"
    "k8s.io/apimachinery/pkg/runtime"
    runtimeserializer "k8s.io/apimachinery/pkg/runtime/serializer"
    "k8s.io/apimachinery/pkg/util/yaml"

    "github.com/blang/semver"        // <-- line 21, gci error here

    "sigs.k8s.io/controller-runtime/pkg/client"
)

The gci linter requires imports to be grouped in a specific order (stdlib, then external packages sorted consistently). The github.com/blang/semver import is in its own separate group between k8s.io/ and sigs.k8s.io/ imports, violating the expected grouping order.

And the unparam error is on usesRuncRuntime at line 29 — the function signature returns (bool, error) but the error return is always nil (the function never returns a non-nil error).

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

hypershift-operator/controllers/nodepool/osstream.go:21:1: File is not properly formatted (gci)
	"github.com/blang/semver"
	^
hypershift-operator/controllers/nodepool/osstream.go:29:95: usesRuncRuntime - result 1 (error) is always nil (unparam)
func usesRuncRuntime(ctx context.Context, c client.Client, nodePool *hyperv1.NodePool) (bool, error) {
                                                                                               ^
2 issues: gci: 1, unparam: 1

Summary

The make lint step failed with 2 golangci-lint violations in the newly added file hypershift-operator/controllers/nodepool/osstream.go. First, a gci (Go Comment Imports) error at line 21: the github.com/blang/semver import is placed in its own isolated group between the k8s.io/ and sigs.k8s.io/ import blocks, violating the project's required import ordering. Second, an unparam error at line 29: the usesRuncRuntime function declares a return type of (bool, error) but every code path returns nil for the error — the linter flags the error return value as always-nil and therefore unnecessary.

Root Cause

Issue 1 — gci (import ordering) at line 21:

The new file osstream.go has its imports organized into 5 separate groups separated by blank lines:

  1. stdlib (bufio, context, errors, fmt, io, strings)
  2. github.com/openshift/... (hypershift API and releaseinfo)
  3. github.com/openshift/api/... (MCO types)
  4. k8s.io/... (core, runtime, serializer, yaml)
  5. github.com/blang/semvermisplaced (standalone group between k8s.io and sigs.k8s.io)
  6. sigs.k8s.io/controller-runtime/...

The project's .golangci.yml configures gci to enforce a specific import section ordering. Third-party imports like github.com/blang/semver must be grouped together with other non-stdlib, non-k8s imports (i.e. alongside the github.com/openshift/... imports) rather than in an isolated group. The blank-line separation creates a distinct import section that gci considers improperly formatted.

Issue 2 — unparam (unused error return) at line 29:

The function usesRuncRuntime is defined as:

func usesRuncRuntime(ctx context.Context, c client.Client, nodePool *hyperv1.NodePool) (bool, error)

However, examining all code paths:

  • If nodePool.Spec.Config is empty → returns false, nil
  • If c.Get() fails → continue (error is swallowed, not returned)
  • If YAML read error → break (error swallowed)
  • If decode error → silently skipped
  • If runc found → returns true, nil
  • Default → returns false, nil

Every path returns nil for the error. The unparam linter correctly identifies that result 1 (the error) is always nil and the function signature should either return only bool or actually propagate errors.

Note: The companion file config.go also imports github.com/blang/semver (line 34) in a separate group between k8s imports and sigs.k8s.io, but it passes lint because the main golangci-lint run (outside the api/ directory) uses --new-from-rev=origin/main on the API sub-module only — the root-level run doesn't use diff-mode, so the existing config.go import ordering was already present before this PR. However, osstream.go is an entirely new file, so all its lines are flagged.

Recommendations

Fix 1 — gci (import ordering):

Reorganize the imports in osstream.go to follow the project's gci configuration. Move github.com/blang/semver into the same group as other third-party imports, and consolidate according to the project convention (typically: stdlib, then all third-party including openshift/blang/k8s.io/sigs.k8s.io separated into the configured sections). The simplest fix is to run:

golangci-lint run --fix --enable gci hypershift-operator/controllers/nodepool/osstream.go

Or manually restructure the imports to match the ordering used in passing files (e.g., nodepool.go or capi.go).

Fix 2 — unparam (always-nil error):

Two options:

  1. Remove the error return — Change the signature to func usesRuncRuntime(ctx context.Context, c client.Client, nodePool *hyperv1.NodePool) bool and update all call sites. This is the cleaner option since the function intentionally swallows errors (ConfigMap-not-found is expected, decode errors are skipped).

  2. Actually return errors — If the intent is to propagate c.Get() failures or decode errors in the future, change the continue/break paths to return the error. This preserves the current signature but changes the function's behavior.

Option 1 is recommended since the current error-swallowing is intentional (comments say "validation catches this elsewhere"), and this aligns with the linter's finding.

Evidence
Evidence Detail
Failing step Run make lint in GitHub Actions job lint / Lint
Linter version golangci-lint 2.11.4 (go1.25.7)
Error count 2 issues: 1 gci, 1 unparam
File hypershift-operator/controllers/nodepool/osstream.go (new file in PR)
gci error location Line 21, column 1 — "github.com/blang/semver" import misplaced
unparam error location Line 29, column 95 — usesRuncRuntime error return is always nil
Active linters dupword, durationcheck, errcheck, errorlint, fatcontext, gci, gocyclo, govet, ineffassign, misspell, nilerr, noctx, staticcheck, unparam, unused, usestdlibvars
Exit code make: *** [Makefile:121: lint] Error 1 (exit code 2)
Run URL https://github.com/openshift/hypershift/actions/runs/28174068709/job/83445532641

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform area/platform/gcp PR/issue for GCP (GCPPlatform) platform do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants