GCP-841: remove ClusterResourceSet feature gate from CAPG manager args by cristianoveiga · Pull Request #8795 · openshift/hypershift

cristianoveiga · 2026-06-22T13:47:21Z

Summary

Removes ClusterResourceSet=false from the --feature-gates arg passed to the CAPG manager
ClusterResourceSet was promoted to GA in CAPI 1.10 and removed in CAPI 1.12 (kubernetes-sigs/cluster-api#12950)
OCP 4.22+ ships CAPG built against CAPI 1.12.8, causing the capi-provider pod to crash at startup with: unrecognized feature gate: ClusterResourceSet
MachinePool=false is retained — still valid in CAPI 1.12 (Beta, default-on)

Fixes: https://redhat.atlassian.net/browse/GCP-841

Test plan

Existing unit tests pass (go test ./hypershift-operator/controllers/hostedcluster/internal/platform/gcp/)
periodic-ci-openshift-hypershift-release-4.23-periodics-e2e-v2-gke no longer fails due to capi-provider crash
capi-provider pod starts successfully on 4.22.x and 4.23.x without a CAPG image override

🤖 Generated with Claude Code

Summary by CodeRabbit

Refactor
- Simplified GCP controller feature gate configuration by removing version-dependent logic, now using a fixed set of feature gates instead of conditionally adjusting based on payload version.

ClusterResourceSet was promoted to GA in CAPI 1.10 and removed entirely in CAPI 1.12. OCP 4.22+ ships CAPG built against CAPI 1.12.8, causing the capi-provider pod to crash at startup with: invalid argument "MachinePool=false,ClusterResourceSet=false" for "--feature-gates" flag: unrecognized feature gate: ClusterResourceSet Fixes: GCP-841 Signed-off-by: Cristiano Veiga <cveiga@redhat.com> Commit-Message-Assisted-by: Claude (via Claude Code)

openshift-ci · 2026-06-22T13:47:27Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-merge-bot · 2026-06-22T13:47:30Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

coderabbitai · 2026-06-22T13:47:48Z

📝 Walkthrough

Walkthrough

In CAPIProviderDeploymentSpec within the GCP platform controller, the featureGates variable is now initialized with a single static entry (MachinePool=false). The previous conditional logic that parsed payloadVersion and appended ClusterResourceSet=false when the major version was 4 and the minor version was greater than 16 has been removed entirely.

🚥 Pre-merge checks | ✅ 11

✅ Passed checks (11 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	PR does not contain Ginkgo test definitions. Modified file (gcp.go) is non-test code; codebase uses standard Go testing, not Ginkgo.
Test Structure And Quality	✅ Passed	PR modifies only non-test code (gcp.go) and contains no Ginkgo tests. Custom check for Ginkgo test quality is not applicable to this pull request.
Topology-Aware Scheduling Compatibility	✅ Passed	This PR only modifies feature gate configuration strings for CAPI 1.12 compatibility; it introduces no scheduling constraints, affinity rules, topology assumptions, or replica changes whatsoever.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	This PR does not add any Ginkgo e2e tests. It modifies only the GCP platform controller configuration to remove an obsolete feature gate, making this check not applicable.
No-Weak-Crypto	✅ Passed	PR modifies GCP feature gate configuration, not cryptographic code. No MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB, custom crypto, or insecure secret comparisons detected in changes.
Container-Privileges	✅ Passed	PR contains no container privilege escalations: AllowPrivilegeEscalation=false, RunAsNonRoot=true, all capabilities dropped. Changes are only to feature gates, not security configuration.
No-Sensitive-Data-In-Logs	✅ Passed	The PR removes a feature gate flag from CAPG controller configuration. No logging statements are added, modified, or exposed. No sensitive data (credentials, tokens, PII) is logged in this change.
Title check	✅ Passed	The title accurately and specifically describes the main change: removing the ClusterResourceSet feature gate from CAPG manager arguments, which aligns with the core objective of the PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci-robot · 2026-06-22T13:50:43Z

@cristianoveiga: This pull request references GCP-841 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

Removes ClusterResourceSet=false from the --feature-gates arg passed to the CAPG manager

ClusterResourceSet was promoted to GA in CAPI 1.10 and removed in CAPI 1.12 (kubernetes-sigs/cluster-api#12950)

OCP 4.22+ ships CAPG built against CAPI 1.12.8, causing the capi-provider pod to crash at startup with: unrecognized feature gate: ClusterResourceSet

MachinePool=false is retained — still valid in CAPI 1.12 (Beta, default-on)

Fixes: https://redhat.atlassian.net/browse/GCP-841

Test plan

Existing unit tests pass (go test ./hypershift-operator/controllers/hostedcluster/internal/platform/gcp/)

periodic-ci-openshift-hypershift-release-4.23-periodics-e2e-v2-gke no longer fails due to capi-provider crash

capi-provider pod starts successfully on 4.22.x and 4.23.x without a CAPG image override

🤖 Generated with Claude Code

Summary by CodeRabbit

Refactor

Simplified GCP controller feature gate configuration by removing version-dependent logic, now using a fixed set of feature gates instead of conditionally adjusting based on payload version.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

codecov · 2026-06-22T13:56:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 42.09%. Comparing base (8019810) to head (f11fc38).
⚠️ Report is 143 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8795      +/-   ##
==========================================
- Coverage   42.09%   42.09%   -0.01%     
==========================================
  Files         766      766              
  Lines       95047    95043       -4     
==========================================
- Hits        40012    40008       -4     
  Misses      52221    52221              
  Partials     2814     2814

Files with missing lines	Coverage Δ
...rollers/hostedcluster/internal/platform/gcp/gcp.go	`83.67% <ø> (-0.20%)`	⬇️

Flag	Coverage Δ
cmd-support	`35.42% <ø> (ø)`
cpo-hostedcontrolplane	`44.48% <ø> (ø)`
cpo-other	`44.25% <ø> (ø)`
hypershift-operator	`51.91% <ø> (-0.01%)`	⬇️
other	`31.56% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cristianoveiga · 2026-06-22T16:43:54Z

/test e2e-v2-gke

clebs · 2026-06-25T11:07:15Z

@cristianoveiga hypershift is still on CAPI 1.11, since you are removing a feature that is still there on that version we need to make sure it is fine.

cristianoveiga · 2026-06-25T13:03:04Z

@cristianoveiga hypershift is still on CAPI 1.11, since you are removing a feature that is still there on that version we need to make sure it is fine.

Hi @clebs,

The deployed CAPG binary comes from the OCP payload image, built separately from HyperShift's own vendor. My understanding is that these versions are not required to match.

The OpenShift CAPG fork upgraded to CAPI 1.12.8 in openshift/cluster-api-provider-gcp@e049bbd, and the new payloads (GCP HCP minimum will be 4.23) ship that binary.

ClusterResourceSet doesn't exist in any supported CAPG binary, so the fix is safe.

clebs · 2026-06-26T08:43:59Z

@cristianoveiga I see, if older CAPG versions that are still on CAPI 1.11 do not have that either, it should work fine.

/lgtm

openshift-merge-bot · 2026-06-26T08:45:02Z

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-v2-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

hypershift-jira-solve-ci · 2026-06-26T10:57:10Z

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aks | Build: 2070428924499726336 | Cost: $2.93488025 | Failed step: hypershift-azure-run-e2e

View full analysis report

_{Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6}

cristianoveiga · 2026-06-26T14:29:59Z

/retest-required

hypershift-jira-solve-ci · 2026-06-26T17:00:22Z

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aks | Build: 2070515044726083584 | Cost: $2.9783685 | Failed step: hypershift-azure-run-e2e

View full analysis report

_{Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6}

cristianoveiga · 2026-06-28T14:47:22Z

/retest-required

hypershift-jira-solve-ci · 2026-06-28T16:35:56Z

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aks | Build: 2071244081966616576 | Cost: $3.4274917499999993 | Failed step: hypershift-azure-run-e2e

View full analysis report

_{Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6}

cristianoveiga · 2026-06-29T15:15:15Z

/retest-required

hypershift-jira-solve-ci · 2026-06-29T18:25:24Z

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aks | Build: 2071613649356591104 | Cost: $4.784708499999997 | Failed step: hypershift-azure-run-e2e

View full analysis report

_{Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6}

cristianoveiga · 2026-06-29T18:52:36Z

/test e2e-aks

cristianoveiga · 2026-06-29T18:54:27Z

/verified later by @cristianoveiga

openshift-ci-robot · 2026-06-29T18:54:38Z

@cristianoveiga: Only users can be targets for the /verified later command.

Details

In response to this:

/verified later by @cristianoveiga

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

cristianoveiga · 2026-06-29T19:24:12Z

/verified bypass

openshift-ci-robot · 2026-06-29T19:24:23Z

@cristianoveiga: The verified label has been added.

Details

In response to this:

/verified bypass

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

hypershift-jira-solve-ci · 2026-06-29T20:43:55Z

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aks | Build: 2071668215800401920 | Cost: $5.610252449999998 | Failed step: hypershift-azure-run-e2e

View full analysis report

_{Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6}

cristianoveiga · 2026-06-29T20:58:27Z

/test e2e-aks

hypershift-jira-solve-ci · 2026-06-29T22:03:23Z

The e2e-aks step was never started (Started: None, Finished: None) because it depends on [release:initial] which failed. The failure is entirely in CI infrastructure — the release-images-initial pod could not be scheduled on the build01 cluster for the entire 1-hour timeout period. This is unrelated to the PR's code changes.

Test Failure Analysis Complete

Job Information

Prow Job: pull-ci-openshift-hypershift-main-e2e-aks
Build ID: 2071699859362025472
Target: e2e-aks
Cluster: build01
PR: GCP-841: remove ClusterResourceSet feature gate from CAPG manager args #8795 — GCP-841: remove ClusterResourceSet feature gate from CAPG manager args
Started: 2026-06-29T20:58:31Z
Completed: 2026-06-29T21:58:53Z (duration: ~60 min)

Test Failure Analysis

Error

step [release:initial] failed: release "release-images-initial" failed: pod pending for more than
1h0m0s: pod has not been scheduled in 1h0m0.000126666s:
0/47 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 12 node(s) didn't match
Pod's node affinity/selector, 2 node(s) were unschedulable, 32 node(s) had untolerated taint(s).
preemption: 0/47 nodes are available: No preemption victims found for incoming pod.

Summary

This failure is a CI infrastructure scheduling issue completely unrelated to the PR's code changes. The job failed before any test code ever executed. The release-images-initial pod — responsible for importing the OCP 5.0 initial release payload — could not be scheduled on the build01 cluster for the entire 1-hour timeout. All 47–55 available nodes were excluded due to a combination of untolerated taints (~30–40 nodes), node affinity/selector mismatches (12 nodes), unschedulable nodes (1–7 nodes), and pod anti-affinity rules (1 node). Because the e2e-aks test step depends on [release:initial], it was never started.

Root Cause

The root cause is CI build cluster resource exhaustion / scheduling constraints on build01. The release-images-initial pod was created at 20:58:51Z and remained in Pending state for the full 60-minute timeout until 21:58:51Z, when ci-operator aborted the job with reason executing_graph:step_failed:importing_release:running_pod:pod_pending.

Across the 78 scheduling events recorded, the scheduler consistently could not place the pod because:

~30–40 nodes had untolerated taints: These nodes are reserved for other workloads (e.g., different CI profiles or infrastructure components) and cannot accept this pod without matching tolerations.
12 nodes didn't match Pod's node affinity/selector: The pod has node affinity rules (likely requiring amd64 architecture and specific worker labels) that excluded these nodes.
1–7 nodes were unschedulable: These nodes were cordoned for maintenance or draining.
1 node didn't match pod anti-affinity rules: The pod has anti-affinity constraints preventing co-location with certain other pods.

The multiarch-tuning-operator processed the pod correctly (gated it, detected amd64 architecture, removed the gate at 20:58:54Z), but after that the Kubernetes scheduler was never able to find a suitable node. The node counts fluctuated throughout the hour (47–55 total nodes), but the combination of constraints always eliminated all candidates.

This is a transient infrastructure condition. The actual test step e2e-aks was never reached — it has a dependency on [release:initial] and its Started and Finished timestamps are both None.

The PR changes (removing the ClusterResourceSet feature gate from CAPG manager args) were never tested because the failure occurred during release image import, long before any HyperShift or CAPG code was executed.

Recommendations

Retry the job — This is a transient CI infrastructure issue. Use /retest or /test e2e-aks to re-trigger the job. The cluster scheduling pressure is likely to have resolved.
No code changes needed — The PR's changes to remove the ClusterResourceSet feature gate are not implicated in this failure in any way.
If retries continue to fail — Escalate to the CI infrastructure team (#forum-ocp-crt on Slack) about scheduling capacity on the build01 cluster, particularly for release-image import pods that require amd64 workers without restrictive taints.

Evidence

Evidence	Detail
Failed Step	`[release:initial]` — Import the release payload "initial" from an external source
Pod Name	`release-images-initial`
Pod Status	`Pending` for 60m0s (full timeout)
Build Cluster	`build01`
Scheduling Events	78 events, all showing no schedulable nodes
Node Exclusions	~30-40 untolerated taints, 12 affinity mismatches, 1-7 unschedulable, 1 anti-affinity conflict
e2e-aks Step	Never started (Started: None, Finished: None)
Job Reason	`executing_graph:step_failed:importing_release:running_pod:pod_pending`
Release Being Imported	`registry.ci.openshift.org/ocp/release-5:5.0.0-0.ci-2026-06-25-122017`
All Images Built	✅ hypershift, hypershift-operator, hypershift-cli, hypershift-tests — all succeeded

cristianoveiga · 2026-06-30T12:24:47Z

/test e2e-aks

openshift-ci · 2026-06-30T14:46:55Z

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: cristianoveiga

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2026-06-30T15:09:48Z

@cristianoveiga: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 22, 2026

openshift-ci Bot added the do-not-merge/needs-area label Jun 22, 2026

openshift-ci Bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/gcp PR/issue for GCP (GCPPlatform) platform and removed do-not-merge/needs-area labels Jun 22, 2026

cristianoveiga changed the title ~~fix(gcp): remove ClusterResourceSet feature gate from CAPG manager args~~ GCP-841: remove ClusterResourceSet feature gate from CAPG manager args Jun 22, 2026

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 22, 2026

cristianoveiga marked this pull request as ready for review June 22, 2026 15:47

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 22, 2026

openshift-ci Bot requested review from clebs and jimdaga June 22, 2026 15:47

openshift-ci Bot assigned clebs Jun 26, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 26, 2026

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 29, 2026

csrwng added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 30, 2026

openshift-merge-bot Bot merged commit 4d087b3 into openshift:main Jun 30, 2026
41 checks passed

hypershift-jira-solve-ci Bot mentioned this pull request Jul 1, 2026

OCPCLOUD-3261: feat(cloud providers): inject centralized TLS configuration #8864

Open

4 tasks

Uh oh!

Conversation

cristianoveiga commented Jun 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented Jun 22, 2026

Uh oh!

openshift-merge-bot Bot commented Jun 22, 2026

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

openshift-ci-robot commented Jun 22, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

codecov Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cristianoveiga commented Jun 22, 2026

Uh oh!

clebs commented Jun 25, 2026

Uh oh!

cristianoveiga commented Jun 25, 2026

Uh oh!

clebs commented Jun 26, 2026

Uh oh!

openshift-merge-bot Bot commented Jun 26, 2026

Uh oh!

hypershift-jira-solve-ci Bot commented Jun 26, 2026

AI Test Failure Analysis

Uh oh!

cristianoveiga commented Jun 26, 2026

Uh oh!

hypershift-jira-solve-ci Bot commented Jun 26, 2026

AI Test Failure Analysis

Uh oh!

cristianoveiga commented Jun 28, 2026

Uh oh!

hypershift-jira-solve-ci Bot commented Jun 28, 2026

AI Test Failure Analysis

Uh oh!

cristianoveiga commented Jun 29, 2026

Uh oh!

hypershift-jira-solve-ci Bot commented Jun 29, 2026

AI Test Failure Analysis

Uh oh!

cristianoveiga commented Jun 29, 2026

Uh oh!

cristianoveiga commented Jun 29, 2026

Uh oh!

openshift-ci-robot commented Jun 29, 2026

Uh oh!

cristianoveiga commented Jun 29, 2026

Uh oh!

openshift-ci-robot commented Jun 29, 2026

Uh oh!

hypershift-jira-solve-ci Bot commented Jun 29, 2026

AI Test Failure Analysis

Uh oh!

cristianoveiga commented Jun 29, 2026

Uh oh!

hypershift-jira-solve-ci Bot commented Jun 29, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

Summary

Uh oh!

cristianoveiga commented Jun 30, 2026

Uh oh!

openshift-ci Bot commented Jun 30, 2026

cristianoveiga commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

openshift-ci-robot commented Jun 22, 2026 •

edited by openshift-ci Bot

Loading

codecov Bot commented Jun 22, 2026 •

edited

Loading

hypershift-jira-solve-ci Bot commented Jun 29, 2026 •

edited by openshift-ci Bot

Loading