Skip to content

CNTRLPLANE-3631: Predictable NodePool Rollout Control#2042

Open
csrwng wants to merge 1 commit into
openshift:masterfrom
csrwng:cntrlplane-3631
Open

CNTRLPLANE-3631: Predictable NodePool Rollout Control#2042
csrwng wants to merge 1 commit into
openshift:masterfrom
csrwng:cntrlplane-3631

Conversation

@csrwng

@csrwng csrwng commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Enhancement proposal for OCPSTRAT-3298 — Predictable NodePool Rollout Control for Hosted Control Planes.

Today, the HyperShift NodePool controller uses a single hash over the entire rendered ignition config to drive rollout decisions. Any change — including automated management-side image digest bumps (e.g., HAProxy) — triggers a full Replace rollout of all worker nodes. This has caused production incidents.

This enhancement proposes:

  • A rollout hash derived only from spec-driven inputs (user MachineConfigs, release version, pull secret, trust bundle, global config), excluding management-side content
  • A new nodePoolCurrentRolloutConfig annotation with safe first-reconcile seeding for existing NodePools
  • A ConfigUpdatePending condition for observability into management-side configuration drift
  • Support for both Replace and InPlace upgrade strategies

Jira

Test plan

🤖 Generated with Claude Code

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 15, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 15, 2026

Copy link
Copy Markdown

@csrwng: This pull request references CNTRLPLANE-3631 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

Enhancement proposal for OCPSTRAT-3298 — Predictable NodePool Rollout Control for Hosted Control Planes.

Today, the HyperShift NodePool controller uses a single hash over the entire rendered ignition config to drive rollout decisions. Any change — including automated management-side image digest bumps (e.g., HAProxy) — triggers a full Replace rollout of all worker nodes. This has caused production incidents.

This enhancement proposes:

  • A rollout hash derived only from spec-driven inputs (user MachineConfigs, release version, pull secret, trust bundle, global config), excluding management-side content
  • A new nodePoolCurrentRolloutConfig annotation with safe first-reconcile seeding for existing NodePools
  • A ConfigUpdatePending condition for observability into management-side configuration drift
  • Support for both Replace and InPlace upgrade strategies

Jira

Test plan

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from enxebre and sjenning June 15, 2026 20:50
@openshift-ci

openshift-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign csrwng for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1. An automated image-updater bumps the HAProxy image digest in the HyperShift
operator deployment.
2. The operator reconciles all NodePools. The full `Hash()` changes (because it
includes HAProxy), so a new user-data secret is generated with the latest

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is a new userdata secret generated here if there's no consumers for it?

Besides, as soon as a new userdata is created, the existing one gets deleted via token.cleanupOutdated
which would make the nodepool unable to perform scaling operations.
And the token secret gets an expiration timestamp (IgnitionServerTokenExpirationTimestampAnnotation), which will result in the payload being deleted from the ignition server

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — this is the same issue I addressed in my response to your comment on the workflow section header. The current wording in step 2 is wrong: when only management-side content changes and no rollout is triggered, the controller should NOT create new token/user-data secrets or run cleanupOutdated(). The existing secrets remain valid and the MachineDeployment continues to reference them.

I'll rewrite this workflow to reflect the corrected behavior:

  1. Operator reconciles all NodePools. The RolloutHashWithoutVersion() does NOT change.
  2. No new token or user-data secrets are created. The existing secrets remain valid.
  3. The ConfigUpdatePending condition transitions to True with reason ManagementConfigDrift.
  4. No MachineDeployment or MachineSet spec change occurs. Existing nodes remain undisturbed. Scale-up continues to work using the existing user-data secret.

`nodePoolCurrentRolloutConfig` annotation.
4. The MachineDeployment spec is updated with the new version and
`DataSecretName`, triggering a CAPI rolling Replace.
5. When the rollout completes (`MachineDeploymentComplete()`), the controller

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we articulate the steps for what happens if during a rollout the service provider config changes again.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I'll add a fourth workflow scenario covering this. The behavior:

Mid-rollout management-side config change:

  1. A spec-driven rollout is in progress — the MachineDeployment is rolling, UpdatingConfig condition is True.
  2. While the rollout is running, a management-side change occurs (e.g., HAProxy image bump).
  3. The rollout hash has NOT changed (management-side content is excluded), so no new rollout is triggered.
  4. The in-progress rollout continues using the existing token/user-data secrets. No new secrets are created (rollout hash unchanged).
  5. When the rollout completes, the nodePoolCurrentRolloutConfig annotation is updated to the current rollout hash. The nodes that were just replaced have the ignition payload that was generated at the start of the rollout — they do NOT automatically pick up the mid-rollout management-side change.
  6. The ConfigUpdatePending condition may transition to True if the management-side change means the current payload differs from what newly-created nodes would get on a fresh provision.

Mid-rollout spec-driven config change:

This is an existing behavior and is unchanged by this enhancement. CAPI handles this via the MachineDeployment's rolling update strategy — the new desired state supersedes the in-progress one, and CAPI continues rolling until all machines match the latest template.


### Goals

1. Management-plane image digest bumps (e.g., HAProxy, CPO) MUST NOT trigger

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we be specific about what things owned by the platform might cause a rollout today? haproxy dataplane image/overrides. What else?
It's also probably worth mentioning that haproxy dataplane image bumping is a particular exception because of early haproxy needs. There's no reason for that image to not come from payload now, unless we ever support shared ingress in selfhosted.

Also should probably include as still a risk to be addressed separately that some config.openshift.io changes to defaults might result in accidental intent for a rollout

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. I'll update the Goals section to enumerate the specific platform-owned inputs that can trigger rollouts today:

  1. HAProxy data plane image — the kube-apiserver-proxy static pod image. For shared ingress clusters (ROSA HCP, ARO HCP), this comes from the operator's IMAGE_SHARED_INGRESS_HAPROXY env var rather than the NodePool's release payload. For non-shared-ingress (self-hosted) clusters, it already comes from the NodePool's release payload (haproxy-router component), so it's not an issue there. This is a historical artifact from early shared ingress bootstrapping — there's no reason it can't come from the payload now.

  2. Registry overrides applied to the HAProxy image--registry-overrides on the management cluster rewrites the image reference that gets embedded in the ignition payload, even though data plane CRI-O handles mirroring natively (tracked in OCPBUGS-86415).

  3. config.openshift.io computed defaults — the globalConfigString() function reconciles proxy and image configs with platform-specific defaults (e.g., Status.NoProxy entries like network CIDRs, 169.254.169.254 for AWS). If the operator code changes these defaults, the serialized config changes and triggers a rollout even though the user's spec didn't change.

For (3), the two-hash architecture can address this by hashing only the user's raw spec inputs (HostedCluster.Spec.Configuration.Proxy, HostedCluster.Spec.Configuration.Image) in the rollout hash — without reconciliation or computed defaults. The full reconciled config continues to be used for payload generation. This follows the same pattern as HAProxy: user intent drives rollouts, platform-computed values flow into the payload silently.

I'll update the enhancement to cover this as part of the rollout hash design rather than calling it out as a separate risk.

not yet applied to existing nodes.

### Workflow Description

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we articulate how any workflow impact the lifecycle of the token secrets and payload cache generation and expiration

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — this is a gap in the current design that needs to be addressed. Let me trace the problem:

When only management-side content changes (e.g., HAProxy bump):

  1. Hash() changes → isOutdated() returns true
  2. cleanupOutdated() expires the old token secret (2hr TTL) and deletes the old user-data secret
  3. New token/user-data secrets are created with names based on the new Hash()
  4. No rollout triggered (rollout hash unchanged) → MachineDeployment still references the old (now deleted) user-data secret
  5. Scale-up would reference a non-existent secret

The fix: when the rollout hash has not changed, the controller should not create new token/user-data secrets or cleanup existing ones. The existing secrets remain valid — they contain a working ignition payload, and the MachineDeployment continues to reference them. Scale-up nodes get the same config as existing nodes, which is the correct behavior since we're explicitly choosing not to roll out.

For spec-driven rollouts, the lifecycle stays the same as today: new secrets are created, old ones are expired/cleaned up, and the MachineDeployment is updated to reference the new secret.

I'll add a "Token secret and payload cache lifecycle" section to each workflow scenario covering this.

return cg.doParse(configs, cg.haproxyRawConfig)
}

func (cg *ConfigGenerator) parseWithoutHaproxy(configs []corev1.ConfigMap) (string, error) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to name this after rollout vs non rollout so it's extendable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'll rename the table headers and descriptions to use "rollout" vs "non-rollout" terminology:

Hash Category Inputs Used for
Hash() / HashWithoutVersion() Non-rollout Full MCO config including HAProxy, pull secret name, additional trust bundle name, reconciled global config User-data secret naming, payload generation
RolloutHash() / RolloutHashWithoutVersion() Rollout MCO config excluding HAProxy, pull secret name, additional trust bundle name, user-set global config (proxy spec, image spec — without computed defaults) Rollout decisions

This makes it clearer that the categories are extensible — if new management-side content is added in the future, it goes into the "non-rollout" hash only.

Comment on lines +40 to +41
ignition payload so new nodes always receive the latest configuration, but they
no longer trigger rollouts of existing nodes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't sound correct. When only HAProxy changes:

  • A new user-data secret IS generated (different name from Hash())
  • But propagateVersionAndTemplate does NOT update MachineDeployment.Spec.Template.Spec.Bootstrap.DataSecretName (because rollout hash didn't change)
  • Therefore scale-up nodes created by the MachineDeployment will reference the OLD DataSecretName
  • Scale-up nodes get stale management-side config, not "the full payload"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — the summary as written is incorrect. With the corrected design (discussed in the thread with @enxebre), when only management-side content changes, no new token/user-data secrets are created at all. The existing secrets remain valid, and the MachineDeployment continues to reference them. Scale-up nodes get the same config as existing nodes — which is the correct and intended behavior, since we're explicitly choosing not to roll out.

I'll update the summary to reflect this accurately: "Management-side changes do not trigger rollouts, and both existing and scale-up nodes retain the current configuration until the next spec-driven rollout."

Introduces a two-hash architecture in the NodePool controller to decouple
rollout decisions from management-side configuration changes. A new "rollout
hash" derived only from customer-facing spec inputs (user MachineConfigs,
release version, pull secret, trust bundle, user-set proxy/image config)
determines whether to trigger Replace or InPlace rollouts. Management-side
changes (HAProxy image bumps, registry overrides, config.openshift.io computed
defaults) no longer trigger rollouts.

Tracking: CNTRLPLANE-3631
@csrwng csrwng force-pushed the cntrlplane-3631 branch from 8703185 to f1de573 Compare June 16, 2026 18:14
@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

@csrwng: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

reference embedded in the ignition payload, even though data plane CRI-O
handles mirroring natively
([OCPBUGS-86415](https://issues.redhat.com/browse/OCPBUGS-86415)).
- **`config.openshift.io` computed defaults**: The `globalConfigString()`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would we differentiate default changes from user intent changes? I don't think this is addressed by this enhacement

updated to the current rollout hash. The nodes that were replaced have the
ignition payload that was generated at the start of the rollout — they do NOT
automatically pick up the mid-rollout management-side change.
6. The `ConfigUpdatePending` condition may transition to `True` if the

@enxebre enxebre Jun 22, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we detail how would this be implemented in the impl details section?
how does the controller decides what to set to current not rollout config, if there are two targets, the latest (written in the target annotation) and the one in flight (which is not stored in the annotation anymore because latest overrode that)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants