Skip to content

Bug OCPBUGS-94106: Fall back to TLSProfileIntermediateType ciphers when observedConfig is empty during bootstrap#1645

Open
redhat-chai-bot wants to merge 1 commit into
openshift:mainfrom
redhat-chai-bot:fix-bootstrap-cipher-suites-fallback
Open

Bug OCPBUGS-94106: Fall back to TLSProfileIntermediateType ciphers when observedConfig is empty during bootstrap#1645
redhat-chai-bot wants to merge 1 commit into
openshift:mainfrom
redhat-chai-bot:fix-bootstrap-cipher-suites-fallback

Conversation

@redhat-chai-bot

@redhat-chai-bot redhat-chai-bot commented Jun 30, 2026

Copy link
Copy Markdown

Summary

During bootstrap, observedConfig is initially empty because the config observation controller hasn't converged yet. The getCipherSuites() function in pkg/etcdenvvar/etcd_env.go hard-errors when observedConfig.servingInfo.cipherSuites is empty, causing the EnvVarController to report Degraded. This blocks the InstallerController from creating the first etcd static pod revision, and if the config observer doesn't converge fast enough, bootstrap times out with zero control-plane etcd members started.

This has been observed as an intermittent (~19%) bootstrap failure in OCP 5.0 nightly CI since June 25, 2026.

Fixes: https://redhat.atlassian.net/browse/OCPBUGS-94106

Changes

pkg/etcdenvvar/etcd_env.go (+20, -1):

  • getCipherSuites(): When observedCipherSuites is empty (config observer hasn't converged), fall back to TLSProfileIntermediateType cipher suites — the same defaults used by the render/bootstrap path in pkg/cmd/render/env.go:getTLSCipherSuites(). Logs a warning when falling back. If ciphers were present but none were supported by etcd, the function still errors (distinguishing bootstrap-empty from genuinely-bad-config).
  • getObservedTLSMinVersion(): When observedMinTLSVersion is empty, fall back to TLS 1.2 (the TLSProfileIntermediateType default). Logs a warning.
  • Fixed typo in error message: "no supported cipherSuites not found" → "no supported cipherSuites found".

pkg/etcdenvvar/etcd_env_test.go (+98):

  • Added TestGetCipherSuites with 3 table-driven test cases: (a) populated observedConfig returns expected ciphers, (b) empty observedConfig falls back to IntermediateType defaults, (c) observedConfig with unsupported ciphers still errors.

Security Analysis

No security risk:

  • The fallback uses the same cipher suites that bootstrap etcd already runs with via the render path
  • Both paths filter through SupportedEtcdCiphers (same etcd cipher validation)
  • On default clusters, the fallback produces identical ciphers to what observedConfig would eventually contain
  • The fallback is temporary — once observedConfig converges, the normal path takes over
  • The current failure mode (no etcd → no cluster) is strictly worse

Evidence

  • Failing job: periodic-ci-openshift-release-main-nightly-5.0-e2e-gcp-ovn-serial/2071513972917407744
  • Bootkube log: 31 minutes of waiting on condition EtcdRunningInCluster before timeout
  • Error: no supported cipherSuites not found in observedConfig
  • StaticPodsAvailable: 0 nodes are active; 3 nodes are at revision 0
  • First observed Jun 25 in payload 5.0.0-0.nightly-2026-06-25-122140
  • 0 bootstrap failures in 16 runs Jun 20-24; 3 in 16 runs Jun 25-30

Summary by CodeRabbit

  • Bug Fixes

    • Improved startup handling when TLS settings are missing by falling back to safe defaults instead of failing during parsing.
    • Added a fallback for cipher suite settings during bootstrap, with clearer warning messages when no supported options are found.
  • Tests

    • Expanded coverage for cipher suite handling, including successful configuration output and unsupported-cipher error cases.

…en observedConfig is empty during bootstrap

During bootstrap, the config observation controller hasn't converged yet,
so observedConfig.servingInfo.cipherSuites and minTLSVersion are empty.
This caused getCipherSuites() and getObservedTLSMinVersion() to hard-error,
preventing the etcd static pod from rendering.

Fall back to TLSProfileIntermediateType defaults (matching the render path
in pkg/cmd/render/env.go:getTLSCipherSuites) when the observed values are
empty, and log a warning so the fallback is auditable. The fallback only
triggers when the config is genuinely empty — observedConfig with
unrecognized ciphers still errors as before.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 30, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@redhat-chai-bot: This pull request references Jira Issue OCPBUGS-94106, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

During bootstrap, observedConfig is initially empty because the config observation controller hasn't converged yet. The getCipherSuites() function in pkg/etcdenvvar/etcd_env.go hard-errors when observedConfig.servingInfo.cipherSuites is empty, causing the EnvVarController to report Degraded. This blocks the InstallerController from creating the first etcd static pod revision, and if the config observer doesn't converge fast enough, bootstrap times out with zero control-plane etcd members started.

This has been observed as an intermittent (~19%) bootstrap failure in OCP 5.0 nightly CI since June 25, 2026.

Fixes: https://redhat.atlassian.net/browse/OCPBUGS-94106

Changes

pkg/etcdenvvar/etcd_env.go (+20, -1):

  • getCipherSuites(): When observedCipherSuites is empty (config observer hasn't converged), fall back to TLSProfileIntermediateType cipher suites — the same defaults used by the render/bootstrap path in pkg/cmd/render/env.go:getTLSCipherSuites(). Logs a warning when falling back. If ciphers were present but none were supported by etcd, the function still errors (distinguishing bootstrap-empty from genuinely-bad-config).
  • getObservedTLSMinVersion(): When observedMinTLSVersion is empty, fall back to TLS 1.2 (the TLSProfileIntermediateType default). Logs a warning.
  • Fixed typo in error message: "no supported cipherSuites not found" → "no supported cipherSuites found".

pkg/etcdenvvar/etcd_env_test.go (+98):

  • Added TestGetCipherSuites with 3 table-driven test cases: (a) populated observedConfig returns expected ciphers, (b) empty observedConfig falls back to IntermediateType defaults, (c) observedConfig with unsupported ciphers still errors.

Security Analysis

No security risk:

  • The fallback uses the same cipher suites that bootstrap etcd already runs with via the render path
  • Both paths filter through SupportedEtcdCiphers (same etcd cipher validation)
  • On default clusters, the fallback produces identical ciphers to what observedConfig would eventually contain
  • The fallback is temporary — once observedConfig converges, the normal path takes over
  • The current failure mode (no etcd → no cluster) is strictly worse

Evidence

  • Failing job: periodic-ci-openshift-release-main-nightly-5.0-e2e-gcp-ovn-serial/2071513972917407744
  • Bootkube log: 31 minutes of waiting on condition EtcdRunningInCluster before timeout
  • Error: no supported cipherSuites not found in observedConfig
  • StaticPodsAvailable: 0 nodes are active; 3 nodes are at revision 0
  • First observed Jun 25 in payload 5.0.0-0.nightly-2026-06-25-122140
  • 0 bootstrap failures in 16 runs Jun 20-24; 3 in 16 runs Jun 25-30

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Walkthrough

Adds bootstrap-time fallback logic in getObservedTLSMinVersion and getCipherSuites: when the observed config's servingInfo fields are empty, each function emits a warning and returns safe defaults (TLS 1.2 and TLSProfileIntermediateType ciphers). A new table-driven test covers all three cipher suite scenarios.

Bootstrap TLS Fallback

Layer / File(s) Summary
Bootstrap fallbacks in TLS version and cipher suite resolution
pkg/etcdenvvar/etcd_env.go
getObservedTLSMinVersion returns TLS 1.2 with a warning when minTLSVersion is empty; getCipherSuites falls back to TLSProfileIntermediateType defaults with a warning when cipherSuites is empty; the empty-supported-ciphers error message is updated.
Table-driven tests for getCipherSuites
pkg/etcdenvvar/etcd_env_test.go
TestGetCipherSuites covers populated config, empty-config fallback, and unsupported-cipher error cases by marshaling observedConfig to YAML and asserting on ETCD_CIPHER_SUITES output and error messages.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: bootstrap now falls back to TLSProfileIntermediateType ciphers when observedConfig is empty.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed New subtest names are static, descriptive strings; no dynamic IDs, dates, node/pod names, or Ginkgo titles were added.
Test Structure And Quality ✅ Passed PASS: These are plain table-driven unit tests, not Ginkgo; each subtest is single-purpose and there are no cluster waits or cleanup concerns.
Microshift Test Compatibility ✅ Passed The PR adds only a Go unit test (TestGetCipherSuites), with no Ginkgo e2e tests or MicroShift-unsupported OpenShift API usage.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PASS: The only added test is a standard Go unit test in pkg/etcdenvvar, with no Ginkgo/e2e constructs or multi-node/SNO assumptions.
Topology-Aware Scheduling Compatibility ✅ Passed Changes only add TLS fallback/test logic in env-var generation; no node selectors, affinities, spread constraints, PDBs, or replica/topology assumptions were introduced.
Ote Binary Stdout Contract ✅ Passed No process-level stdout writes were added: touched files lack main/TestMain/init/suite setup, and new klog warnings stay inside helper functions.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo/e2e tests were added; the diff only adds a standard unit test in pkg/etcdenvvar/etcd_env_test.go with no IPv4 or external-network assumptions.
No-Weak-Crypto ✅ Passed PR only falls back to TLSProfileIntermediateType ciphers (AES/CHACHA20) and filters via SupportedEtcdCiphers; no weak crypto, custom crypto, or secret compares added.
Container-Privileges ✅ Passed The PR only changes Go code and tests in pkg/etcdenvvar; no container/K8s manifests were touched, and no privileged/root settings appear in the modified files.
No-Sensitive-Data-In-Logs ✅ Passed New warnings are generic and do not interpolate secrets, tokens, hostnames, PII, or customer data.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot requested review from dusk125 and ingvagabund June 30, 2026 12:54
@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign atiratree for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/etcdenvvar/etcd_env_test.go (1)

14-72: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add direct coverage for the empty-minTLSVersion fallback.

The PR also adds a fallback in getObservedTLSMinVersion (empty → TLS 1.2), but no case asserts ETCD_TLS_MIN_VERSION. The empty-observedConfig case exercises it only indirectly via getCipherSuites and never checks the resulting min-version value. Consider a table case (or a sibling test on getTLSMinVersion) asserting the TLS 1.2 fallback output.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/etcdenvvar/etcd_env_test.go` around lines 14 - 72, Add direct test
coverage for the empty min-TLS-version fallback in getObservedTLSMinVersion /
getTLSMinVersion, since the current TestGetCipherSuites only exercises it
indirectly. Extend the existing table or add a sibling test that explicitly
asserts the ETCD_TLS_MIN_VERSION output is TLS 1.2 when observedConfig has no
minTLSVersion, while keeping the current cipher-suite cases unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/etcdenvvar/etcd_env_test.go`:
- Around line 14-72: Add direct test coverage for the empty min-TLS-version
fallback in getObservedTLSMinVersion / getTLSMinVersion, since the current
TestGetCipherSuites only exercises it indirectly. Extend the existing table or
add a sibling test that explicitly asserts the ETCD_TLS_MIN_VERSION output is
TLS 1.2 when observedConfig has no minTLSVersion, while keeping the current
cipher-suite cases unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a98696eb-47b7-484e-8268-f78b7e1055c7

📥 Commits

Reviewing files that changed from the base of the PR and between b4786ae and d4c53e8.

📒 Files selected for processing (2)
  • pkg/etcdenvvar/etcd_env.go
  • pkg/etcdenvvar/etcd_env_test.go

@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

@redhat-chai-bot: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-operator-disruptive d4c53e8 link true /test e2e-gcp-operator-disruptive
ci/prow/e2e-aws-ovn-serial-2of2 d4c53e8 link true /test e2e-aws-ovn-serial-2of2

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

// During bootstrap the config observation controller hasn't converged yet,
// so observedConfig.servingInfo.minTLSVersion will be empty. Fall back to
// TLSProfileIntermediateType defaults (TLS 1.2).
if observedMinTLSVersion == "" {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check isn't really necessary, crypto.TLSVersion returns TLSVersion12 if the input is empty: https://github.com/openshift/library-go/blob/257053230f0be3e7325f38dc00d98147c7f77e16/pkg/crypto/crypto.go#L86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants