Bug 0000: Use server-side apply for observedConfig to prevent bootstrap livelock#1643
Bug 0000: Use server-side apply for observedConfig to prevent bootstrap livelock#1643redhat-chai-bot wants to merge 2 commits into
Conversation
During OCP 5.0 bootstrap, 10+ controllers simultaneously update the
etcd operator CR's .status section, causing rapid resourceVersion churn.
The ConfigObserver's standard read-modify-write (optimistic concurrency)
pattern fails repeatedly with HTTP 409 Conflict because the
resourceVersion is bumped by other controllers' status updates before
the spec update can complete. This livelock persists for the entire
59-minute bootstrap window, preventing the observedConfig (cipher
suites, TLS version, control plane replicas) from ever being written,
which blocks etcd static pod creation.
This commit replaces the library-go ConfigObserver's UpdateSpec-based
write path with a custom sync loop that uses Server-Side Apply (SSA)
via ApplyOperatorSpec. SSA eliminates optimistic concurrency conflicts
because the API server handles field-level ownership merging rather
than requiring a matching resourceVersion. Each controller owns only
the fields it manages (the config observer owns .spec.observedConfig,
other controllers own .status fields), so concurrent updates no longer
interfere with each other.
The fix is entirely within the etcd operator's codebase — no changes
to library-go are required. The custom controller replicates the
observer execution, merging, and determinism-checking logic from the
library-go ConfigObserver, but writes the result using
operatorClient.ApplyOperatorSpec with a dedicated field manager
("cluster-etcd-operator-config-observer").
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
WalkthroughReplaces the library-go config observer wiring with an in-file SSA-based observer, adds randomized observer execution and merge determinism checks, and applies ChangesSSA config observer controller
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@pkg/operator/configobservation/configobservercontroller/observe_config_controller.go`:
- Around line 190-195: The non-determinism check in the config observation flow
is correct, but `applyObservedConfig` still applies `mergedObservedConfig`,
which can be a random winner from `rand.Perm`. Update `observeConfigController`
so that when `equality.Semantic.DeepEqual` detects conflicting observer output,
the controller does not write that random merged result into
`.spec.observedConfig`; instead, skip the apply or preserve the last stable
observed config, and keep the degraded error path intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 405950cb-2c79-40eb-b7c4-187735a8b887
📒 Files selected for processing (1)
pkg/operator/configobservation/configobservercontroller/observe_config_controller.go
| if !equality.Semantic.DeepEqual(mergedObservedConfig, reverseMergedObservedConfig) { | ||
| errs = append(errs, errors.New("non-deterministic config observation detected")) | ||
| } | ||
|
|
||
| if err := c.applyObservedConfig(ctx, syncCtx, existingConfig, mergedObservedConfig); err != nil { | ||
| errs = []error{err} |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Do not apply a random merge result after detecting non-determinism.
Line 190 detects conflicting observer output, but Line 194 still applies mergedObservedConfig, whose winning values depend on rand.Perm. That can make .spec.observedConfig flap across syncs even while reporting degraded.
Suggested fix
+ nonDeterministic := false
if !equality.Semantic.DeepEqual(mergedObservedConfig, reverseMergedObservedConfig) {
+ nonDeterministic = true
errs = append(errs, errors.New("non-deterministic config observation detected"))
}
- if err := c.applyObservedConfig(ctx, syncCtx, existingConfig, mergedObservedConfig); err != nil {
+ if !nonDeterministic {
+ if err := c.applyObservedConfig(ctx, syncCtx, existingConfig, mergedObservedConfig); err != nil {
- errs = []error{err}
+ errs = []error{err}
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if !equality.Semantic.DeepEqual(mergedObservedConfig, reverseMergedObservedConfig) { | |
| errs = append(errs, errors.New("non-deterministic config observation detected")) | |
| } | |
| if err := c.applyObservedConfig(ctx, syncCtx, existingConfig, mergedObservedConfig); err != nil { | |
| errs = []error{err} | |
| nonDeterministic := false | |
| if !equality.Semantic.DeepEqual(mergedObservedConfig, reverseMergedObservedConfig) { | |
| nonDeterministic = true | |
| errs = append(errs, errors.New("non-deterministic config observation detected")) | |
| } | |
| if !nonDeterministic { | |
| if err := c.applyObservedConfig(ctx, syncCtx, existingConfig, mergedObservedConfig); err != nil { | |
| errs = []error{err} | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@pkg/operator/configobservation/configobservercontroller/observe_config_controller.go`
around lines 190 - 195, The non-determinism check in the config observation flow
is correct, but `applyObservedConfig` still applies `mergedObservedConfig`,
which can be a random winner from `rand.Perm`. Update `observeConfigController`
so that when `equality.Semantic.DeepEqual` detects conflicting observer output,
the controller does not write that random merged result into
`.spec.observedConfig`; instead, skip the apply or preserve the last stable
observed config, and keep the degraded error path intact.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@redhat-chai-bot: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@redhat-chai-bot does this really cause bootstrap timeouts? how many conflicts do we run into while updating the status? |
Summary
During OCP 5.0 bootstrap, the ConfigObserver in the etcd operator cannot write cipher suite configuration to
.spec.observedConfigbecause of persistent HTTP 409 Conflict errors. This prevents etcd static pods from ever being created, causing a bootstrap timeout (~28% failure rate across 20+ GCP jobs).Root Cause
The library-go
ConfigObserverusesUpdateSpec()— a read-modify-write pattern with optimistic concurrency (resourceVersion check). During bootstrap, 10+ controllers simultaneously update the same Etcd CR's.statussection, causing rapidresourceVersionchurn. Every ConfigObserver write attempt fails with 409 Conflict because another controller has already bumped the resourceVersion between the read and write.This creates a vicious cycle:
observedConfigstays emptyEnvVarControllercan't find cipher suites → goes DegradedresourceVersion→ feeds the conflict stormFix
Replace the library-go
configobserver.NewConfigObserver()with a customssaConfigObserverthat uses Server-Side Apply (SSA) to write.spec.observedConfig. SSA eliminates optimistic concurrency conflicts because the API server handles field-level ownership merging rather than requiring a matchingresourceVersion.Key changes in
pkg/operator/configobservation/configobservercontroller/observe_config_controller.go:ssaConfigObserverstruct that implements the same observer logic (randomized merge, determinism check) but writes viaApplyOperatorSpec()with a dedicated field manager"cluster-etcd-operator-config-observer"ApplyOperatorStatus()ObserveTLSSecurityProfileandObserveControlPlaneReplicasare unchangedTesting
go build ./...— passesmake verify(gofmt + go vet) — passesgo test ./pkg/...— all tests pass (except pre-existingpkg/cmd/monitorfailure unrelated to this change)Summary by CodeRabbit
New Features
Bug Fixes
Chores