[shell-operator] fix: retry webhook manager initialization and submit calls#899
Merged
Conversation
Signed-off-by: Ruslan Gorbunov <ruslan.gorbunov@flant.com>
Signed-off-by: Ruslan Gorbunov <ruslan.gorbunov@flant.com>
Signed-off-by: Ruslan Gorbunov <ruslan.gorbunov@flant.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds retry-with-backoff layers around webhook manager initialization and admission webhook configuration submission so that brief Kubernetes API server unavailability during shell-operator startup does not cause a fatal exit. Introduces a new shared pkg/utils/retry helper, threads context.Context through the admission webhook registration path, and also tidies up logger propagation in the kube events manager so monitor/namespace informers don't rely on MonitorConfig.Logger being populated by callers.
Changes:
- New context-aware
retry.WithBackoffhelper with tests, used in two new places: admission webhooksubmit(5 retries, 2–15 s) and bootstrap-level webhook manager init (8 retries, 1–15 s). - Admission
Register/submitnow take acontext.Context; create/update errors are propagated (previously logged and swallowed). - Kube events manager:
AddMonitordefaultsMonitorConfig.Logger;namespaceInformercarries its own logger;resource_informerandnamespace_informerlog sync errors via their informer/manager logger.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/utils/retry/retry.go | New retry-with-backoff helper honoring context cancellation. |
| pkg/utils/retry/retry_test.go | Tests for success/exhaustion/context-cancel/backoff cap. |
| pkg/webhook/admission/resource.go | Adds retry around list/create/update; propagates create/update errors; threads context. |
| pkg/webhook/admission/manager.go | Start now accepts context and passes it to Register. |
| pkg/shell-operator/operator.go | Passes op.ctx into AdmissionWebhookManager.Start. |
| pkg/shell-operator/bootstrap.go | Wraps both webhook init calls with retry.WithBackoff + config constants. |
| pkg/kube_events_manager/kube_events_manager.go | Ensures MonitorConfig.Logger is populated in AddMonitor. |
| pkg/kube_events_manager/monitor.go | Passes m.logger into NewNamespaceInformer. |
| pkg/kube_events_manager/namespace_informer.go | Stores informer-owned logger and uses it on sync failure. |
| pkg/kube_events_manager/resource_informer.go | Uses ei.logger instead of Monitor.Logger on sync failure. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Ruslan Gorbunov <ruslan.gorbunov@flant.com>
ldmonster
approved these changes
May 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Improve webhook registration resilience by adding context-aware retry with exponential backoff for admission webhook configuration submit calls, and harden kube-events informer error logging paths.
What this PR does / why we need it
During shell-operator startup, the admission webhook manager must register ValidatingWebhookConfiguration and MutatingWebhookConfiguration via the Kubernetes API.
If the API server is briefly unavailable (rolling restart, leader election, etcd hiccup), these calls could previously fail immediately and abort startup.
This PR adds resilience at the admission resource submit layer:
ValidatingWebhookConfigurationMutatingWebhookConfigurationlist/create/updateerrors are now returned (fail-fast) instead of being logged and silently ignored.Additional hardening included in this PR: