feat: configurable reconcile worker pool (--max-concurrent-reconciles)#379
feat: configurable reconcile worker pool (--max-concurrent-reconciles)#379xrl wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: xrl The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @xrl. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Add a --max-concurrent-reconciles flag (default 5) and thread it into
SetupWithManager via builder.WithOptions(controller.Options{...}).
controller-runtime defaults to a single reconcile worker. Each EtcdCluster
is reconciled on its own workqueue key (deduped by namespaced name), so a
cluster is never reconciled by two workers at once -- concurrency only
parallelizes distinct clusters, with no intra-cluster races. Reconciles
here are heavy and long-running (StatefulSet patches, member-list/health
RPCs against managed etcd, certificate work), so a small pool improves
multi-cluster throughput; the cost of a larger pool is more simultaneous
apiserver and managed-etcd load. A value <= 0 falls back to the safe
default of 1.
Document the flag in its help text, a doc comment on the reconciler field,
and a new docs/operator-flags.md (linked from the README). Add an
envtest-backed test asserting SetupWithManager threads the pool size for
widened, single-worker, and non-positive fallback values.
test/e2e/STRESS.md records the measured spinup-burst budget behind the
stress e2e batching and why the worker pool, not namespace isolation, is
the lever for overlapping heavy spinups.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Xavier Lange <xrlange@gmail.com>
40cd013 to
a189e28
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: xrl The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Split out the etcd-container QoS change (it edits the StatefulSet template, which #363 is about to replace — parked on a separate branch until that lands). This PR is now only the worker pool. |
controller-runtime defaults to a single reconcile worker, so one slow cluster blocks progress on every other cluster. This adds a
--max-concurrent-reconcilesflag (default 5) threaded intoSetupWithManagerviabuilder.WithOptions; eachEtcdClusterkeeps its own workqueue key, so concurrency only parallelizes distinct clusters and a value<= 0falls back to a single worker. Documented indocs/operator-flags.md, withtest/e2e/STRESS.mdrecording the measured spinup-burst budget behind the stress-tier batching. Tested with an envtest-backed assertion that the pool size is threaded for widened, single-worker, and non-positive values.PR series — operability fixes & TLS
Small single-purpose PRs from live kind-cluster testing of the operator. Each stands alone unless an After is listed. → = this PR.
events.k8s.ioRBAC so operator Events are actually recordedmembers[]/leaderIDfrom one health snapshot (consistent leader)altNames.ipAddressesinto certificatesvalidityDuration(365d,100d12h) as documentedDegradedcondition (was empty status)--max-concurrent-reconciles)spec.tls.{peer,client}surfaces (breaking alpha API)TLSReadycondition + TLS lifecycle EventsPeerCANotShared/metricsendpointEtcdBackupCR → object storage (S3/GCS)🟢 ready · ⚪ draft