🤖 Generated by the Daily AI Assistant
Problem
The descheduler's HighNodeUtilization consolidation strategy (being introduced for #2468) is upstream-documented to pair with the kube-scheduler's MostAllocated scoring strategy. This cluster runs the default scoring (LeastAllocated-leaning), which prefers the emptiest node — so pods evicted off an under-used node tend to land on another under-used node (including the CapacityBuffer's warm spare), instead of packing onto the busy baseline. Without MostAllocated, consolidation converges slowly or can ping-pong; the v1 descheduler policy therefore keeps HighNodeUtilization at a deliberately low threshold (15%) with tight eviction caps.
Proposed direction
Set the scheduler's NodeResourcesFit scoring to MostAllocated declaratively via Talos machine config (cluster.scheduler.config — a KubeSchedulerConfiguration patch under talos/cluster/), the same mechanism as the existing cluster patches (e.g. terminated-pod-gc.yaml).
Trade-off to assess before shipping: packing reduces natural replica spread. Mitigations already in place: HA-critical components carry explicit topologySpreadConstraints + PDBs (coredns, kyverno, snapshot-controller, …), PodTopologySpread scoring still applies, and the descheduler's RemoveDuplicates/RemovePodsViolatingTopologySpreadConstraint strategies actively repair clumping. The rollout should verify Coroot raises no new single-node-co-location findings for HA tiers.
Once shipped, raise the descheduler's HighNodeUtilization thresholds (e.g. 15% → 25–30%) so consolidation actually bites.
Rough size
Small manifest change (one Talos cluster patch + a descheduler threshold bump), but behavioral: needs a careful rollout window and a Coroot observation pass. Not a blocker for the descheduler itself — v1 delivers scoring-independent rebalancing value on its own.
Part of #2043. Enables the full consolidation goal of #2468.
Problem
The descheduler's
HighNodeUtilizationconsolidation strategy (being introduced for #2468) is upstream-documented to pair with the kube-scheduler'sMostAllocatedscoring strategy. This cluster runs the default scoring (LeastAllocated-leaning), which prefers the emptiest node — so pods evicted off an under-used node tend to land on another under-used node (including the CapacityBuffer's warm spare), instead of packing onto the busy baseline. WithoutMostAllocated, consolidation converges slowly or can ping-pong; the v1 descheduler policy therefore keepsHighNodeUtilizationat a deliberately low threshold (15%) with tight eviction caps.Proposed direction
Set the scheduler's
NodeResourcesFitscoring toMostAllocateddeclaratively via Talos machine config (cluster.scheduler.config— aKubeSchedulerConfigurationpatch undertalos/cluster/), the same mechanism as the existing cluster patches (e.g.terminated-pod-gc.yaml).Trade-off to assess before shipping: packing reduces natural replica spread. Mitigations already in place: HA-critical components carry explicit
topologySpreadConstraints+ PDBs (coredns, kyverno, snapshot-controller, …),PodTopologySpreadscoring still applies, and the descheduler'sRemoveDuplicates/RemovePodsViolatingTopologySpreadConstraintstrategies actively repair clumping. The rollout should verify Coroot raises no new single-node-co-location findings for HA tiers.Once shipped, raise the descheduler's
HighNodeUtilizationthresholds (e.g. 15% → 25–30%) so consolidation actually bites.Rough size
Small manifest change (one Talos cluster patch + a descheduler threshold bump), but behavioral: needs a careful rollout window and a Coroot observation pass. Not a blocker for the descheduler itself — v1 delivers scoring-independent rebalancing value on its own.
Part of #2043. Enables the full consolidation goal of #2468.