Skip to content

fix: retain draining brokers in external listener config until CruiseControl completes#252

Open
dobrerazvan wants to merge 1 commit into
masterfrom
feat/draining-broker-listener-retention
Open

fix: retain draining brokers in external listener config until CruiseControl completes#252
dobrerazvan wants to merge 1 commit into
masterfrom
feat/draining-broker-listener-retention

Conversation

@dobrerazvan
Copy link
Copy Markdown

@dobrerazvan dobrerazvan commented May 13, 2026

Brokers removed from the KafkaCluster spec were immediately excluded from envoy, istio, and contour config, even while CruiseControl was still draining them. Clients lost connectivity to brokers that still held partition leaders.

Root cause: ShouldIncludeBroker() returned false when brokerConfig==nil (broker not in spec). Add a fallback path: when brokerConfig is nil, check the broker's CruiseControlState in status. If the state is an active downscale (IsDownscale && !IsSucceeded) and the broker was previously bound to the requested ingressConfig, keep it in the external listener resources.

Brokers stuck in CompletedWithError or Paused are also retained, allowing manual investigation while keeping client connectivity. ShouldIncludeBroker is the single gatekeeper for all external listener reconcilers (envoy, istio, contour), so no other files need changes.

Description

Please provide a meaningful description of what this change will do, or is for. Bonus points for including links to
related issues, other PRs, or technical references.

Note that by not including a description, you are asking reviewers to do extra work to understand the context of this
change, which may lead to your PR taking much longer to review, or result in it not being reviewed at all.

Type of Change

  • Bug Fix
  • New Feature
  • Breaking Change
  • Refactor
  • Documentation
  • Other (please describe)

Checklist

  • I have read the contributing guidelines
  • Existing issues have been referenced (where applicable)
  • I have verified this change is not present in other open pull requests
  • Functionality is documented
  • All code style checks pass
  • New code contribution is covered by automated tests
  • All new and existing tests pass

…Control completes

Brokers removed from the KafkaCluster spec were immediately excluded
from envoy, istio, and contour config, even while CruiseControl was
still draining them. Clients lost connectivity to brokers that still
held partition leaders.

Root cause: ShouldIncludeBroker() returned false when brokerConfig==nil
(broker not in spec). Add a fallback path: when brokerConfig is nil,
check the broker's CruiseControlState in status. If the state is an
active downscale (IsDownscale && !IsSucceeded) and the broker was
previously bound to the requested ingressConfig, keep it in the
external listener resources.

Brokers stuck in CompletedWithError or Paused are also retained,
allowing manual investigation while keeping client connectivity.
ShouldIncludeBroker is the single gatekeeper for all external listener
reconcilers (envoy, istio, contour), so no other files need changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant