Skip to content

CCDB-4523: Strip config.providers from kafka-ready preflight check#1667

Open
kunal sevkani (kunalmnnit) wants to merge 13 commits into
masterfrom
fix/ccdb-4523-resilient-config-provider-preflight
Open

CCDB-4523: Strip config.providers from kafka-ready preflight check#1667
kunal sevkani (kunalmnnit) wants to merge 13 commits into
masterfrom
fix/ccdb-4523-resilient-config-provider-preflight

Conversation

@kunalmnnit
Copy link
Copy Markdown
Member

@kunalmnnit kunal sevkani (kunalmnnit) commented May 6, 2026

Summary

When CUB_KAFKA_READY_SKIP_CONFIG_PROVIDERS=true is set, strips config.providers entries from the worker config before running the kafka-ready preflight check. Default behavior is unchanged.

Problem

When a Connect worker config includes config.providers entries (e.g. for AWS Secrets Manager) and the security protocol is non-PLAINTEXT, the kafka-ready preflight fails with ClassNotFoundException because config provider JARs are on the worker plugin.path but not on CUB_CLASSPATH.

Fix

Simple flag-gated approach:

  • Flag off (default): Original behavior, no change
  • Flag on: Strip config.providers.* entries from a copy of the config before passing to ClusterStatus.isKafkaReady(). The preflight only needs Kafka client properties (bootstrap.servers, security., ssl., sasl.*) to verify broker connectivity.

Usage

In Dockerfile or K8s pod spec:

ENV CUB_KAFKA_READY_SKIP_CONFIG_PROVIDERS=true

Test plan

  • 4 unit tests for stripConfigProviders: removes entries, returns original when none, multiple providers, does not remove similarly-named keys
  • Local Docker repro confirms SASL_SSL + config providers fails without fix and passes with flag

Fixes: https://confluentinc.atlassian.net/browse/CCDB-4523
Related: https://confluentinc.atlassian.net/browse/DP-6628

When the Connect worker config includes config.providers entries (e.g.
for AWS Secrets Manager), the kafka-ready preflight check fails with
ClassNotFoundException because the config provider plugin JARs are on
the worker's plugin.path but not on the CUB/UB classpath.

The kafka-ready check only needs Kafka client properties (bootstrap
servers, security config) to verify broker connectivity. Config
providers are irrelevant for this check, so strip them before creating
the AdminClient. Removed entries are logged at WARN level.
Copilot AI review requested due to automatic review settings May 6, 2026 07:13
@kunalmnnit kunal sevkani (kunalmnnit) requested a review from a team as a code owner May 6, 2026 07:13
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the kafka-ready CLI preflight check to ignore Kafka Connect config provider properties (which can trigger ClassNotFoundException during AdminClient initialization when provider JARs aren’t on the preflight classpath), and adds unit tests for the stripping behavior.

Changes:

  • Strip config.providers and config.providers.* keys from the loaded --config properties before running the readiness check.
  • Add WARN logging to make stripped properties visible to operators.
  • Add a new unit test class covering the stripping behavior across multiple scenarios.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Strips config-provider properties prior to calling the readiness check; adds operator-facing WARN logs.
utility-belt/src/test/java/io/confluent/admin/utils/cli/KafkaReadyCommandTest.java Adds unit tests for config-provider stripping behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Outdated
Comment thread utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Outdated
…ass loading errors

When the Connect worker config includes config.providers entries (e.g.
for AWS Secrets Manager), the kafka-ready preflight check fails with
ClassNotFoundException because the config provider plugin JARs are on
the worker's plugin.path but not on the CUB/UB classpath.

The fix uses a try/catch/retry strategy:
1. Try the check with the full config (works if providers are on classpath)
2. On ConfigException from class loading, strip config.providers entries
3. If remaining properties have unresolved ${provider:...} variable
   references (security props depend on config providers), skip the
   check with a WARN — the Connect worker will verify connectivity
   on startup
4. Otherwise, retry the check without config.providers
…k trace

- Consolidate per-key WARN logs into a single message with count + key list
- Replace Unicode em dash with plain hyphen for ASCII-safe log parsing
- Include exception stack trace in config provider load failure warning
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Outdated
Comment thread utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Outdated
- Tighten CONFIG_PROVIDER_VAR_PATTERN to require at least one colon
  separator (${provider:path[:key]}), avoiding false matches on
  non-provider ${...} placeholders like ${ENV_VAR}
- Check exception cause chain for ClassNotFoundException /
  NoClassDefFoundError instead of brittle message string matching;
  keep message check as fallback
- Add 4 new tests: ClassNotFoundException cause, NoClassDefFoundError
  cause, nested cause chain, non-provider placeholder filtering
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Outdated
…, cause chain check

- Extract ClusterStatus.isKafkaReady behind KafkaReadyChecker functional
  interface so checkKafkaReadyWithConfigProviderResilience orchestration
  logic can be unit tested with mocked checkers
- Add 5 orchestration tests covering all scenarios: no config providers,
  providers loadable (first try succeeds), class not found with literal
  props (strip + retry), class not found with unresolved vars (skip),
  non-provider ConfigException (rethrow)
- Tighten CONFIG_PROVIDER_VAR_PATTERN to require at least one colon
  separator, avoiding false matches on non-provider ${...} placeholders
- Check exception cause chain for ClassNotFoundException /
  NoClassDefFoundError instead of brittle message string matching;
  keep message check as fallback for truncated cause chains
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Outdated
- Move ConfigException import into org.apache.kafka.* group for
  consistent import ordering
- Require exception message to mention "config.providers" before
  checking cause chain for ClassNotFoundException/NoClassDefFoundError,
  so SASL/SSL handler class loading failures are not misclassified
- Add tests for SASL and SSL CNFE cases that must NOT match
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Outdated
- Fix 2 test failures: mock ConfigException messages now include
  "config.providers" to match the tightened isConfigProviderLoadFailure
  predicate from round 3
- Limit findUnresolvedConfigProviderVars to Kafka client connectivity
  keys only (security.*, sasl.*, ssl.*, bootstrap.servers) so
  unresolved references in non-client keys (e.g. connector configs)
  don't cause the preflight to be skipped unnecessarily
- Add test for non-client keys being ignored
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

…s in message

The resilience_classNotFound_unresolvedVars_skipsWithWarning test was
still using a bare "Could not load" message without "config.providers",
which the tightened isConfigProviderLoadFailure predicate rejects.
Convert KafkaReadyCommandTest from JUnit 5 to JUnit 4 for consistency
with the rest of the module. Add junit-vintage-engine dependency so
JUnit 4 tests are discovered by the JUnit Platform surefire provider.
…ures

The vintage engine causes surefire to discover existing JUnit 4 tests
that have a broken EmbeddedKafkaCluster setup. Removing it keeps the
PR scoped to the kafka-ready fix without inheriting unrelated failures.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Outdated
- Operate on a copy of workerProps when stripping config.providers,
  so the caller's map is never mutated
- Update javadoc to document fast-path and no-mutation guarantee
- Update test assertion to verify original map is untouched
Replace the complex try/catch/retry/unresolved-var logic with a simple
approach: when CUB_KAFKA_READY_SKIP_CONFIG_PROVIDERS=true, strip all
config.providers entries from the config before running the check.
When not set (default), original behavior is preserved.

This is safe because the kafka-ready preflight only needs Kafka client
properties (bootstrap.servers, security.*, ssl.*, sasl.*) to verify
broker connectivity. Config providers are only needed by the Connect
worker at runtime.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Outdated
Comment thread utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java Outdated
…og line

- Check for both config.providers and config.providers.* keys before
  returning early, so prefix-only entries are also stripped
- Rename isConfigProviderResilienceEnabled to isSkipConfigProvidersEnabled
  to match the env var name
- Replace per-key WARN with single INFO summarizing count of stripped keys
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

static Map<String, String> stripConfigProviders(Map<String, String> props) {
boolean hasProviderKeys = false;
for (String key : props.keySet()) {
if (key.equals(CONFIG_PROVIDERS_PREFIX) || key.startsWith(CONFIG_PROVIDERS_PREFIX + ".")) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this a redundant check?
can this be replaced with

if (key.startsWith(CONFIG_PROVIDERS_PREFIX))

Iterator<String> it = result.keySet().iterator();
while (it.hasNext()) {
String key = it.next();
if (key.equals(CONFIG_PROVIDERS_PREFIX) || key.startsWith(CONFIG_PROVIDERS_PREFIX + ".")) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants