CCDB-4523: Strip config.providers from kafka-ready preflight check#1667
Open
kunal sevkani (kunalmnnit) wants to merge 13 commits into
Open
CCDB-4523: Strip config.providers from kafka-ready preflight check#1667kunal sevkani (kunalmnnit) wants to merge 13 commits into
kunal sevkani (kunalmnnit) wants to merge 13 commits into
Conversation
When the Connect worker config includes config.providers entries (e.g. for AWS Secrets Manager), the kafka-ready preflight check fails with ClassNotFoundException because the config provider plugin JARs are on the worker's plugin.path but not on the CUB/UB classpath. The kafka-ready check only needs Kafka client properties (bootstrap servers, security config) to verify broker connectivity. Config providers are irrelevant for this check, so strip them before creating the AdminClient. Removed entries are logged at WARN level.
There was a problem hiding this comment.
Pull request overview
Updates the kafka-ready CLI preflight check to ignore Kafka Connect config provider properties (which can trigger ClassNotFoundException during AdminClient initialization when provider JARs aren’t on the preflight classpath), and adds unit tests for the stripping behavior.
Changes:
- Strip
config.providersandconfig.providers.*keys from the loaded--configproperties before running the readiness check. - Add WARN logging to make stripped properties visible to operators.
- Add a new unit test class covering the stripping behavior across multiple scenarios.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| utility-belt/src/main/java/io/confluent/admin/utils/cli/KafkaReadyCommand.java | Strips config-provider properties prior to calling the readiness check; adds operator-facing WARN logs. |
| utility-belt/src/test/java/io/confluent/admin/utils/cli/KafkaReadyCommandTest.java | Adds unit tests for config-provider stripping behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ass loading errors
When the Connect worker config includes config.providers entries (e.g.
for AWS Secrets Manager), the kafka-ready preflight check fails with
ClassNotFoundException because the config provider plugin JARs are on
the worker's plugin.path but not on the CUB/UB classpath.
The fix uses a try/catch/retry strategy:
1. Try the check with the full config (works if providers are on classpath)
2. On ConfigException from class loading, strip config.providers entries
3. If remaining properties have unresolved ${provider:...} variable
references (security props depend on config providers), skip the
check with a WARN — the Connect worker will verify connectivity
on startup
4. Otherwise, retry the check without config.providers
…k trace - Consolidate per-key WARN logs into a single message with count + key list - Replace Unicode em dash with plain hyphen for ASCII-safe log parsing - Include exception stack trace in config provider load failure warning
- Tighten CONFIG_PROVIDER_VAR_PATTERN to require at least one colon
separator (${provider:path[:key]}), avoiding false matches on
non-provider ${...} placeholders like ${ENV_VAR}
- Check exception cause chain for ClassNotFoundException /
NoClassDefFoundError instead of brittle message string matching;
keep message check as fallback
- Add 4 new tests: ClassNotFoundException cause, NoClassDefFoundError
cause, nested cause chain, non-provider placeholder filtering
…, cause chain check
- Extract ClusterStatus.isKafkaReady behind KafkaReadyChecker functional
interface so checkKafkaReadyWithConfigProviderResilience orchestration
logic can be unit tested with mocked checkers
- Add 5 orchestration tests covering all scenarios: no config providers,
providers loadable (first try succeeds), class not found with literal
props (strip + retry), class not found with unresolved vars (skip),
non-provider ConfigException (rethrow)
- Tighten CONFIG_PROVIDER_VAR_PATTERN to require at least one colon
separator, avoiding false matches on non-provider ${...} placeholders
- Check exception cause chain for ClassNotFoundException /
NoClassDefFoundError instead of brittle message string matching;
keep message check as fallback for truncated cause chains
- Move ConfigException import into org.apache.kafka.* group for consistent import ordering - Require exception message to mention "config.providers" before checking cause chain for ClassNotFoundException/NoClassDefFoundError, so SASL/SSL handler class loading failures are not misclassified - Add tests for SASL and SSL CNFE cases that must NOT match
- Fix 2 test failures: mock ConfigException messages now include "config.providers" to match the tightened isConfigProviderLoadFailure predicate from round 3 - Limit findUnresolvedConfigProviderVars to Kafka client connectivity keys only (security.*, sasl.*, ssl.*, bootstrap.servers) so unresolved references in non-client keys (e.g. connector configs) don't cause the preflight to be skipped unnecessarily - Add test for non-client keys being ignored
…s in message The resilience_classNotFound_unresolvedVars_skipsWithWarning test was still using a bare "Could not load" message without "config.providers", which the tightened isConfigProviderLoadFailure predicate rejects.
Convert KafkaReadyCommandTest from JUnit 5 to JUnit 4 for consistency with the rest of the module. Add junit-vintage-engine dependency so JUnit 4 tests are discovered by the JUnit Platform surefire provider.
…ures The vintage engine causes surefire to discover existing JUnit 4 tests that have a broken EmbeddedKafkaCluster setup. Removing it keeps the PR scoped to the kafka-ready fix without inheriting unrelated failures.
- Operate on a copy of workerProps when stripping config.providers, so the caller's map is never mutated - Update javadoc to document fast-path and no-mutation guarantee - Update test assertion to verify original map is untouched
Replace the complex try/catch/retry/unresolved-var logic with a simple approach: when CUB_KAFKA_READY_SKIP_CONFIG_PROVIDERS=true, strip all config.providers entries from the config before running the check. When not set (default), original behavior is preserved. This is safe because the kafka-ready preflight only needs Kafka client properties (bootstrap.servers, security.*, ssl.*, sasl.*) to verify broker connectivity. Config providers are only needed by the Connect worker at runtime.
…og line - Check for both config.providers and config.providers.* keys before returning early, so prefix-only entries are also stripped - Rename isConfigProviderResilienceEnabled to isSkipConfigProvidersEnabled to match the env var name - Replace per-key WARN with single INFO summarizing count of stripped keys
| static Map<String, String> stripConfigProviders(Map<String, String> props) { | ||
| boolean hasProviderKeys = false; | ||
| for (String key : props.keySet()) { | ||
| if (key.equals(CONFIG_PROVIDERS_PREFIX) || key.startsWith(CONFIG_PROVIDERS_PREFIX + ".")) { |
There was a problem hiding this comment.
isn't this a redundant check?
can this be replaced with
if (key.startsWith(CONFIG_PROVIDERS_PREFIX))
| Iterator<String> it = result.keySet().iterator(); | ||
| while (it.hasNext()) { | ||
| String key = it.next(); | ||
| if (key.equals(CONFIG_PROVIDERS_PREFIX) || key.startsWith(CONFIG_PROVIDERS_PREFIX + ".")) { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When
CUB_KAFKA_READY_SKIP_CONFIG_PROVIDERS=trueis set, stripsconfig.providersentries from the worker config before running the kafka-ready preflight check. Default behavior is unchanged.Problem
When a Connect worker config includes
config.providersentries (e.g. for AWS Secrets Manager) and the security protocol is non-PLAINTEXT, thekafka-readypreflight fails withClassNotFoundExceptionbecause config provider JARs are on the workerplugin.pathbut not onCUB_CLASSPATH.Fix
Simple flag-gated approach:
config.providers.*entries from a copy of the config before passing toClusterStatus.isKafkaReady(). The preflight only needs Kafka client properties (bootstrap.servers, security., ssl., sasl.*) to verify broker connectivity.Usage
In Dockerfile or K8s pod spec:
Test plan
stripConfigProviders: removes entries, returns original when none, multiple providers, does not remove similarly-named keysFixes: https://confluentinc.atlassian.net/browse/CCDB-4523
Related: https://confluentinc.atlassian.net/browse/DP-6628