Konnectivity retry proxy connection on timeout#8579
Konnectivity retry proxy connection on timeout#8579YamunadeviShanmugam wants to merge 3 commits into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Skipping CI for Draft Pull Request. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR adds a resilient guard for the Konnectivity SOCKS5 proxy that bootstraps a Konnectivity dialer, verifies TCP reachability, starts the SOCKS5 server, and retries on transient errors using exponential backoff. It moves proxy startup to a guarded signal-aware flow in main, replaces panic-based startup with error propagation, adds TCP StartupProbes for the konnectivity container (defaulting serving port to 8090 when unset), and declares the konnectivity agent as an explicit dependency of the OAuth API server component. Extensive unit and integration-style tests are included. Sequence Diagram(s)sequenceDiagram
participant SignalHandler
participant GuardLoop as KonnectivityGuard
participant K8sAPI as KubernetesAPI
participant Dialer as KonnectivityDialer
participant Socks5 as SOCKS5Server
SignalHandler->>GuardLoop: start(ctx)
GuardLoop->>K8sAPI: load kube config
GuardLoop->>Dialer: tryBootstrapKonnectivity()
Dialer->>K8sAPI: use controller-runtime client
Dialer->>KonnectivityServer: dial TCP host:port (verify)
GuardLoop->>Socks5: start(server, Dialer)
Socks5-->>GuardLoop: exit error
GuardLoop->>GuardLoop: isTransientKonnectivityError?
GuardLoop->>GuardLoop: backoff & retry (on transient)
Suggested reviewers
🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: YamunadeviShanmugam The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
The proxy was exiting hard on startup timeouts. This change adds exponential backoff retry logic with startup probes.
e81da62 to
c1f6355
Compare
|
/test all |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8579 +/- ##
==========================================
+ Coverage 40.37% 40.62% +0.25%
==========================================
Files 755 756 +1
Lines 93227 93395 +168
==========================================
+ Hits 37644 37946 +302
+ Misses 52879 52736 -143
- Partials 2704 2713 +9
... and 3 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
konnectivity-socks5-proxy/guard_test.go (2)
28-58: ⚡ Quick winConsider refactoring to a table-driven test.
The test cases all follow the same structure (create error, call
isTransientKonnectivityError, assert result). Table-driven tests improve maintainability and make it easier to add new cases. As per coding guidelines, prefer table-driven tests where possible.♻️ Proposed table-driven refactor
func TestIsTransientKonnectivityError(t *testing.T) { - t.Run("When error is nil, it should not be transient", func(t *testing.T) { - g := NewGomegaWithT(t) - g.Expect(isTransientKonnectivityError(nil)).To(BeFalse()) - }) - - t.Run("When error is context deadline exceeded, it should be transient", func(t *testing.T) { - g := NewGomegaWithT(t) - g.Expect(isTransientKonnectivityError(context.DeadlineExceeded)).To(BeTrue()) - }) - - t.Run("When error is connection refused, it should be transient", func(t *testing.T) { - g := NewGomegaWithT(t) - g.Expect(isTransientKonnectivityError(fmt.Errorf("dial tcp: connection refused"))).To(BeTrue()) - }) - - t.Run("When error is syscall ECONNREFUSED, it should be transient", func(t *testing.T) { - g := NewGomegaWithT(t) - g.Expect(isTransientKonnectivityError(syscall.ECONNREFUSED)).To(BeTrue()) - }) - - t.Run("When error is validation failure, it should not be transient", func(t *testing.T) { - g := NewGomegaWithT(t) - g.Expect(isTransientKonnectivityError(errors.New("failed validation: KonnectivityHost is required"))).To(BeFalse()) - }) - - t.Run("When error is file not found, it should not be transient", func(t *testing.T) { - g := NewGomegaWithT(t) - g.Expect(isTransientKonnectivityError(os.ErrNotExist)).To(BeFalse()) - }) + testCases := []struct { + name string + err error + isTransient bool + }{ + { + name: "When error is nil, it should not be transient", + err: nil, + isTransient: false, + }, + { + name: "When error is context deadline exceeded, it should be transient", + err: context.DeadlineExceeded, + isTransient: true, + }, + { + name: "When error is connection refused, it should be transient", + err: fmt.Errorf("dial tcp: connection refused"), + isTransient: true, + }, + { + name: "When error is syscall ECONNREFUSED, it should be transient", + err: syscall.ECONNREFUSED, + isTransient: true, + }, + { + name: "When error is validation failure, it should not be transient", + err: errors.New("failed validation: KonnectivityHost is required"), + isTransient: false, + }, + { + name: "When error is file not found, it should not be transient", + err: os.ErrNotExist, + isTransient: false, + }, + } + + for _, tc := range testCases { + t.Run(tc.name, func(t *testing.T) { + g := NewGomegaWithT(t) + if tc.isTransient { + g.Expect(isTransientKonnectivityError(tc.err)).To(BeTrue()) + } else { + g.Expect(isTransientKonnectivityError(tc.err)).To(BeFalse()) + } + }) + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@konnectivity-socks5-proxy/guard_test.go` around lines 28 - 58, Refactor TestIsTransientKonnectivityError into a table-driven test: create a slice of cases with fields name, err and want (bool) containing the current scenarios (nil, context.DeadlineExceeded, fmt.Errorf("dial tcp: connection refused"), syscall.ECONNREFUSED, errors.New("failed validation..."), os.ErrNotExist), then loop over cases calling t.Run(case.name, func(t *testing.T){ g := NewGomegaWithT(t); g.Expect(isTransientKonnectivityError(case.err)).To(Equal(case.want)) }); keep the test function name TestIsTransientKonnectivityError and reuse isTransientKonnectivityError and NewGomegaWithT to preserve existing assertions.
95-103: ⚡ Quick winAvoid mutating package-level state in tests.
Modifying the package-level
dialKonnectivityServervariable violates the guideline to avoid global state in tests. While the defer restores the original value, this pattern is unsafe for parallel test execution and can cause race conditions. As per coding guidelines, tests should run with parallel execution enabled.Consider refactoring the code under test to accept the dialer as a parameter (dependency injection) rather than relying on a package-level variable. This would make tests safer, support parallel execution, and improve testability.
Example approach:
// In guard.go, make dialKonnectivityServer a parameter or field func retryDialKonnectivityServer(ctx context.Context, log logr.Logger, host string, port uint32, dialFunc func(string, uint32) error) error { // ... use dialFunc instead of global dialKonnectivityServer } // In test func TestDialKonnectivityServerRetries(t *testing.T) { attempts := 0 mockDial := func(host string, port uint32) error { attempts++ if attempts < 2 { return fmt.Errorf("dial tcp: connection refused") } return dialKonnectivityServerTCP(host, port) } err := retryDialKonnectivityServer(ctx, log, "127.0.0.1", port, mockDial) // ... assertions }As per coding guidelines, avoid global state in tests and run tests with parallel execution.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@konnectivity-socks5-proxy/guard_test.go` around lines 95 - 103, The test mutates the package-level dialKonnectivityServer which is unsafe for parallel tests; refactor retryDialKonnectivityServer (in guard.go) to accept a dial function parameter (e.g., dialFunc func(string,uint32) error or as a field on a struct) and update the test to call retryDialKonnectivityServer with a local mockDial closure that increments attempts and returns the simulated error on the first call and delegates to the real dial (e.g., dialKonnectivityServerTCP) on success; remove any package-level reassignment in tests and use dependency injection to supply the mock instead.support/controlplane-component/konnectivity-container.go (1)
184-198: ⚡ Quick winRefactor to eliminate duplication and add defensive default case.
The HTTPS and Socks5 branches contain identical logic, violating DRY. Additionally, the switch lacks a default case—if
opts.Modeis neither HTTPS nor Socks5,servingPortremains zero, creating an invalid probe configuration.♻️ Proposed refactoring
- var servingPort int32 - switch opts.Mode { - case HTTPS: - if opts.HTTPSOptions.ServingPort != 0 { - servingPort = int32(opts.HTTPSOptions.ServingPort) - } else { - servingPort = 8090 - } - case Socks5: - if opts.Socks5Options.ServingPort != 0 { - servingPort = int32(opts.Socks5Options.ServingPort) - } else { - servingPort = 8090 - } - } + servingPort := int32(8090) + switch opts.Mode { + case HTTPS: + if opts.HTTPSOptions.ServingPort != 0 { + servingPort = int32(opts.HTTPSOptions.ServingPort) + } + case Socks5: + if opts.Socks5Options.ServingPort != 0 { + servingPort = int32(opts.Socks5Options.ServingPort) + } + default: + // Should never occur given validation at line 81-84, but defensive. + panic(fmt.Sprintf("unexpected konnectivity proxy mode: %s", opts.Mode)) + }As per coding guidelines, follow the 100-go-mistakes rules: prefer eliminating code duplication and using defensive programming patterns.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@support/controlplane-component/konnectivity-container.go` around lines 184 - 198, The servingPort selection duplicates identical logic for HTTPS and Socks5 and lacks a default branch, so update the code that sets servingPort (variable servingPort using opts.Mode, HTTPS, Socks5, opts.HTTPSOptions.ServingPort, opts.Socks5Options.ServingPort) to remove duplication by first selecting the candidate port (e.g., from opts.HTTPSOptions.ServingPort or opts.Socks5Options.ServingPort based on opts.Mode) and then applying a single fallback to 8090 if the candidate is zero; also add a defensive default for the switch (or handle unknown modes) to ensure servingPort is always set to a valid default.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@konnectivity-socks5-proxy/guard.go`:
- Around line 59-60: The retry backoff currently uses time.Sleep(delay()) which
ignores ctx cancellation; update both retry sites in guard.go (where
time.Sleep(delay()) is called) to use a cancellable wait: replace
time.Sleep(delay()) with a select that waits on time.After(delay()) and
ctx.Done(), returning or breaking when ctx is cancelled. Ensure the surrounding
function (the loop that calls delay()) respects ctx cancellation (e.g., check
ctx.Err() or return on ctx.Done()) so retries exit immediately when the provided
ctx is cancelled.
- Around line 30-35: coreGuardBackoff currently has Steps==0 so DelayFunc
returns a constant delay and exponential growth never happens; update
coreGuardBackoff to set a positive Steps (e.g., 5 or 10) so Factor and Cap take
effect, and replace every time.Sleep(delay()) call (where delay is produced by
coreGuardBackoff.DelayFunc or similar) with a ctx-aware wait: compute delay :=
delay() then use select { case <-ctx.Done(): return ctx.Err() (or break out
appropriately); case <-time.After(delay): /* continue retry */ } so retries exit
promptly on cancellation; reference coreGuardBackoff, DelayFunc and the sites
using time.Sleep(delay()) when making these edits.
In `@konnectivity-socks5-proxy/main.go`:
- Around line 55-71: signals.SetupSignalHandler() cancels ctx but the current
runWithCoreGuard call blocks on server.ListenAndServe and never observes
ctx.Done(), so SIGTERM doesn't stop the proxy and context cancellation becomes a
fatal error; change the socks5 server start to create a net.Listener (net.Listen
with fmt.Sprintf(":%d", servingPort)), run server.Serve(listener) in a
goroutine, and in the main goroutine select on ctx.Done() vs serve result: on
ctx.Done() call listener.Close() and wait for Serve to return and then return
nil (treat context cancellation as clean shutdown), and if Serve returns a
non-context error propagate it as before; update the block that constructs
socks5.Config and calls server.ListenAndServe to use this listener/Serve + ctx
cancellation handling so signals.SetupSignalHandler(), runWithCoreGuard,
server.Serve, and listener.Close() coordinate graceful shutdown.
---
Nitpick comments:
In `@konnectivity-socks5-proxy/guard_test.go`:
- Around line 28-58: Refactor TestIsTransientKonnectivityError into a
table-driven test: create a slice of cases with fields name, err and want (bool)
containing the current scenarios (nil, context.DeadlineExceeded,
fmt.Errorf("dial tcp: connection refused"), syscall.ECONNREFUSED,
errors.New("failed validation..."), os.ErrNotExist), then loop over cases
calling t.Run(case.name, func(t *testing.T){ g := NewGomegaWithT(t);
g.Expect(isTransientKonnectivityError(case.err)).To(Equal(case.want)) }); keep
the test function name TestIsTransientKonnectivityError and reuse
isTransientKonnectivityError and NewGomegaWithT to preserve existing assertions.
- Around line 95-103: The test mutates the package-level dialKonnectivityServer
which is unsafe for parallel tests; refactor retryDialKonnectivityServer (in
guard.go) to accept a dial function parameter (e.g., dialFunc
func(string,uint32) error or as a field on a struct) and update the test to call
retryDialKonnectivityServer with a local mockDial closure that increments
attempts and returns the simulated error on the first call and delegates to the
real dial (e.g., dialKonnectivityServerTCP) on success; remove any package-level
reassignment in tests and use dependency injection to supply the mock instead.
In `@support/controlplane-component/konnectivity-container.go`:
- Around line 184-198: The servingPort selection duplicates identical logic for
HTTPS and Socks5 and lacks a default branch, so update the code that sets
servingPort (variable servingPort using opts.Mode, HTTPS, Socks5,
opts.HTTPSOptions.ServingPort, opts.Socks5Options.ServingPort) to remove
duplication by first selecting the candidate port (e.g., from
opts.HTTPSOptions.ServingPort or opts.Socks5Options.ServingPort based on
opts.Mode) and then applying a single fallback to 8090 if the candidate is zero;
also add a defensive default for the switch (or handle unknown modes) to ensure
servingPort is always set to a valid default.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 56b1509a-1ce4-4bfd-9308-e453f94028b4
⛔ Files ignored due to path filters (40)
control-plane-operator/controllers/hostedcontrolplane/testdata/catalog-operator/AROSwift/zz_fixture_TestControlPlaneComponents_catalog_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/catalog-operator/GCP/zz_fixture_TestControlPlaneComponents_catalog_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/catalog-operator/IBMCloud/zz_fixture_TestControlPlaneComponents_catalog_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/catalog-operator/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_catalog_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/catalog-operator/zz_fixture_TestControlPlaneComponents_catalog_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-network-operator/AROSwift/zz_fixture_TestControlPlaneComponents_cluster_network_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-network-operator/GCP/zz_fixture_TestControlPlaneComponents_cluster_network_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-network-operator/IBMCloud/zz_fixture_TestControlPlaneComponents_cluster_network_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-network-operator/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_cluster_network_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-network-operator/zz_fixture_TestControlPlaneComponents_cluster_network_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/ingress-operator/AROSwift/zz_fixture_TestControlPlaneComponents_ingress_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/ingress-operator/GCP/zz_fixture_TestControlPlaneComponents_ingress_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/ingress-operator/IBMCloud/zz_fixture_TestControlPlaneComponents_ingress_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/ingress-operator/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_ingress_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/ingress-operator/zz_fixture_TestControlPlaneComponents_ingress_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/oauth-openshift/AROSwift/zz_fixture_TestControlPlaneComponents_oauth_openshift_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/oauth-openshift/GCP/zz_fixture_TestControlPlaneComponents_oauth_openshift_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/oauth-openshift/IBMCloud/zz_fixture_TestControlPlaneComponents_oauth_openshift_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/oauth-openshift/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_oauth_openshift_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/oauth-openshift/zz_fixture_TestControlPlaneComponents_oauth_openshift_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/olm-operator/AROSwift/zz_fixture_TestControlPlaneComponents_olm_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/olm-operator/GCP/zz_fixture_TestControlPlaneComponents_olm_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/olm-operator/IBMCloud/zz_fixture_TestControlPlaneComponents_olm_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/olm-operator/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_olm_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/olm-operator/zz_fixture_TestControlPlaneComponents_olm_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-apiserver/AROSwift/zz_fixture_TestControlPlaneComponents_openshift_apiserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-apiserver/GCP/zz_fixture_TestControlPlaneComponents_openshift_apiserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-apiserver/IBMCloud/zz_fixture_TestControlPlaneComponents_openshift_apiserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-apiserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_openshift_apiserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-apiserver/zz_fixture_TestControlPlaneComponents_openshift_apiserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-oauth-apiserver/AROSwift/zz_fixture_TestControlPlaneComponents_openshift_oauth_apiserver_controlplanecomponent.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-oauth-apiserver/GCP/zz_fixture_TestControlPlaneComponents_openshift_oauth_apiserver_controlplanecomponent.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-oauth-apiserver/IBMCloud/zz_fixture_TestControlPlaneComponents_openshift_oauth_apiserver_controlplanecomponent.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-oauth-apiserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_openshift_oauth_apiserver_controlplanecomponent.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-oauth-apiserver/zz_fixture_TestControlPlaneComponents_openshift_oauth_apiserver_controlplanecomponent.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/AROSwift/zz_fixture_TestControlPlaneComponents_packageserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/GCP/zz_fixture_TestControlPlaneComponents_packageserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/IBMCloud/zz_fixture_TestControlPlaneComponents_packageserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_packageserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/zz_fixture_TestControlPlaneComponents_packageserver_deployment.yamlis excluded by!**/testdata/**
📒 Files selected for processing (5)
control-plane-operator/controllers/hostedcontrolplane/v2/oauth_apiserver/component.gokonnectivity-socks5-proxy/guard.gokonnectivity-socks5-proxy/guard_test.gokonnectivity-socks5-proxy/main.gosupport/controlplane-component/konnectivity-container.go
|
@YamunadeviShanmugam: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
konnectivity-socks5-proxy/guard_test.go (1)
79-110: 🏗️ Heavy liftTest the production retry path instead of a local clone.
retryDialKonnectivityServerreimplements its own loop, so this test can stay green whilebootstrapKonnectivityorrunWithCoreGuardregress. Prefer injecting the dial/bootstrap dependency into the real code path and asserting retries there.As per coding guidelines, "Always include unit tests when creating new functions or modifying existing ones" and "Unit test any code changes and additions."
Also applies to: 113-131
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@konnectivity-socks5-proxy/guard_test.go` around lines 79 - 110, The test currently calls retryDialKonnectivityServer directly and stubs dialKonnectivityServer, which doesn't exercise the real production path; refactor the code to inject the dialing/bootstrap dependency (e.g., accept a DialFunc or a KonnectivityBootstrap interface) into bootstrapKonnectivity/runWithCoreGuard and update the test to call the real entrypoint (bootstrapKonnectivity or runWithCoreGuard) with a test dial implementation that fails once then succeeds, assert the entrypoint eventually succeeds and that the injected dial was invoked multiple times; keep references to dialKonnectivityServer, retryDialKonnectivityServer, bootstrapKonnectivity, and runWithCoreGuard to locate and change signatures, and ensure tests restore any global state and use context timeouts as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@konnectivity-socks5-proxy/guard.go`:
- Around line 36-37: Thread a context through bootstrapKonnectivity →
tryBootstrapKonnectivity → dialKonnectivityServerTCP so the TCP probe is
cancellable: add a ctx parameter to bootstrapKonnectivity and propagate it into
tryBootstrapKonnectivity and dialKonnectivityServerTCP, then replace the
net.DialTimeout(...) call inside dialKonnectivityServerTCP with
(&net.Dialer{Timeout: konnectivityDialTimeout}).DialContext(ctx, network, addr)
so the in-progress probe is aborted when ctx is cancelled; update all callers to
pass the ctx accordingly.
---
Nitpick comments:
In `@konnectivity-socks5-proxy/guard_test.go`:
- Around line 79-110: The test currently calls retryDialKonnectivityServer
directly and stubs dialKonnectivityServer, which doesn't exercise the real
production path; refactor the code to inject the dialing/bootstrap dependency
(e.g., accept a DialFunc or a KonnectivityBootstrap interface) into
bootstrapKonnectivity/runWithCoreGuard and update the test to call the real
entrypoint (bootstrapKonnectivity or runWithCoreGuard) with a test dial
implementation that fails once then succeeds, assert the entrypoint eventually
succeeds and that the injected dial was invoked multiple times; keep references
to dialKonnectivityServer, retryDialKonnectivityServer, bootstrapKonnectivity,
and runWithCoreGuard to locate and change signatures, and ensure tests restore
any global state and use context timeouts as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: a56c9242-6df5-4a69-85b4-450d2a131e1a
⛔ Files ignored due to path filters (40)
control-plane-operator/controllers/hostedcontrolplane/testdata/catalog-operator/AROSwift/zz_fixture_TestControlPlaneComponents_catalog_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/catalog-operator/GCP/zz_fixture_TestControlPlaneComponents_catalog_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/catalog-operator/IBMCloud/zz_fixture_TestControlPlaneComponents_catalog_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/catalog-operator/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_catalog_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/catalog-operator/zz_fixture_TestControlPlaneComponents_catalog_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-network-operator/AROSwift/zz_fixture_TestControlPlaneComponents_cluster_network_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-network-operator/GCP/zz_fixture_TestControlPlaneComponents_cluster_network_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-network-operator/IBMCloud/zz_fixture_TestControlPlaneComponents_cluster_network_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-network-operator/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_cluster_network_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-network-operator/zz_fixture_TestControlPlaneComponents_cluster_network_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/ingress-operator/AROSwift/zz_fixture_TestControlPlaneComponents_ingress_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/ingress-operator/GCP/zz_fixture_TestControlPlaneComponents_ingress_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/ingress-operator/IBMCloud/zz_fixture_TestControlPlaneComponents_ingress_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/ingress-operator/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_ingress_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/ingress-operator/zz_fixture_TestControlPlaneComponents_ingress_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/oauth-openshift/AROSwift/zz_fixture_TestControlPlaneComponents_oauth_openshift_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/oauth-openshift/GCP/zz_fixture_TestControlPlaneComponents_oauth_openshift_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/oauth-openshift/IBMCloud/zz_fixture_TestControlPlaneComponents_oauth_openshift_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/oauth-openshift/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_oauth_openshift_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/oauth-openshift/zz_fixture_TestControlPlaneComponents_oauth_openshift_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/olm-operator/AROSwift/zz_fixture_TestControlPlaneComponents_olm_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/olm-operator/GCP/zz_fixture_TestControlPlaneComponents_olm_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/olm-operator/IBMCloud/zz_fixture_TestControlPlaneComponents_olm_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/olm-operator/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_olm_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/olm-operator/zz_fixture_TestControlPlaneComponents_olm_operator_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-apiserver/AROSwift/zz_fixture_TestControlPlaneComponents_openshift_apiserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-apiserver/GCP/zz_fixture_TestControlPlaneComponents_openshift_apiserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-apiserver/IBMCloud/zz_fixture_TestControlPlaneComponents_openshift_apiserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-apiserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_openshift_apiserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-apiserver/zz_fixture_TestControlPlaneComponents_openshift_apiserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-oauth-apiserver/AROSwift/zz_fixture_TestControlPlaneComponents_openshift_oauth_apiserver_controlplanecomponent.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-oauth-apiserver/GCP/zz_fixture_TestControlPlaneComponents_openshift_oauth_apiserver_controlplanecomponent.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-oauth-apiserver/IBMCloud/zz_fixture_TestControlPlaneComponents_openshift_oauth_apiserver_controlplanecomponent.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-oauth-apiserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_openshift_oauth_apiserver_controlplanecomponent.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/openshift-oauth-apiserver/zz_fixture_TestControlPlaneComponents_openshift_oauth_apiserver_controlplanecomponent.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/AROSwift/zz_fixture_TestControlPlaneComponents_packageserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/GCP/zz_fixture_TestControlPlaneComponents_packageserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/IBMCloud/zz_fixture_TestControlPlaneComponents_packageserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_packageserver_deployment.yamlis excluded by!**/testdata/**control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/zz_fixture_TestControlPlaneComponents_packageserver_deployment.yamlis excluded by!**/testdata/**
📒 Files selected for processing (5)
control-plane-operator/controllers/hostedcontrolplane/v2/oauth_apiserver/component.gokonnectivity-socks5-proxy/guard.gokonnectivity-socks5-proxy/guard_test.gokonnectivity-socks5-proxy/main.gosupport/controlplane-component/konnectivity-container.go
The proxy was exiting hard on startup timeouts.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
konnectivity-socks5-proxy/main.go (1)
94-96:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTreat bootstrap-time context cancellation as a clean exit.
If
runWithCoreGuardreturnscontext.Canceledwhile still bootstrapping, Lines 94-96 still print an error andos.Exit(1). That makes an intentional SIGTERM during startup look like a crash.Suggested change
import ( + "context" + "errors" "fmt" "net" "os" @@ }) if err != nil { + if errors.Is(err, context.Canceled) { + return + } fmt.Fprintf(os.Stderr, "Error: %v\n", err) os.Exit(1) } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@konnectivity-socks5-proxy/main.go` around lines 94 - 96, When handling the error returned from runWithCoreGuard, treat bootstrap-time cancellation as a clean exit by checking if errors.Is(err, context.Canceled) (or err == context.Canceled) and, if so, do not print an error and exit 0; otherwise keep the existing fmt.Fprintf(os.Stderr, ...) and os.Exit(1). Update the error handling around runWithCoreGuard to perform this conditional check (and add an import for the errors package if needed) so a SIGTERM during startup is treated as a normal shutdown.
🧹 Nitpick comments (1)
konnectivity-socks5-proxy/main_test.go (1)
100-134: ⚡ Quick winExercise the production shutdown path instead of a copied closure.
This subtest reimplements the listener/
Serve/ctx.Done()flow locally, so it can stay green even ifNewStartCommanddrifts. Extract that block into a small package-private helper and call the helper from bothmain.goand this test.As per coding guidelines, "Unit test any code changes and additions".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@konnectivity-socks5-proxy/main_test.go` around lines 100 - 134, The test duplicates the listener/Serve/ctx.Done() shutdown logic (serveFunc) instead of exercising the real production path; refactor that logic into a package-private helper (e.g., create a function serveWithGracefulShutdown(ctx context.Context, servingPort uint32, server *socks5.Server) error) and replace the test-local closure and the implementation in main.go to call this helper; update NewStartCommand (or the function in main.go that currently inlines the logic) to call serveWithGracefulShutdown so the test can simply invoke the real code path and verify graceful shutdown, and adjust tests to pass a server instance and listener port to the new helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@konnectivity-socks5-proxy/guard_test.go`:
- Around line 213-219: The test currently dials a hardcoded port 54321 which can
be occupied on CI; instead create a short-lived listener on "127.0.0.1:0" to get
an ephemeral port, read its actual port from listener.Addr(), close the listener
to free the port, then call dialKonnectivityServerTCP(ctx, "127.0.0.1", port)
and assert error with isTransientKonnectivityError; update the anonymous t.Run
block to use this ephemeral-port pattern and keep the same assertions using
dialKonnectivityServerTCP and isTransientKonnectivityError.
---
Duplicate comments:
In `@konnectivity-socks5-proxy/main.go`:
- Around line 94-96: When handling the error returned from runWithCoreGuard,
treat bootstrap-time cancellation as a clean exit by checking if errors.Is(err,
context.Canceled) (or err == context.Canceled) and, if so, do not print an error
and exit 0; otherwise keep the existing fmt.Fprintf(os.Stderr, ...) and
os.Exit(1). Update the error handling around runWithCoreGuard to perform this
conditional check (and add an import for the errors package if needed) so a
SIGTERM during startup is treated as a normal shutdown.
---
Nitpick comments:
In `@konnectivity-socks5-proxy/main_test.go`:
- Around line 100-134: The test duplicates the listener/Serve/ctx.Done()
shutdown logic (serveFunc) instead of exercising the real production path;
refactor that logic into a package-private helper (e.g., create a function
serveWithGracefulShutdown(ctx context.Context, servingPort uint32, server
*socks5.Server) error) and replace the test-local closure and the implementation
in main.go to call this helper; update NewStartCommand (or the function in
main.go that currently inlines the logic) to call serveWithGracefulShutdown so
the test can simply invoke the real code path and verify graceful shutdown, and
adjust tests to pass a server instance and listener port to the new helper.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 099bd414-6684-4f69-8d67-a453ed5f21b3
📒 Files selected for processing (5)
konnectivity-socks5-proxy/guard.gokonnectivity-socks5-proxy/guard_test.gokonnectivity-socks5-proxy/main.gokonnectivity-socks5-proxy/main_test.gosupport/controlplane-component/konnectivity-container_test.go
| t.Run("When port is not listening, it should return connection refused", func(t *testing.T) { | ||
| g := NewGomegaWithT(t) | ||
| // Try to connect to a port that's not listening (port 1 requires root) | ||
| err := dialKonnectivityServerTCP(context.Background(), "127.0.0.1", 54321) | ||
| g.Expect(err).To(HaveOccurred()) | ||
| g.Expect(isTransientKonnectivityError(err)).To(BeTrue()) | ||
| }) |
There was a problem hiding this comment.
Avoid a fixed localhost port in the refusal test.
Line 216 assumes 54321 is unused. If that port is bound on a CI worker, this turns into a false negative instead of a connection-refused path. Reserve an ephemeral port with 127.0.0.1:0, capture it, close the listener, then dial that port.
Suggested change
func TestDialKonnectivityServerTCP_ConnectionRefused(t *testing.T) {
t.Run("When port is not listening, it should return connection refused", func(t *testing.T) {
g := NewGomegaWithT(t)
- // Try to connect to a port that's not listening (port 1 requires root)
- err := dialKonnectivityServerTCP(context.Background(), "127.0.0.1", 54321)
+ ln, err := net.Listen("tcp", "127.0.0.1:0")
+ g.Expect(err).ToNot(HaveOccurred())
+
+ _, portStr, err := net.SplitHostPort(ln.Addr().String())
+ g.Expect(err).ToNot(HaveOccurred())
+ g.Expect(ln.Close()).To(Succeed())
+
+ var port uint32
+ _, err = fmt.Sscanf(portStr, "%d", &port)
+ g.Expect(err).ToNot(HaveOccurred())
+
+ err = dialKonnectivityServerTCP(context.Background(), "127.0.0.1", port)
g.Expect(err).To(HaveOccurred())
g.Expect(isTransientKonnectivityError(err)).To(BeTrue())
})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| t.Run("When port is not listening, it should return connection refused", func(t *testing.T) { | |
| g := NewGomegaWithT(t) | |
| // Try to connect to a port that's not listening (port 1 requires root) | |
| err := dialKonnectivityServerTCP(context.Background(), "127.0.0.1", 54321) | |
| g.Expect(err).To(HaveOccurred()) | |
| g.Expect(isTransientKonnectivityError(err)).To(BeTrue()) | |
| }) | |
| t.Run("When port is not listening, it should return connection refused", func(t *testing.T) { | |
| g := NewGomegaWithT(t) | |
| ln, err := net.Listen("tcp", "127.0.0.1:0") | |
| g.Expect(err).ToNot(HaveOccurred()) | |
| _, portStr, err := net.SplitHostPort(ln.Addr().String()) | |
| g.Expect(err).ToNot(HaveOccurred()) | |
| g.Expect(ln.Close()).To(Succeed()) | |
| var port uint32 | |
| _, err = fmt.Sscanf(portStr, "%d", &port) | |
| g.Expect(err).ToNot(HaveOccurred()) | |
| err = dialKonnectivityServerTCP(context.Background(), "127.0.0.1", port) | |
| g.Expect(err).To(HaveOccurred()) | |
| g.Expect(isTransientKonnectivityError(err)).To(BeTrue()) | |
| }) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@konnectivity-socks5-proxy/guard_test.go` around lines 213 - 219, The test
currently dials a hardcoded port 54321 which can be occupied on CI; instead
create a short-lived listener on "127.0.0.1:0" to get an ephemeral port, read
its actual port from listener.Addr(), close the listener to free the port, then
call dialKonnectivityServerTCP(ctx, "127.0.0.1", port) and assert error with
isTransientKonnectivityError; update the anonymous t.Run block to use this
ephemeral-port pattern and keep the same assertions using
dialKonnectivityServerTCP and isTransientKonnectivityError.
The proxy was exiting hard on startup timeouts.
|
Now I have all the information. Let me compile the final report. Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryBoth CI failures are caused by code style issues in the three new test files introduced by PR #8579. The lint job fails with 5 linter violations: two Root CauseThe root cause is three new test files that were not run through the project's linter and formatter before being committed:
Recommendations
Evidence
|
This PR fixes the intermittent E2E test failures where the konnectivity-proxy-socks5 sidecar container crashes and restarts during cluster bootstrap. Fix includes retry loop, native Kubernetes startup probes, and explicit component dependency ordering.
Proxy Internal Retry Logic
Native Startup Probes
konnectivity-container.go: Added a native TCP startup probe to all Konnectivity sidecar containers. This gives the sidecars a safe 60-second window to initialize during cluster bootstrap before any readiness or liveness checks trigger.
Component Dependency Ordering
Deployment graph is updated with dependency check. The openshift-oauth-apiserver pods will now explicitly wait for the konnectivity-agent to be completely Available and RolloutComplete before they begin scheduling.
Test Fixtures are updated accordingly
Summary by CodeRabbit
New Features
Bug Fixes
Tests