Skip to content

[WIP] Add taints package#27

Draft
Tzif-Morgen wants to merge 26 commits into
medik8s:mainfrom
Tzif-Morgen:add-taints-package
Draft

[WIP] Add taints package#27
Tzif-Morgen wants to merge 26 commits into
medik8s:mainfrom
Tzif-Morgen:add-taints-package

Conversation

@Tzif-Morgen

@Tzif-Morgen Tzif-Morgen commented Apr 27, 2026

Copy link
Copy Markdown

Why we need this PR

Both FAR and SNR have duplicated taint-related code, particularly:

  • Out-of-service taint detection and version checking logic
  • Generic taint manipulation utilities
  • K8s version parsing for taint feature support

Moving this code to common allows both projects to share tested, consistent implementations.

Changes made

  1. Add pkg/taints package
    New package providing shared taint utilities:
  • TaintExists() - Check if a taint exists in a taint slice
  • FilterOutTaint() - Return new slice without matching taints
  • CreateOutOfServiceTaint() - Create K8s out-of-service taint
  • InitOutOfServiceTaintFlagsWithRetry() - Initialize version detection flags with retry logic
  • Package-level flags: IsOutOfServiceTaintSupported (K8s 1.26+), IsOutOfServiceTaintGA (K8s 1.28+)

Tests include version parsing with various formats and edge cases (11 test cases).

  1. Upgrade k8s dependencies to v0.33.7
    Aligns common with FAR and SNR k8s versions, enables modern APIs like wait.PollUntilContextTimeout.

  2. Fix lease test compatibility
    Removed Kind/APIVersion assertions from lease tests. The new fake client (k8s v0.33) doesn't auto-populate TypeMeta fields like the old version did. These fields aren't used by manager logic. See
    kubernetes-sigs/controller-runtime#3229.

  3. Rename DeleteTaint to FilterOutTaint
    Clearer name reflecting that it's a pure function returning a filtered slice, not modifying the original or deleting from a node. Each operator can use this function based on their specific needs.

Which issue(s) this PR fixes

needed for RHWA-690

Summary by CodeRabbit

  • New Features

    • Added support for out-of-service taint management with automatic Kubernetes version detection.
  • Chores

    • Updated Go runtime from version 1.20 to 1.24.0.
    • Upgraded all major dependencies including Kubernetes client libraries and controller-runtime to latest compatible versions.
  • Tests

    • Expanded taint operation test coverage.

Tzif added 4 commits April 23, 2026 12:58
This package provides shared taint utilities for remediation operators:
- TaintExists: Check if a taint exists in a node's taint list
- DeleteTaint: Remove taints by key and effect
- CreateOutOfServiceTaint: Create out-of-service taint for node shutdown
- InitOutOfServiceTaintFlagsWithRetry: Initialize out-of-service taint support flags based on K8s version

Includes version detection for:
- Out-of-service taint support (K8s 1.26+)
- Out-of-service taint GA (K8s 1.28+)

Tests cover version parsing with various formats and edge cases.

Note: Tests not passing yet due to k8s version incompatibility.
wait.PollUntilContextTimeout not available in k8s.io/apimachinery v0.26.3.

Options to fix:
1. Use wait.PollWithContext instead (compatible with v0.26.3)
2. Upgrade common to k8s.io/apimachinery v0.33.x to match FAR/SNR
Upgraded k8s dependencies to support modern APIs:
- k8s.io/apimachinery v0.26.3 -> v0.33.7
- k8s.io/api v0.26.3 -> v0.33.7
- k8s.io/client-go v0.26.3 -> v0.33.7
- sigs.k8s.io/controller-runtime v0.14.5 -> v0.21.0

This upgrade enables use of wait.PollUntilContextTimeout in taints package
and aligns with FAR and SNR k8s versions.

Tests status:
- pkg/taints: PASS (11/11 tests)
- pkg/etcd: PASS (7/7 tests)
- pkg/events: PASS (9/9 tests)
- pkg/lease: FAIL (10/11 tests) - 1 failing test due to fake client behavior change
  Known issue: lease test expects Kind field populated by fake client

The lease test failure is unrelated to the taints package.
Will fix in the next commit.
Remove Kind/APIVersion assertions on Lease objects in testCreateLease.

The new fake client (k8s v0.33) doesn't auto-populate TypeMeta fields
like the old one (v0.26) did (see kubernetes-sigs/controller-runtime#3229).

These fields are not used by the manager logic and are not checked by
compareLeases(). The assertions were testing fake client behavior
rather than manager functionality.

All lease tests now pass (11/11).
Clearer name for a pure function that filters a taint slice.
Does not modify the original slice or delete from a node.

Each operator can use this function differently based on their needs.
@openshift-ci

openshift-ci Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented Apr 27, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Go toolchain updated to version 1.24.0 with comprehensive dependency upgrades across direct and transitive modules. Taint management API refactored with renamed functions and new capabilities for out-of-service taint detection. Test suite added for taint utilities with version-aware feature gating logic. Minor test assertion cleanup in lease manager.

Changes

Taints API Refactoring

Layer / File(s) Summary
Core API Redesign
pkg/taints/taints.go
Public functions renamed: TaintExistsContains, FilterOutTaintFilter. New type OutOfServiceTaintInfo added to cache Kubernetes version support flags (Supported ≥1.26, GA ≥1.28).
Taint Operations
pkg/taints/taints.go
New functions CreateOutOfServiceTaint(), AddTaintToNode(), and version detection via DetectOutOfServiceTaintInfoWithRetry() with retry-based Kubernetes server version discovery.
Version Detection Logic
pkg/taints/taints.go
Internal functions detectOutOfServiceTaintInfo() and getOutOfServiceTaintInfo() parse server version and compute feature support flags, enabling conditional out-of-service taint usage.
Test Suite Initialization
pkg/taints/taints_suite_test.go
New Ginkgo test suite with BeforeSuite hook configuring Zap logger and controller-runtime logging output for the taints package.
Comprehensive Tests
pkg/taints/taints_test.go
Table-driven tests validate Contains, Filter, CreateOutOfServiceTaint, version-aware flag setting, and node taint operations (AddTaintToNode, RemoveTaintFromNode) with edge cases.

Go Dependency Upgrade

Layer / File(s) Summary
Toolchain & Direct Dependencies
go.mod
Go version bumped to 1.24.0. Major direct dependencies upgraded in lockstep: go-logr, ginkgo, gomega, logrus, zap, Kubernetes client libraries (k8s.io/api, k8s.io/apimachinery, k8s.io/client-go), and sigs.k8s.io/controller-runtime.
Transitive Dependency Refresh
go.mod
Indirect modules version-bumped (e.g., go-restful, json-patch/v5, fsnotify, go-openapi/*, Prometheus packages). New indirect modules introduced (e.g., blang/semver/v4, gorilla/websocket, mxk/go-flowrate). Obsolete entries removed.
Test Cleanup
pkg/lease/manager_test.go
Lease test assertions for Kind and APIVersion fields removed; only compareLeases() validation retained. Control flow and error expectations unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[WIP] Add taints package' directly and accurately describes the main objective of the changeset—introducing a new taints package with shared utilities.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
pkg/taints/taints.go (3)

38-60: Nit: inconsistent MatchTaint receiver direction between TaintExists and FilterOutTaint.

TaintExists calls taint.MatchTaint(taintToFind) (receiver = list element, arg = needle), while FilterOutTaint calls taintToDelete.MatchTaint(&taints[i]) (receiver = needle, arg = list element). The behavior is equivalent because MatchTaint compares Key+Effect symmetrically, but aligning them makes the code easier to scan.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/taints/taints.go` around lines 38 - 60, The two functions TaintExists and
FilterOutTaint use MatchTaint with different receiver/argument order which is
inconsistent; change FilterOutTaint to call the same direction as TaintExists
(use taints[i].MatchTaint(taintToDelete) instead of
taintToDelete.MatchTaint(&taints[i])) so both use the list element as the
receiver and the needle as the argument, keeping the comparison logic identical
and improving readability.

28-35: Recommended: add unit tests for TaintExists, FilterOutTaint, and CreateOutOfServiceTaint.

Only setOutOfTaintFlags is currently covered by taints_test.go. Since this package is meant to be the single shared implementation reused by FAR and SNR (per PR objectives), the public surface — taint matching, slice filtering with the deleted-flag return, and the constructed OutOfService taint shape — is exactly what consumers will rely on and what should be locked down by tests. Recommend table-driven cases covering: matching/non-matching keys, matching/non-matching effects, empty input slice, and the no-op delete (deleted == false) path.

Want me to draft the additional test cases?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/taints/taints.go` around lines 28 - 35, Add unit tests for the public
functions TaintExists, FilterOutTaint, and CreateOutOfServiceTaint: write
table-driven tests that exercise matching and non-matching key scenarios and
matching/non-matching Effect values for TaintExists (use several taint slices
and assert boolean result), tests for FilterOutTaint that cover empty input
slice, removal of a matching taint (assert returned slice and deleted==true),
and the no-op delete path where no taint matches (assert deleted==false and
original slice unchanged), and tests for CreateOutOfServiceTaint that assert the
returned taint has the expected Key, Value, Effect and contains the intended
leadingDigits behavior if relevant; reference functions TaintExists,
FilterOutTaint, CreateOutOfServiceTaint and ensure edge cases (nil/empty slices,
different Effect enums) are covered in table cases.

91-107: Optional: flatten if/else if/else chain into early returns for readability.

The nested if/else if/else with multiple shadowed err declarations is harder to read than the equivalent early-return form, and avoids re-binding err inside if-init scopes.

♻️ Suggested refactor
 func initOutOfServiceTaintFlags(config *rest.Config) error {
-	if cs, err := kubernetes.NewForConfig(config); err != nil || cs == nil {
-		if cs == nil {
-			err = fmt.Errorf("k8s client set is nil")
-		}
-		loggerTaint.Error(err, "couldn't retrieve k8s client")
-		return err
-	} else if k8sVersion, err := cs.Discovery().ServerVersion(); err != nil || k8sVersion == nil {
-		if k8sVersion == nil {
-			err = fmt.Errorf("k8s server version is nil")
-		}
-		loggerTaint.Error(err, "couldn't retrieve k8s server version")
-		return err
-	} else {
-		return setOutOfTaintFlags(k8sVersion)
-	}
+	cs, err := kubernetes.NewForConfig(config)
+	if err != nil || cs == nil {
+		if err == nil {
+			err = fmt.Errorf("k8s client set is nil")
+		}
+		loggerTaint.Error(err, "couldn't retrieve k8s client")
+		return err
+	}
+	k8sVersion, err := cs.Discovery().ServerVersion()
+	if err != nil || k8sVersion == nil {
+		if err == nil {
+			err = fmt.Errorf("k8s server version is nil")
+		}
+		loggerTaint.Error(err, "couldn't retrieve k8s server version")
+		return err
+	}
+	return setOutOfTaintFlags(k8sVersion)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/taints/taints.go` around lines 91 - 107, The initOutOfServiceTaintFlags
function uses a nested if/else-if/else chain with shadowed err variables;
refactor it to use early returns to improve readability: first call
kubernetes.NewForConfig(config) into cs, err, check if err != nil or cs == nil
and log via loggerTaint.Error with a clear message (set err = fmt.Errorf("k8s
client set is nil") if cs == nil) then return err; next call
cs.Discovery().ServerVersion() into k8sVersion, err, check if err != nil or
k8sVersion == nil and log similarly (set err = fmt.Errorf("k8s server version is
nil") if nil) then return err; finally call and return
setOutOfTaintFlags(k8sVersion). Ensure you do not rebind err in nested if-inits
so the flow is linear and easier to follow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/taints/taints.go`:
- Around line 80-89: Update the stale comment to reference the unexported helper
name initOutOfServiceTaintFlags (instead of InitOutOfServiceTaintFlags) and add
a short inline justification next to the PollUntilContextTimeout callback (or a
//nolint:nilerr directive) clarifying that returning (false, nil) is intentional
to keep retrying and that the outer err variable preserves the last failure;
this will fix searchability and silence the nilerr linter false positive for the
closure used by PollUntilContextTimeout.

---

Nitpick comments:
In `@pkg/taints/taints.go`:
- Around line 38-60: The two functions TaintExists and FilterOutTaint use
MatchTaint with different receiver/argument order which is inconsistent; change
FilterOutTaint to call the same direction as TaintExists (use
taints[i].MatchTaint(taintToDelete) instead of
taintToDelete.MatchTaint(&taints[i])) so both use the list element as the
receiver and the needle as the argument, keeping the comparison logic identical
and improving readability.
- Around line 28-35: Add unit tests for the public functions TaintExists,
FilterOutTaint, and CreateOutOfServiceTaint: write table-driven tests that
exercise matching and non-matching key scenarios and matching/non-matching
Effect values for TaintExists (use several taint slices and assert boolean
result), tests for FilterOutTaint that cover empty input slice, removal of a
matching taint (assert returned slice and deleted==true), and the no-op delete
path where no taint matches (assert deleted==false and original slice
unchanged), and tests for CreateOutOfServiceTaint that assert the returned taint
has the expected Key, Value, Effect and contains the intended leadingDigits
behavior if relevant; reference functions TaintExists, FilterOutTaint,
CreateOutOfServiceTaint and ensure edge cases (nil/empty slices, different
Effect enums) are covered in table cases.
- Around line 91-107: The initOutOfServiceTaintFlags function uses a nested
if/else-if/else chain with shadowed err variables; refactor it to use early
returns to improve readability: first call kubernetes.NewForConfig(config) into
cs, err, check if err != nil or cs == nil and log via loggerTaint.Error with a
clear message (set err = fmt.Errorf("k8s client set is nil") if cs == nil) then
return err; next call cs.Discovery().ServerVersion() into k8sVersion, err, check
if err != nil or k8sVersion == nil and log similarly (set err = fmt.Errorf("k8s
server version is nil") if nil) then return err; finally call and return
setOutOfTaintFlags(k8sVersion). Ensure you do not rebind err in nested if-inits
so the flow is linear and easier to follow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c6bd1bdc-2caa-4e8e-90ee-367a63c023f1

📥 Commits

Reviewing files that changed from the base of the PR and between 2dab4f0 and f8e10b9.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (4)
  • go.mod
  • pkg/lease/manager_test.go
  • pkg/taints/taints.go
  • pkg/taints/taints_test.go
💤 Files with no reviewable changes (1)
  • pkg/lease/manager_test.go

Comment thread pkg/taints/taints.go Outdated
@Tzif-Morgen

Copy link
Copy Markdown
Author

we can also use PollWithContext instead of PollUntilContextTimeout to avoid all the k8s upgrades

@razo7 razo7 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Add taints package

Must fix before merge

1. Concurrency hazard — mutable package-level state
IsOutOfServiceTaintSupported and IsOutOfServiceTaintGA are exported mutable package-level vars written by InitOutOfServiceTaintFlagsWithRetry and read by any consumer, with no synchronization. If two controllers in the same process call this, or a reader checks the flag while the retry loop is running, there's a data race.

Use sync.Once or atomic.Bool:

type OutOfServiceTaintInfo struct {
    Supported bool
    GA        bool
}

Or at minimum protect with sync.Once since the values are write-once.

2. Missing test coverage
TaintExists, FilterOutTaint, and CreateOutOfServiceTaint are all untested. Since this is a shared library replacing duplicated code from FAR and SNR, comprehensive coverage is essential. Both FAR and SNR presumably had tests for their own implementations — those should migrate here.

3. Nested if/else with shadowed err in initOutOfServiceTaintFlags
The deep if/else if/else chain re-declares err in each if init statement, shadowing the outer variable. While the current code happens to work, it's fragile. Refactor to early returns.

Should fix

4. Discarded PollUntilContextTimeout error

_ = wait.PollUntilContextTimeout(ctx, interval, timeout, true, func(ctx context.Context) (bool, error) {

Intentional but undocumented. If the parent context is cancelled, the poll returns ctx.Err(), but you return the last internal err. Add a comment explaining the intent.

5. Client-side TimeAdded in CreateOutOfServiceTaint
Setting TimeAdded at Go function call time, not when the taint is applied to the node. In Kubernetes, TimeAdded is typically set by the API server. Consider removing it and letting the API server handle it.

6. Stale comment
InitOutOfServiceTaintFlagsWithRetry comment says "retry for a limited number of times" but the implementation is time-bound (2s interval, 10s timeout), not count-bound. Update the comment.

7. Inconsistent MatchTaint call direction
TaintExists calls taint.MatchTaint(taintToFind) while FilterOutTaint calls taintToDelete.MatchTaint(&taints[i]). MatchTaint is symmetric on Key+Effect so it works, but the inconsistency hurts readability. Pick one direction.

Notes

  • Tests use stdlib testing while the rest of the repo (pkg/etcd, pkg/lease) uses Ginkgo/Gomega — worth aligning for consistency.
  • The dense test assertion in setOutOfTaintFlags checks three conditions in one if statement. Breaking into separate assertions would make failures easier to diagnose.
  • No suite_test.go file, unlike other packages in the repo.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/taints/taints.go`:
- Around line 88-104: The current InitOutOfServiceTaintFlagsWithRetry uses
initOnce.Do which permanently prevents retries if the first attempt fails;
change the logic so the "done" marker is only set when
initOutOfServiceTaintFlags succeeds: replace the initOnce.Do block with a
mutex-protected flow (or an atomic boolean) that runs the
wait.PollUntilContextTimeout retry loop and only sets the initialized flag and
initErr on success, leaving initErr unset/updated on transient failures so
subsequent calls to InitOutOfServiceTaintFlagsWithRetry will retry; keep
references to initOutOfServiceTaintFlags, initErr, and initOnce (or rename
initOnce to initializedFlag if you switch types) so the success-only
initialization semantics are clear.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 58a8decc-d29b-4375-bd40-ee84048d196d

📥 Commits

Reviewing files that changed from the base of the PR and between f8e10b9 and ca54e7b.

📒 Files selected for processing (2)
  • pkg/taints/taints.go
  • pkg/taints/taints_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/taints/taints_test.go

Comment thread pkg/taints/taints.go Outdated
Replace exported mutable vars with OutOfServiceTaintInfo struct accessed via
GetOutOfServiceTaintInfo() getter.

Update tests to use struct and follow common repo naming (expectedInfo, expectedErr).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/taints/taints.go`:
- Around line 79-80: GetOutOfServiceTaintInfo reads the shared package variable
taintInfo while setOutOfTaintFlags mutates it, causing potential data races; add
a package-level sync.RWMutex (e.g. taintInfoMu) and use
taintInfoMu.RLock()/RUnlock() in GetOutOfServiceTaintInfo and
taintInfoMu.Lock()/Unlock() in setOutOfTaintFlags to synchronize access
(returning the struct copy under the read lock). Ensure all other accesses to
taintInfo are similarly guarded.
- Around line 91-97: Assign and check the error returned by
wait.PollUntilContextTimeout instead of discarding it: capture its return value
(e.g. pollErr := wait.PollUntilContextTimeout(...)), and return that error when
non-nil; if initOutOfServiceTaintFlags left a non-nil initialization error in
the outer err variable, wrap or combine both (e.g. include init error context)
before returning. Ensure you don’t shadow err inside the polling callback and
reference the existing initOutOfServiceTaintFlags and
wait.PollUntilContextTimeout symbols when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 544e21ca-fab7-4cec-87d8-72fa650ac40b

📥 Commits

Reviewing files that changed from the base of the PR and between ca54e7b and f41128c.

📒 Files selected for processing (2)
  • pkg/taints/taints.go
  • pkg/taints/taints_test.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/taints/taints_test.go

Comment thread pkg/taints/taints.go Outdated
Comment thread pkg/taints/taints.go Outdated
@Tzif-Morgen

Copy link
Copy Markdown
Author
  1. Concurrency hazard — mutable package-level state

Both FAR and SNR call it once in main.go at startup; Single-threaded initialization - no concurrent calls (and, each operator runs in its own pod/process with separate memory - FAR and SNR don't share package-level variables)

Tzif added 8 commits April 29, 2026 09:20
Convert tests to Ginkgo/Gomega for consistency with other packages in common
(pkg/etcd, pkg/lease). Add test coverage for TaintExists, FilterOutTaint, and
CreateOutOfServiceTaint which were previously untested.

Total test coverage: 22 test cases covering all public functions.
Add envtest setup to Makefile following FAR/SNR pattern. The test target now
depends on envtest and sets KUBEBUILDER_ASSETS to enable integration testing.

Add integration tests that verify InitOutOfServiceTaintFlagsWithRetry works
with a real Kubernetes API server, testing both success and error cases.

All tests pass (25 total: 22 unit + 3 integration).
Simplify error handling to avoid err shadowing.
Check poll error for context cancellation and return it so caller knows the
operation was cancelled. On timeout, return the internal error which is more
specific than the generic timeout error.

While FAR and SNR currently use context.Background() (which never cancels), as
a library we should not rely on that assumption.
TimeAdded should be set when the taint is applied to the node, not when the
struct is created. Setting it at creation time results in inaccurate timestamps
if there's any delay before patching the node.

Both FAR and SNR set TimeAdded at patch time, making the creation-time value
incorrect and misleading. Callers should set TimeAdded when they apply the taint.
Adds two new functions for node taint management:
- AppendTaintToNode: adds taint to node and patches it, sets TimeAdded
- RemoveTaintFromNode: removes taint from node and patches it

Both use MergeFrom patch strategy to avoid update conflicts.
Includes integration tests using envtest.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@go.mod`:
- Line 43: The project currently pulls vulnerable github.com/moby/spdystream
v0.5.0 transitively; update the dependency graph by bumping
sigs.k8s.io/controller-runtime to a release that depends on k8s.io/client-go
v0.36.0+ (e.g., set sigs.k8s.io/controller-runtime to v0.17.0 or later in
go.mod), then run go get/sync (e.g., `go get
sigs.k8s.io/controller-runtime@v0.17.0`), followed by go mod tidy and verify
go.sum to ensure spdystream is now v0.5.1; alternatively, if you choose not to
upgrade, add a note and tests proving the spdystream code path is unreachable.

In `@Makefile`:
- Around line 70-79: The Makefile installs tools into a hardcoded
$(PROJECT_DIR)/bin while targets use the configurable LOCALBIN, causing
mismatched check vs install locations; update the tool install invocations so
they install into the same path referenced by the targets (use the per-tool path
variables SORT_IMPORTS and ENVTEST or LOCALBIN) by changing the go-install-tool
calls in the sort-imports and envtest targets to install to $(SORT_IMPORTS) and
$(ENVTEST) (or otherwise ensure the go-install-tool macro uses $(LOCALBIN)
instead of $(PROJECT_DIR)/bin) so checks and installs remain consistent.
- Around line 43-45: The check target uses a dockerized golang:1.20 image while
go.mod declares Go 1.24.0; update the Makefile's GO_VERSION variable (and any
references that construct the docker image tag) from 1.20 to 1.24.0 so the
`check` target (and the container run that executes `make lint test`) uses
golang:1.24.0; ensure any other occurrences of GO_VERSION in the Makefile refer
to the same updated variable so the test container and tooling match the module
toolchain.

In `@pkg/taints/taints_integration_test.go`:
- Around line 44-68: The tests currently call
InitOutOfServiceTaintFlagsWithRetry and then discard the result of
GetOutOfServiceTaintInfo(), which means regressions in taint detection can slip
by; update both specs so after calling GetOutOfServiceTaintInfo() you assert the
returned struct's key flags (e.g., Supported and GA) are true (since envtest is
pinned to k8s v1.33) and also assert the struct is not the zero value; reference
the functions InitOutOfServiceTaintFlagsWithRetry and GetOutOfServiceTaintInfo
to locate the assertions to tighten.

In `@pkg/taints/taints_test.go`:
- Around line 91-97: The test for CreateOutOfServiceTaint is missing an
assertion that the returned taint's TimeAdded is unset; update the spec in
Context("CreateOutOfServiceTaint") / It("should create out-of-service taint with
correct fields") to assert that taint.TimeAdded is nil (or zero/unset per the
type used) after calling CreateOutOfServiceTaint(), keeping the other Expect
checks for Key, Value, and Effect intact so regressions setting TimeAdded will
fail the test.

In `@pkg/taints/taints.go`:
- Around line 81-111: AppendTaintToNode and RemoveTaintFromNode currently build
merge patches from a potentially stale node and can clobber concurrent taint
updates; fix both by re-fetching the latest Node before mutating, capture a deep
copy as "stored" before changing node.Spec.Taints, then call client.Patch with
client.MergeFromWithOptions(stored, client.MergeFromWithOptimisticLock{}) so the
patch includes metadata.resourceVersion and will fail on concurrent
modifications; if Patch returns a conflict error, return that error so the
caller/controller can requeue and reconcile with a fresh fetch.
- Around line 87-89: The function that appends taints currently sets
taint.TimeAdded unconditionally; change AppendTaintToNode (or the code block
that does now := metav1.Now(); taint.TimeAdded = &now; node.Spec.Taints =
append(...)) to only set TimeAdded when taint.Effect ==
corev1.TaintEffectNoExecute, leaving TimeAdded nil for NoSchedule and
PreferNoSchedule; locate the taint append logic in pkg/taints/taints.go and wrap
the timestamp assignment in an if checking corev1.TaintEffectNoExecute, then
append the taint as before and adjust related tests that currently expect
TimeAdded for NoSchedule.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 90bff7ce-2a72-4752-b1f4-5566a0b515d5

📥 Commits

Reviewing files that changed from the base of the PR and between f41128c and 2ab160a.

📒 Files selected for processing (6)
  • Makefile
  • go.mod
  • pkg/taints/taints.go
  • pkg/taints/taints_integration_test.go
  • pkg/taints/taints_suite_test.go
  • pkg/taints/taints_test.go

Comment thread go.mod
github.com/matttproud/golang_protobuf_extensions v1.0.2 // indirect
github.com/moby/spdystream v0.2.0 // indirect
github.com/mailru/easyjson v0.7.7 // indirect
github.com/moby/spdystream v0.5.0 // indirect

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Is github.com/moby/spdystream v0.5.0 affected by GHSA-pc3f-x583-g7j2, and which k8s.io/client-go or sigs.k8s.io/controller-runtime release removes or patches that transitive dependency?

💡 Result:

Yes, github.com/moby/spdystream v0.5.0 is affected by GHSA-pc3f-x583-g7j2, as affected versions are <= v0.5.0 and v0.5.1 patches it. k8s.io/client-go versions prior to v0.36.0 depend on spdystream v0.5.0 transitively (via k8s.io/apimachinery/pkg/util/httpstream/spdy or k8s.io/streaming), so they are affected. k8s.io/client-go v0.36.0 uses spdystream v0.5.1. sigs.k8s.io/controller-runtime depends on k8s.io/client-go; versions aligned with client-go v0.36.0+ (likely v0.17.0+, check compatibility table) will patch it transitively. No direct spdystream dep in controller-runtime go.mod; fix comes via client-go upgrade.

Citations:


Address known-vulnerable moby/spdystream v0.5.0 in the transitive dependency graph.

This version is affected by GHSA-pc3f-x583-g7j2 (DoS advisory). The fix is available in spdystream v0.5.1, which is included in k8s.io/client-go v0.36.0+. Upgrade sigs.k8s.io/controller-runtime to a version aligned with k8s.io/client-go v0.36.0 or later (likely v0.17.0+) to transitively pull the patched version. Alternatively, verify that the affected code path in spdystream is unreachable in your application before merge.

🧰 Tools
🪛 OSV Scanner (2.3.5)

[HIGH] 43-43: github.com/moby/spdystream 0.5.0: SpdyStream: DOS on CRI

(GHSA-pc3f-x583-g7j2)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go.mod` at line 43, The project currently pulls vulnerable
github.com/moby/spdystream v0.5.0 transitively; update the dependency graph by
bumping sigs.k8s.io/controller-runtime to a release that depends on
k8s.io/client-go v0.36.0+ (e.g., set sigs.k8s.io/controller-runtime to v0.17.0
or later in go.mod), then run go get/sync (e.g., `go get
sigs.k8s.io/controller-runtime@v0.17.0`), followed by go mod tidy and verify
go.sum to ensure spdystream is now v0.5.1; alternatively, if you choose not to
upgrade, add a note and tests proving the spdystream code path is unreachable.

Comment thread Makefile Outdated
Comment on lines 43 to 45
test: envtest ## Run tests.
KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir $(LOCALBIN)/k8s/$(ENVTEST_K8S_VERSION) -p path)" \
go test ./... -coverprofile cover.out -v

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find and read the go.mod file to check the Go version requirement
fd -t f 'go.mod$' -x cat {}

echo "---"

# Find and read the Makefile to check GO_VERSION and the check/test targets
fd -t f 'Makefile$' -x head -100 {}

Repository: medik8s/common

Length of output: 6694


make check runs tests in Go 1.20 but go.mod requires Go 1.24.0.

The dockerized check target runs make lint test inside a golang:1.20 container, which will fail if the code uses Go 1.24.0 features. Update GO_VERSION to match the module's declared toolchain.

🛠️ Fix
-GO_VERSION = 1.20
+GO_VERSION = 1.24.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 43 - 45, The check target uses a dockerized
golang:1.20 image while go.mod declares Go 1.24.0; update the Makefile's
GO_VERSION variable (and any references that construct the docker image tag)
from 1.20 to 1.24.0 so the `check` target (and the container run that executes
`make lint test`) uses golang:1.24.0; ensure any other occurrences of GO_VERSION
in the Makefile refer to the same updated variable so the test container and
tooling match the module toolchain.

Comment thread Makefile Outdated
Comment thread pkg/taints/taints_integration_test.go Outdated
Comment on lines +44 to +68
It("should successfully detect k8s version and set taint info", func() {
ctx := context.Background()
err := InitOutOfServiceTaintFlagsWithRetry(ctx, cfg)
Expect(err).NotTo(HaveOccurred())

// envtest typically runs a recent k8s version (>= 1.26)
// so we expect out-of-service taint to be supported
info := GetOutOfServiceTaintInfo()
// We can't assert exact values since envtest version varies
// but we can verify the function executed without error
_ = info
})

It("should populate taintInfo correctly", func() {
ctx := context.Background()
err := InitOutOfServiceTaintFlagsWithRetry(ctx, cfg)
Expect(err).NotTo(HaveOccurred())

// Verify taintInfo was populated (not zero value)
info := GetOutOfServiceTaintInfo()
// Since we can't know the exact envtest version, just verify
// the function completed and populated something
By("verifying taintInfo was populated")
_ = info
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These specs don't currently validate the flag values.

Both cases only assert "no error" and then discard info, so a regression in GetOutOfServiceTaintInfo() would still pass. Since this PR’s Makefile pins envtest to Kubernetes 1.33, these tests can assert Supported == true and GA == true directly.

🔍 Tighten the assertions
-			info := GetOutOfServiceTaintInfo()
-			// We can't assert exact values since envtest version varies
-			// but we can verify the function executed without error
-			_ = info
+			info := GetOutOfServiceTaintInfo()
+			Expect(info.Supported).To(BeTrue())
+			Expect(info.GA).To(BeTrue())
@@
-			info := GetOutOfServiceTaintInfo()
-			// Since we can't know the exact envtest version, just verify
-			// the function completed and populated something
-			By("verifying taintInfo was populated")
-			_ = info
+			info := GetOutOfServiceTaintInfo()
+			By("verifying taintInfo was populated")
+			Expect(info.Supported).To(BeTrue())
+			Expect(info.GA).To(BeTrue())
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
It("should successfully detect k8s version and set taint info", func() {
ctx := context.Background()
err := InitOutOfServiceTaintFlagsWithRetry(ctx, cfg)
Expect(err).NotTo(HaveOccurred())
// envtest typically runs a recent k8s version (>= 1.26)
// so we expect out-of-service taint to be supported
info := GetOutOfServiceTaintInfo()
// We can't assert exact values since envtest version varies
// but we can verify the function executed without error
_ = info
})
It("should populate taintInfo correctly", func() {
ctx := context.Background()
err := InitOutOfServiceTaintFlagsWithRetry(ctx, cfg)
Expect(err).NotTo(HaveOccurred())
// Verify taintInfo was populated (not zero value)
info := GetOutOfServiceTaintInfo()
// Since we can't know the exact envtest version, just verify
// the function completed and populated something
By("verifying taintInfo was populated")
_ = info
})
It("should successfully detect k8s version and set taint info", func() {
ctx := context.Background()
err := InitOutOfServiceTaintFlagsWithRetry(ctx, cfg)
Expect(err).NotTo(HaveOccurred())
// envtest typically runs a recent k8s version (>= 1.26)
// so we expect out-of-service taint to be supported
info := GetOutOfServiceTaintInfo()
Expect(info.Supported).To(BeTrue())
Expect(info.GA).To(BeTrue())
})
It("should populate taintInfo correctly", func() {
ctx := context.Background()
err := InitOutOfServiceTaintFlagsWithRetry(ctx, cfg)
Expect(err).NotTo(HaveOccurred())
// Verify taintInfo was populated (not zero value)
info := GetOutOfServiceTaintInfo()
By("verifying taintInfo was populated")
Expect(info.Supported).To(BeTrue())
Expect(info.GA).To(BeTrue())
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/taints/taints_integration_test.go` around lines 44 - 68, The tests
currently call InitOutOfServiceTaintFlagsWithRetry and then discard the result
of GetOutOfServiceTaintInfo(), which means regressions in taint detection can
slip by; update both specs so after calling GetOutOfServiceTaintInfo() you
assert the returned struct's key flags (e.g., Supported and GA) are true (since
envtest is pinned to k8s v1.33) and also assert the struct is not the zero
value; reference the functions InitOutOfServiceTaintFlagsWithRetry and
GetOutOfServiceTaintInfo to locate the assertions to tighten.

Comment thread pkg/taints/taints_test.go
Comment on lines +91 to +97
Context("CreateOutOfServiceTaint", func() {
It("should create out-of-service taint with correct fields", func() {
taint := CreateOutOfServiceTaint()
Expect(taint.Key).To(Equal(corev1.TaintNodeOutOfService))
Expect(taint.Value).To(Equal("nodeshutdown"))
Expect(taint.Effect).To(Equal(corev1.TaintEffectNoExecute))
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert that TimeAdded stays unset here.

One of the new contract changes is that CreateOutOfServiceTaint() does not populate TimeAdded, but this test never checks that. A regression there would pass unnoticed.

✅ Add the missing assertion
 			taint := CreateOutOfServiceTaint()
 			Expect(taint.Key).To(Equal(corev1.TaintNodeOutOfService))
 			Expect(taint.Value).To(Equal("nodeshutdown"))
 			Expect(taint.Effect).To(Equal(corev1.TaintEffectNoExecute))
+			Expect(taint.TimeAdded).To(BeNil())
 		})
 	})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Context("CreateOutOfServiceTaint", func() {
It("should create out-of-service taint with correct fields", func() {
taint := CreateOutOfServiceTaint()
Expect(taint.Key).To(Equal(corev1.TaintNodeOutOfService))
Expect(taint.Value).To(Equal("nodeshutdown"))
Expect(taint.Effect).To(Equal(corev1.TaintEffectNoExecute))
})
Context("CreateOutOfServiceTaint", func() {
It("should create out-of-service taint with correct fields", func() {
taint := CreateOutOfServiceTaint()
Expect(taint.Key).To(Equal(corev1.TaintNodeOutOfService))
Expect(taint.Value).To(Equal("nodeshutdown"))
Expect(taint.Effect).To(Equal(corev1.TaintEffectNoExecute))
Expect(taint.TimeAdded).To(BeNil())
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/taints/taints_test.go` around lines 91 - 97, The test for
CreateOutOfServiceTaint is missing an assertion that the returned taint's
TimeAdded is unset; update the spec in Context("CreateOutOfServiceTaint") /
It("should create out-of-service taint with correct fields") to assert that
taint.TimeAdded is nil (or zero/unset per the type used) after calling
CreateOutOfServiceTaint(), keeping the other Expect checks for Key, Value, and
Effect intact so regressions setting TimeAdded will fail the test.

Comment thread pkg/taints/taints.go Outdated
Comment on lines +81 to +111
func AppendTaintToNode(ctx context.Context, c client.Client, node *corev1.Node, taint corev1.Taint) (bool, error) {
if TaintExists(node.Spec.Taints, &taint) {
return false, nil
}

patch := client.MergeFrom(node.DeepCopy())
now := metav1.Now()
taint.TimeAdded = &now
node.Spec.Taints = append(node.Spec.Taints, taint)

if err := c.Patch(ctx, node, patch); err != nil {
return false, err
}
return true, nil
}

// RemoveTaintFromNode removes the given taint from the node and patches it.
// Returns true if the taint was removed, false if it didn't exist.
func RemoveTaintFromNode(ctx context.Context, c client.Client, node *corev1.Node, taint corev1.Taint) (bool, error) {
newTaints, removed := FilterOutTaint(node.Spec.Taints, &taint)
if !removed {
return false, nil
}

patch := client.MergeFrom(node.DeepCopy())
node.Spec.Taints = newTaints

if err := c.Patch(ctx, node, patch); err != nil {
return false, err
}
return true, nil

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In sigs.k8s.io/controller-runtime v0.21.0, what is the recommended way to patch corev1.Node.Spec.Taints without dropping concurrent taint updates? Does client.MergeFromWithOptimisticLock exist in this version?

💡 Result:

Recommended way: use MergeFrom (or StrategicMergeFrom) with optimistic locking so that the patch includes metadata.resourceVersion and will conflict/retry if another actor updated the Node (avoids clobbering concurrent taint changes). Concretely, in controller-runtime v0.21.0 patch the Node object using client.MergeFromWithOptions(stored, client.MergeFromWithOptimisticLock{}), where stored is node.DeepCopy taken immediately before you mutate node.Spec.Taints. Example: stored := node.DeepCopy // capture current state // ... add/remove taints on node.Spec.Taints ... if !equality.Semantic.DeepEqual(stored, node) { err := kubeClient.Patch(ctx, node, client.MergeFromWithOptions(stored, client.MergeFromWithOptimisticLock{})) // if err is a conflict, requeue/retry with fresh state } Does client.MergeFromWithOptimisticLock exist in this version? Yes. controller-runtime exposes client.MergeFromWithOptimisticLock (used via MergeFromWithOptions to set OptimisticLock=true for merge-from patches).

Citations:


These helpers can clobber concurrent taint updates.

Both functions build a merge patch from a potentially stale Node object. If another actor modifies spec.taints between when the caller fetches the node and when the patch is applied, the whole slice can be overwritten, silently dropping the concurrent change.

Instead, re-fetch the latest Node inside the helper and patch with optimistic locking. Use client.MergeFromWithOptions(stored, client.MergeFromWithOptimisticLock{}), where stored is a deep copy captured before mutating node.Spec.Taints. On conflict errors, requeue with a fresh fetch. This ensures the patch includes metadata.resourceVersion and will fail if the node was modified concurrently, preventing silent data loss.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/taints/taints.go` around lines 81 - 111, AppendTaintToNode and
RemoveTaintFromNode currently build merge patches from a potentially stale node
and can clobber concurrent taint updates; fix both by re-fetching the latest
Node before mutating, capture a deep copy as "stored" before changing
node.Spec.Taints, then call client.Patch with
client.MergeFromWithOptions(stored, client.MergeFromWithOptimisticLock{}) so the
patch includes metadata.resourceVersion and will fail on concurrent
modifications; if Patch returns a conflict error, return that error so the
caller/controller can requeue and reconcile with a fresh fetch.

Comment thread pkg/taints/taints.go
Comment on lines +87 to +89
now := metav1.Now()
taint.TimeAdded = &now
node.Spec.Taints = append(node.Spec.Taints, taint)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the taints.go file to see the full context
cat -n pkg/taints/taints.go | head -100

Repository: medik8s/common

Length of output: 3848


🏁 Script executed:

# Look for the AppendTaintToNode function
rg -A 20 "func.*AppendTaintToNode" pkg/taints/

Repository: medik8s/common

Length of output: 1262


🏁 Script executed:

# Search for TimeAdded usage in the codebase
rg "TimeAdded" --type-list

Repository: medik8s/common

Length of output: 448


🏁 Script executed:

# Search for integration tests related to taints
fd -e "\.go$" -i test | xargs rg -l "TimeAdded|AppendTaint" 2>/dev/null | head -20

Repository: medik8s/common

Length of output: 117


🏁 Script executed:

# Look for documentation or comments about taint effects
rg "TaintEffect" -B 2 -A 2 --type go | head -50

Repository: medik8s/common

Length of output: 3067


🏁 Script executed:

cat -n pkg/taints/taints_integration_test.go

Repository: medik8s/common

Length of output: 7828


🏁 Script executed:

# Search for TimeAdded assertions in tests
rg "TimeAdded" pkg/taints/

Repository: medik8s/common

Length of output: 461


🌐 Web query:

Kubernetes taint TimeAdded NoExecute semantics

💡 Result:

In Kubernetes, the TimeAdded field in a taint specifies the timestamp when the taint was added to the node. This is particularly relevant for taints with the NoExecute effect. For NoExecute taints: - Pods without a matching toleration are evicted immediately upon taint addition. - Pods with a matching toleration but no tolerationSeconds remain bound indefinitely. - Pods with a matching toleration specifying tolerationSeconds remain bound for that duration after the TimeAdded timestamp. The node lifecycle controller evicts them after this grace period elapses, unless the taint is removed first. The eviction timer starts from the TimeAdded time. Kubernetes automatically adds TimeAdded when the node controller or kubelet applies taints like node.kubernetes.io/not-ready or node.kubernetes.io/unreachable with NoExecute effect (e.g., when a node becomes NotReady or unreachable). Example taint with TimeAdded (from node status): - effect: NoExecute key: node.kubernetes.io/unreachable timeAdded: "2023-09-21T06:32:46Z" Pods tolerating this with tolerationSeconds: 300 (default for auto-added tolerations) stay for 300 seconds post-TimeAdded. This behavior is unchanged in recent versions (docs updated 2026), with minor implementation details like occasional 5s delays in taint addition observed in issues but not affecting core semantics.

Citations:


🏁 Script executed:

# Search for how TimeAdded is used in Kubernetes and related code
rg "TimeAdded" -B 3 -A 3 --type go

Repository: medik8s/common

Length of output: 2377


🏁 Script executed:

# Look for any comments or documentation about taint effects
rg "TaintEffect|NoSchedule|NoExecute|PreferNoSchedule" -B 2 -A 2 --type go | head -100

Repository: medik8s/common

Length of output: 3446


Only set TimeAdded for NoExecute taints.

corev1.Taint.TimeAdded is only meaningful for NoExecute effects. Kubernetes uses it to calculate the grace period for pod eviction based on tolerationSeconds. Setting it unconditionally makes AppendTaintToNode semantically incorrect for NoSchedule and PreferNoSchedule taints, and the integration test currently asserts incorrect behavior by expecting TimeAdded to be set on a NoSchedule taint.

🎯 Narrow the timestamping to the supported effect
 	now := metav1.Now()
-	taint.TimeAdded = &now
+	if taint.Effect == corev1.TaintEffectNoExecute {
+		taint.TimeAdded = &now
+	}
 	node.Spec.Taints = append(node.Spec.Taints, taint)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
now := metav1.Now()
taint.TimeAdded = &now
node.Spec.Taints = append(node.Spec.Taints, taint)
now := metav1.Now()
if taint.Effect == corev1.TaintEffectNoExecute {
taint.TimeAdded = &now
}
node.Spec.Taints = append(node.Spec.Taints, taint)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/taints/taints.go` around lines 87 - 89, The function that appends taints
currently sets taint.TimeAdded unconditionally; change AppendTaintToNode (or the
code block that does now := metav1.Now(); taint.TimeAdded = &now;
node.Spec.Taints = append(...)) to only set TimeAdded when taint.Effect ==
corev1.TaintEffectNoExecute, leaving TimeAdded nil for NoSchedule and
PreferNoSchedule; locate the taint append logic in pkg/taints/taints.go and wrap
the timestamp assignment in an if checking corev1.TaintEffectNoExecute, then
append the taint as before and adjust related tests that currently expect
TimeAdded for NoSchedule.

Tzif added 5 commits April 29, 2026 15:10
- Rename taintInfo variable to OutOfServiceInfo for better clarity
- Remove GetOutOfServiceTaintInfo() getter function since variable is now exported
- Update all references in tests to use OutOfServiceInfo directly
- This allows consumers to access and mock the variable for testing
@slintes

slintes commented Apr 29, 2026

Copy link
Copy Markdown
Member

please add .DS_Store to your global .gitignore :)

@slintes slintes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this probably is a lot of copy paste of existing code. But this is the chance to improve things :)
Main issue is the use a Init(...) function and storing the result in a global public var. Please rethink this.
Some more comments inline.

Comment thread pkg/taints/taints.go Outdated
Comment thread pkg/taints/taints.go Outdated
Comment thread pkg/taints/taints.go Outdated
Comment thread pkg/taints/taints.go Outdated
Comment thread pkg/taints/taints.go Outdated
Comment thread pkg/taints/taints.go Outdated
Tzif added 5 commits April 30, 2026 12:08
- Rename FilterOutTaint to Filter (package name makes 'Out' redundant)
- Use slices.DeleteFunc for cleaner implementation
- Update documentation to explain Kubernetes taint identity semantics
- Update all tests to use new function name
- Rename InitOutOfServiceTaintFlagsWithRetry to DetectOutOfServiceTaintInfoWithRetry
- Function now returns OutOfServiceTaintInfo instead of storing in global variable
- Rename initOutOfServiceTaintFlags to detectOutOfServiceTaintInfo
- Rename setOutOfTaintFlags to getOutOfServiceTaintInfo
- Update tests to check returned values instead of global variable
slintes
slintes previously approved these changes May 6, 2026

@slintes slintes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, will add a hold in case someone else want to do a review

/hold

@slintes

slintes commented May 6, 2026

Copy link
Copy Markdown
Member

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
pkg/taints/taints_test.go (1)

96-103: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Still missing the TimeAdded nil assertion on CreateOutOfServiceTaint.

The contract is that CreateOutOfServiceTaint() does not set TimeAdded (the caller does, in AddTaintToNode). Without an explicit assertion, a regression that pre-populates TimeAdded here would slip through.

✅ Add the missing assertion
 			taint := CreateOutOfServiceTaint()
 			Expect(taint.Key).To(Equal(corev1.TaintNodeOutOfService))
 			Expect(taint.Value).To(Equal("nodeshutdown"))
 			Expect(taint.Effect).To(Equal(corev1.TaintEffectNoExecute))
+			Expect(taint.TimeAdded).To(BeNil())
 		})
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/taints/taints_test.go` around lines 96 - 103, The test for
CreateOutOfServiceTaint is missing an assertion that TimeAdded is nil; update
the spec in Context("CreateOutOfServiceTaint")/It(...) to assert that the
returned taint from CreateOutOfServiceTaint() has a nil TimeAdded (i.e.,
Expect(taint.TimeAdded).To(BeNil())), so the test enforces the contract that
CreateOutOfServiceTaint does not pre-populate TimeAdded.
go.mod (1)

42-42: ⚠️ Potential issue | 🟠 Major | 💤 Low value

moby/spdystream v0.5.0 (GHSA-pc3f-x583-g7j2) still in transitive graph.

This is unchanged from the previous review — client-go v0.33.7 still pulls v0.5.0 transitively, and the fix lands only in client-go v0.36.0+. Since the explicit goal here is alignment with FAR/SNR on v0.33.7, a transitive upgrade isn't on the table for this PR; consider adding a //indirect comment or a SECURITY note documenting that the affected SPDY code path is unreachable in this codebase, and/or tracking a follow-up issue to re-evaluate once FAR/SNR move past v0.33.x.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go.mod` at line 42, The transitive dependency github.com/moby/spdystream
v0.5.0 remains in the go.mod via client-go v0.33.7; update the PR by annotating
that entry (github.com/moby/spdystream v0.5.0) with a comment documenting the
GHSA and explaining that the vulnerable SPDY code path is unreachable in this
codebase (or add a SECURITY.md note), and create or reference a follow-up issue
to re-evaluate/upgrade once client-go moves to v0.36.0+ to eliminate the
transitive vulnerability.
🧹 Nitpick comments (1)
pkg/taints/taints.go (1)

117-122: 💤 Low value

Optional: silence the recurring nilerr lint hit.

golangci-lint keeps reporting nilerr on line 119 (also visible in the static-analysis hints). The (false, nil) return is the canonical "keep polling on transient error" pattern, so the warning is a false positive — but a one-line //nolint:nilerr (or a brief inline comment) on the return would prevent CI noise from recurring across future commits.

♻️ Suggested tweak
 	pollErr := wait.PollUntilContextTimeout(ctx, interval, timeout, true, func(ctx context.Context) (bool, error) {
 		if info, err = detectOutOfServiceTaintInfo(config); err != nil {
-			return false, nil // Keep retrying
+			// Returning nil keeps wait polling on transient errors; the latest err is preserved via closure.
+			return false, nil //nolint:nilerr
 		}
 		return true, nil // Success
 	})
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/taints/taints.go` around lines 117 - 122, The linter flags the canonical
retry return (false, nil) inside the wait.PollUntilContextTimeout callback in
pollErr handling; update the anonymous func used by wait.PollUntilContextTimeout
(the closure that calls detectOutOfServiceTaintInfo) to silence the
false-positive by adding a one-line linter suppression comment (e.g., //
nolint:nilerr or a short inline comment) next to the return false, nil so
golangci-lint no longer reports nilerr for this deliberate retry path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@go.mod`:
- Line 42: The transitive dependency github.com/moby/spdystream v0.5.0 remains
in the go.mod via client-go v0.33.7; update the PR by annotating that entry
(github.com/moby/spdystream v0.5.0) with a comment documenting the GHSA and
explaining that the vulnerable SPDY code path is unreachable in this codebase
(or add a SECURITY.md note), and create or reference a follow-up issue to
re-evaluate/upgrade once client-go moves to v0.36.0+ to eliminate the transitive
vulnerability.

In `@pkg/taints/taints_test.go`:
- Around line 96-103: The test for CreateOutOfServiceTaint is missing an
assertion that TimeAdded is nil; update the spec in
Context("CreateOutOfServiceTaint")/It(...) to assert that the returned taint
from CreateOutOfServiceTaint() has a nil TimeAdded (i.e.,
Expect(taint.TimeAdded).To(BeNil())), so the test enforces the contract that
CreateOutOfServiceTaint does not pre-populate TimeAdded.

---

Nitpick comments:
In `@pkg/taints/taints.go`:
- Around line 117-122: The linter flags the canonical retry return (false, nil)
inside the wait.PollUntilContextTimeout callback in pollErr handling; update the
anonymous func used by wait.PollUntilContextTimeout (the closure that calls
detectOutOfServiceTaintInfo) to silence the false-positive by adding a one-line
linter suppression comment (e.g., // nolint:nilerr or a short inline comment)
next to the return false, nil so golangci-lint no longer reports nilerr for this
deliberate retry path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f8ad9f94-1590-4fc2-9642-ecee60206ef0

📥 Commits

Reviewing files that changed from the base of the PR and between 2ab160a and 736c9a3.

📒 Files selected for processing (3)
  • go.mod
  • pkg/taints/taints.go
  • pkg/taints/taints_test.go

Comment thread pkg/taints/taints.go
Comment on lines +117 to +129
pollErr := wait.PollUntilContextTimeout(ctx, interval, timeout, true, func(ctx context.Context) (bool, error) {
if info, err = detectOutOfServiceTaintInfo(config); err != nil {
return false, nil // Keep retrying
}
return true, nil // Success
})

// Respect context cancellation - return poll error so caller knows operation was cancelled
if pollErr != nil && (errors.Is(pollErr, context.Canceled) || errors.Is(pollErr, context.DeadlineExceeded)) {
return info, pollErr
}
// Return info and internal error: nil on success, or last failure on timeout (more specific than generic timeout)
return info, err

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Internal timeout discards the more informative err.

The trailing comment ("or last failure on timeout (more specific than generic timeout)") describes the desired behavior, but the branch on line 125 catches context.DeadlineExceeded and returns pollErr for both parent-context deadline expiry and the internal 10s timeout from PollUntilContextTimeout. In the inner-timeout case, the captured err from the last detectOutOfServiceTaintInfo call is strictly more useful than a bare "context deadline exceeded".

Distinguish the two by inspecting the parent ctx:

🔧 Suggested fix
-	// Respect context cancellation - return poll error so caller knows operation was cancelled
-	if pollErr != nil && (errors.Is(pollErr, context.Canceled) || errors.Is(pollErr, context.DeadlineExceeded)) {
-		return info, pollErr
-	}
-	// Return info and internal error: nil on success, or last failure on timeout (more specific than generic timeout)
-	return info, err
+	// Respect parent context cancellation/deadline so the caller knows the operation was aborted externally.
+	if ctxErr := ctx.Err(); ctxErr != nil {
+		return info, ctxErr
+	}
+	// On the internal 10s timeout (or success), prefer the last detection error (more specific than the generic
+	// "context deadline exceeded" returned by PollUntilContextTimeout). Returns nil on success.
+	return info, err
🧰 Tools
🪛 golangci-lint (2.12.1)

[error] 119-119: error is not nil (line 118) but it returns nil

(nilerr)

@openshift-ci

openshift-ci Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

New changes are detected. LGTM label has been removed.

@openshift-ci openshift-ci Bot removed the approved label May 7, 2026

@clobrano clobrano left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments. Moreover, also CodeRabbit gave interesting analysis, we should address its remarks :)

Comment thread pkg/taints/taints.go
Comment on lines +113 to +114
interval := 2 * time.Second // retry every 2 seconds
timeout := 10 * time.Second // for a period of 10 seconds

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate if the operators could decide the timing independently.
We could extract these values as package-level constants, or (better) pass them as arguments

Comment thread pkg/taints/taints_test.go Outdated
Expect(removed).To(BeTrue())
Expect(newTaints).To(HaveLen(2))
Expect(newTaints).To(ContainElement(taint1))
Expect(newTaints).To(ContainElement(taint3))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should also test that taint2 and duplicate have been removed

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used ConsistOf instead. wdyt?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better

Comment thread pkg/taints/taints_test.go

Context("getOutOfServiceTaintInfo", func() {
When("version is supported but not GA", func() {
DescribeTable("should return Supported=true, GA=false",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👍

@openshift-ci

openshift-ci Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Tzif-Morgen
Once this PR has been reviewed and has the lgtm label, please ask for approval from clobrano. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment thread pkg/taints/taints.go
})

// Respect context cancellation - return poll error so caller knows operation was cancelled
if pollErr != nil && (errors.Is(pollErr, context.Canceled) || errors.Is(pollErr, context.DeadlineExceeded)) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this might be a bug.
I'll break down how I understand it and correct me if I'm wrong.

Desired action:

  1. In case the context was cancelled outside of this method (likely before the 10 sec window) than we want to return the cancelled context error as it's the most relevant one.
  2. in case we catch an error during our 10 sec retry window we want to return that (last one caught) error.
    (For example if we kept retrying because we got an error X from the API Server than we should return error X)

Actual action:

  1. Working as designed
  2. We store the error we want to return in err var, but we never return it because we will exit here - our pollErr would be of context.DeadlineExceeded because our own 10 sec deadline context exceeded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants