Fix RetryError not matched through ChainError wrapping#387
Open
databus23 wants to merge 1 commit into
Open
Conversation
errors.Is with pointer types (without an Is() method) uses pointer identity (==), so two different *RetryError instances never match. This caused RetryErrors from trigger plugins (wrapped in ChainError) to be treated as hard errors, which aborted the reconcile and skipped the node patch — preventing cordon from taking effect during drain. Fix by using errors.As which matches by type through the error chain. Also change the eviction plugin to return RetryError for drain failures instead of hard errors, allowing the reconcile patch to succeed so that the cordon is applied to the API server.
Merging this branch changes the coverage (1 decrease, 1 increase)
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
anton-paulovich
approved these changes
May 22, 2026
| continue | ||
| } | ||
| if errors.Is(err, retryErr) { | ||
| if errors.As(err, &retryErr) { |
There was a problem hiding this comment.
Btw since go 1.26.0 we have new errors.AsType() func
if retryErr, ok := errors.AsType[*plugin.RetryError](err); ok {
...
}
Maybe you find it pretty :)
defo89
approved these changes
May 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
errors.Asinstead oferrors.Iswhen checking for*plugin.RetryErrorinstate.Apply()andcontrollers.ApplyProfiles().errors.Iswith pointer types uses==comparison, which never matches different instances of the same type through an error chain.RetryErrorfor drain failures instead of hard errors. Hard errors abort the reconcile and skip the node patch, preventing cordon from taking effect on the API server.Root cause
When a trigger plugin (e.g. eviction/drain) encounters a PDB violation, it returned a hard error wrapped in
ChainError. The callers triederrors.Is(err, &plugin.RetryError{})which always returned false (pointer identity), causing the error to be treated as fatal. This aborted the reconcile loop before the node patch, so cordon was never applied — pods got rescheduled back, creating an infinite eviction loop.Test plan
errors.AsextractsRetryErrorthroughChainErrorerrors.Isdoes NOT match (documenting the pitfall)gmake build/cover.out)