scheduler: reject sticky on a static host volume#28097
Open
eyupcanakman wants to merge 1 commit into
Open
Conversation
mismithhisler
requested changes
Jun 15, 2026
mismithhisler
left a comment
Member
There was a problem hiding this comment.
Hi @eyupcanakman, thanks for the contribution! The detailed description helped provide good background here for my review. Overall looks good with one suggestion.
| // sticky is only supported on dynamic host volumes. Reject it on any static | ||
| // host volume up front, before the per-volume checks below, which can return | ||
| // early for another volume in the same request set. | ||
| for _, req := range h.volumeReqs { |
Member
There was a problem hiding this comment.
Since we're already iterating the volumeReqs below, can we move some of this logic there?
Author
There was a problem hiding this comment.
Done, moved it into the static-volume branch of the loop. I switched the two sticky checks from early returns to continue so the loop still reaches a later volume, otherwise a sticky dynamic volume ahead of the static one in volumeReqs could short-circuit the check.
sticky is only supported on dynamic host volumes. A static host volume
requested with sticky passed feasibility, so the allocation was placed and
plan apply then failed inserting the volume claim with an empty volume ID
("object missing primary index"). Reject it in the per-volume feasibility
loop, and make the sticky checks continue instead of returning early so a
later volume in the same task group can't short-circuit the rejection.
Fixes hashicorp#27153
5d38813 to
869e782
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Setting
sticky = trueon a static host volume passed feasibility checking, so the scheduler placed the allocation and then failed to apply the plan withTask group volume claim insert failed: object missing primary index. Static host volumes have an empty ID, and the sticky claim is keyed by volume ID, so the claim insert hits an empty primary index. Static and dynamic host volumes share the sametype = "host"jobspec, so the distinction is only known once a node is selected, which is why this can't be caught at job submission.The fix rejects
stickyon a static host volume during feasibility checking, before the allocation is placed. The check runs before the per-volume loop, so a feasible dynamic sticky volume in the same task group can't return early and skip it. A job that setssticky = trueon a static host volume will now fail feasibility with a clear constraint instead of failing the evaluation. Removing thestickyflag restores placement.Testing & Reproduction steps
Added two tests.
TestHostVolumeChecker_StaticStickycovers the feasibility checker directly. Sticky read-write and read-only on a static volume are infeasible, and non-sticky stays feasible.TestServiceSched_JobRegister_StickyStaticHostVolumeruns the service scheduler end to end. Onmainit fails with theobject missing primary indexclaim error, and with the fix the evaluation completes with the allocation left unplaced and the host volume constraint recorded.go test ./scheduler/feasible/ ./scheduler/passes.Links
Fixes #27153
Per @tgross's note on the issue, this can't be caught at job submission because the static/dynamic distinction isn't known until feasibility checking.
Contributor Checklist
.changelog/27153.txt.Changes to Security Controls
No changes to security controls.
This contribution was developed with AI assistance (Claude Code), used to trace the feasibility path and draft the fix and tests.