Skip to content

scheduler: reject sticky on a static host volume#28097

Open
eyupcanakman wants to merge 1 commit into
hashicorp:mainfrom
eyupcanakman:fix/feasibility-sticky-static-host-volume
Open

scheduler: reject sticky on a static host volume#28097
eyupcanakman wants to merge 1 commit into
hashicorp:mainfrom
eyupcanakman:fix/feasibility-sticky-static-host-volume

Conversation

@eyupcanakman

Copy link
Copy Markdown

Description

Setting sticky = true on a static host volume passed feasibility checking, so the scheduler placed the allocation and then failed to apply the plan with Task group volume claim insert failed: object missing primary index. Static host volumes have an empty ID, and the sticky claim is keyed by volume ID, so the claim insert hits an empty primary index. Static and dynamic host volumes share the same type = "host" jobspec, so the distinction is only known once a node is selected, which is why this can't be caught at job submission.

The fix rejects sticky on a static host volume during feasibility checking, before the allocation is placed. The check runs before the per-volume loop, so a feasible dynamic sticky volume in the same task group can't return early and skip it. A job that sets sticky = true on a static host volume will now fail feasibility with a clear constraint instead of failing the evaluation. Removing the sticky flag restores placement.

Testing & Reproduction steps

Added two tests. TestHostVolumeChecker_StaticSticky covers the feasibility checker directly. Sticky read-write and read-only on a static volume are infeasible, and non-sticky stays feasible. TestServiceSched_JobRegister_StickyStaticHostVolume runs the service scheduler end to end. On main it fails with the object missing primary index claim error, and with the fix the evaluation completes with the allocation left unplaced and the host volume constraint recorded. go test ./scheduler/feasible/ ./scheduler/ passes.

Links

Fixes #27153

Per @tgross's note on the issue, this can't be caught at job submission because the static/dynamic distinction isn't known until feasibility checking.

Contributor Checklist

  • Changelog Entry Added at .changelog/27153.txt.
  • Testing Added a feasibility-checker test and an end-to-end scheduler test.
  • Documentation No product doc change. The behavior change may need an upgrade note in the web-unified-docs repo.

Changes to Security Controls

No changes to security controls.

This contribution was developed with AI assistance (Claude Code), used to trace the feasibility path and draft the fix and tests.

@mismithhisler mismithhisler left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @eyupcanakman, thanks for the contribution! The detailed description helped provide good background here for my review. Overall looks good with one suggestion.

Comment thread scheduler/feasible/feasible.go Outdated
// sticky is only supported on dynamic host volumes. Reject it on any static
// host volume up front, before the per-volume checks below, which can return
// early for another volume in the same request set.
for _, req := range h.volumeReqs {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're already iterating the volumeReqs below, can we move some of this logic there?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, moved it into the static-volume branch of the loop. I switched the two sticky checks from early returns to continue so the loop still reaches a later volume, otherwise a sticky dynamic volume ahead of the static one in volumeReqs could short-circuit the check.

sticky is only supported on dynamic host volumes. A static host volume
requested with sticky passed feasibility, so the allocation was placed and
plan apply then failed inserting the volume claim with an empty volume ID
("object missing primary index"). Reject it in the per-volume feasibility
loop, and make the sticky checks continue instead of returning early so a
later volume in the same task group can't short-circuit the rejection.

Fixes hashicorp#27153
@eyupcanakman eyupcanakman force-pushed the fix/feasibility-sticky-static-host-volume branch from 5d38813 to 869e782 Compare June 16, 2026 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

feasibility check should fail when setting sticky on a static host volume

3 participants