Skip to content

bjq: add operator set-config ability and ability to manage multiple queues#28078

Open
mismithhisler wants to merge 14 commits into
f-batch-job-queuefrom
f-bjq-queue-mgr-and-set
Open

bjq: add operator set-config ability and ability to manage multiple queues#28078
mismithhisler wants to merge 14 commits into
f-batch-job-queuefrom
f-bjq-queue-mgr-and-set

Conversation

@mismithhisler

Copy link
Copy Markdown
Member

Description

These changes add the batch queue manager which allows the nomad server to manage various queues for both the global scheduler config, and various node pool scheduler configs (Enterprise feature). This change allows the nomad operator scheduler set-config to update queue's for a running server.

Testing & Reproduction steps

Links

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad product documentation, which is stored in the
    web-unified-docs repo. Refer to the web-unified-docs contributor guide for docs guidelines.
    Please also consider whether the change requires notes within the upgrade
    guide
    . If you would like help with the docs, tag the nomad-docs team in this PR.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.
  • If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

@mismithhisler mismithhisler self-assigned this Jun 2, 2026
@mismithhisler mismithhisler marked this pull request as ready for review June 3, 2026 20:14
@mismithhisler mismithhisler requested review from a team as code owners June 3, 2026 20:14

@allisonlarson allisonlarson left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I'll have more questions, but starting out with a couple of them

Comment thread nomad/queues/batch_queue_manager.go Outdated
if b.state == nil {
b.state = state
}
b.createQueues()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like its changing my mental model of the queue's lifecycles, but I can't tell if its an issue or not. If we create a new set of queues when this is enabled, do we lose anything? I think at some point there was a comment about rebuilding the state during the SetEnabled call, but that might have been misplaced?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah similar to eval broker and deployments watcher, the Nomad server isn't going to be doing this work if it's not the leader. Once a server becomes the leader, it's actually going to do the eval broker logic, and in this case, it's going to build it's queue state and actually start processing the queue.

Comment thread nomad/leader.go Outdated
return
}

queue, err := NewQueue(b.state, conf, b.broker, b.logger)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a queue is updated, does it loose all of the queued workloads still waiting? I think thats the nagging question I didn't have words for in my previous review (sorry!)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, but then it would be rebuilt when the new queue gets started. That's probably not as efficient as an in place update of an existing queue, but I figured we could revisit if this actually needs better performance?

But now that I'm looking at this after a couple days I'm noticing some issues with how we wait for job placement and how we use Stop() that need to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants