Skip to content

Unify pagination tokenizers on a shared helper (+ fix Namespace+ID cursor ordering)#4

Open
afreidah wants to merge 1 commit into
mainfrom
pagination-tokenizers-shared-helper
Open

Unify pagination tokenizers on a shared helper (+ fix Namespace+ID cursor ordering)#4
afreidah wants to merge 1 commit into
mainfrom
pagination-tokenizers-shared-helper

Conversation

@afreidah

Copy link
Copy Markdown
Owner

What

This is the larger-scope alternative to hashicorp#28178 (the narrow /v1/jobs/statuses ModifyIndex cursor fix), opened on my fork so it can be linked for a side-by-side comparison and a scope decision.

It consolidates the four pagination tokenizers in nomad/state/paginator/tokenizer.go - IDTokenizer, NamespaceIDTokenizer, CreateIndexAndIDTokenizer, and ModifyIndexAndNamespaceIDTokenizer - onto a single tokenAndCompare helper. Each tokenizer now just declares its ordered fields (numeric or string); the helper serializes the .-joined token and compares it field-by-field, in order, matching how memdb iterates the underlying index.

Two payoffs:

  1. A tiebreaker-less cursor (the original /v1/jobs/statuses bug) becomes structurally impossible - every tokenizer carries its full key.
  2. It fixes a second, latent pagination bug in NamespaceIDTokenizer for free (below).

The Namespace+ID bug

NamespaceIDTokenizer built "<namespace>.<id>" and compared it as a whole string. memdb orders namespaced objects by the (Namespace, ID) compound index, whose fields are NUL-separated - and NUL sorts before every printable byte, so the index effectively compares namespace, then id. A whole-string compare with a . separator does not: when one namespace name is a prefix of another that differs by a dash (e.g. team vs team-a), "team.j1" vs "team-a.j1" compares . (0x2E) against - (0x2D), so the token orders team-a before team - the opposite of the state store.

Result, when paginating across those namespaces: jobs in team are duplicated across pages, jobs in team-a become unreachable, and the cursor can return the same next_token forever. This affects every endpoint that uses the Namespace+ID cursor (jobs, variables, CSI/host volumes, service registrations, and others).

Comparing field-by-field (namespace, then id) - which the shared helper does for all tokenizers - matches the memdb order and fixes it. The token format is unchanged (namespace.id), so existing and in-flight tokens stay valid across an upgrade.

Reproduction

A/B harness, namespaces team and team-a, 4 jobs each, per_page=4:
https://github.com/afreidah/nomad/blob/repro-jobs-statuses-28132/repro-28132/ns-order-demo.sh

Before:

page 1: 4 job(s)  next_token=team-a.j1
page 2: 4 job(s)  next_token=team-a.j1   <- same token it was given; repeats forever
duplicated:     team/j1 team/j2 team/j3 team/j4
never returned: team-a/j1 team-a/j2 team-a/j3 team-a/j4

After: every job is returned exactly once and the walk terminates.

Changes

  • nomad/state/paginator/tokenizer.go - add tokenAndCompare + tokenField; reimplement all four tokenizers on top of it. Token formats and the legacy bare-integer fallback are preserved.
  • nomad/state/paginator/tokenizer_test.go - unit tests for the Namespace+ID tiebreaker (including the team/team-a case) and edges.
  • nomad/job_endpoint_test.go - TestJobEndpoint_ListJobs_NamespacePagination, a Job.List walk across the team/team-a namespaces.

Testing

The new integration test fails on main and passes with this change:

FAIL (on main): pagination did not terminate after 11 pages (stall);
   seen = team/j1:6 team/j2:6 team/j3:5 team-a/j1:5   (team-a/j2, team-a/j3 never returned)
PASS (with this change): every job exactly once, walk terminates
  • All existing tokenizer tests pass.
  • List/pagination tests for every consumer endpoint pass (jobs, variables, CSI, host volumes, service registrations, node pools, evals, allocs, deployments, ACL).

Scope

This supersets hashicorp#28178. Open question for the maintainers: keep the narrow hashicorp#28178 and track the Namespace+ID bug and the refactor separately, or take this unified version instead. Equally happy either way - that's the call I'd like your input on.

AI usage

Flagging for transparency, consistent with hashicorp#28178. The root-cause analysis, the reproduction, and the verification are mine: I confirmed the team/team-a failure and the before/after using the harness and the integration test. This refactor's Go was written with AI assistance (the shared tokenAndCompare helper and the new tests), working from that analysis and the agreed design - unlike hashicorp#28178, where the change was a one-line bool flip and no AI touched the source. I have reviewed and understand every line and I stand behind it.

…e+ID cursor ordering

Consolidate the four pagination tokenizers (ID, Namespace+ID, CreateIndex+ID,
ModifyIndex+Namespace+ID) onto a single tokenAndCompare helper that builds a
'.'-joined token from an ordered list of numeric/string fields and compares it
field-by-field, matching the order memdb iterates the underlying index.

This fixes a latent bug in NamespaceIDTokenizer: it compared the joined
"<namespace>.<id>" token as a whole string, so a namespace whose name was a
prefix of another differing by a dash (e.g. "team" vs "team-a") ordered
"team-a.*" before "team.*" (because '.' > '-'), disagreeing with the state
store's (Namespace, ID) order. That made pagination duplicate jobs in one
namespace and skip the other, repeating a page forever.

Add unit tests for the Namespace+ID tiebreaker and a Job.List integration test
(TestJobEndpoint_ListJobs_NamespacePagination) that walks pages across the
"team"/"team-a" namespaces; it fails before this change and passes after.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant