Follow-up to #28132 / #28158 — this is the pagination issue I flagged on that PR, written up so it doesn't get lost.
What's going on
#28158 made the jobs modify_index index non-unique so jobs sharing a ModifyIndex stop disappearing from /v1/jobs/statuses. Good, but it uncovers a second bug: that endpoint paginates with ModifyIndexTokenizer, which uses the bare ModifyIndex as its cursor. That only worked because the ModifyIndex was unique. Now that it isn't, the cursor can't tell two jobs apart at a page boundary, so:
- a job at the boundary gets returned again on the next page (duplicate), and
- if more jobs share a ModifyIndex than fit in one
per_page, the cursor never gets past that group - the same page repeats forever and you can't reach anything older.
The jobs aren't actually gone (one big unpaginated request returns them all); it's purely the cursor.
Reproduction
A/B harness, 1 server + 1 client, no Consul. Step 7 walks the endpoint page by page following next_token: https://github.com/afreidah/nomad/tree/repro-jobs-statuses-28132/repro-28132
With #28158 in place, small per_page over jobs that share a ModifyIndex:
==> 7. Full pagination walk (next_token cursor, like the web UI)
page 1: 6 job(s) next_token=77
page 2: 6 job(s) next_token=76
page 3: 6 job(s) next_token=76
walk could not finish: page 3 returned the same next_token (76) it was given — repeats forever.
duplicated: j03 j04 j05 j20
never returned: j09 j10 j11 j12 j13 j14
Root cause
ModifyIndexTokenizer in nomad/state/paginator/tokenizer.go builds the token from ModifyIndex alone, no tiebreaker. The paginator resumes a page by skipping every item strictly past the cursor and including from the first item that equals it - which assumes the token points at exactly one job. With a non-unique index that's no longer true: "equals the cursor" now matches the whole group sharing that ModifyIndex, so the paginator resumes at the front of the group (re-emitting ones already shown), and because the overflow item that sets the next cursor is itself in that group, the cursor never advances.
Fix
(note: the issue is real, I'm still digging around the code making sure I understand it all adaquately, so please correct any wrong assumptions/understandings I have here)
Give the token a real position instead of a bare sort value: <modifyIndex>.<namespace>.<id> - compare the index numerically, then namespace, then ID.
- Namespace, not just ID:
modify_index spans all namespaces and memdb orders ties by the jobs key (Namespace, ID), so the token has to order the same way. The existing CreateIndexAndIDTokenizer only carries ID, so it can't be reused as-is here - the token needs namespace in it too.
- Still accept a plain-integer token for rolling upgrades.
- It's a server + UI change: the jobs page does integer math on
next_token (ui/app/controllers/jobs/index.js: +this.cursorAt + 1, and builds tokens from modifyIndex for prev/last). The UI needs to treat the token as opaque and keep a back-history stack. The endpoint test assumes integer tokens too.
Job.Statuses is the only consumer, so it's contained.
Potential tokenizer minor refactor opportunity
There are four of these in nomad/state/paginator/tokenizer.go, and they're all the same basic idea - a sort key plus a tiebreaker, serialized to a string:
IDTokenizer -> id
NamespaceIDTokenizer -> namespace.id
CreateIndexAndIDTokenizer -> createIndex.id
ModifyIndexTokenizer -> modifyIndex <- the one that bit us, no tiebreaker
Instead of separate hand-rolled functions they could share one helper that builds a token from an ordered list of fields (numeric or string) and compares them in order. Then the statuses fix is just modifyIndex + namespace + id, and a tiebreaker-less cursor becomes impossible.
so the question is: add the one focused tokenizer, or build that shared helper and move these onto it? I'd love to work this issue but I defer to you guys on the design decision there. Happy to work it either way just wanted to check with you guys on making that slight refactor part of the bug fix or not.
Affects main (2.0.x) post-#28158 and any backports.
Follow-up to #28132 / #28158 — this is the pagination issue I flagged on that PR, written up so it doesn't get lost.
What's going on
#28158 made the jobs
modify_indexindex non-unique so jobs sharing a ModifyIndex stop disappearing from/v1/jobs/statuses. Good, but it uncovers a second bug: that endpoint paginates withModifyIndexTokenizer, which uses the bare ModifyIndex as its cursor. That only worked because the ModifyIndex was unique. Now that it isn't, the cursor can't tell two jobs apart at a page boundary, so:per_page, the cursor never gets past that group - the same page repeats forever and you can't reach anything older.The jobs aren't actually gone (one big unpaginated request returns them all); it's purely the cursor.
Reproduction
A/B harness, 1 server + 1 client, no Consul. Step 7 walks the endpoint page by page following
next_token: https://github.com/afreidah/nomad/tree/repro-jobs-statuses-28132/repro-28132With #28158 in place, small
per_pageover jobs that share a ModifyIndex:Root cause
ModifyIndexTokenizerinnomad/state/paginator/tokenizer.gobuilds the token from ModifyIndex alone, no tiebreaker. The paginator resumes a page by skipping every item strictly past the cursor and including from the first item that equals it - which assumes the token points at exactly one job. With a non-unique index that's no longer true: "equals the cursor" now matches the whole group sharing that ModifyIndex, so the paginator resumes at the front of the group (re-emitting ones already shown), and because the overflow item that sets the next cursor is itself in that group, the cursor never advances.Fix
(note: the issue is real, I'm still digging around the code making sure I understand it all adaquately, so please correct any wrong assumptions/understandings I have here)
Give the token a real position instead of a bare sort value:
<modifyIndex>.<namespace>.<id>- compare the index numerically, then namespace, then ID.modify_indexspans all namespaces and memdb orders ties by the jobs key(Namespace, ID), so the token has to order the same way. The existingCreateIndexAndIDTokenizeronly carries ID, so it can't be reused as-is here - the token needs namespace in it too.next_token(ui/app/controllers/jobs/index.js:+this.cursorAt + 1, and builds tokens frommodifyIndexfor prev/last). The UI needs to treat the token as opaque and keep a back-history stack. The endpoint test assumes integer tokens too.Job.Statusesis the only consumer, so it's contained.Potential tokenizer minor refactor opportunity
There are four of these in
nomad/state/paginator/tokenizer.go, and they're all the same basic idea - a sort key plus a tiebreaker, serialized to a string:IDTokenizer->idNamespaceIDTokenizer->namespace.idCreateIndexAndIDTokenizer->createIndex.idModifyIndexTokenizer->modifyIndex<- the one that bit us, no tiebreakerInstead of separate hand-rolled functions they could share one helper that builds a token from an ordered list of fields (numeric or string) and compares them in order. Then the statuses fix is just
modifyIndex + namespace + id, and a tiebreaker-less cursor becomes impossible.so the question is: add the one focused tokenizer, or build that shared helper and move these onto it? I'd love to work this issue but I defer to you guys on the design decision there. Happy to work it either way just wanted to check with you guys on making that slight refactor part of the bug fix or not.
Affects main (2.0.x) post-#28158 and any backports.