Skip to content

Range-complete acked tasks on queue Stop#10326

Open
prathyushpv wants to merge 1 commit into
mainfrom
ppv/queue-final-checkpoint-on-stop
Open

Range-complete acked tasks on queue Stop#10326
prathyushpv wants to merge 1 commit into
mainfrom
ppv/queue-final-checkpoint-on-stop

Conversation

@prathyushpv
Copy link
Copy Markdown
Contributor

@prathyushpv prathyushpv commented May 19, 2026

What changed?

queueBase.Stop now runs a best-effort doCheckpoint(true) that range-deletes already-acked task rows via RangeCompleteHistoryTasks. The shard-state write (UpdateShard) is intentionally skipped — by the time engine.Stop runs from FinishStop, the shard is already Stopped and updateShardInfo would reject it anyway.

Why?

Without this, the next shard owner reloads the task rows acked between the last periodic checkpoint and shutdown, and re-dispatches them. Most visible on the transfer queue (high throughput, exact-key deletion). Deleting the rows is enough — the next owner's pagination returns empty for the stale reader-scope range, the slice compacts on the first post-startup checkpoint, and no tasks are reprocessed.

How did you test it?

  • built
  • covered by existing tests

@prathyushpv prathyushpv requested review from a team as code owners May 19, 2026 16:39
@prathyushpv prathyushpv force-pushed the ppv/queue-final-checkpoint-on-stop branch from bade8e1 to 8071852 Compare May 19, 2026 17:14
@prathyushpv prathyushpv changed the title Flush queue checkpoint on shutdown Range-complete acked tasks on queue Stop May 19, 2026
@prathyushpv prathyushpv force-pushed the ppv/queue-final-checkpoint-on-stop branch 2 times, most recently from ed37dc2 to 9730acd Compare May 19, 2026 17:25
queueBase.Stop now does a best-effort doCheckpoint(true) that range-
deletes already-acked task rows. Without this, the next shard owner
reloads stale task rows from the gap between the last periodic
checkpoint and shutdown and re-dispatches them — most visible on the
transfer queue.

The shard state write is intentionally skipped on the shutdown path:
by the time engine.Stop runs from FinishStop, the shard is already
Stopped and updateShardInfo would reject it. The next owner instead
relies on the rangeCompleteTasks deletion — its pagination over the
stale reader-scope range returns empty for deleted rows, and the slice
compacts on the first post-startup checkpoint.
@prathyushpv prathyushpv force-pushed the ppv/queue-final-checkpoint-on-stop branch from 9730acd to 580b598 Compare May 19, 2026 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant