[improve][broker] Trim orphaned bucket snapshots when ledgers are deleted#25984
[improve][broker] Trim orphaned bucket snapshots when ledgers are deleted#25984dao-jun wants to merge 2 commits into
Conversation
| synchronized (this) { | ||
| immutableBuckets.remove(range); | ||
| numberDelayedMessages.addAndGet(-bucket.getNumberBucketDelayedMessages()); | ||
| } |
There was a problem hiding this comment.
Removing a bucket from immutableBuckets might not be enough. When a mutable bucket is sealed, the first snapshot segment has already been loaded into sharedBucketPriorityQueue, and snapshotSegmentLastIndexMap can still point to this bucket. After this trim removes the range and decrements numberDelayedMessages, getScheduledMessages() can still pop those queued entries and return positions from ledgers that have already been deleted, then decrement numberDelayedMessages again.
Could we either purge the in-memory state for the trimmed bucket under the same lock, such as rebuilding sharedBucketPriorityQueue and removing matching snapshotSegmentLastIndexMap entries, or otherwise avoid trimming buckets that still have loaded segments? Please also add a test that advances the clock after trim and asserts no positions below firstLedgerId are returned and the delayed-message counter remains consistent.
There was a problem hiding this comment.
Not clearing the data of sharedBucketPriorityQueue is intentionally designed, as ManagedCursorImpl.asyncReplayEntries will filter out invalid Positions.
I did miss cleaning the snapshotSegmentLastIndexMap, this is a problem.
Double decrease numberDelayedMessages is indeed a problem, good catch.
| CompletableFuture<Void> before = trimFuture != null && !trimFuture.isDone() | ||
| ? trimFuture : CompletableFuture.completedFuture(null); | ||
| trimFuture = before.thenCompose(__ -> { |
There was a problem hiding this comment.
If clear() is called while trimFuture is still in flight, and that trim/merge chain later completes exceptionally, this before.thenCompose(...) will not execute the clear block. The whenComplete in addMessage only logs the failure and preserves the exceptional completion, so clear() can return exceptionally while leaving immutableBuckets, sharedBucketPriorityQueue, lastMutableBucket, snapshotSegmentLastIndexMap, and numberDelayedMessages uncleared.
Could we normalize the previous future before chaining clear, for example with handle/exceptionally, so clear always runs after the in-flight trim/merge settles?
There was a problem hiding this comment.
Good catch! That’s a bug for sure.
| public void testTrimRemovesOrphanedBuckets() throws Exception { | ||
| TrackerWithStorage ts = createTrackerWithMockLedger(50L, 5); | ||
|
|
||
| for (int i = 1; i <= 31; i++) { | ||
| ts.tracker.addMessage(i, i, i * 10); | ||
| } | ||
| Awaitility.await().untilAsserted(() -> | ||
| Assert.assertTrue(ts.tracker.getImmutableBuckets().asMapOfRanges().values().stream() | ||
| .noneMatch(x -> x.merging))); | ||
|
|
||
| int bucketCount = ts.tracker.getImmutableBuckets().asMapOfRanges().size(); | ||
| assertTrue(bucketCount <= 5, | ||
| "Bucket count " + bucketCount + " should be <= maxNumBuckets=5 after trim+merge"); | ||
|
|
||
| ts.tracker.getImmutableBuckets().asMapOfRanges().forEach((range, bucket) -> | ||
| assertTrue(range.lowerEndpoint() >= 50L, | ||
| "Remaining bucket range " + range + " should be >= 50")); | ||
|
|
There was a problem hiding this comment.
This test only checks that immutableBuckets no longer contains ranges below firstLedgerId. It does not verify the externally visible behavior after trim. Please advance the mock clock and call getScheduledMessages(), then assert that no position from ledgers below firstLedgerId is returned and that getNumberOfDelayedMessages() remains consistent. That would catch stale entries left in sharedBucketPriorityQueue after the bucket range is removed.
Motivation
When a topic's managed ledger trims old ledgers (via retention or compaction), the immutable bucket snapshots in BucketDelayedDeliveryTracker that cover those deleted ledger ranges become orphaned — their underlying message data no longer exists, yet they:
Prior to this change, the tracker only merged buckets when the count exceeded the limit; it never cleaned up buckets whose ledger range had been fully trimmed. This change introduces a trim step that runs before merge, deleting orphaned buckets and decrementing the delayed-message counter accordingly.
Modifications
Source (BucketDelayedDeliveryTracker.java)
Test (BucketDelayedDeliveryTrackerTest.java)
Added 4 tests using a helper that injects a mocked ManagedLedger:
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes