Skip to content

[fix][broker] Prevent backlog quota cursor updates during topic close#25914

Open
Denovo1998 wants to merge 2 commits into
apache:masterfrom
Denovo1998:backlog_quota_cursor_updates_during_topic_close
Open

[fix][broker] Prevent backlog quota cursor updates during topic close#25914
Denovo1998 wants to merge 2 commits into
apache:masterfrom
Denovo1998:backlog_quota_cursor_updates_during_topic_close

Conversation

@Denovo1998

@Denovo1998 Denovo1998 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Follow-up to #25684

Motivation

BacklogQuotaManager already skips backlog quota handling when a topic is fenced or closing/deleting. However, checking the topic state only before entering the eviction path is not enough: a topic can begin close/delete after backlog quota eviction has started but before eviction mutates the slowest cursor.

This can let backlog quota eviction call skipEntries or markDelete while the topic teardown path is starting, which is the race this PR is intended to harden.

Modifications

  • Keep the existing entry guard for fenced or closing/deleting topics.
  • Add a topic close/delete read-lock helper in PersistentTopic.
  • Run direct backlog quota cursor mutations under that read lock, after checking isClosingOrDeleting().
  • Apply the guarded mutation path to size-based eviction before ManagedCursor#skipEntries.
  • Apply the guarded mutation path to non-precise time-based eviction before ManagedCursor#markDelete.
  • Keep the pending-ack cleanup hook after cursor advancement by continuing to call markDeletePositionMoveForward() inside the guarded mutation.
  • Add real broker/topic race tests that start topic close after backlog quota eviction reaches the cursor mutation point and verify the cursor mark-delete position does not move.

Verifying this change

This change added tests and can be verified as follows:

  • Added BacklogQuotaManagerTest.testSizeBacklogEvictionRaceWithTopicCloseDoesNotSkipEntries
  • Added BacklogQuotaManagerTest.testTimeBacklogEvictionRaceWithTopicCloseDoesNotMarkDelete

Verified locally with:

./gradlew :pulsar-broker:test --rerun-tasks --tests "org.apache.pulsar.broker.service.BacklogQuotaManagerTest.testSizeBacklogEvictionRaceWithTopicCloseDoesNotSkipEntries" --tests "org.apache.pulsar.broker.service.BacklogQuotaManagerTest.testTimeBacklogEvictionRaceWithTopicCloseDoesNotMarkDelete"
./gradlew :pulsar-broker:spotlessCheck :pulsar-broker:checkstyleMain :pulsar-broker:checkstyleTest
git diff --check

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

@Denovo1998

Copy link
Copy Markdown
Contributor Author

@void-ptr974

Copy link
Copy Markdown
Contributor

Thanks for the follow-up. I think adding the second check before skipEntries / markDelete is a good improvement. The original entry check only handles the case where the topic is already closing before backlog eviction starts. This PR also handles the more important race where backlog eviction has already started, and then the topic begins closing before the cursor is mutated.

One thing I think we should decide is whether this PR is intended to be a best-effort mitigation or a strict guarantee.

With the current code, there is still a small window:

  1. shouldStopEvictionOnTopicClose() returns false
  2. topic close/delete starts immediately after that
  3. eviction still continues to call skipEntries / markDelete

So this change can reduce the race window a lot, but it cannot completely guarantee that cursor mutation never happens during topic close/delete.

If best-effort mitigation is enough for this issue, I think the current direction is reasonable. If the goal is to strictly prevent cursor mutation during teardown, then we may need stronger coordination with the topic close/delete path, instead of only checking isClosingOrDeleting() before mutation.

break;
}
beforeBacklogQuotaCursorMutation(persistentTopic);
if (shouldStopEvictionOnTopicClose(persistentTopic)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is useful because it is close to the actual cursor mutation point.

The remaining subtle case is:

  1. this check returns false
  2. topic close/delete starts right after that
  3. eviction still proceeds to skipEntries / markDelete

So I think we should clarify whether this is intended as a best-effort race reduction, or whether we need stronger coordination to strictly avoid cursor mutation once topic close/delete starts.

@lhotari lhotari added this to the 5.0.0-M1 milestone Jun 2, 2026
@Denovo1998

Copy link
Copy Markdown
Contributor Author

@void-ptr974

Thanks, that is a good point. The previous version was best-effort only.

I updated the code to use the existing PersistentTopic close/delete read-write lock. Backlog quota eviction now checks isClosingOrDeleting and performs skipEntries / markDelete while holding the read lock; close/delete takes the write lock before setting the closing/deleting state. This removes the window between the state check and the cursor mutation with respect to the topic teardown state transition.

@Denovo1998 Denovo1998 requested a review from void-ptr974 June 7, 2026 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants