Skip to content

[fix][ml] fix potential data lose when enable lazy cursor recovery #20814

Open
lifepuzzlefun wants to merge 2 commits into
apache:masterfrom
lifepuzzlefun:only_trim_ledger_when_all_cursor_are_recovered
Open

[fix][ml] fix potential data lose when enable lazy cursor recovery #20814
lifepuzzlefun wants to merge 2 commits into
apache:masterfrom
lifepuzzlefun:only_trim_ledger_when_all_cursor_are_recovered

Conversation

@lifepuzzlefun

@lifepuzzlefun lifepuzzlefun commented Jul 15, 2023

Copy link
Copy Markdown
Contributor

Motivation

when user enable lazyCursorRecovery.

if one cursor success recovered and ack some message, and the other cursor recovery is delayed for sometime.

if the finished cursor markdeletepos moved it may trigger internalTrimLedger and delete ledger only decide by this cursor.

so after the other cursor finished recovery, these cursor may occur "data lose" because of the early trim.

Modifications

  1. check if there is lazy recovery before trim ledgers.
  2. add programmed readEntry delay in PulsarMockLedgerHandle
  3. add CursorRecoveryInProcessException

Verifying this change

This change added tests and can be verified as follows:

add unit test in ManagedLedgerTest.onlyTrimCursorAfterLazyRecoverFinished

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

2. add programmed readEntry delay in `PulsarMockLedgerHandle`
@github-actions

Copy link
Copy Markdown

@lifepuzzlefun Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@lifepuzzlefun lifepuzzlefun changed the title [improve][ml] ManagedLedger trim operation should always triggered after all cursor has been recovered. [fix][ml] ManagedLedger trim operation should always triggered after all cursor has been recovered. Jul 15, 2023
@github-actions github-actions Bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Jul 15, 2023
@lifepuzzlefun lifepuzzlefun changed the title [fix][ml] ManagedLedger trim operation should always triggered after all cursor has been recovered. [fix][ml] ManagedLedger trim operation should be delayed before lazy cursor recovery finished. Jul 15, 2023
@lifepuzzlefun lifepuzzlefun changed the title [fix][ml] ManagedLedger trim operation should be delayed before lazy cursor recovery finished. [fix][ml] fixlazy cursor recovery Jul 16, 2023
@lifepuzzlefun lifepuzzlefun changed the title [fix][ml] fixlazy cursor recovery [fix][ml] fix datalazy cursor recovery Jul 16, 2023
@lifepuzzlefun lifepuzzlefun changed the title [fix][ml] fix datalazy cursor recovery [fix][ml] fix potential data lose when enable lazy cursor recovery Jul 16, 2023
@Technoboy- Technoboy- added this to the 3.2.0 milestone Jul 31, 2023
@lifepuzzlefun

Copy link
Copy Markdown
Contributor Author

@poorbarcode @mattisonchao Could you take a look this pr ?

@poorbarcode poorbarcode left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

static final AtomicLongFieldUpdater<ManagedLedgerImpl> LAZY_RECOVERY_IN_PROCESS = AtomicLongFieldUpdater
.newUpdater(ManagedLedgerImpl.class, "lazyRecoveryInProcess");
@SuppressWarnings("unused")
private volatile long lazyRecoveryInProcess = 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the variable uninitializedCursors is tracing the cursor, which is in initialize progress. I think the new variable lazyRecoveryInProcess is not needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review, I agree with you. The idea to use this variable here is to avoid get the synchronized lock of the managedledger instance. current uninitializedCursors is not thread safe. so each time a call to get synchronized lock is needed. I submit another to make this map thread safe. #20674. Need some guide to handle this. If acquire synchronized lock is acceptable. I will change the code in this style.

@github-actions

github-actions Bot commented Sep 7, 2023

Copy link
Copy Markdown

The pr had no activity for 30 days, mark with Stale label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs Stale type/bug The PR fixed a bug or issue reported a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants