Skip to content

Improve robustness of NTL flows #2257

Description

@igamigo

#2127 fixed a durability gap in Client::send_private_note: if the on-chain transaction committed but send_note failed, the private note could be lost. The fix persists the payload to an outbox (stored in the client's settings table) before calling the transport, then retries it during sync_note_transport or an explicit flush_relay_outbox.

This is a pretty opinionated implementation of retries so I'm a bit reluctant to make it the long term solution. For example, if only a specific note has a problem against the NTL, then it will fail any time the user wants to try to send or resend the outbox. We can however try to iterate and provide a more flexible solution in future releases. I'll try opening an issue for this following this PR

Originally posted by @igamigo in #2127 (review)

This issue tracks the follow-up work discussed during review, since the current retry model is intentionally a first pass. However, there are some issues with the implementation:

Easier to tackle:

  • Failed notes retry forever:flush_relay_outbox now attempts each entry independently, which is good. But if one note keeps failing, it stays in the outbox and gets retried on every flush forever. There is no attempt limit, backoff, or drop permanent failjures, so a poison entry guarantees one failed send_note per sync.
  • Retry logic is mixed into the durable outbox flow: Transient network retries probably belong inside NoteTransportClient, behind configurable retry/backoff settings. The durable outbox should mainly handle process death/restart durability, not be the only retry mechanism. This may be its own issue/PR.

More related to the overall design we have:

  • sync_state flushing old sends is a bit inconsistent: sync_note_transport currently drains the outbox as a side effect, including entries from previous manual sends. That makes sync_state responsible for hidden retry work and swallows failures that the caller may care about. It is not intuitive that you can send notes on-demand, but a Client::sync_state fires off the dispatch of previously failed notes (which might fail again)
  • The outbox is stored as a settings value: The current implementation stores a serialized Vec<NoteInfo> under the note_transport_outbox settings key. This avoided a Store-trait change, but it is not ideal long term. Also, the fact that it is a plain list allows for NoteInfo duplication which allows the user to make mistakes by inserting it multiple times.

Potential solutions

Approach 1

A cleaner long-term model may be (as suggested in the PR reviews):

  • send_private_note queues the note locally (perhaps on a separate set of tables instead of using the settings one)
  • sync_state performs the actual relay
  • relay success/failure is exposed separately (maybe through SyncSummary)

Approach 2

Decide that we drop the built-in relay outbox and that it is more an application-level concern. Because it's the user that needs to send private notes through the NTL client, it is already effectively an application level concern to decide when, how, and which notes to send through the service. This means that dealing with failures should also be one. The user may want to fall back to other mechanisms or channels for distributing notes if the primary one fails, for example, and this is not allowed in the current model. The user could still use the settings table to build this themselves.

This would imply effectively reverting #2127 and adding the retry mechanisms to the wallet or webclient.


Overall I lean towards approach 2. cc @mmagician @Dominik1999 @WiktorStarczewski for opinions

Metadata

Metadata

Assignees

No one assigned

    Labels

    note transportRelated to note transport layer integration
    No fields configured for Refactoring.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions