#2127 fixed a durability gap in Client::send_private_note: if the on-chain transaction committed but send_note failed, the private note could be lost. The fix persists the payload to an outbox (stored in the client's settings table) before calling the transport, then retries it during sync_note_transport or an explicit flush_relay_outbox.
This is a pretty opinionated implementation of retries so I'm a bit reluctant to make it the long term solution. For example, if only a specific note has a problem against the NTL, then it will fail any time the user wants to try to send or resend the outbox. We can however try to iterate and provide a more flexible solution in future releases. I'll try opening an issue for this following this PR
Originally posted by @igamigo in #2127 (review)
This issue tracks the follow-up work discussed during review, since the current retry model is intentionally a first pass. However, there are some issues with the implementation:
Easier to tackle:
- Failed notes retry forever:
flush_relay_outbox now attempts each entry independently, which is good. But if one note keeps failing, it stays in the outbox and gets retried on every flush forever. There is no attempt limit, backoff, or drop permanent failjures, so a poison entry guarantees one failed send_note per sync.
- Retry logic is mixed into the durable outbox flow: Transient network retries probably belong inside
NoteTransportClient, behind configurable retry/backoff settings. The durable outbox should mainly handle process death/restart durability, not be the only retry mechanism. This may be its own issue/PR.
More related to the overall design we have:
sync_state flushing old sends is a bit inconsistent: sync_note_transport currently drains the outbox as a side effect, including entries from previous manual sends. That makes sync_state responsible for hidden retry work and swallows failures that the caller may care about. It is not intuitive that you can send notes on-demand, but a Client::sync_state fires off the dispatch of previously failed notes (which might fail again)
- The outbox is stored as a settings value: The current implementation stores a serialized
Vec<NoteInfo> under the note_transport_outbox settings key. This avoided a Store-trait change, but it is not ideal long term. Also, the fact that it is a plain list allows for NoteInfo duplication which allows the user to make mistakes by inserting it multiple times.
Potential solutions
Approach 1
A cleaner long-term model may be (as suggested in the PR reviews):
send_private_note queues the note locally (perhaps on a separate set of tables instead of using the settings one)
sync_state performs the actual relay
- relay success/failure is exposed separately (maybe through
SyncSummary)
Approach 2
Decide that we drop the built-in relay outbox and that it is more an application-level concern. Because it's the user that needs to send private notes through the NTL client, it is already effectively an application level concern to decide when, how, and which notes to send through the service. This means that dealing with failures should also be one. The user may want to fall back to other mechanisms or channels for distributing notes if the primary one fails, for example, and this is not allowed in the current model. The user could still use the settings table to build this themselves.
This would imply effectively reverting #2127 and adding the retry mechanisms to the wallet or webclient.
Overall I lean towards approach 2. cc @mmagician @Dominik1999 @WiktorStarczewski for opinions
#2127 fixed a durability gap in
Client::send_private_note: if the on-chain transaction committed butsend_notefailed, the private note could be lost. The fix persists the payload to an outbox (stored in the client's settings table) before calling the transport, then retries it duringsync_note_transportor an explicitflush_relay_outbox.Originally posted by @igamigo in #2127 (review)
This issue tracks the follow-up work discussed during review, since the current retry model is intentionally a first pass. However, there are some issues with the implementation:
Easier to tackle:
flush_relay_outboxnow attempts each entry independently, which is good. But if one note keeps failing, it stays in the outbox and gets retried on every flush forever. There is no attempt limit, backoff, or drop permanent failjures, so a poison entry guarantees one failedsend_noteper sync.NoteTransportClient, behind configurable retry/backoff settings. The durable outbox should mainly handle process death/restart durability, not be the only retry mechanism. This may be its own issue/PR.More related to the overall design we have:
sync_stateflushing old sends is a bit inconsistent:sync_note_transportcurrently drains the outbox as a side effect, including entries from previous manual sends. That makessync_stateresponsible for hidden retry work and swallows failures that the caller may care about. It is not intuitive that you can send notes on-demand, but aClient::sync_statefires off the dispatch of previously failed notes (which might fail again)Vec<NoteInfo>under thenote_transport_outboxsettings key. This avoided aStore-trait change, but it is not ideal long term. Also, the fact that it is a plain list allows forNoteInfoduplication which allows the user to make mistakes by inserting it multiple times.Potential solutions
Approach 1
A cleaner long-term model may be (as suggested in the PR reviews):
send_private_notequeues the note locally (perhaps on a separate set of tables instead of using the settings one)sync_stateperforms the actual relaySyncSummary)Approach 2
Decide that we drop the built-in relay outbox and that it is more an application-level concern. Because it's the user that needs to send private notes through the NTL client, it is already effectively an application level concern to decide when, how, and which notes to send through the service. This means that dealing with failures should also be one. The user may want to fall back to other mechanisms or channels for distributing notes if the primary one fails, for example, and this is not allowed in the current model. The user could still use the settings table to build this themselves.
This would imply effectively reverting #2127 and adding the retry mechanisms to the wallet or webclient.
Overall I lean towards approach 2. cc @mmagician @Dominik1999 @WiktorStarczewski for opinions