Skip to content

TXM: Rpc liveness#562

Merged
norswap merged 8 commits into
masterfrom
gabriel/txm-rpc-liveness
Apr 10, 2025
Merged

TXM: Rpc liveness#562
norswap merged 8 commits into
masterfrom
gabriel/txm-rpc-liveness

Conversation

@GabrielMartinezRodriguez

@GabrielMartinezRodriguez GabrielMartinezRodriguez commented Mar 26, 2025

Copy link
Copy Markdown
Contributor

Linked Issues

Description

Added a liveness monitor to the Transaction Manager

  • Include all relevant context (but no need to repeat the issue's content).
  • Draw attention to new, noteworthy & unintuitive elements.
Toggle Checklist

Checklist

Basics

  • B1. I have applied the proper label & proper branch name (e.g. norswap/build-system-caching).
  • B2. This PR is not so big that it should be split & addresses only one concern.
  • B3. The PR targets the lowest branch it can (ideally master).

Reminder: PR review guidelines

Correctness

  • C1. Builds and passes tests.
  • C2. The code is properly parameterized & compatible with different environments (e.g. local,
    testnet, mainnet, standalone wallet, ...).
  • C3. I have manually tested my changes & connected features.

< INDICATE BROWSER, DEMO APP & OTHER ENV DETAILS USED FOR TESTING HERE >

< INDICATE TESTED SCENARIOS (USER INTERFACE INTERACTION, CODE FLOWS) HERE >

  • C4. I have performed a thorough self-review of my code after submitting the PR,
    and have updated the code & comments accordingly.

Architecture & Documentation

  • D1. I made it easy to reason locally about the code, by (1) using proper abstraction boundaries,
    (2) commenting these boundaries correctly, (3) adding inline comments for context when needed.
  • D2. All public-facing APIs & meaningful (non-local) internal APIs are properly documented in code
    comments.
  • D3. If appropriate, the general architecture of the code is documented in a code comment or
    in a Markdown document.
  • D4. An appropriate Changeset has been generated (and committed) for changes that touch npm published packages (currently packages/core and packages/react), see here for more info.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Mar 26, 2025

Copy link
Copy Markdown

Deploying happychain with  Cloudflare Pages  Cloudflare Pages

Latest commit: c884815
Status: ✅  Deploy successful!
Preview URL: https://2608acea.happychain.pages.dev
Branch Preview URL: https://gabriel-txm-rpc-liveness.happychain.pages.dev

View logs

@GabrielMartinezRodriguez GabrielMartinezRodriguez mentioned this pull request Mar 26, 2025
11 tasks
@GabrielMartinezRodriguez GabrielMartinezRodriguez changed the title feat(txm): rpc liveness TXM: Rpc liveness Mar 26, 2025
@GabrielMartinezRodriguez GabrielMartinezRodriguez added the reviewing-1 Ready for, or undergoing first-line review label Mar 26, 2025
@GabrielMartinezRodriguez GabrielMartinezRodriguez marked this pull request as ready for review March 26, 2025 14:49
@linear

linear Bot commented Mar 26, 2025

Copy link
Copy Markdown
HAPPY-366 TXM: Add RPC liveness monitor

Goal: avoid creating an unbounded amount of attempts when the RPC is down, which then creates a lot of load on the service or on the RPC. At the same time, we want to retry once the RPC comes back up.

This service can receive pings from other components (e.g. block monitor or tx submitter) to determine if the service is alive.

This is a policy that could be customized by the user.

Comment thread packages/txm/lib/NonceManager.ts Outdated
return
}

this.txmgr.rpcLivenessMonitor.onSuccess()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onSuccess/onFailure sound like callback listeners to me

onSuccess(args => console.log("success", args))

maybe trackSuccess() or something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, much better!

Comment thread packages/txm/lib/TransactionManager.ts
Comment thread packages/txm/lib/TransactionManager.ts Outdated
* @default 2000 (2 seconds)
* @unit milliseconds
*/
livenessPingInterval?: number

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this was supposed to be used in conjunction with livenessSuccessCount but dont see it anywhere

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, good catch. It was renamed to livenessCheckInterval, and I forgot to remove livenessPingInterval

Comment thread packages/txm/lib/telemetry/metrics.ts
Comment thread packages/txm/lib/RpcLivenessMonitor.ts Outdated
occurredAt: new Date(),
success: true,
})
this.checkIfDown()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if that doesn't add a lot of overhead? If we're making 1000 RPC calls per second (which is not insane, each tx might require a few of these calls), then we're calling this 1000/s, and there are ~10k events in the event window, so 1000 times per second we're filtering through a list of 10,000 events.

Maybe we could maintain a rolling list of counters? Like for 10s period, 10 counters for success events, 10 counters for error events. And just maintain a single timestamp corresponding to the second-granularity timestamp of the oldest counters?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @aodhgan for an opinion here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. I refactored the code, and I believe this approach is much better

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! If we ever need to optimize this more, we can replace the object by an array, and we could also update the count dynamically instead of recomputing it in ratioOfSuccess but this works nicely, merging this.

@norswap norswap added updating Updating after review and removed reviewing-2 Ready for, or undergoing final review labels Apr 8, 2025
@GabrielMartinezRodriguez GabrielMartinezRodriguez force-pushed the gabriel/txm-rpc-liveness branch 3 times, most recently from a3d1d4d to 0c8f54d Compare April 10, 2025 08:37
@GabrielMartinezRodriguez GabrielMartinezRodriguez force-pushed the gabriel/txm-rpc-liveness branch 2 times, most recently from d72d783 to 75f46bf Compare April 10, 2025 09:12
Base automatically changed from gabriel/fix-randomnness to master April 10, 2025 09:17
@GabrielMartinezRodriguez GabrielMartinezRodriguez added reviewing-2 Ready for, or undergoing final review and removed updating Updating after review labels Apr 10, 2025
@norswap norswap merged commit d22cb22 into master Apr 10, 2025
@norswap norswap deleted the gabriel/txm-rpc-liveness branch April 10, 2025 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

reviewing-2 Ready for, or undergoing final review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants