Skip to content

Overlay V2 cleanup#5296

Open
drebelsky wants to merge 4 commits into
stellar:overlay-v2-sharedfrom
drebelsky:v2-clean-up-data
Open

Overlay V2 cleanup#5296
drebelsky wants to merge 4 commits into
stellar:overlay-v2-sharedfrom
drebelsky:v2-clean-up-data

Conversation

@drebelsky
Copy link
Copy Markdown
Contributor

@drebelsky drebelsky commented May 27, 2026

The goal of this PR is to create a more stable baseline for experiments to compare against. In particular, the goal is to reach a steady state with data sizes being capped and data being regularly cleaned up.

Changes

  • RemoveTxsFromMempool (which is called on ledger close) now also calls evict_expired to remove stale TXs
  • PendingRequests::process_timeouts now evicts the hashes that were given up on
  • Unify the two LRUs in InvTracker
  • Switch from removing random to removing last tx set in TxSetCache and re-order eviction after insertion.

Not changed since we're assuming non-malicious peers + a reasonably bounded number of peers

  • InvBatcher keeps an entry in the map for every peer that ever existed

  • SharedState::peer_streams (unbounded, but limited to total number of connected peers)

  • App::pending_scp_state_requests (unbounded, but as long as nodes aren't falling out of sync, should only have at most one entry per peer (from the message sent on start up))

  • App::{known_peers, peer_hostnames, configured_peers}: all of these are bounded by the number of configured peers

  • App::local_addrs: there shouldn't be too many local addresses

  • The following LRU cache sizes remain unchanged and cleanup remains just LRU eviction. The sizes are small enough that I think we should hit the steady state full usage relatively quickly (although, I'm still examining the sizes for scp_seen and scp_sent_to)

    • InvTracker's cache(s)
    • SharedState::{scp_seen, tx_seen, scp_sent_to, tx_set_sources}
    • TxBuffer

Otherwise unchanged:

  • I left the channels alone since, hopefully under normal load these shouldn't start backing up.
  • Claude suggests that there is some unbounded state when accepting peers, but this will be fine for benchmarking
  • SharedState::pending_txset_requests: if the request isn't responded to, these continue to take up space. For benchmarking, this should only happen if a peer disconnects, in which case they do get cleaned up, and we try to fetch from the next peer. This is probably worth addressing with some reasonable timeouts later.

@drebelsky
Copy link
Copy Markdown
Contributor Author

For many of the LRU caches without cleanup, it might be worth considering switching to some form of TTLCache (although, the decreased cache locality from lru probably isn't that substantial in our workload).

@drebelsky drebelsky changed the title V2 clean up data Overlay V2 cleanup May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant