Skip to content

assoc: debounce bloom-based leave to tolerate stale/empty beacons#163

Closed
geonnave wants to merge 2 commits into
DotBots:developfrom
geonnave:bloom-node-empty-guard
Closed

assoc: debounce bloom-based leave to tolerate stale/empty beacons#163
geonnave wants to merge 2 commits into
DotBots:developfrom
geonnave:bloom-node-empty-guard

Conversation

@geonnave

Copy link
Copy Markdown
Contributor

Draft - pending hardware test on the testbed.

Problem

The gateway-side availability guard (#162) stopped beacons from carrying a
partially rebuilt filter, but it does not stop a joined node from leaving on
two remaining transient cases:

  1. Empty filter (mid-recompute). While the gateway recomputes, the guard
    skips the copy and the beacon's bloom_filter ships all-zero. The node's
    membership test fails against all-zero and it leaves with
    MARI_PEER_LOST_BLOOM.
  2. Stale filter (recompute pending). A node joins; the gateway marks the
    filter dirty but emits a beacon before it recomputes, so the beacon carries
    the previous complete filter that does not yet include the new node. The
    filter is non-empty, so it looks like a real eviction, and the freshly-joined
    node drops itself.

Both are transient: the next beacon, after the gateway recomputes, includes the
node again. Leaving on a single beacon turns these transients into a
join/leave bounce, which is most of the remaining slow network formation.

Change

Make the node tolerant of transient beacons instead of acting on one:

  • An empty filter carries no membership info, so it is ignored entirely.
  • A non-empty filter that omits the node increments a miss counter; the node
    only leaves after MARI_BLOOM_MISS_THRESHOLD (3) consecutive omitting
    beacons. Any beacon that contains the node, or any empty beacon, resets the
    counter. The counter is also reset on join.

A genuine eviction (the gateway permanently drops the node) still triggers,
since its beacons consistently omit the node. With the 3 beacon slots at the
start of the slotframe, 3 consecutive misses span roughly one slotframe of
sustained omission.

Validation

Builds clean. Needs a testbed run (40-node formation) to confirm the bounce is
gone and formation time drops; comparing against commit 4ea35db is the
reference point.

@geonnave

Copy link
Copy Markdown
Contributor Author

Superseded by a simpler gateway-only approach: the gateway emits a pass-all (all-ones) bloom filter in beacons while the filter is being recomputed, so no node is ever falsely evicted during that window - no node-side changes (empty-guard or miss debounce) needed. See the replacement PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant