assoc: debounce bloom-based leave to tolerate stale/empty beacons#163
Closed
geonnave wants to merge 2 commits into
Closed
assoc: debounce bloom-based leave to tolerate stale/empty beacons#163geonnave wants to merge 2 commits into
geonnave wants to merge 2 commits into
Conversation
AI-assisted: Claude Opus 4.8
AI-assisted: Claude Opus 4.8
Contributor
Author
|
Superseded by a simpler gateway-only approach: the gateway emits a pass-all (all-ones) bloom filter in beacons while the filter is being recomputed, so no node is ever falsely evicted during that window - no node-side changes (empty-guard or miss debounce) needed. See the replacement PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft - pending hardware test on the testbed.
Problem
The gateway-side availability guard (#162) stopped beacons from carrying a
partially rebuilt filter, but it does not stop a joined node from leaving on
two remaining transient cases:
skips the copy and the beacon's
bloom_filterships all-zero. The node'smembership test fails against all-zero and it leaves with
MARI_PEER_LOST_BLOOM.filter dirty but emits a beacon before it recomputes, so the beacon carries
the previous complete filter that does not yet include the new node. The
filter is non-empty, so it looks like a real eviction, and the freshly-joined
node drops itself.
Both are transient: the next beacon, after the gateway recomputes, includes the
node again. Leaving on a single beacon turns these transients into a
join/leave bounce, which is most of the remaining slow network formation.
Change
Make the node tolerant of transient beacons instead of acting on one:
only leaves after
MARI_BLOOM_MISS_THRESHOLD(3) consecutive omittingbeacons. Any beacon that contains the node, or any empty beacon, resets the
counter. The counter is also reset on join.
A genuine eviction (the gateway permanently drops the node) still triggers,
since its beacons consistently omit the node. With the 3 beacon slots at the
start of the slotframe, 3 consecutive misses span roughly one slotframe of
sustained omission.
Validation
Builds clean. Needs a testbed run (40-node formation) to confirm the bounce is
gone and formation time drops; comparing against commit
4ea35dbis thereference point.