fix(service): wait for finalization stage before computing the participants root#72
fix(service): wait for finalization stage before computing the participants root#720xHansLee wants to merge 1 commit into
Conversation
The kernel computed the participants root from registrations read at its light-client trusted height, which can lag the chain tip. When a dealer is invalidated in the same block that advances the round to the finalization stage, a lagging read still counts that dealer as an effective participant, so the kernel's root disagrees with the chain's post-invalidation set and finalize is rejected with a participants-root mismatch. Before computing the root, wait until the light client has observed the finalization stage (the block that performs missing-dealer invalidation), then read the registrations, retrying on lag and surfacing ErrLightClientLag if the stage is never observed.
|
The fix is logically sound: waiting for the light client to observe the finalization stage before reading registrations eliminates the stale-read window that caused participants-root mismatches. Test coverage is good (4 targeted cases, including the critical "never read while lagging" invariant). One reliability concern worth noting: the new retry loop holds the gRPC goroutine for up to 10 s while using Review iteration 1 · Commit dd7eca0 · 2026-06-12T02:36:01Z |
Problem
The kernel reads participant registrations at its light-client trusted height, which can lag the chain tip. Missing-dealer invalidation happens in the same block that advances the round to the finalization stage, so a lagging read still counts an invalidated dealer as an effective participant. The kernel's participants root then disagrees with the chain's post-invalidation set and finalize is rejected with a participants-root mismatch, stalling the round.
Fix
Before computing the participants root, wait until the light client has observed the finalization stage (the block that performs invalidation), then read the registrations:
waitForFinalizationRegistrationspollsGetDKGNetwork().Stageuntil it reachesDKG_STAGE_FINALIZATION, then queries registrations. Since the trusted height is monotonic, observing that stage guarantees the read reflects the invalidation, so the root matches the chain's set.ErrLightClientLagif the stage is never observed.Tests
waitForFinalizationRegistrations: waits for stage / immediate / retry-exhausted (never reads a stale set while lagging) / network-error not retried.issue: #71