Skip to content

fix(service): wait for finalization stage before computing the participants root#72

Open
0xHansLee wants to merge 1 commit into
mainfrom
fix/dkg-finalize-wait-invalidated
Open

fix(service): wait for finalization stage before computing the participants root#72
0xHansLee wants to merge 1 commit into
mainfrom
fix/dkg-finalize-wait-invalidated

Conversation

@0xHansLee

Copy link
Copy Markdown
Collaborator

Problem

The kernel reads participant registrations at its light-client trusted height, which can lag the chain tip. Missing-dealer invalidation happens in the same block that advances the round to the finalization stage, so a lagging read still counts an invalidated dealer as an effective participant. The kernel's participants root then disagrees with the chain's post-invalidation set and finalize is rejected with a participants-root mismatch, stalling the round.

Fix

Before computing the participants root, wait until the light client has observed the finalization stage (the block that performs invalidation), then read the registrations:

  • waitForFinalizationRegistrations polls GetDKGNetwork().Stage until it reaches DKG_STAGE_FINALIZATION, then queries registrations. Since the trusted height is monotonic, observing that stage guarantees the read reflects the invalidation, so the root matches the chain's set.
  • Retries on lag (5 × 2s) and returns ErrLightClientLag if the stage is never observed.

Tests

  • waitForFinalizationRegistrations: waits for stage / immediate / retry-exhausted (never reads a stale set while lagging) / network-error not retried.
  • Helper at 100% coverage; service suite passes.

issue: #71

The kernel computed the participants root from registrations read at its
light-client trusted height, which can lag the chain tip. When a dealer is
invalidated in the same block that advances the round to the finalization
stage, a lagging read still counts that dealer as an effective participant,
so the kernel's root disagrees with the chain's post-invalidation set and
finalize is rejected with a participants-root mismatch.

Before computing the root, wait until the light client has observed the
finalization stage (the block that performs missing-dealer invalidation),
then read the registrations, retrying on lag and surfacing ErrLightClientLag
if the stage is never observed.
@jinn-agent

jinn-agent Bot commented Jun 12, 2026

Copy link
Copy Markdown

The fix is logically sound: waiting for the light client to observe the finalization stage before reading registrations eliminates the stale-read window that caused participants-root mismatches. Test coverage is good (4 targeted cases, including the critical "never read while lagging" invariant).

One reliability concern worth noting: the new retry loop holds the gRPC goroutine for up to 10 s while using context.Background() for all RPCs and sleeping without a cancellation check. The parent FinalizeDKG already discards its context (_ context.Context), so this is consistent with the rest of the file, but the new polling window amplifies the impact — if the validator's gRPC client times out before the 10 s budget is exhausted, the server keeps making RPCs and holding the goroutine with no way to bail out early.


Review iteration 1 · Commit dd7eca0 · 2026-06-12T02:36:01Z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants