fix: QBFT#417
Conversation
|
@claude review the changes |
|
Claude finished @iamquang95's task in 7m 6s —— View job Claude PR Review — fix: QBFT
SummarySolid hardening PR. Three real fixes (FakeClock Strengths
Issues & Questions1.
|
varex83agent
left a comment
There was a problem hiding this comment.
Solid hardening pass on QBFT — the chain-split tests, fuzz cases, and Byzantine-input filters meaningfully tighten the implementation, and the compare refactor finally lets the round timer win over a blocked callback. A few items deserve attention before this fully matches Charon's robustness:
Minor (worth addressing):
comparecallback exiting/panicking without sending on the err channel kills the QBFT instance (UnexpectedCompareErrorpath). Charon Go quietly waits for the round timer instead — Pluto is strictly less resilient.- Each
UPON_JUSTIFIED_PRE_PREPAREnow spawns a detached OS thread. A callback that ignores the cancellation token leaks a thread per invocation; over hours this becomes a slow-burn DoS. (Documented as Charon parity, but OS threads aren't goroutines.) thread::sleep(Duration::from_millis(10))is used as test synchronization incompare_parent_cancel_cancels_callback_token— replace with acompare_started_rxsignal like the sister test already does.make_is_leaderreshapes Go's modulo so process 4 wins at round 4 where Charon would elect no leader. Hidden test-only parity gap that may mask bugs in chain-split scenarios.- The new
valid_round_change_prepared_roundByzantine check rejects messages Charon currently accepts — coordinate with the Charon team to add the same check upstream, or log a warning when it fires.
Plus several nits (naming, a dead filter, missing docs, missing boundary tests). Verdict: COMMENT — none of these block merge, but the compare-callback robustness deserves a follow-up.
| @@ -98,56 +98,86 @@ impl FakeClock { | |||
| } | |||
| } | |||
There was a problem hiding this comment.
nit: the previous impl Drop for FakeClock auto-cancelled timers when the clock went out of scope. Removing it is correct for the new test pattern (FakeClock is Clone; threads hold their own clones, so one clone being dropped must not cancel the others), but no comment explains the design. Add a one-line comment near cancel documenting the explicit-cleanup expectation so a future contributor doesn't re-introduce Drop — and consider a test-helper macro that wraps the cts.cancel(); clock.cancel(); panic!(...) boilerplate, since every panic path in test_qbft / test_qbft_chain_split must remember to cancel both before panicking.
This PR