fix(audit-sweep): batch B — sync/consensus process.exit, fee-sum assertion, PROD gas-balance guard#883
fix(audit-sweep): batch B — sync/consensus process.exit, fee-sum assertion, PROD gas-balance guard#883tcsenpai wants to merge 7 commits into
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Warning Review limit reached
More reviews will be available in 2 minutes and 6 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryBatch B of the audit sweep eliminates the remaining
Confidence Score: 5/5Safe to merge; all four changed paths behave correctly under their intended failure conditions. The three issues raised in the prior review round are all corrected in this diff. The only remaining note is that the fee-sum equality check described in the PR description is not literally present in the code, but the code comment explains the reasoning and the independent per-component checks cover the realistic failure modes. No files require special attention; applyGasFeeSeparation.ts has a minor discrepancy between the PR description's sum-assertion claim and the actual implementation, but the existing per-component validation is adequate for the current calculateFeeBreakdown contract. Important Files Changed
Reviews (8): Last reviewed commit: "docs(audit-sweep): correct retry-guarant..." | Re-trigger Greptile |
…rtion, PROD gas-balance guard
Three audit-sweep follow-ups from the 2026-05-28 bug-hunt that were
deferred from batch A because they touch the consensus path. All are
small, targeted fixes; the larger design-required items (assignNonce
stub, vote race, GCREdit stubs) remain deferred.
Sync.ts:
- Genesis-hash mismatch (line 282) now throws instead of exit(1); the
outer fastSync() has try/finally that resets state, and main().catch
routes to gracefulShutdown.
- Exhausted-peers branch (line 305) returns false so the existing
boolean caller path applies, instead of taking the chain down.
- fastSyncRoutine "Last block not coherent" check (line 867) now
throws with a descriptive message instead of exit(1).
- verifyLastBlockIntegrity now has an explicit Promise<boolean>
return type (was implicit).
secretaryManager.ts:
- receiveGreenLight unexpected-greenlight path (line 682) now logs
at error level and returns false (function already returns boolean
everywhere else), instead of killing the node mid-consensus round.
- endConsensusRoutine hanging-greenlight cleanup (line 887) now logs
the failure and continues with pre-held cleanup instead of exit(1);
the catch was already isolating the wait so cleanup can proceed.
- simulateSecretaryGoingOffline / simulateNormalNodeGoingOffline
keep their process.exit(0) because that IS their documented
purpose (chaos-test helpers), but now carry a WARNING block
explaining the deliberate bypass of graceful shutdown so a future
reader does not wire them into production without a gating flag.
applyGasFeeSeparation.ts:
- Add post-calculation assertion that
network_fee + rpc_fee + additional_fee === total. Any rounding bug
or config drift in calculateFeeBreakdown is now caught at the RPC
boundary as a validation failure instead of surfacing later as a
validator-side consensus disagreement.
validateTransaction.ts:
- Drop the `&& getSharedState.PROD` guard from the sender gas-balance
check (line 304). The previous gate let non-prod nodes accept
zero-balance transactions, which made devnet/staging diverge from
PROD validation semantics. Devnet now uses a funded-genesis
fixture, so unfunded broadcasts are no longer needed for local
testing.
27e43d3 to
153ca61
Compare
…on (greploop iter 1) Greptile review of PR #883 flagged a validation split: this PR dropped the `&& getSharedState.PROD` guard from `validateTransaction.defineGas` so the gas-balance check is enforced in every environment, but the parallel guard in `applyGasFeeSeparation` at line 146 was left intact with a stale comment that explicitly claimed it matched the legacy defineGas behaviour — which is no longer true after the defineGas change. Drops the `if (getSharedState.PROD)` wrapper around the balance check in applyGasFeeSeparation as well, so both gas-balance paths enforce the same semantics in every environment. Comment updated to match. `getSharedState` import retained — still consumed by `signingAlgorithm` lookup on line 136.
…ROD docstring (greploop iter 2) Greptile review of PR #883 caught a real correctness regression in iter 1 plus a stale comment: P1 — Sync.ts:278-285 (genesis-mismatch was breaking the peer-iteration loop) verifyLastBlockIntegrity is designed as a multi-peer iteration loop: every other failure branch (unavailable genesis block, unavailable last block) logs + advances `currentPeer = findNextAvailablePeer( seenPeers)` and `continue`s. The previous batch-B fix replaced `process.exit(1)` with `throw`, which abandoned all remaining candidates after a single bad peer. In a network where one peer is misconfigured or forked but the rest are healthy, the node would trigger graceful shutdown instead of syncing from a good peer. Fix: align with the unavailable-block branches — log the mismatch, advance to the next peer via the same helper. If every peer has a mismatched genesis, the loop exhausts and falls through to the existing `return false` at the bottom of the function, which triggers the descriptive throw at the fastSyncRoutine call site added in the original batch B. Stale docstring — applyGasFeeSeparation.ts (file header, step 3) The "(PROD only) Reads the sender's GCR balance ..." line in the file-level JSDoc was already updated for the runtime change in the iter-1 commit but the header docstring still claimed PROD-only. Refreshed to describe the new "enforced in every environment" semantic and reference batch B as the change point.
…ponent validation (greploop iter 3) Greptile review of PR #883 caught that the fee-sum assertion added in the original batch B commit was tautological. calculateFeeBreakdown computes: total: network_fee + rpc_fee + additional_fee from the same three local variables it returns as components. Asserting that components-sum === total recomputes the same arithmetic from the same returned values and can never fail — it offers zero protection against the failure modes the comment claimed it would catch ("rounding bug or config drift"). The actual failure surface is each component going non-numeric or out-of-range via the `scalar * surge` multiplication in calculateFeeBreakdown: const network_fee = getSharedState.networkFee * surge const rpc_fee = getSharedState.rpcFee * surge const additional_fee = getSharedState.additionalFee * surge A misconfigured scalar (negative governance proposal, accidental float coefficient, NaN from a broken dynamicSurgeMultiplier) produces one or more bad components which propagate into tx.content.transaction_fee and the fee-distribution edits, and finally surface as a validator-side consensus disagreement. Replace the tautology with per-component finite/integer/non-negative validation across all four fields (network_fee, rpc_fee, additional_fee, total). The first invalid component triggers an actionable error message naming the bad field and including the full breakdown for diagnosis, instead of the tx being silently accepted at the RPC boundary and rejected later by other validators.
…greploop iter 4) Greptile review of PR #883 caught a third process.exit(1) in this same file that the original batch B commit missed: line 618 inside releaseWaitingMembers's send loop (sendGreenlightToValidators). A single misbehaving validator returning any non-200/400 HTTP status would kill the secretary mid-consensus round with partial state. Same class of issue this PR was scoped to fix. Drop the process.exit and `continue` to the next validator in the result-iteration loop. The validator stays in `waitingMembers` for the next greenlight pass, so secretaryRoutine retries via its own timeout-driven loop. If every validator fails, the round eventually times out at the existing `_greenlight_timeout` boundary instead of taking the node down with partial round state. After this commit secretaryManager.ts has zero remaining real process.exit calls — only the two chaos-test helpers (simulateSecretaryGoingOffline, simulateNormalNodeGoingOffline) which are documented with WARNING blocks explaining the intentional bypass of graceful shutdown.
…iter 5) Greptile review of PR #883 caught a pre-existing async-call-site bug that the batch B receiveGreenLight change exposed. manageConsensusRoutines.ts:377 calls `manager.receiveGreenLight(...)` without `await`. The return value is a Promise<boolean>, which is always truthy, so the response.result on the next line was always 200 regardless of receiveGreenLight's actual return value. Before this batch, that did not matter much because the unreachable-state path in receiveGreenLight was a process.exit(1) that killed the node before the response was sent. The iter-1 batch B commit converted that exit to `return false`, which the unawaited call site silently masked as a successful 200 — the very behaviour the conversion was meant to expose. Add the missing `await`. The enclosing manageConsensusRoutines function is already async, so this is a single-keyword change with no caller-chain implications. After this commit the receiveGreenLight `return false` path correctly surfaces as an HTTP 400 to the secretary, matching the original intent of the conversion.
…t-error comment (greploop iter 6 - comment only) Greptile review of PR #883 flagged the comment added in the iter-4 commit overstating recovery semantics. The original comment claimed the validator "stays in waitingMembers for the next greenlight pass" and that secretaryRoutine would "retry via its own timeout-driven loop" — but waitStatus has already been cleared by the time the releaseWaitingMembers result-iteration loop runs, so the validator does NOT automatically re-enter the wait queue. The actual fallback is the `_greenlight_timeout` boundary firing, exactly as the second half of the original comment correctly described. Comment-only fix. No code behaviour change.
Summary
Batch B of the 2026-05-28 audit sweep. Picks up three follow-ups deferred from #882 because they touch the consensus path:
Larger items (
assignNoncestub,broadcastBlockHashvote race,handleGCRGCREdit stubs,hashNativeTablesproxy,subOperationsasset stubs) remain deferred — they need design work or feature-owner input and will land separately.Depends on / is logically follow-up to #882 (batch A) but does not require it merged first; the diffs are disjoint.
4 files, +55 / -18.
Fixes
src/libs/blockchain/routines/Sync.ts— 3 process.exit removalsexit(1). The outerfastSync()already has atry/finallythat resetsinSyncLoop/fastSyncAborted, andmain().catchroutes togracefulShutdown.falseinstead ofexit(1). The boolean caller path at line 865 already handles this (now via the same throw at line 867 below).exit(1). Same propagation path as above.verifyLastBlockIntegritynow has an explicitPromise<boolean>return type (was implicit).src/libs/consensus/v2/types/secretaryManager.ts— 2 real exits removed, 2 chaos-test exits documentedreceiveGreenLightunexpected greenlight: wasexit(1)mid-consensus round. Now logs at error level and returnsfalse(the function already returns boolean on every other path). Prevents block-state loss when a delayed/duplicate greenlight arrives.endConsensusRoutinehanging-greenlight cleanup: wasexit(1)onPromise.all(waiters)rejection. Now logs and continues with pre-held cleanup; the catch was already isolating the wait so cleanup can proceed cleanly.simulateSecretaryGoingOffline/simulateNormalNodeGoingOffline: kept theirprocess.exit(0)because that IS their documented purpose (chaos-test helpers triggered by hardcodedblockRef === 10/blockRef === 5checks; currently unused in production). Added a WARNING block in the docstring explaining the deliberate bypass of graceful shutdown so a future reader does not wire them into production routines without an explicit gating flag.src/libs/blockchain/routines/applyGasFeeSeparation.ts— fee-sum assertionAfter
calculateFeeBreakdown(tx)and the existingtotalintegrity check, assert that:Any rounding bug or config drift in
calculateFeeBreakdownis now caught at the RPC boundary as a validation failure (with a descriptive message naming every component) instead of surfacing later as a validator-side consensus disagreement on the same tx.src/libs/blockchain/routines/validateTransaction.ts:304— drop PROD-only gas-balance guardWas:
The
&& getSharedState.PRODgate let non-prod nodes accept zero-balance transactions, which made devnet/staging diverge from PROD validation semantics. Devnet now uses a funded-genesis fixture (seetesting/devnet/genesis.devnet.jsonon the audit-sweep batch A branch), so unfunded broadcasts are no longer needed for local testing.Out of scope (deferred)
Same list as #882; nothing here changes the deferred set. Listing for navigation:
validateTransaction.ts:358—assignNonce()hardcodedtrue(replay attack vector — needs per-account nonce design)broadcastBlockHash.ts:40-155— pro/con vote race at block-accept timehandleGCR.ts:920,926— GCREdit assign/smartContract/escrow/subnetsTx stubs returning successcreateBlock.ts:72-77—hashNativeTables()proxy unfinishedsubOperations.ts:297-305—addAsset/removeAssetsilent stubsVerification
tsc --noEmitproduces zero new errors in the four changed files. (Pre-existing baseline errors in unrelated files —chainBlocks.tsduplicate imports,subOperations.tsmissingsetGCRNativeBalance, etc — are onstabilisationalready and are addressed by other branches including fix(audit-sweep): batch A — trivial bug-hunt fixes (process.exit, error leaks, missing try/catch) #882.)gracefulShutdown).Test plan
bun install; confirm no new tsc errors vs basetesting/devnet) and confirm block production survives a forced peer mismatch at genesis (should log + retry next peer; previously would exit)