Feat: Sequencer Final PR #966
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
📝 WalkthroughWalkthroughMigrates L1Sequencer to a scheduled history model, regenerates bindings, implements SPEC-005 derivation (local/layer1 verify, reorg detection, finalized anchoring, tag advancer), adds Raft-based HA (FSM, service, RPC), rewires node startup/executor/syncer/retryable client, updates CLI/ops tooling, and bumps dependency pins. ChangesL1Sequencer history model and bindings
Derivation SPEC-005, reorg/finalization, HA service, and node wiring
CLI flags cleanup (remove validator/blocktag; add HA/derivation)
Ops/devnet, Docker, HA test/perf tooling
Build and module dependency updates
Estimated code review effort Possibly related PRs
Suggested reviewers
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
|
There was a problem hiding this comment.
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
♻️ Duplicate comments (2)
tx-submitter/go.mod (1)
5-14:⚠️ Potential issue | 🟡 MinorVerify the version mismatch between
requireandreplace.Same pattern as in
ops/tools/go.modandoracle/go.mod: therequiredirective usesv0.5.0whilereplacepoints to the pseudo-versionv0.0.0-20260529141627-eb5fbf8f9748.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tx-submitter/go.mod` around lines 5 - 14, The go.mod has a version mismatch: the require entry for github.com/morph-l2/go-ethereum is v0.5.0 while the replace points to a pseudo-version commit v0.0.0-20260529141627-eb5fbf8f9748 (and similarly for github.com/morph-l2/tendermint); fix by making the require and replace consistent—either change the require for github.com/morph-l2/go-ethereum (and github.com/morph-l2/tendermint if needed) to the same pseudo-version used in replace, or update the replace directives to point to the tagged v0.5.0 release (or remove the replace if unnecessary) so module github names match exactly.oracle/go.mod (1)
5-12:⚠️ Potential issue | 🟡 MinorVerify the version mismatch between
requireandreplace.Same pattern as in
ops/tools/go.mod: therequiredirective usesv0.5.0whilereplacepoints to the pseudo-versionv0.0.0-20260529141627-eb5fbf8f9748.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@oracle/go.mod` around lines 5 - 12, The go.mod has a version mismatch: the require entry for github.com/morph-l2/go-ethereum lists v0.5.0 while the replace maps it to a pseudo-version; update the require to the same pseudo-version used in the replace (github.com/morph-l2/go-ethereum v0.0.0-20260529141627-eb5fbf8f9748) so the require and replace agree (also check and align the github.com/tendermint/tendermint require/replace entries if present); edit the require block accordingly to use the pseudo-versions referenced by the replace lines.
🟠 Major comments (22)
node/core/executor.go-374-390 (1)
374-390:⚠️ Potential issue | 🟠 Major | ⚡ Quick winData race on
e.syncerfield access.
Syncer()readse.syncerwithout synchronization whileSetSyncer()writes to it. If called concurrently (e.g., derivation checking syncer availability while executor is being wired up), this is a data race.Consider using the existing
muor an atomic pointer:🔒 Proposed fix using RWMutex
func (e *Executor) Syncer() *sync.Syncer { + e.mu.RLock() + defer e.mu.RUnlock() return e.syncer } func (e *Executor) SetSyncer(s *sync.Syncer) { + e.mu.Lock() + defer e.mu.Unlock() if e.syncer != nil { return } e.syncer = s e.l1MsgReader = s }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/core/executor.go` around lines 374 - 390, Syncer() reads the executor's e.syncer without synchronization while SetSyncer() writes it, causing a data race; fix by guarding accesses to e.syncer with the executor's existing mutex (mu) or by switching e.syncer to an atomic pointer: acquire mu.RLock() in Syncer() (or use atomic.LoadPointer for e.syncer) and mu.Lock() in SetSyncer() (or atomic.StorePointer) to ensure safe concurrent read/write on e.syncer; update Syncer(), SetSyncer(), and any other direct e.syncer reads/writes to use the chosen synchronization method.contracts/deploy/022-SequencerInit.ts-38-42 (1)
38-42:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSeed L1Sequencer history during deployment (missing initializeHistory leaves proxy unusable).
contracts/deploy/022-SequencerInit.tsupgrades the proxy and calls onlyinitialize(owner)—there is no subsequentinitializeHistory(...)call anywhere in the deploy/initialize flow (contracts/tasks/deploy.ts), so the upgraded contract starts with an emptysequencerHistory.In
contracts/contracts/l1/L1Sequencer.sol,getSequencer/getSequencerAtrevert whensequencerHistory.length == 0, andupdateSequencerrequires the contract to be initialized viainitializeHistory().Either call
initializeHistory(firstSequencer, upgradeL2Block)inSequencerInitwhen the first sequencer / activation block are known, or add an explicit, ordered next deployment step that invokes it.Current upgrade call
await IL1SequencerProxy.upgradeToAndCall( L1SequencerImplAddress, L1SequencerFactory.interface.encodeFunctionData('initialize', [owner]) )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@contracts/deploy/022-SequencerInit.ts` around lines 38 - 42, The sequencer proxy is upgraded only calling initialize(owner) which leaves sequencerHistory empty and makes getSequencer/getSequencerAt revert; update the deployment to seed history by calling initializeHistory(firstSequencer, upgradeL2Block) — either include initializeHistory in the same upgradeToAndCall payload (use IL1SequencerProxy.upgradeToAndCall with L1SequencerFactory.interface.encodeFunctionData to call initializeHistory after initialize) or add a follow-up deploy step that invokes initializeHistory on the upgraded contract; ensure you pass the correct firstSequencer and upgradeL2Block values to initializeHistory so updateSequencer/getSequencer work.ops/docker-sequencer-test/run-perf-test.sh-173-183 (1)
173-183:⚠️ Potential issue | 🟠 Major | ⚡ Quick winIncorrect Docker Compose service names.
The script references
morph-geth-0,morph-geth-1, etc. but the compose files (docker-compose-4nodes.yml,docker-compose.override.yml) define these services asmorph-el-0,morph-el-1, etc. Thesestopandupcommands will fail to match the intended services.Proposed fix
# Stop any existing - $COMPOSE_HA stop morph-geth-0 morph-geth-1 morph-geth-2 morph-geth-3 \ + $COMPOSE_HA stop morph-el-0 morph-el-1 morph-el-2 morph-el-3 \ node-0 node-1 node-2 node-3 2>/dev/null || true # Clean Raft state for fresh cluster rm -rf .devnet/node0/raft .devnet/node1/raft .devnet/node2/raft 2>/dev/null || true # Start geth nodes log_info "Starting geth nodes..." - $COMPOSE_HA up -d morph-geth-0 morph-geth-1 morph-geth-2 morph-geth-3 + $COMPOSE_HA up -d morph-el-0 morph-el-1 morph-el-2 morph-el-3🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/docker-sequencer-test/run-perf-test.sh` around lines 173 - 183, The stop/up commands reference wrong service names; update the service names passed to $COMPOSE_HA in run-perf-test.sh from morph-geth-0/1/2/3 to the actual compose service names morph-el-0/morph-el-1/morph-el-2/morph-el-3 (both in the $COMPOSE_HA stop and $COMPOSE_HA up invocations) so the correct Docker Compose services are stopped/started; leave the node-0..node-3 entries and the raft cleanup logic unchanged.ops/docker-sequencer-test/run-test.sh-534-559 (1)
534-559:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUndefined variable
BITGET_ROOT.
build_malicious_imageuses$BITGET_ROOT(lines 536, 546, 549, 556) but this variable is not defined in this script. Based on line 9, the correct variable should be$POLYREPO_ROOT.Proposed fix
build_malicious_image() { log_info "Building malicious node image from test/p2p-security branch..." - cd "$BITGET_ROOT" + cd "$POLYREPO_ROOT" # Save current tendermint branch state cd tendermint local original_branch original_branch=$(git branch --show-current) git stash 2>/dev/null || true # Switch to malicious branch git checkout test/p2p-security - cd "$BITGET_ROOT" + cd "$POLYREPO_ROOT" # Build using same Dockerfile, different tag docker build -t morph-node-malicious:latest \ -f morph/ops/docker-sequencer-test/Dockerfile.l2-node-test . # Switch back cd tendermint git checkout "$original_branch" git stash pop 2>/dev/null || true - cd "$BITGET_ROOT" + cd "$POLYREPO_ROOT" log_success "Malicious image built!" }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/docker-sequencer-test/run-test.sh` around lines 534 - 559, In build_malicious_image() replace uses of the undefined BITGET_ROOT with the correct POLYREPO_ROOT: update all cd "$BITGET_ROOT" invocations and any references to that variable inside the function (lines around the git/tendermint operations and docker build) to cd "$POLYREPO_ROOT" so the script uses the same root variable declared earlier; ensure no other occurrences of BITGET_ROOT remain in that function.ops/docker-sequencer-test/run-perf-test.sh-454-461 (1)
454-461:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSame incorrect service names in
do_stop.Proposed fix
do_stop() { log_section "Stopping all containers" stop_tx_load cd "$DOCKER_DIR" - $COMPOSE_HA stop morph-geth-0 morph-geth-1 morph-geth-2 morph-geth-3 \ + $COMPOSE_HA stop morph-el-0 morph-el-1 morph-el-2 morph-el-3 \ node-0 node-1 node-2 node-3 2>/dev/null || true log_ok "Stopped" }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/docker-sequencer-test/run-perf-test.sh` around lines 454 - 461, do_stop currently calls $COMPOSE_HA stop with incorrect service names (the same wrong names repeated); update the service list in the $COMPOSE_HA stop invocation inside do_stop to exactly match the services used when bringing containers up (check the docker-compose service names or the do_start function) so the correct containers are stopped—modify the call that currently lists "morph-geth-0 morph-geth-1 morph-geth-2 morph-geth-3 node-0 node-1 node-2 node-3" to the exact service identifiers defined in the compose file (or used elsewhere in this script).node/derivation/derivation.go-859-877 (1)
859-877:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPropagate
StartReactorsAfterReorgfailures instead of only logging them.If reactor restart fails here, the caller still treats the reorg/self-heal as successful, advances metrics/cursor, and keeps running with Tendermint reactors down. That leaves the node in a degraded state without surfacing the failure to the control flow.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/derivation/derivation.go` around lines 859 - 877, The deferred restart currently only logs StartReactorsAfterReorg failures; change it to propagate that error to the caller by using a named error return (e.g., retErr or err) for the surrounding function and, inside the defer, if startErr != nil then set the named return error (wrapping startErr with context) instead of only logging; keep the existing logging but ensure you import/used fmt or errors to wrap (e.g., fmt.Errorf("StartReactorsAfterReorg failed: %w", startErr)) so the caller can react to the failure.node/flags/flags.go-254-258 (1)
254-258:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid a fail-open default for HA write API auth.
The new flag explicitly makes HAKeeper write APIs unauthenticated when
ha.rpc-tokenis empty. In any deployment where that management endpoint is reachable off-host, this turns HA membership/control operations into an unauthenticated network surface. Please require a token when HA is enabled, or fail closed unless the server is bound to loopback only.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/flags/flags.go` around lines 254 - 258, The SequencerHARPCToken flag currently allows an empty value which leaves HAKeeper write APIs unauthenticated; update startup validation to reject an empty SequencerHARPCToken when HA is enabled (i.e., whenever HAKeeper/HA RPC is configured to listen) by returning a fatal error and refusing to start unless a token is provided, or alternatively allow empty only if the HA RPC bind address is strictly loopback; implement this check where flags are processed/startup logic (the code path that reads SequencerHARPCToken/"ha.rpc-token" and the HA RPC listen address) and produce a clear fatal error explaining that a token is required unless bound to loopback.node/derivation/finalizer.go-150-152 (1)
150-152:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftDon't finalize a later batch index against an older safe head.
This branch intentionally anchors finalized to the local safe head, but it still passes
committedAtFin.Uint64(). If L1 finalized has committed batches beyond what this node has verified locally, you'll record a finalized batch index that does not correspond tosafeHash/safeNum, which can break the tag-advancer's monotonicity/dedup assumptions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/derivation/finalizer.go` around lines 150 - 152, The current branch anchors finalized to the local safe head but still passes committedAtFin.Uint64() into d.tagAdvancer.advanceFinalized, which can record a finalized batch index that doesn't match safeHash/safeNum; change the argument so the finalized index corresponds to the local safe head (use safeNum (or the uint64 conversion of safeNum) instead of committedAtFin.Uint64(), or take min(committedAtFin.Uint64(), safeNum) if types require) when calling d.tagAdvancer.advanceFinalized(d.ctx, ..., safeHash, safeNum) to preserve monotonicity and dedup assumptions.node/derivation/derivation.go-390-398 (1)
390-398:⚠️ Potential issue | 🟠 Major | ⚡ Quick winActually stop derivation on HA blob-hash divergence.
This
returnonly exits the current poll;Start()will callderivationBlockagain on the next ticker tick. In HA mode that contradicts the “hard-stop/manual intervention required” invariant and turns a fatal divergence into an endless retry loop.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/derivation/derivation.go` around lines 390 - 398, The code currently does a plain return when d.isHaMode detects a blob-hash mismatch in derivationBlock, which only exits the current poll but lets Start() schedule another run; replace that return with a hard stop so HA mode truly halts and requires manual intervention. Modify the block in derivationBlock (the d.isHaMode branch) to invoke the component shutdown used by this service (preferably call the existing Stop() or Shutdown() method on the derivation controller, e.g. d.Stop() or d.Shutdown() so Start() will not keep scheduling further derivations); if no graceful stop method exists, perform an immediate process exit (os.Exit(1)) or otherwise set the service state to permanently disabled (e.g. d.stopped = true) so no further derivationBlock runs occur. Ensure the chosen action is deterministic and documented in the log message so the error remains visible (keep the existing d.logger.Error call).node/db/store.go-210-224 (1)
210-224:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDelete to a known upper bound instead of stopping at the first hole.
This loop assumes derivation L1 block checkpoints are gap-free, but
ReadDerivationL1BlockRangeexplicitly treats missing heights as valid state. In a sparse DB,Has(key)==falseon the first gap stops the cleanup early and leaves stale higher checkpoints behind, so a later reorg scan can compare against old hashes past that hole.Suggested fix
func (s *Store) DeleteDerivationL1BlocksFrom(height uint64) { + latest := s.ReadLatestDerivationL1Height() + if latest == nil || height > *latest { + return + } + batch := s.db.NewBatch() - for h := height; ; h++ { - key := DerivationL1BlockKey(h) - has, err := s.db.Has(key) - if err != nil || !has { - break - } - if err := batch.Delete(key); err != nil { + for h := height; ; h++ { + if err := batch.Delete(DerivationL1BlockKey(h)); err != nil { panic(fmt.Sprintf("failed to delete DerivationL1Block at %d, err: %v", h, err)) } + if h == *latest { + break + } } if err := batch.Write(); err != nil { panic(fmt.Sprintf("failed to write batch delete for DerivationL1Blocks, err: %v", err)) } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/db/store.go` around lines 210 - 224, The deletion loop in Store.DeleteDerivationL1BlocksFrom stops at the first missing key and leaves higher checkpoints undeleted; instead compute a known upper bound and delete the full [height..upperBound] range. Modify DeleteDerivationL1BlocksFrom to obtain the upper bound (for example by calling ReadDerivationL1BlockRange or another method that returns the highest stored derivation L1 height starting from height) and then iterate h from height through that upperBound deleting each DerivationL1BlockKey(h) via the batch (and panic/log on batch.Delete errors), then batch.Write once; ensure you reference DeleteDerivationL1BlocksFrom, DerivationL1BlockKey, and ReadDerivationL1BlockRange when making the change.ops/docker-sequencer-test/check-whitelist-test.sh-49-54 (1)
49-54:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFilter reconnect evidence to
node-0, not any peer.
RECONNECT_COUNTcurrently passes on any sentry reconnect. That can mark the whitelist test green even ifnode-0never reconnects after the forced disconnect. Scope this check the same way asSTOP_COUNT/BAN_COUNT, usingNODE0_NODEIDor the exact peer address logged fornode-0.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/docker-sequencer-test/check-whitelist-test.sh` around lines 49 - 54, The RECONNECT_COUNT and the subsequent grep are too broad and should be scoped to reconnections for node-0 only; change the RECONNECT_COUNT command and the following grep to filter for the node-0 peer identifier used elsewhere (use NODE0_NODEID or the exact peer address like in STOP_COUNT/BAN_COUNT) so you count only "Reconnecting to peer" events that reference node-0 and similarly tail only those matching lines from sentry-node-0 logs.ops/docker/docker-compose-4nodes.yml-414-418 (1)
414-418:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMount
static-nodes.jsononsentry-el-1as well.Every other non-primary L2 execution client in this compose file has the static peer list mounted, but
sentry-el-1is missing it. That makes the new sentry topology the odd one out and can leave it without the deterministic peer graph the devnet relies on.Suggested fix
volumes: - "sentry_el_data_1:/db" - "${PWD}/jwt-secret.txt:/jwt-secret.txt" - "${PWD}/../l2-genesis/.devnet/genesis-l2.json:/genesis.json" + - "${PWD}/static-nodes.json:/db/geth/static-nodes.json"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/docker/docker-compose-4nodes.yml` around lines 414 - 418, The sentry-el-1 service is missing the static peer list mount; update the volumes section for sentry-el-1 to mount the repo's static-nodes.json into the container (same pattern used by other non-primary L2 execution clients) so the devnet has a deterministic peer graph; specifically add a volume entry for static-nodes.json under the sentry-el-1 "volumes:" block (the block shown immediately above "entrypoint:" for sentry-el-1) so the container sees /static-nodes.json at runtime.ops/docker/ha-nodekey2-1-1 (1)
1-1:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove the leading whitespace from this node key.
This file is the only HA node key with a prefixed space. Node key files need exact raw hex content, so this can prevent
ha-node-2from loading the intended identity.Suggested fix
- 5146627e8b60d6dca6666c4996083e5bd551fdbe56c6c4a7a53791274a574bd4 +5146627e8b60d6dca6666c4996083e5bd551fdbe56c6c4a7a53791274a574bd4🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/docker/ha-nodekey2` at line 1, The file ha-nodekey2 contains a node key string with a leading whitespace before the hex ("5146627e8b60...") which must be removed; edit ha-nodekey2 to trim the leading space so the file contains only the exact raw hex characters (no leading/trailing whitespace or newlines) ensuring the node key used by ha-node-2 matches the intended identity.node/l1sequencer/verifier.go-56-59 (1)
56-59:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail verifier initialization when the first history sync fails.
This constructor currently logs the initial
syncHistory()error and still returns a live verifier. Downstream, startup treats that as success, butIsSequencerAtwill returnno sequencer recordfor every height until a later refresh succeeds. That means a node can enter consensus with signature verification effectively broken. Please return the init error here and let the caller abort startup instead of relying on the background retry loop.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/l1sequencer/verifier.go` around lines 56 - 59, The constructor currently swallows v.syncHistory() errors and starts v.refreshLoop(ctx) anyway; change it to fail fast by returning the sync error to the caller instead of only logging it: if v.syncHistory() returns non-nil, return that error (do not start the background goroutine), and only launch go v.refreshLoop(ctx) after a successful initial sync; this ensures callers relying on IsSequencerAt see a hard startup failure rather than an uninitialized verifier.node/cmd/node/main.go-105-108 (1)
105-108:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't hard-require L1 sequencer wiring before the mock sequencer branch.
initL1SequencerComponents()now errors whenl1.sequencerContractis missing, butL2NodeMain()calls it before checkingisMockSequencer. In the mock path the verifier/signer are never used, so--mock.enablednow fails startup unless a real L1 sequencer contract is configured. Please gate this initialization behind the non-mock path, or make the mock path explicitly bypass the verifier requirement.Also applies to: 295-304
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/cmd/node/main.go` around lines 105 - 108, L2NodeMain currently calls initL1SequencerComponents unconditionally which fails when l1.sequencerContract is missing; move or guard the call so L1 sequencer wiring only runs when not using the mock sequencer (check isMockSequencer flag) or add an explicit bypass in initL1SequencerComponents for the mock path; specifically, wrap the initL1SequencerComponents(...) call in the non-mock branch (or short-circuit and return nil verifier/signer when isMockSequencer is true) so the mock.enabled startup path does not require a real L1 sequencer contract, and apply the same guard for the similar call later around the other instance referenced in the diff.node/hakeeper/block_fsm.go-127-131 (1)
127-131:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftAvoid lossy delivery after a successful apply.
Once
onAppliedsucceeds, this log entry is already committed, but thedefaultbranch silently drops the block whenblockChis full. That turns transient subscriber lag into permanent loss on the broadcast path, because nothing can replay that block from here. This handoff needs to stay lossless — e.g. backpressure, a durable queue, or failing before mutating state — instead of best-effort dropping.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/block_fsm.go` around lines 127 - 131, The current non-blocking send to f.blockCh in block_fsm.go silently drops blocks when the channel is full; change this handoff to be lossless by replacing the select/default with a blocking send (i.e., f.blockCh <- block) so callers (the onApplied path) apply backpressure rather than dropping the block, or alternatively return an error from onApplied before committing state if delivery cannot be guaranteed; update the send site that references f.blockCh (the select in BlockFSM) and ensure callers of onApplied handle a blocking send or propagated error appropriately.node/hakeeper/ha_service.go-152-160 (1)
152-160:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't treat a reused
ServerIDwith a stale address as a successful join.This short-circuits as soon as the
ServerIDis present, even if the cluster still has the oldAddressfor that server. After a restart with a changedadvertisedAddror IP,Join()returns success but peers keep dialing the dead endpoint. Compare both ID and address here and either reconcile membership or return a hard error instead of silently accepting the mismatch.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/ha_service.go` around lines 152 - 160, The current membership check only compares membership.Servers[].ID to h.cfg.ServerID and treats any match as a successful join; change it to also compare the stored server Address to h.advertisedAddr and handle mismatches explicitly: if the ID matches but Address differs, do not return success — either reconcile the membership (e.g., call a suitable update/remove API on the client before re-adding) or return a hard error so callers will attempt corrective action instead of silently accepting a stale address; ensure this logic is applied where the loop currently lives (referencing membership.Servers, h.cfg.ServerID, h.advertisedAddr and the subsequent client.AddServerAsVoter call).node/hakeeper/config.go-203-208 (1)
203-208:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject port
0during validation.These checks currently allow
consensus.listen_port = 0andrpc.listen_port = 0. That binds an ephemeral port, but the rest of the HA flow still treats the configured address as stable, so peers and operators end up dialing the wrong endpoint. This should require1..65535instead.Suggested fix
- if c.Consensus.ListenPort < 0 || c.Consensus.ListenPort > math.MaxUint16 { + if c.Consensus.ListenPort <= 0 || c.Consensus.ListenPort > math.MaxUint16 { return fmt.Errorf("invalid consensus.listen_port: %d", c.Consensus.ListenPort) } - if c.RPC.ListenPort < 0 || c.RPC.ListenPort > math.MaxUint16 { + if c.RPC.ListenPort <= 0 || c.RPC.ListenPort > math.MaxUint16 { return fmt.Errorf("invalid rpc.listen_port: %d", c.RPC.ListenPort) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/config.go` around lines 203 - 208, The port validation currently allows zero; update the checks for c.Consensus.ListenPort and c.RPC.ListenPort to reject 0 by changing the lower bound check from < 0 to < 1 so the valid range becomes 1..math.MaxUint16 (1..65535), and keep the same error formatting (e.g., "invalid consensus.listen_port: %d" / "invalid rpc.listen_port: %d") so invalid 0 values produce an error during config validation.node/hakeeper/rpc/types.go-37-50 (1)
37-50:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftPlumb
context.Contextthrough the consensus adapter.The RPC surface is context-aware, but
ConsensusAdapterdrops cancellation and deadlines entirely. That means a stuckAddVoter/RemoveServer/leader-transfer call can outlive the HTTP request and keep running even after the client times out or disconnects. Addcontext.Contextto the adapter methods, or enforce bounded internal timeouts at this boundary, so HA control-plane calls stay cancellable under quorum-loss/failover conditions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/rpc/types.go` around lines 37 - 50, ConsensusAdapter currently lacks context.Context and so cannot honor cancellations; update the interface by adding a context.Context first parameter to all RPC-facing methods (e.g., Leader(ctx context.Context) bool, LeaderWithID(ctx context.Context) *ServerInfo, AddVoter(ctx context.Context, id, addr string, version uint64) error, AddNonVoter(...), DemoteVoter(...), RemoveServer(...), TransferLeader(ctx context.Context) error, TransferLeaderTo(ctx context.Context, id, addr string) error, ClusterMembership(ctx context.Context) (*ClusterMembership, error), ServerID(ctx context.Context) string, Addr(ctx context.Context) string), then propagate the new ctx into the implementing type (HAService in ha_service.go) and update all callers (RPC handlers) to pass the request context so long-running operations are cancellable; ensure tests and any internal call-sites are updated to compile and that implementations respect context cancellations.node/hakeeper/rpc/server.go-57-66 (1)
57-66:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReturn bind failures synchronously from
Start().
Start()always returnsnilbecauseListenAndServe()runs in the background, so address/permission errors are only logged after the caller has already continued. That can leave the node running without a working HA management endpoint. Bind the socket first and return that error synchronously, then serve from the goroutine.Suggested shape
+import "net" + func (s *Server) Start() error { - s.log.Info("Starting hakeeper RPC server", "addr", s.httpServer.Addr) + ln, err := net.Listen("tcp", s.httpServer.Addr) + if err != nil { + return err + } + s.log.Info("Starting hakeeper RPC server", "addr", s.httpServer.Addr) s.wg.Add(1) go func() { defer s.wg.Done() - if err := s.httpServer.ListenAndServe(); err != nil && err != http.ErrServerClosed { + if err := s.httpServer.Serve(ln); err != nil && err != http.ErrServerClosed { s.log.Error("hakeeper RPC server error", "err", err) } }() return nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/rpc/server.go` around lines 57 - 66, Start currently spawns ListenAndServe in a goroutine so bind errors surface only via logs; change Start to perform the bind synchronously and return any error immediately: call net.Listen("tcp", s.httpServer.Addr) (or net.Listen on the configured address) and if it fails return the error, otherwise add to s.wg and launch a goroutine that calls s.httpServer.Serve(listener) (and logs non-closed errors), preserving defer s.wg.Done; update any shutdown logic to close the listener when stopping the server. Ensure you replace the direct use of s.httpServer.ListenAndServe() with the listen-then-Serve pattern in Start.node/hakeeper/rpc/auth.go-35-41 (1)
35-41:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCap the buffered request body before
io.ReadAll.This middleware copies the full body into memory on every request it inspects. A large POST can therefore consume unbounded memory before auth or JSON-RPC validation runs. Wrap the body with
http.MaxBytesReaderor reject oversizedContent-Lengthbefore reading it.Suggested guard
func authMiddleware(token string, next http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + const maxRPCBodyBytes = 1 << 20 // 1 MiB + if token == "" { next.ServeHTTP(w, r) return } // Read and immediately restore the body so downstream can read it. - body, err := io.ReadAll(r.Body) + r.Body = http.MaxBytesReader(w, r.Body, maxRPCBodyBytes) + body, err := io.ReadAll(r.Body) if err != nil { http.Error(w, "failed to read request body", http.StatusBadRequest) return }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/rpc/auth.go` around lines 35 - 41, The middleware currently reads the entire request body via io.ReadAll(r.Body) (see r.Body and io.ReadAll in this auth middleware); change this to guard against unbounded memory by wrapping the request body with http.MaxBytesReader (or explicitly rejecting requests with an oversized Content-Length) before reading. Introduce a reasonable MaxBodySize constant, replace the direct read with body, err := io.ReadAll(http.MaxBytesReader(w, r.Body, MaxBodySize)) (or check r.ContentLength and return http.Error if > MaxBodySize), then restore r.Body with io.NopCloser(bytes.NewReader(body)) as before and handle the max-bytes error case appropriately.node/derivation/tag_advance.go-101-124 (1)
101-124:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPublish a conservative safe tag during reset instead of zeroing it and waiting for re-derivation.
Right now
resetonly mutates in-memory state. It zeroessafe, leavesfinalizedintact, and never flushes. That createsfinalized > safe, soflushTagswould reject the state anyway, and the nextadvanceFinalizedcall is usually deduped. The engine therefore keeps the stale pre-reorg safe tag until a lateradvanceSafe, which defeats the reset on exactly the window you are trying to rewind.Suggested direction
Reset
safeback to the current finalized head, then force aSetBlockTagspublication from the reset path. That preservesfinalized <= safeand immediately moves the engine to a conservative canonical point.-func (t *tagAdvancer) reset(toBatchIndex uint64) { +func (t *tagAdvancer) reset(ctx context.Context, toBatchIndex uint64) { t.mu.Lock() - defer t.mu.Unlock() - t.safeL2Hash = common.Hash{} - t.safeL2Number = 0 + t.safeL2Hash = t.finalizedL2Hash + t.safeL2Number = t.finalizedL2Number t.safeMaxBatchIndex = toBatchIndex t.lastNotifiedSafe = common.Hash{} t.metrics.IncL1ReorgReset() t.logger.Info("tag advancer reset on L1 reorg", "to_batch_index", toBatchIndex) + t.mu.Unlock() + + t.flushTags(ctx) }That also requires updating the
reorg.gocall site to pass a context.Also applies to: 146-156, 158-206
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/derivation/tag_advance.go` around lines 101 - 124, The reset path currently only zeroes the safe tag in memory and doesn't flush, causing finalized > safe and leaving a stale safe until the next advanceSafe; modify the reset logic in tagAdvancer.reset to set safe (safeL2Hash/safeL2Number) to the current finalized values (finalizedL2Hash/finalizedL2Number) and then call t.flushTags(ctx) to force a conservative SetBlockTags publication so finalized <= safe immediately; update the reorg.go call site that invokes reset to pass a context parameter so flushTags can run (ensure symbols referenced: tagAdvancer.reset, t.flushTags, safeL2Hash/safeL2Number, finalizedL2Hash/finalizedL2Number, and the reorg.go caller).
🟡 Minor comments (10)
ops/devnet-morph/devnet/setup_nodes.py-104-110 (1)
104-110:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winInconsistent error message after directory check refactor.
Line 109 says "Missing source or destination directory" but at that point only
source_diris being checked (dest_dir was already validated on line 104). The message should reflect only the source directory check.📝 Proposed fix
if not os.path.isdir(source_dir): - print(f"Error: Missing source or destination directory for {node}. Exiting.") + print(f"Error: Missing source directory for {node}. Exiting.") sys.exit(1)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/devnet-morph/devnet/setup_nodes.py` around lines 104 - 110, The error message for the source directory check is misleading: after verifying dest_dir, the subsequent check for source_dir should only reference the missing source; update the print/sys.exit branch that checks source_dir (the block that inspects variable source_dir and node) to use a message like "Error: Missing source directory for {node}. Exiting." so it accurately describes the failure when source_dir is absent.contracts/contracts/test/L1Sequencer.t.sol-181-199 (1)
181-199:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winThe stress test masks the first exact-boundary transition.
sequencerAis alreadyaddress(0xA001), and the first loop iteration writes that same address again at height110. The assertion on Line 196 therefore cannot catch an off-by-one bug between records0and1. Use a distinct initial sequencer or assert every inserted boundary with unique addresses.Possible fix
function test_getSequencerAt_manyRecords_binarySearchStress() public { - _initHistory(sequencerA, 10); + address firstSequencer = address(uint160(0xA000)); + _initHistory(firstSequencer, 10); // Add 9 more records (10 total) for (uint64 i = 1; i < 10; i++) { address seq = address(uint160(0xA000 + i)); vm.prank(owner); l1Sequencer.updateSequencer(seq, 10 + i * 100); } assertEq(l1Sequencer.getSequencerHistoryLength(), 10); // Query each boundary - assertEq(l1Sequencer.getSequencerAt(10), sequencerA); - assertEq(l1Sequencer.getSequencerAt(99), sequencerA); + assertEq(l1Sequencer.getSequencerAt(10), firstSequencer); + assertEq(l1Sequencer.getSequencerAt(99), firstSequencer); assertEq(l1Sequencer.getSequencerAt(110), address(uint160(0xA001))); assertEq(l1Sequencer.getSequencerAt(910), address(uint160(0xA009))); assertEq(l1Sequencer.getSequencerAt(99999), address(uint160(0xA009))); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@contracts/contracts/test/L1Sequencer.t.sol` around lines 181 - 199, The test test_getSequencerAt_manyRecords_binarySearchStress masks the first boundary because sequencerA equals the address written in the first loop iteration; fix by ensuring the initial record and each inserted record are unique: either initialize history with a distinct initial sequencer (modify the call to _initHistory or the value of sequencerA) or change the loop to start with an address different from sequencerA (updateSequencer calls in the for loop) and then assert each boundary return value (use unique addresses like address(uint160(0xA000 + i))). Ensure changes reference test_getSequencerAt_manyRecords_binarySearchStress, _initHistory, sequencerA, and l1Sequencer.updateSequencer so the first exact-boundary transition is actually tested.contracts/contracts/test/L1Sequencer.t.sol-19-23 (1)
19-23:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winStrengthen zero-owner init and binary-search boundary coverage
- The zero-owner test calls
initializeon the rawL1Sequencerimplementation; sinceL1Sequencerhas no constructor_disableInitializers(so direct calls revert with the same"invalid owner"), this isn’t likely to fail for the wrong reason—but deploying a freshTransparentUpgradeableProxyand asserting the revert there would better match the production path.- In
test_getSequencerAt_manyRecords_binarySearchStress, the first “boundary” (getSequencerAt(110)) doesn’t cross a sequencer change because the record at110is alsosequencerA(loop uses0xA000 + i, andi=1yields0xA001). Update the loop/queries so the tested boundary actually transitions to a different sequencer address.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@contracts/contracts/test/L1Sequencer.t.sol` around lines 19 - 23, The zero-owner initialization test should deploy and call initialize through a TransparentUpgradeableProxy instead of instantiating L1Sequencer directly: update the test to deploy a TransparentUpgradeableProxy pointing at a new L1Sequencer implementation and call initialize on the proxy, expecting the "invalid owner" revert to ensure the proxy path is exercised (reference L1Sequencer, TransparentUpgradeableProxy, and initialize). For the binary-search stress test, change the loop/queries so the boundary check actually crosses to a different sequencer: ensure the records written in test_getSequencerAt_manyRecords_binarySearchStress create a change at the boundary (adjust the loop index or the address generation so getSequencerAt(110) returns sequencerB instead of sequencerA), referencing getSequencerAt, sequencerA, sequencerB and the loop that uses 0xA000 + i.ops/docker-sequencer-test/run-ha-test.sh-146-158 (1)
146-158:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
wait_for_blockcan hang indefinitely.Unlike
wait_for_rpc(line 121) which has amax_retriesparameter,wait_for_blockruns in an infinitewhile trueloop with no timeout. If block production stalls, this will hang the test script.Proposed fix: add timeout parameter
wait_for_block() { local target=$1 local rpc_url="${2:-$L2_RPC_NODE0}" + local max_wait=${3:-300} + local waited=0 - while true; do + while [ $waited -lt $max_wait ]; do local cur=$(get_block_number "$rpc_url") if [ "$cur" -ge "$target" ] 2>/dev/null; then return 0 fi echo -ne "\r Block: $cur / $target " sleep 3 + waited=$((waited + 3)) done - echo "" + echo "" + return 1 }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/docker-sequencer-test/run-ha-test.sh` around lines 146 - 158, wait_for_block currently loops forever if block production stalls; add an optional timeout/max_retries parameter (like wait_for_rpc) to the wait_for_block function, track retries or a deadline inside the loop, break and return non-zero with a clear error message when the timeout is reached, and ensure the function still returns 0 on success; update references to wait_for_block to pass the timeout when needed and keep the existing behavior when the parameter is omitted by using a sensible default.ops/docker-sequencer-test/run-test.sh-59-59 (1)
59-59:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winBehavior change:
get_block_numberreturns empty string on failure instead of "0".The fallback changed from
|| echo "0"to|| true, meaning on parse failure the function returns an empty string. This can break numeric comparisons like[ "$current" -ge "$target" ]which would fail with an arithmetic error when$currentis empty.Proposed fix
- echo "$result" | grep -o '"result":"[^"]*"' | cut -d'"' -f4 | xargs printf "%d" 2>/dev/null || true + echo "$result" | grep -o '"result":"[^"]*"' | cut -d'"' -f4 | xargs printf "%d" 2>/dev/null || echo "0"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/docker-sequencer-test/run-test.sh` at line 59, The get_block_number pipeline currently ends with "|| true" which causes an empty string on parse failure and breaks numeric comparisons; change the fallback so the pipeline returns a numeric "0" on failure (e.g., replace the final "|| true" behavior with a fallback that echoes "0") in the command used by get_block_number (the echo "$result" | grep -o '"result":"[^"]*"' | cut ... | xargs printf "%d" part) so callers like get_block_number return "0" instead of an empty string.ops/docker-sequencer-test/run-ha-test.sh-1073-1079 (1)
1073-1079:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winIncorrect path construction for raft directory cleanup.
The pattern
${target_follower_id/#node-/node}attempts to replace "node-" prefix with "node", buttarget_follower_idis set to values like "ha-node-0", "ha-node-1", etc. (from line 1009). The substitution won't match "ha-node-" and the resulting path.devnet/ha-node-0/raftwon't be cleaned correctly.Proposed fix
$COMPOSE_HA stop "$target_follower_id" 2>/dev/null || true - rm -rf "$DOCKER_DIR/.devnet/${target_follower_id/#node-/node}/raft" + rm -rf "$DOCKER_DIR/.devnet/${target_follower_id}/raft" $COMPOSE_HA up -d "$target_follower_id" 2>/dev/null || true🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/docker-sequencer-test/run-ha-test.sh` around lines 1073 - 1079, The cleanup path uses an incorrect prefix substitution so rm -rf "$DOCKER_DIR/.devnet/${target_follower_id/#node-/node}/raft" doesn't match IDs like "ha-node-0"; update the substitution to strip the leading "ha-" instead (e.g. use "${target_follower_id/#ha-/}" ) so the rm targets "$DOCKER_DIR/.devnet/${target_follower_id/#ha-/}/raft"; adjust the rm/compose sequence around target_follower_id, DOCKER_DIR and COMPOSE_HA to use that corrected expansion.ops/tools/go.mod-5-10 (1)
5-10:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAlign
github.com/morph-l2/go-ethereumrequireversions with thereplacetarget for readabilityIn
ops/tools/go.mod,require github.com/morph-l2/go-ethereum v0.5.0conflicts withreplace github.com/morph-l2/go-ethereum => ... v0.0.0-20260529141627-eb5fbf8f9748(behavior should be correct sincereplacewins), but the same mismatch pattern exists inoracle/go.modandtx-submitter/go.mod(bothrequire v0.5.0vs the samereplace), andtoken-price-oracle/go.mod(require v1.10.14-...vs that samereplace). Align therequireversions to the replaced pseudo-version (or document why they differ) to avoid ongoing confusion.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/tools/go.mod` around lines 5 - 10, The require version for the module github.com/morph-l2/go-ethereum in your go.mod files should match the replace target pseudo-version to avoid confusion; update the require entry for github.com/morph-l2/go-ethereum to v0.0.0-20260529141627-eb5fbf8f9748 (the same pseudo-version used in the replace directive) in each go.mod where you currently have v0.5.0 or a different version, or alternatively add a brief comment next to the require explaining why it intentionally differs from the replace pseudo-version.node/derivation/static_scan_test.go-132-141 (1)
132-141:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMake the SPEC-005 local-verify guard AST-based (avoid fragile string matching)
static_scan_test.gochecks raw substrings for"morph-l2/common/blob"andcommonblob.*, which currently align withverify_local.go(importscommonblob "morph-l2/common/blob"and callscommonblob.CompressBatchBytes/commonblob.MakeBlobTxSidecar).- This can still false-pass/fail on comments/literals or harmless alias/spelling changes; parsing
verify_local.gowithgo/parser/go/astto validate the import path and actualCallExpr/SelectorExprtargets would make the codec-drift check robust.Suggested direction
+// Parse verify_local.go and: +// 1. confirm an import exists for "morph-l2/common/blob" +// 2. record the local import name (default or explicit alias) +// 3. walk CallExpr/SelectorExpr nodes and require +// <blobImport>.CompressBatchBytes and <blobImport>.MakeBlobTxSidecar - if !strings.Contains(src, `"morph-l2/common/blob"`) { - t.Fatalf("verify_local.go must import morph-l2/common/blob") - } - required := []string{"commonblob.CompressBatchBytes", "commonblob.MakeBlobTxSidecar"} - for _, sym := range required { - if !strings.Contains(src, sym) { - t.Errorf("verify_local.go missing required call %q", sym) - } - }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/derivation/static_scan_test.go` around lines 132 - 141, Replace fragile substring checks in static_scan_test.go with AST-based validation: parse verify_local.go using go/parser.ParseFile, inspect imports to confirm there's an import with Path.Value `"morph-l2/common/blob"` and record its local name (alias or default, e.g., commonblob), then walk the AST (ast.Inspect) to find CallExprs whose Fun is a SelectorExpr with X being an Ident matching that import alias and Sel names matching "CompressBatchBytes" and "MakeBlobTxSidecar" (replace the current required []string and strings.Contains checks with these AST checks); update error messages to reference missing import or missing call selector accordingly.node/flags/flags.go-271-276 (1)
271-276:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate the verify-mode help text to match the implemented self-heal path.
The help string still says the blob-mismatch self-heal flow is “currently TODO”, but
node/derivation/derivation.gonow does fetch the real blob and re-derive on mismatch. Leaving the old text here will mislead operators about whatlocalmode actually does.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/flags/flags.go` around lines 271 - 276, The help text for the DerivationVerifyMode cli flag is stale—update the Usage string in the DerivationVerifyMode variable to remove the "currently TODO" note and accurately describe the implemented self-heal path: when using "local" (the default) it rebuilds blob bytes from local L2 blocks and on a versioned-hash mismatch will fetch the real beacon blob and re-derive the batch (no beacon fetch on the happy path); keep the descriptions of the "layer1" and "local" behaviours and note that the mode is selected at startup. Reference DerivationVerifyMode in node/flags/flags.go and the implemented behaviour in node/derivation/derivation.go when editing the Usage text.node/hakeeper/ha.toml.example-18-20 (1)
18-20:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winClarify that
join_addrsare RPC endpoints, not Raft transport addresses.
Join()dials these addresses throughhakeeperrpc.DialAPIClient, so using the Raftconsensus.advertised_addr/:9400endpoint will never succeed. The sample values are right, but the comment should say "RPC addresses" explicitly to avoid misconfiguration.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/ha.toml.example` around lines 18 - 20, The comment for join_addrs is ambiguous: clarify that join_addrs are RPC endpoints used by Join() via hakeeperrpc.DialAPIClient (not the Raft transport/consensus.advertised_addr :9400 addresses); update the sample comment to explicitly say "RPC addresses (used by hakeeperrpc.DialAPIClient/Join()), e.g. 10.0.0.1:9401", to prevent users from supplying Raft transport addresses like consensus.advertised_addr.
🧹 Nitpick comments (5)
node/types/retryable_client.go (1)
41-42: 💤 Low valueUnused constants
GethRetryAttemptsandGethRetryInterval.These constants are declared but not used in this file. The
RetryableClientusesExponentialBackOffwithMaxElapsedTimeinstead of a fixed retry count/interval.If these are intended for external consumers, consider adding a comment documenting the intended usage. Otherwise, remove to avoid confusion.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/types/retryable_client.go` around lines 41 - 42, The constants GethRetryAttempts and GethRetryInterval are declared but never used in this file; either remove them or document intended external usage. Fix by either deleting GethRetryAttempts and GethRetryInterval from node/types/retryable_client.go if the RetryableClient uses ExponentialBackOff with MaxElapsedTime exclusively, or add a short comment above those constants explaining they are exported for external consumers and how they relate to RetryableClient/ExponentialBackOff behavior (e.g., intended fallback for fixed retries/intervals) so their presence is not confusing.ops/docker-sequencer-test/run-test.sh (1)
563-591: 💤 Low valueDead code:
start_malicious_sentryis never called.The function
start_malicious_sentry(lines 569-591) is defined but unused. The actual function used isswap_sentry_to_malicious. Consider removing this dead code or documenting if it's intentionally kept for alternative usage.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/docker-sequencer-test/run-test.sh` around lines 563 - 591, The function start_malicious_sentry is dead code (never invoked) — either remove its definition or make its intent explicit; if it's an alternative to the existing swap_sentry_to_malicious helper, delete start_malicious_sentry to avoid confusion and duplicate logic, or add a clear comment and a unit/integration test or a single-point call that delegates to swap_sentry_to_malicious so the behavior is exercised. Locate start_malicious_sentry in the script and either delete its whole block or replace it with a short comment referencing swap_sentry_to_malicious (or call swap_sentry_to_malicious from it) to keep code consistent.ops/docker-sequencer-test/run-ha-test.sh (1)
698-698: 💤 Low valueUnused log baseline file.
The file at
$log_baselineis written (line 698) but never read or compared against. If the intent was to compare log growth, the comparison logic is missing.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/docker-sequencer-test/run-ha-test.sh` at line 698, The script writes to $log_baseline but never uses it; either remove the write or implement the intended comparison: capture the initial baseline (write to $log_baseline) and later read and compare it against the current log (e.g., use diff, comm, or grep -c to detect new lines or changes) to validate log growth. Update run-ha-test.sh to reference $log_baseline where comparisons are expected (replace the unused write with a proper read+compare step) and ensure any test assertions or exit codes reflect the comparison result.token-price-oracle/go.mod (1)
5-13: ⚡ Quick winDowngrade:
replaceoverrides therequireversion, so the timestamp mismatch isn’t a resolution bug.
Intoken-price-oracle/go.mod, therequire github.com/morph-l2/go-ethereum v1.10.14-0.20251219060125-03910bc750a2timestamp can be earlier than thereplace github.com/morph-l2/go-ethereum => github.com/morph-l2/go-ethereum v0.0.0-20260529141627-eb5fbf8f9748timestamp becausereplacepins the version actually used. Optional: align therequireversion to match thereplace(e.g., viago mod tidy) to reduce confusion.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@token-price-oracle/go.mod` around lines 5 - 13, The require and replace entries for github.com/morph-l2/go-ethereum are inconsistent (require shows v1.10.14-0.... while replace pins v0.0.0-20260529141627-eb5fbf8f9748); update the module metadata so they align by either running go mod tidy to refresh and align the require to the replace or manually change the require line to match the replace version (look for the require github.com/morph-l2/go-ethereum and replace github.com/morph-l2/go-ethereum lines and make them consistent).node/hakeeper/ha_service.go (1)
74-77: ⚡ Quick winFail fast when the apply callback was never registered.
BlockFSM.Applypanics ifonAppliedis nil, butStart()currently lets the service boot without checking that precondition. Returning an error here is much safer than crashing later on the first committed log.Suggested fix
func (h *HAService) Start() error { + if h.fsm.onApplied == nil { + return fmt.Errorf("HAService.Start: onApplied callback is not set") + } if err := h.initRaft(); err != nil { return fmt.Errorf("HAService.Start: %w", err) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/ha_service.go` around lines 74 - 77, HAService.Start must fail fast if the FSM apply callback wasn't registered: check the BlockFSM.onApplied (or whatever field holds the apply callback) before calling initRaft/returning success and return a descriptive error if it's nil so the service doesn't boot and later panic in BlockFSM.Apply; locate the HAService.Start method and the BlockFSM struct (and its onApplied field) and add a nil-check that returns an error like "HAService.Start: apply callback not registered" when onApplied == nil.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@node/l1sequencer/enclave_signer.go`:
- Around line 245-269: In EnclaveSigner.dial, avoid leaking the post-timeout
goroutine by adding a done channel and signalling it when the timeout branch
runs; change the goroutine that calls vsock.Dial to, after receiving (c,e),
attempt to send the result into ch with a select that also listens on done and
closes c if done was already closed, and in the timeout branch close the done
channel before returning the timeout error; reference symbols:
EnclaveSigner.dial, ch, vsock.Dial, dialTimeout, and the new done channel.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5d586711-3161-4eca-8aa5-ccccdfa63fb9
⛔ Files ignored due to path filters (1)
node/go.sumis excluded by!**/*.sum
📒 Files selected for processing (5)
node/cmd/node/main.gonode/flags/flags.gonode/go.modnode/l1sequencer/enclave_signer.gonode/l1sequencer/enclave_signer_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- node/go.mod
- node/cmd/node/main.go
| func (s *EnclaveSigner) dial() (net.Conn, error) { | ||
| type result struct { | ||
| conn net.Conn | ||
| err error | ||
| } | ||
| ch := make(chan result, 1) | ||
| go func() { | ||
| c, e := vsock.Dial(s.cid, s.port, nil) | ||
| ch <- result{c, e} | ||
| }() | ||
| select { | ||
| case r := <-ch: | ||
| if r.err != nil { | ||
| return nil, fmt.Errorf("vsock dial %d:%d: %w", s.cid, s.port, r.err) | ||
| } | ||
| return r.conn, nil | ||
| case <-time.After(dialTimeout): | ||
| go func() { | ||
| r := <-ch | ||
| if r.conn != nil { | ||
| _ = r.conn.Close() | ||
| } | ||
| }() | ||
| return nil, fmt.Errorf("vsock dial %d:%d timed out after %v", s.cid, s.port, dialTimeout) | ||
| } |
There was a problem hiding this comment.
Avoid leaking an extra goroutine on dial timeouts.
Line 262 starts a second goroutine that waits on ch forever when vsock.Dial is the part that hangs. Each timed-out retry therefore leaks an additional goroutine, so a stuck enclave can accumulate parked goroutines indefinitely under repeated Sign failures. Let the dialing goroutine close any late connection after a timeout signal instead of spawning a post-timeout receiver.
Suggested fix
func (s *EnclaveSigner) dial() (net.Conn, error) {
type result struct {
conn net.Conn
err error
}
ch := make(chan result, 1)
+ timedOut := make(chan struct{})
go func() {
c, e := vsock.Dial(s.cid, s.port, nil)
- ch <- result{c, e}
+ select {
+ case ch <- result{c, e}:
+ case <-timedOut:
+ if c != nil {
+ _ = c.Close()
+ }
+ }
}()
select {
case r := <-ch:
if r.err != nil {
return nil, fmt.Errorf("vsock dial %d:%d: %w", s.cid, s.port, r.err)
}
return r.conn, nil
case <-time.After(dialTimeout):
- go func() {
- r := <-ch
- if r.conn != nil {
- _ = r.conn.Close()
- }
- }()
+ close(timedOut)
return nil, fmt.Errorf("vsock dial %d:%d timed out after %v", s.cid, s.port, dialTimeout)
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@node/l1sequencer/enclave_signer.go` around lines 245 - 269, In
EnclaveSigner.dial, avoid leaking the post-timeout goroutine by adding a done
channel and signalling it when the timeout branch runs; change the goroutine
that calls vsock.Dial to, after receiving (c,e), attempt to send the result into
ch with a select that also listens on done and closes c if done was already
closed, and in the timeout branch close the done channel before returning the
timeout error; reference symbols: EnclaveSigner.dial, ch, vsock.Dial,
dialTimeout, and the new done channel.
- L1Sequencer.sol: sequencerHistory[], updateSequencer, getSequencerAt, initializeHistory - Bindings: updated ABI for new contract interface - SequencerVerifier: L1 history cache with interval cursor optimization - Signer: simplified interface (removed IsActiveSequencer) - 022-SequencerInit.ts: fixed initialize call (1 param instead of 2) - Docker: added L1_SEQUENCER_CONTRACT env for all nodes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- run-test.sh: added build-malicious and p2p-test commands - docker-compose.override.yml: malicious-geth-0 and malicious-node-0 services - Tests: T-01~T-05 (active attacks) + T-06 (BlockSync pollution) + T-07 (resilience) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix grep -c multiline: use || true instead of || echo "0" - Fix env var loss: malicious override must include full env list - Swap approach: reuse synced sentry instead of fresh malicious container - Uncomment CONSENSUS_SWITCH_HEIGHT for V2 mode activation - Add SEQUENCER_PRIVATE_KEY to node-0 override Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use staking key (0xd998...) as SEQUENCER_PRIVATE_KEY for node-0 - Add initializeHistory() call in setup to register sequencer on L1 - Fixes "no sequencer record" error in V2 mode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- T-06: use blocksync-forge (blocksync/reactor.go) instead of sync-forge (broadcast_reactor.go) - targets the actual V1 vulnerability path - T-06: stop node-3 to create gap, restart to trigger BlockSync - Phase 0: explicit checks for V2 mode, signer, and switch height - T-04: use futureHeight (currentHeight+10000) for deterministic unsolicited test - Separate log files per phase to prevent cross-contamination Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add L1Sequencer.t.sol: 27 Foundry tests covering initialize, initializeHistory, updateSequencer, getSequencerAt binary search edge cases, and access control - Regenerate l1sequencer.go with abigen (bytecode now matches current contract with sequencerHistory[], binary search, etc.) - Update verifier.go: L1SequencerHistoryRecord -> L1SequencerSequencerRecord - Add exponential backoff retry (10s -> 20s -> ... -> 5min) when initial history load fails, instead of waiting full 5 minutes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Avoids stuttering in abigen output (L1SequencerSequencerRecord -> L1SequencerHistoryRecord). No ABI/storage layout change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…om L1 contract Unify the upgrade height source: instead of a CLI flag / env var, the verifier now sets upgrade.UpgradeBlockHeight from the first history record fetched from the L1Sequencer contract. - node/l1sequencer/verifier.go: call SetUpgradeBlockHeight on first successful history load (prev==0) - node/cmd/node/main.go: remove ConsensusSwitchHeight flag read; require L1 Sequencer contract address - node/flags/flags.go: delete ConsensusSwitchHeight flag definition - docker-compose.override.yml: remove 5× MORPH_NODE_CONSENSUS_SWITCH_HEIGHT - run-test.sh: remove set_upgrade_height function, add wait_for_l1_finalized to ensure L1 contract data is finalized before L2 nodes start Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These env var overrides (DEPOSIT_CONTRACT_ADDRESS, SYNC_START_HEIGHT) and the malicious_geth_data volume should be managed via overlay/override files, not by modifying the base compose file directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ncer Add hakeeper module implementing a 3-node Raft cluster for sequencer HA. The HA cluster provides automatic leader election, block replication, and failover without changing the on-chain sequencer identity. node/hakeeper/: - HAService: wraps hashicorp/raft, implements SequencerHA interface - Config: layered loading (defaults -> TOML file -> CLI flags -> resolve -> validate) supports auto-detection of server_id (hostname) and advertised_addr (local IP) - BlockFSM: Raft FSM for block replication; onApplied callback drives geth apply - leaderMonitor: gates block production behind Barrier to ensure log catch-up - rpc/: JSON-RPC admin API (ha_leader, ha_clusterMembership, ha_addServerAsVoter, ha_removeServer, ha_transferLeader, ha_transferLeaderToServer) with HTTP middleware token auth on write operations node/flags/flags.go: - New flags: --ha.enabled, --ha.config, --ha.bootstrap, --ha.join, --ha.server-id, --ha.advertised-addr, --ha.rpc-token node/cmd/node/main.go: - initHAService(): init HA from flags/config when --ha.enabled is set - Fix typed-nil interface bug: pass untyped nil when HA is disabled node/sequencer/tm_node.go: - Pass HA service to tendermint node setup node/go.mod: - Add hashicorp/raft v1.7.1, raft-boltdb/v2 ops/docker-sequencer-test/: - docker-compose.ha-override.yml: 3-node Raft cluster config for devnet - run-ha-test.sh: 29-case integration test suite (config, cluster, block production, failover, admin API, lifecycle) - run-perf-test.sh: performance test harness Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire up the new engine_newL2BlockV2 API for reorg support: - Executor.ApplyBlockV2 now returns (applied bool, err error) matching the updated L2Node interface; detects idempotent skips and reorgs using BlockNumber + BlockByNumber checks before calling NewL2BlockV2 - RetryableClient.NewL2BlockV2 wraps the new authclient method with exponential backoff retry; excludes WrongBlockNumberError and ParentNotFoundError from retry (permanent errors) - RetryableClient.AssembleL2BlockV2 added for parent-hash-based block assembly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add BlockHashMismatchError and InvalidNextL1MsgIndexError to the retryableError() exclusion list so the executor stops re-sending invalid payloads back to geth. Made-with: Cursor
… not in PBFT validator set - Add Syncer()/SetSyncer() accessors to Executor for explicit syncer wiring - Start L1 syncer eagerly in main.go for separated-deployment / HA sequencers that are not PBFT validators (lazy-init path never fires for them) - Guard Syncer.Start() with atomic flag to prevent duplicate goroutines Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Increase blockCh buffer from 200 to 1000 to reduce drops under load. - Panic on nil onApplied callback in BlockFSM.Apply: this can only happen due to a programmer error (forgot to wire SetOnBlockApplied) and would otherwise silently succeed on the leader while followers diverge. - gofmt: realign one-line method bodies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- go-ethereum: v0.0.0-20260508105911-56deb7072ae4 - tendermint: v0.0.0-20260508065906-9e56b04da3c8 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add explicit replace directives in every go.mod to override MVS, because token-price-oracle indirectly required v1.10.14-..., which caused all workspace modules to resolve to the older version and miss new APIs (NewL2BlockV2, AssembleL2BlockV2, SetBlockTags, MorphTxType, updated AssembleL2Block/NewL2Block signatures). Drop the local-path replace block from go.work so the pinned pseudo-version is actually used. Follow-up: investigate the indirect dep that requires v1.10.14 and bump it so these per-module replaces can be removed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pull in the persistent-peer ban exemption + sigStore.Close fixes from morph-l2/tendermint feat/sequencer-optimize (commit c6f7e21e4). Updated via 'make update' after bumping TENDERMINT_TARGET_VERSION in the Makefile. All sub-modules tidied. morphnode + tendermint binaries build cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements SPEC-005 derivation verification: - verify_local.go: Path B local-rebuild blob verification (rebuild blockContext + L2 tx blob from local chain, compare against on-chain Rollup batch) - verify.go: extract verifyBatchRoots, gate stateException on real divergence verdicts (not transient errors) - finalizer.go + reorg.go: derivation-driven finalizer + L1 reorg detection (SPEC-005 §4.7.6), rewind-and-reset for canonicality - tag_advance.go + metrics.go: derivation-driven L2 tag management, structured failure diagnostics for Path B Common: export common/batch.BuildBlockContext for derivation reuse. go-ethereum: bump submodule to 045be0fdc7ca (NewL2BlockV2 self-heal). Ops: add second sentry node for derivation validation (4-nodes compose, node5 key, devnet setup). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaced by derivation-driven tag management (see prior commit): - node/validator/*: removed dedicated validator role - node/blocktag/*: removed standalone block-tag advancer service - node/cmd/node/main.go: drop validator/blocktag wiring - node/flags/flags.go: drop validator-specific CLI flags - ops-morph/docker-compose-validator.yml: drop validator compose file Tags are now advanced inline by the derivation loop, eliminating the extra service and the role-based branching in main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- sync/syncer.go: make Syncer.Start idempotent via sync.Once so re-entry on retry no longer leaks goroutines or races on state - types/retryable_client.go (+test): treat ethereum.NotFound as a permanent failure rather than retrying forever - db/keys.go + db/store.go: derivation-related key helpers used by the new finalizer / tag-advance paths Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Shared L1 client: main.go does a single ethclient.Dial and threads
it into syncer, derivation, l1sequencer Tracker/Verifier/Signer, and
the rollup binding. Reads l1.rpc directly from CLI flags instead of
going through derivation.Config first.
* Derivation reorg: blob-hash mismatch now invokes the tendermint
Node.StopReactorsBeforeReorg → fetch full batch → deriveForce →
StartReactorsAfterReorg(post-reorg height) flow. HA-mode adds a
hard-stop guard (cluster invariant violation; logs full context and
returns instead of self-healing). Mock mode (d.node==nil) skips the
reactor cycle.
* deriveForce uses the new NewL2BlockV2 (*Header, error) return; the
redundant HeaderByNumber readback is gone, parent chains via the
returned header. lastHeader is initialised from the batch's parent
so it tracks the chain head end-to-end.
* Executor.ApplyBlockV2 + RetryableClient.NewL2BlockV2 updated for the
new signature. Executor.updateSequencerSet no longer stops the
syncer when this node ceases to be sequencer — derivation needs it
running on every node.
* deps: bump tendermint to 6393e1eaad71 (derivation reorg API,
StopReactorsBeforeReorg / StartReactorsAfterReorg) and go-ethereum
to 5c5b433f18f2 (NewL2BlockV2 returns header, NextL1MsgIndex
backfill on isSafe path so writeBlockStateWithoutHead's gate passes
when callers don't know the per-block index). Replace directives
for both grouped at the top of every go.mod for review locality.
* docker-sequencer-test: Dockerfile copies common/go.mod alongside
the others (workspace replace requires it). HA override adds
L1_ETH_BEACON_RPC env for ha-node-{0,1,2} (derivation refactor
validates it at startup). run-ha-test.sh service names corrected
from morph-geth-* / sentry-geth-0 to morph-el-* / sentry-el-0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves the L1-derived block insertion path off of NewL2BlockV2(isSafe=true) onto NewSafeL2Block, which now accepts SafeL2Data.ParentHash for non-head parents and lets SetCanonical reorg the chain automatically. NewL2BlockV2 becomes sequencer-signed-only (caller supplies pre-computed execution results, gate validates). The previous isSafe=true path on NewL2BlockV2 wrote blocks with caller- supplied StateRoot — for derivation.deriveForce that was the zero hash, because L1 batch metadata only carries the batch-final PostStateRoot, not per-block roots. The resulting blocks had header.Root=0 even though their state was correctly executed and committed, breaking verifyBatchRoots forever and blocking derivation cursor advance. NewSafeL2Block executes internally and fills header.Root from stateDB.IntermediateRoot, so the header on disk is consistent with the state. * node/derivation/derivation.go: deriveForce builds SafeL2Data with ParentHash = lastHeader.Hash() and calls NewSafeL2Block instead of NewL2BlockV2(isSafe=true); safeL2DataToExecutable helper deleted. * node/types/retryable_client.go + node/core/executor.go: drop isSafe arg from NewL2BlockV2. * node/derivation/verify_local.go: outline path got a v0-parent compat shim — only reachable on test/devnet where genesis batch is v0 and V1 is day-1 enabled, so the only v0 batch in the chain is genesis; on prod (V1 day-1, V2 layered on V1) the branch is dead. Reorg semantics only exist post-V2 anyway, so processing pre-V2 via outline is not a load-bearing path. Comment explains the assumption. * deps: bump tendermint to b1b3a3a1d806 (drop dead reorg-restart test harness from Node) and go-ethereum to eb5fbf8f9748 (NewSafeL2Block ParentHash, drop NewL2BlockV2 isSafe). * ops/docker-sequencer-test/Dockerfile.tx-submitter-test: new polyrepo- context Dockerfile so tx-submitter builds against local ../tendermint and ../go-ethereum siblings — the original ops/docker/Dockerfile.submitter only sees the morph repo and can't resolve workspace replace directives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: corey <corey.zhang@bitget.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plumb the morph node into the Nitro Enclave signer (separate
morph-enclave-signer repo) so the sequencer can sign blocks without
ever holding the plaintext key. Two flags are mutually exclusive:
--sequencer.privateKey existing in-process LocalSigner
--sequencer.enclaveSignerAddr new EnclaveSigner over vsock (CID:port)
EnclaveSigner highlights:
- vsock-only, mdlayher/vsock dep added
- one persistent connection reused across Sign calls; dial wrapped
in a goroutine race with dialTimeout (mdlayher/vsock has no
context-aware Dial)
- probe + signOnce both bounded by SetDeadline(requestTimeout) so a
half-open or unresponsive enclave can't stall startup or hold
Sign()'s mutex on reconnect
- up to 3 retries per Sign call with reconnect on each failure,
Error log on exhaustion; caller (tendermint commitBlock) is
expected to treat that as fatal
- identity self-test at construction: sign a 32-byte zero hash,
recover the EVM address from the signature via secp256k1 ECDSA
recovery, abort node startup if it disagrees with the address
GetPubkey reported. Catches misconfig like wrong SECRET_ID baked
into the .eif or vsock-proxy MITM swap before tendermint comes up.
Co-Authored-By: Claude Opus 4.7 (1M context) <<EMAIL_ADDRESS>>
Unify the address format with ops-cli, which already takes
`vsock:CID:port`. parseAddr now accepts both that form and the
legacy bare `CID:port` so existing systemd / k8s configs keep
working through the rollout.
- enclave_signer.go: TrimPrefix("vsock:") in parseAddr
- flags.go: Usage doc reflects both forms
- enclave_signer_test.go: cases for both forms + new error cases
(vsock:16, vsock:abc:5000)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sequencer nodes (HA leader/follower or single-sequencer) produce blocks locally and replicate via Raft / broadcast — running derivation on a sequencer is both redundant (re-fetches L1 batches it produced) and unsafe (deriveForce would risk a self-reorg on transient divergence). Gate construction on signer == nil so only fullnodes start derivation. Drop the now-dead isHaMode field, the unreachable HA-mode hard-stop in rebuildBlob's mismatch path, and the constructor parameter.
- tendermint v0.0.0-20260602085346-ee68e1bcf49a - go-ethereum v0.0.0-20260602085100-22da419ae001
65e4e26 to
9af2c07
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ops/devnet-morph/devnet/setup_nodes.py (1)
61-67:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMissing sentry-node-1 from persistent_peers list.
The script creates 6 nodes (including
sentry-node-1added at line 48), but thepersistent_peers_valueonly lists 5 peers. Since PEX is disabled for validator nodes (lines 88-89), they rely on this static peer list for connectivity. The missing peer could cause connectivity issues in the devnet topology.🔧 Proposed fix to add sentry-node-1
persistent_peers_value = ( "93e27ea2306e158a8146d5f44caaab97496797d2@node-0:26656," "7f78b7d7a7e6bad4faf68d5731d437f4288d96d0@node-1:26656," "06c699be2f9aeb9f7ec79f508a95ff80576deb12@node-2:26656," "b1a131f40d5d3abefe0dd787513c936ef62ac2d6@node-3:26656," - "dae813274913aaf39e7cd3226a0aa8bce00644e1@sentry-node-0:26656" + "dae813274913aaf39e7cd3226a0aa8bce00644e1@sentry-node-0:26656," + "YOUR_NODE_ID_HERE@sentry-node-1:26656" )Note: Replace
YOUR_NODE_ID_HEREwith the actual node ID fromops/docker/node5/node_key.json.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/devnet-morph/devnet/setup_nodes.py` around lines 61 - 67, persistent_peers_value is missing the sentry-node-1 entry which leaves validator nodes without a static peer because PEX is disabled; update the persistent_peers_value string to include the sentry-node-1 peer using the node ID from ops/docker/node5/node_key.json (the identifier for sentry-node-1) and the same :26656 port format so it becomes a comma-separated list including "…@sentry-node-1:26656"; ensure the change is made alongside the existing persistent_peers_value definition and verify the final string contains all six node entries referenced by the node creation (including sentry-node-0 and sentry-node-1).
🧹 Nitpick comments (2)
node/hakeeper/rpc/backend.go (1)
11-11: 💤 Low valueLogger field is stored but never used.
The
logfield is initialized in the constructor but none of the API methods perform any logging. If logging is planned for future observability, consider this a reminder. Otherwise, the field can be removed to reduce noise.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/rpc/backend.go` at line 11, The Backend struct's log field (Backend.log) is initialized but never used; either remove the field and its initialization in NewBackend (and any constructor parameter) to reduce dead code, or, if observability is desired, add logging statements in the Backend methods (e.g., all API handlers on Backend) using the log field; update NewBackend to stop accepting/setting the unused logger or add meaningful process-level logs (errors/info) in methods where operations occur.ops/devnet-morph/devnet/setup_nodes.py (1)
108-110: 💤 Low valueError message is slightly misleading.
At this point,
dest_dirhas already been validated (lines 104-106), so the error message should only reference the source directory.📝 Clearer error message
- if not os.path.isdir(source_dir): - print(f"Error: Missing source or destination directory for {node}. Exiting.") + if not os.path.isdir(source_dir): + print(f"Error: Missing source directory for {node}. Exiting.") sys.exit(1)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/devnet-morph/devnet/setup_nodes.py` around lines 108 - 110, The error message inside the if not os.path.isdir(source_dir) check incorrectly references both source and destination; update the print to mention only the missing source directory for the affected node (e.g., "Error: Missing source directory for {node}. Exiting."), keeping the existing os.path.isdir(source_dir) check and the subsequent sys.exit(1) unchanged and leave the earlier dest_dir validation as is.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@node/derivation/derivation.go`:
- Around line 859-925: In deriveForce, clamp skipNumber to not exceed the batch
tip by setting skipNumber = min(skipNumber, rollupData.lastBlockNumber) (or
equivalent) before computing parentNum and entering the block loop; this ensures
the loop will not skip all blocks and return an anchor at skipNumber when the
batch contains later blocks—update deriveForce to perform this clamp using the
rollupData.lastBlockNumber and then proceed with existing parentNum,
HeaderByNumber, and NewSafeL2Block logic.
In `@node/hakeeper/rpc/backend.go`:
- Around line 22-52: The RPC methods (Leader, LeaderWithID, AddServerAsVoter,
AddServerAsNonvoter, RemoveServer, TransferLeader, TransferLeaderToServer,
ClusterMembership) ignore the provided context.Context and do not propagate
cancellation/timeouts to the underlying ConsensusAdapter calls; either update
the ConsensusAdapter interface methods (e.g., Leader(), LeaderWithID(),
AddVoter(), AddNonVoter(), RemoveServer(), TransferLeader(), TransferLeaderTo())
to accept context.Context and change the APIBackend methods to pass ctx through,
or if the consensus methods are intentionally non-cancellable, document that
constraint on the ConsensusAdapter and APIBackend methods and add a comment
explaining why ctx is unused; ensure you reference the exact adapter methods
(AddVoter/AddNonVoter/RemoveServer/TransferLeader/TransferLeaderTo/ClusterMembership/Leader/LeaderWithID)
when making the change.
---
Outside diff comments:
In `@ops/devnet-morph/devnet/setup_nodes.py`:
- Around line 61-67: persistent_peers_value is missing the sentry-node-1 entry
which leaves validator nodes without a static peer because PEX is disabled;
update the persistent_peers_value string to include the sentry-node-1 peer using
the node ID from ops/docker/node5/node_key.json (the identifier for
sentry-node-1) and the same :26656 port format so it becomes a comma-separated
list including "…@sentry-node-1:26656"; ensure the change is made alongside the
existing persistent_peers_value definition and verify the final string contains
all six node entries referenced by the node creation (including sentry-node-0
and sentry-node-1).
---
Nitpick comments:
In `@node/hakeeper/rpc/backend.go`:
- Line 11: The Backend struct's log field (Backend.log) is initialized but never
used; either remove the field and its initialization in NewBackend (and any
constructor parameter) to reduce dead code, or, if observability is desired, add
logging statements in the Backend methods (e.g., all API handlers on Backend)
using the log field; update NewBackend to stop accepting/setting the unused
logger or add meaningful process-level logs (errors/info) in methods where
operations occur.
In `@ops/devnet-morph/devnet/setup_nodes.py`:
- Around line 108-110: The error message inside the if not
os.path.isdir(source_dir) check incorrectly references both source and
destination; update the print to mention only the missing source directory for
the affected node (e.g., "Error: Missing source directory for {node}.
Exiting."), keeping the existing os.path.isdir(source_dir) check and the
subsequent sys.exit(1) unchanged and leave the earlier dest_dir validation as
is.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fafcc7ff-eadf-4c94-8beb-58cdf1e85437
⛔ Files ignored due to path filters (10)
bindings/go.sumis excluded by!**/*.sumcommon/go.sumis excluded by!**/*.sumcontracts/go.sumis excluded by!**/*.sumgo.work.sumis excluded by!**/*.sumnode/go.sumis excluded by!**/*.sumops/l2-genesis/go.sumis excluded by!**/*.sumops/tools/go.sumis excluded by!**/*.sumoracle/go.sumis excluded by!**/*.sumtoken-price-oracle/go.sumis excluded by!**/*.sumtx-submitter/go.sumis excluded by!**/*.sum
📒 Files selected for processing (83)
Makefilebindings/bindings/l1sequencer.gobindings/go.modcommon/batch/batch_cache.gocommon/go.modcontracts/contracts/l1/L1Sequencer.solcontracts/contracts/test/L1Sequencer.t.solcontracts/contracts/test/base/L1SequencerBase.t.solcontracts/deploy/022-SequencerInit.tscontracts/go.modgo-ethereumnode/blocktag/config.gonode/blocktag/service.gonode/cmd/node/main.gonode/core/executor.gonode/core/sequencers.gonode/core/sync.gonode/db/keys.gonode/db/store.gonode/derivation/batch_decode.gonode/derivation/batch_info.gonode/derivation/config.gonode/derivation/config_test.gonode/derivation/database.gonode/derivation/derivation.gonode/derivation/finalizer.gonode/derivation/metrics.gonode/derivation/reorg.gonode/derivation/static_scan_test.gonode/derivation/tag_advance.gonode/derivation/tag_advance_test.gonode/derivation/verify.gonode/derivation/verify_local.gonode/flags/flags.gonode/go.modnode/hakeeper/block_fsm.gonode/hakeeper/block_payload.gonode/hakeeper/config.gonode/hakeeper/ha.toml.examplenode/hakeeper/ha_service.gonode/hakeeper/leader_monitor.gonode/hakeeper/rpc/api.gonode/hakeeper/rpc/auth.gonode/hakeeper/rpc/auth_test.gonode/hakeeper/rpc/backend.gonode/hakeeper/rpc/client.gonode/hakeeper/rpc/server.gonode/hakeeper/rpc/types.gonode/l1sequencer/enclave_signer.gonode/l1sequencer/enclave_signer_test.gonode/l1sequencer/signer.gonode/l1sequencer/verifier.gonode/ops-morph/docker-compose-validator.ymlnode/sequencer/tm_node.gonode/sync/syncer.gonode/sync/syncer_test.gonode/types/retryable_client.gonode/types/retryable_client_test.gonode/validator/config.gonode/validator/validator.gonode/validator/validator_test.goops/devnet-morph/devnet/setup_nodes.pyops/docker-sequencer-test/Dockerfile.l2-node-testops/docker-sequencer-test/Dockerfile.tx-submitter-testops/docker-sequencer-test/check-whitelist-test.shops/docker-sequencer-test/docker-compose.ha-override.ymlops/docker-sequencer-test/docker-compose.override.ymlops/docker-sequencer-test/docker-compose.reorg-test.override.ymlops/docker-sequencer-test/run-ha-test.shops/docker-sequencer-test/run-perf-test.shops/docker-sequencer-test/run-test.shops/docker/.envops/docker/docker-compose-4nodes.ymlops/docker/docker-compose-reth.ymlops/docker/ha-nodekey0ops/docker/ha-nodekey1ops/docker/ha-nodekey2ops/docker/node5/node_key.jsonops/l2-genesis/go.modops/tools/go.modoracle/go.modtoken-price-oracle/go.modtx-submitter/go.mod
💤 Files with no reviewable changes (8)
- node/validator/config.go
- node/ops-morph/docker-compose-validator.yml
- node/derivation/batch_decode.go
- node/validator/validator_test.go
- node/validator/validator.go
- node/blocktag/service.go
- ops/docker/docker-compose-reth.yml
- node/blocktag/config.go
✅ Files skipped from review due to trivial changes (5)
- ops/docker/ha-nodekey1
- ops/docker-sequencer-test/Dockerfile.l2-node-test
- node/l1sequencer/enclave_signer_test.go
- node/hakeeper/rpc/api.go
- go-ethereum
🚧 Files skipped from review as they are similar to previous changes (61)
- ops/docker-sequencer-test/docker-compose.reorg-test.override.yml
- node/hakeeper/ha.toml.example
- node/core/sequencers.go
- ops/docker/node5/node_key.json
- contracts/deploy/022-SequencerInit.ts
- contracts/go.mod
- ops/docker/ha-nodekey2
- node/derivation/batch_info.go
- ops/l2-genesis/go.mod
- node/sync/syncer_test.go
- ops/tools/go.mod
- node/hakeeper/leader_monitor.go
- common/batch/batch_cache.go
- Makefile
- contracts/contracts/test/base/L1SequencerBase.t.sol
- node/hakeeper/block_payload.go
- ops/docker-sequencer-test/Dockerfile.tx-submitter-test
- node/derivation/verify.go
- node/derivation/metrics.go
- node/hakeeper/rpc/auth.go
- bindings/go.mod
- node/sequencer/tm_node.go
- node/core/sync.go
- node/sync/syncer.go
- ops/docker-sequencer-test/check-whitelist-test.sh
- node/derivation/database.go
- ops/docker/ha-nodekey0
- node/go.mod
- node/db/store.go
- node/hakeeper/rpc/types.go
- ops/docker-sequencer-test/docker-compose.override.yml
- node/derivation/static_scan_test.go
- node/derivation/config_test.go
- ops/docker-sequencer-test/docker-compose.ha-override.yml
- node/derivation/finalizer.go
- node/l1sequencer/signer.go
- ops/docker/.env
- node/derivation/tag_advance_test.go
- node/derivation/verify_local.go
- node/types/retryable_client.go
- node/hakeeper/rpc/server.go
- node/types/retryable_client_test.go
- node/derivation/reorg.go
- node/hakeeper/config.go
- node/flags/flags.go
- node/hakeeper/ha_service.go
- ops/docker/docker-compose-4nodes.yml
- node/hakeeper/rpc/client.go
- node/cmd/node/main.go
- node/hakeeper/block_fsm.go
- node/l1sequencer/verifier.go
- node/hakeeper/rpc/auth_test.go
- node/l1sequencer/enclave_signer.go
- contracts/contracts/test/L1Sequencer.t.sol
- node/derivation/tag_advance.go
- bindings/bindings/l1sequencer.go
- ops/docker-sequencer-test/run-ha-test.sh
- node/core/executor.go
- ops/docker-sequencer-test/run-perf-test.sh
- ops/docker-sequencer-test/run-test.sh
- contracts/contracts/l1/L1Sequencer.sol
| func (d *Derivation) deriveForce(rollupData *BatchInfo, skipNumber uint64) (*eth.Header, error) { | ||
| firstNum := rollupData.firstBlockNumber | ||
| if firstNum == 0 { | ||
| return nil, fmt.Errorf("invalid firstBlockNumber 0 for batch %d", rollupData.batchIndex) | ||
| } | ||
|
|
||
| // Anchor: parent of the first block we will WRITE must exist locally. | ||
| // scenario B (skipNumber==0): firstNum-1. | ||
| // scenario C: max(firstNum-1, skipNumber). | ||
| parentNum := firstNum - 1 | ||
| if skipNumber > parentNum { | ||
| parentNum = skipNumber | ||
| } | ||
| lastHeader, err := d.l2Client.HeaderByNumber(d.ctx, big.NewInt(int64(parentNum))) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("read parent header at %d: %w", parentNum, err) | ||
| } | ||
| if lastHeader == nil { | ||
| return nil, fmt.Errorf("parent header at %d missing", parentNum) | ||
| } | ||
|
|
||
| for _, blockData := range rollupData.blockContexts { | ||
| // Skip blocks already present locally (scenario C). For scenario B | ||
| // skipNumber == 0 means this branch is never taken. | ||
| if blockData.SafeL2Data.Number <= skipNumber { | ||
| continue | ||
| } | ||
|
|
||
| // Pin the parent so SetCanonical reorgs from the local fork to the | ||
| // L1-canonical chain. NewSafeL2Block executes the block internally | ||
| // and fills the header with the resulting state/receipt roots — | ||
| // the caller only knows block contents (txs + timestamp), not the | ||
| // post-execution roots, so this is the right API for the rewrite. | ||
| parentHash := lastHeader.Hash() | ||
| safeData := *blockData.SafeL2Data | ||
| safeData.ParentHash = &parentHash | ||
|
|
||
| err = func() error { | ||
| ctx, cancel := context.WithTimeout(context.Background(), time.Duration(60)*time.Second) | ||
| defer cancel() | ||
| next, err := d.l2Client.NewSafeL2Block(ctx, &safeData) | ||
| if err != nil { | ||
| d.logger.Error("NewSafeL2Block failed", | ||
| "batchIndex", rollupData.batchIndex, | ||
| "blockNumber", safeData.Number, | ||
| "parent", parentHash.Hex(), | ||
| "error", err, | ||
| ) | ||
| return err | ||
| } | ||
| if next == nil { | ||
| return fmt.Errorf("header at %d missing after NewSafeL2Block", safeData.Number) | ||
| } | ||
| lastHeader = next | ||
| return nil | ||
| }() | ||
| if err != nil { | ||
| return nil, fmt.Errorf("apply block %d: %w", safeData.Number, err) | ||
| } | ||
|
|
||
| d.logger.Info("block written via NewSafeL2Block", | ||
| "batchIndex", rollupData.batchIndex, | ||
| "blockNumber", safeData.Number, | ||
| "hash", lastHeader.Hash().Hex(), | ||
| ) | ||
| } | ||
| return lastHeader, nil |
There was a problem hiding this comment.
Clamp the fill-gap skip point to the batch tip.
If scenario C re-reads localLatest after P2P has already caught up past this batch, skipNumber can be greater than rollupData.lastBlockNumber. In that case this loop writes nothing and returns the anchor header at skipNumber, so the later root check and advanceSafe run against the wrong block.
💡 Suggested fix
func (d *Derivation) deriveForce(rollupData *BatchInfo, skipNumber uint64) (*eth.Header, error) {
firstNum := rollupData.firstBlockNumber
if firstNum == 0 {
return nil, fmt.Errorf("invalid firstBlockNumber 0 for batch %d", rollupData.batchIndex)
}
+
+ if skipNumber >= rollupData.lastBlockNumber {
+ lastHeader, err := d.l2Client.HeaderByNumber(d.ctx, big.NewInt(int64(rollupData.lastBlockNumber)))
+ if err != nil {
+ return nil, fmt.Errorf("read batch tip at %d: %w", rollupData.lastBlockNumber, err)
+ }
+ if lastHeader == nil {
+ return nil, fmt.Errorf("batch tip at %d missing", rollupData.lastBlockNumber)
+ }
+ return lastHeader, nil
+ }
// Anchor: parent of the first block we will WRITE must exist locally.
// scenario B (skipNumber==0): firstNum-1.
// scenario C: max(firstNum-1, skipNumber).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (d *Derivation) deriveForce(rollupData *BatchInfo, skipNumber uint64) (*eth.Header, error) { | |
| firstNum := rollupData.firstBlockNumber | |
| if firstNum == 0 { | |
| return nil, fmt.Errorf("invalid firstBlockNumber 0 for batch %d", rollupData.batchIndex) | |
| } | |
| // Anchor: parent of the first block we will WRITE must exist locally. | |
| // scenario B (skipNumber==0): firstNum-1. | |
| // scenario C: max(firstNum-1, skipNumber). | |
| parentNum := firstNum - 1 | |
| if skipNumber > parentNum { | |
| parentNum = skipNumber | |
| } | |
| lastHeader, err := d.l2Client.HeaderByNumber(d.ctx, big.NewInt(int64(parentNum))) | |
| if err != nil { | |
| return nil, fmt.Errorf("read parent header at %d: %w", parentNum, err) | |
| } | |
| if lastHeader == nil { | |
| return nil, fmt.Errorf("parent header at %d missing", parentNum) | |
| } | |
| for _, blockData := range rollupData.blockContexts { | |
| // Skip blocks already present locally (scenario C). For scenario B | |
| // skipNumber == 0 means this branch is never taken. | |
| if blockData.SafeL2Data.Number <= skipNumber { | |
| continue | |
| } | |
| // Pin the parent so SetCanonical reorgs from the local fork to the | |
| // L1-canonical chain. NewSafeL2Block executes the block internally | |
| // and fills the header with the resulting state/receipt roots — | |
| // the caller only knows block contents (txs + timestamp), not the | |
| // post-execution roots, so this is the right API for the rewrite. | |
| parentHash := lastHeader.Hash() | |
| safeData := *blockData.SafeL2Data | |
| safeData.ParentHash = &parentHash | |
| err = func() error { | |
| ctx, cancel := context.WithTimeout(context.Background(), time.Duration(60)*time.Second) | |
| defer cancel() | |
| next, err := d.l2Client.NewSafeL2Block(ctx, &safeData) | |
| if err != nil { | |
| d.logger.Error("NewSafeL2Block failed", | |
| "batchIndex", rollupData.batchIndex, | |
| "blockNumber", safeData.Number, | |
| "parent", parentHash.Hex(), | |
| "error", err, | |
| ) | |
| return err | |
| } | |
| if next == nil { | |
| return fmt.Errorf("header at %d missing after NewSafeL2Block", safeData.Number) | |
| } | |
| lastHeader = next | |
| return nil | |
| }() | |
| if err != nil { | |
| return nil, fmt.Errorf("apply block %d: %w", safeData.Number, err) | |
| } | |
| d.logger.Info("block written via NewSafeL2Block", | |
| "batchIndex", rollupData.batchIndex, | |
| "blockNumber", safeData.Number, | |
| "hash", lastHeader.Hash().Hex(), | |
| ) | |
| } | |
| return lastHeader, nil | |
| func (d *Derivation) deriveForce(rollupData *BatchInfo, skipNumber uint64) (*eth.Header, error) { | |
| firstNum := rollupData.firstBlockNumber | |
| if firstNum == 0 { | |
| return nil, fmt.Errorf("invalid firstBlockNumber 0 for batch %d", rollupData.batchIndex) | |
| } | |
| if skipNumber >= rollupData.lastBlockNumber { | |
| lastHeader, err := d.l2Client.HeaderByNumber(d.ctx, big.NewInt(int64(rollupData.lastBlockNumber))) | |
| if err != nil { | |
| return nil, fmt.Errorf("read batch tip at %d: %w", rollupData.lastBlockNumber, err) | |
| } | |
| if lastHeader == nil { | |
| return nil, fmt.Errorf("batch tip at %d missing", rollupData.lastBlockNumber) | |
| } | |
| return lastHeader, nil | |
| } | |
| // Anchor: parent of the first block we will WRITE must exist locally. | |
| // scenario B (skipNumber==0): firstNum-1. | |
| // scenario C: max(firstNum-1, skipNumber). | |
| parentNum := firstNum - 1 | |
| if skipNumber > parentNum { | |
| parentNum = skipNumber | |
| } | |
| lastHeader, err := d.l2Client.HeaderByNumber(d.ctx, big.NewInt(int64(parentNum))) | |
| if err != nil { | |
| return nil, fmt.Errorf("read parent header at %d: %w", parentNum, err) | |
| } | |
| if lastHeader == nil { | |
| return nil, fmt.Errorf("parent header at %d missing", parentNum) | |
| } | |
| for _, blockData := range rollupData.blockContexts { | |
| // Skip blocks already present locally (scenario C). For scenario B | |
| // skipNumber == 0 means this branch is never taken. | |
| if blockData.SafeL2Data.Number <= skipNumber { | |
| continue | |
| } | |
| // Pin the parent so SetCanonical reorgs from the local fork to the | |
| // L1-canonical chain. NewSafeL2Block executes the block internally | |
| // and fills the header with the resulting state/receipt roots — | |
| // the caller only knows block contents (txs + timestamp), not the | |
| // post-execution roots, so this is the right API for the rewrite. | |
| parentHash := lastHeader.Hash() | |
| safeData := *blockData.SafeL2Data | |
| safeData.ParentHash = &parentHash | |
| err = func() error { | |
| ctx, cancel := context.WithTimeout(context.Background(), time.Duration(60)*time.Second) | |
| defer cancel() | |
| next, err := d.l2Client.NewSafeL2Block(ctx, &safeData) | |
| if err != nil { | |
| d.logger.Error("NewSafeL2Block failed", | |
| "batchIndex", rollupData.batchIndex, | |
| "blockNumber", safeData.Number, | |
| "parent", parentHash.Hex(), | |
| "error", err, | |
| ) | |
| return err | |
| } | |
| if next == nil { | |
| return fmt.Errorf("header at %d missing after NewSafeL2Block", safeData.Number) | |
| } | |
| lastHeader = next | |
| return nil | |
| }() | |
| if err != nil { | |
| return nil, fmt.Errorf("apply block %d: %w", safeData.Number, err) | |
| } | |
| d.logger.Info("block written via NewSafeL2Block", | |
| "batchIndex", rollupData.batchIndex, | |
| "blockNumber", safeData.Number, | |
| "hash", lastHeader.Hash().Hex(), | |
| ) | |
| } | |
| return lastHeader, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@node/derivation/derivation.go` around lines 859 - 925, In deriveForce, clamp
skipNumber to not exceed the batch tip by setting skipNumber = min(skipNumber,
rollupData.lastBlockNumber) (or equivalent) before computing parentNum and
entering the block loop; this ensures the loop will not skip all blocks and
return an anchor at skipNumber when the batch contains later blocks—update
deriveForce to perform this clamp using the rollupData.lastBlockNumber and then
proceed with existing parentNum, HeaderByNumber, and NewSafeL2Block logic.
| func (api *APIBackend) Leader(ctx context.Context) (bool, error) { | ||
| return api.cons.Leader(), nil | ||
| } | ||
|
|
||
| func (api *APIBackend) LeaderWithID(ctx context.Context) (*ServerInfo, error) { | ||
| return api.cons.LeaderWithID(), nil | ||
| } | ||
|
|
||
| func (api *APIBackend) AddServerAsVoter(ctx context.Context, id string, addr string, version uint64) error { | ||
| return api.cons.AddVoter(id, addr, version) | ||
| } | ||
|
|
||
| func (api *APIBackend) AddServerAsNonvoter(ctx context.Context, id string, addr string, version uint64) error { | ||
| return api.cons.AddNonVoter(id, addr, version) | ||
| } | ||
|
|
||
| func (api *APIBackend) RemoveServer(ctx context.Context, id string, version uint64) error { | ||
| return api.cons.RemoveServer(id, version) | ||
| } | ||
|
|
||
| func (api *APIBackend) TransferLeader(ctx context.Context) error { | ||
| return api.cons.TransferLeader() | ||
| } | ||
|
|
||
| func (api *APIBackend) TransferLeaderToServer(ctx context.Context, id string, addr string) error { | ||
| return api.cons.TransferLeaderTo(id, addr) | ||
| } | ||
|
|
||
| func (api *APIBackend) ClusterMembership(ctx context.Context) (*ClusterMembership, error) { | ||
| return api.cons.ClusterMembership() | ||
| } |
There was a problem hiding this comment.
Context is systematically ignored across all RPC methods.
All API methods accept context.Context but none propagate it to the underlying ConsensusAdapter calls. For cluster management operations (adding/removing servers, leader transfers), which can block during network partitions or leadership elections, ignoring context prevents:
- RPC clients from canceling slow operations
- Timeout enforcement
- Graceful handling of client disconnections
If ConsensusAdapter methods are synchronous and non-cancellable by design, consider documenting this constraint. Otherwise, extend the ConsensusAdapter interface to accept context parameters.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@node/hakeeper/rpc/backend.go` around lines 22 - 52, The RPC methods (Leader,
LeaderWithID, AddServerAsVoter, AddServerAsNonvoter, RemoveServer,
TransferLeader, TransferLeaderToServer, ClusterMembership) ignore the provided
context.Context and do not propagate cancellation/timeouts to the underlying
ConsensusAdapter calls; either update the ConsensusAdapter interface methods
(e.g., Leader(), LeaderWithID(), AddVoter(), AddNonVoter(), RemoveServer(),
TransferLeader(), TransferLeaderTo()) to accept context.Context and change the
APIBackend methods to pass ctx through, or if the consensus methods are
intentionally non-cancellable, document that constraint on the ConsensusAdapter
and APIBackend methods and add a comment explaining why ctx is unused; ensure
you reference the exact adapter methods
(AddVoter/AddNonVoter/RemoveServer/TransferLeader/TransferLeaderTo/ClusterMembership/Leader/LeaderWithID)
when making the change.
GenesisBatchHeader produced a v0 (249-byte) header, while every
subsequent commitBatch on a fresh chain went through v1/v2 (257-byte)
codec. Tooling and audit had to special-case batchIndex 0 as a v0
outlier; the storage layout was inconsistent without payoff.
Bump genesis to v2:
- genesisBatchVersion 0 → 2
- genesisBatchHeaderLength 249 → 257
- write lastBlockNumber (=genesisHeader.Number, 0 for fresh chains)
at offset 249
Rollup.sol's importGenesisBatch is already version-agnostic at the
loader (`_loadBatchHeader` dispatches v1/v2 to BatchHeaderCodecV1
which validates the 257-byte length); field reads use V0 codec
offsets that match across v0/v1/v2 for the leading 249 bytes; the
ZERO_VERSIONED_HASH check on blobVersionedHash holds for v2's
"no blob attached" sentinel. No Solidity changes required.
Mainnet immutability note: chains already imported with a v0 genesis
keep their v0 committedBatches[0] forever — this commit only changes
genesis emission for fresh chains (devnet bootstrap, future mainnets).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4e0de0d to
36e9ddb
Compare
Co-authored-by: allen.wu <allen.wu@bitget.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
node/sync/config.go (1)
52-57:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix
SyncDepositContractAddrvalidation to reject the zero address
len(c.L1MessageQueueAddress.Bytes()) == 0can never fire becausecommon.Address.Bytes()always returns a 20-byte slice, so the config allows0x000…000to pass andNewSyncerwill proceed with the wrong contract.Suggested fix
if ctx.GlobalIsSet(flags.SyncDepositContractAddr.Name) { addr := common.HexToAddress(ctx.GlobalString(flags.SyncDepositContractAddr.Name)) c.L1MessageQueueAddress = &addr - if len(c.L1MessageQueueAddress.Bytes()) == 0 { + if addr == (common.Address{}) { return errors.New("invalid SyncDepositContractAddr") } } else if ctx.GlobalBool(flags.MainnetFlag.Name) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/sync/config.go` around lines 52 - 57, The validation currently checks len(c.L1MessageQueueAddress.Bytes()) which is always 20 and thus accepts the zero address; change the check to explicitly detect the zero address by comparing the parsed addr to an empty common.Address (e.g. addr == common.Address{}), and return the error from the same block where you set c.L1MessageQueueAddress (the ctx.GlobalIsSet / common.HexToAddress / c.L1MessageQueueAddress assignment). Ensure NewSyncer then never receives a zero address.
🧹 Nitpick comments (1)
token-price-oracle/go.mod (1)
5-5: ⚡ Quick winConsider aligning the require directive with the replace directive.
The replace directive (line 5) points to commit
e0a2cd340919from 2026-06-03, while the require directive (line 13) specifies versionv1.10.14-0.20251219060125-03910bc750a2from 2025-12-19. This ~6-month version drift could cause confusion, as the require directive documents an older version than what's actually used.While the replace directive takes precedence (so the newer code will be used), aligning both directives would improve clarity and prevent potential confusion during dependency analysis.
Note:
common/go.modexhibits the same pattern, suggesting this may be intentional across the repository.📦 Verify alignment across all go.mod files
Run the following script to check if this pattern is consistent across the repository:
#!/bin/bash # Description: Check go-ethereum version alignment across all go.mod files echo "Checking replace and require directives for go-ethereum across all go.mod files:" echo "" fd -t f '^go\.mod$' -x sh -c 'echo "=== {} ==="; grep -A1 "morph-l2/go-ethereum" {} | grep -E "(replace|require|=>)" | head -4; echo ""'Also applies to: 13-13
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@token-price-oracle/go.mod` at line 5, The require directive for github.com/morph-l2/go-ethereum is out of sync with the replace directive: update the require entry to match the replace pseudo-version (v0.0.0-20260603075727-e0a2cd340919) so both directives reference the same commit (e0a2cd340919) and remove the potential version-drift confusion when resolving dependencies.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@node/sync/config.go`:
- Around line 58-63: The flags handling for Mainnet and Hoodi currently uses an
else-if chain (checking ctx.GlobalBool(flags.MainnetFlag.Name) then
ctx.GlobalBool(flags.HoodiFlag.Name)) so if both are set Mainnet silently wins;
change the logic in the same block to explicitly detect when both
ctx.GlobalBool(flags.MainnetFlag.Name) and ctx.GlobalBool(flags.HoodiFlag.Name)
are true and return an error (or exit) instead of proceeding, otherwise set
c.L1MessageQueueAddress to the appropriate address
(types.MainnetSyncDepositContractAddress or
types.HoodiSyncDepositContractAddress) when only one is set; reference the
existing symbols ctx.GlobalBool, flags.MainnetFlag.Name, flags.HoodiFlag.Name
and c.L1MessageQueueAddress to locate and update the logic.
In `@node/types/networks.go`:
- Around line 13-18: MainnetL1SequencerContractAddress and
HoodiL1SequencerContractAddress are set to common.HexToAddress("") which yields
the zero address and causes the required-contract check to fail when presets
(--mainnet/--hoodi) are used without an explicit --l1.sequencerContract; fix by
replacing those zero-value constants with the correct deployed L1 sequencer
contract addresses (populate MainnetL1SequencerContractAddress and
HoodiL1SequencerContractAddress with their real hex strings) or alternatively
change the preset initialization code that uses these constants (e.g., where the
preset for l1.sequencerContract is applied in the startup logic) to always
supply a non-zero default when the preset is selected.
---
Outside diff comments:
In `@node/sync/config.go`:
- Around line 52-57: The validation currently checks
len(c.L1MessageQueueAddress.Bytes()) which is always 20 and thus accepts the
zero address; change the check to explicitly detect the zero address by
comparing the parsed addr to an empty common.Address (e.g. addr ==
common.Address{}), and return the error from the same block where you set
c.L1MessageQueueAddress (the ctx.GlobalIsSet / common.HexToAddress /
c.L1MessageQueueAddress assignment). Ensure NewSyncer then never receives a zero
address.
---
Nitpick comments:
In `@token-price-oracle/go.mod`:
- Line 5: The require directive for github.com/morph-l2/go-ethereum is out of
sync with the replace directive: update the require entry to match the replace
pseudo-version (v0.0.0-20260603075727-e0a2cd340919) so both directives reference
the same commit (e0a2cd340919) and remove the potential version-drift confusion
when resolving dependencies.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 51dc018b-025f-4cd8-9042-94f44bf86dbb
⛔ Files ignored due to path filters (9)
bindings/go.sumis excluded by!**/*.sumcommon/go.sumis excluded by!**/*.sumcontracts/go.sumis excluded by!**/*.sumnode/go.sumis excluded by!**/*.sumops/l2-genesis/go.sumis excluded by!**/*.sumops/tools/go.sumis excluded by!**/*.sumoracle/go.sumis excluded by!**/*.sumtoken-price-oracle/go.sumis excluded by!**/*.sumtx-submitter/go.sumis excluded by!**/*.sum
📒 Files selected for processing (17)
bindings/go.modcommon/go.modcontracts/go.modnode/cmd/node/main.gonode/core/config.gonode/derivation/config.gonode/derivation/derivation.gonode/flags/flags.gonode/go.modnode/sync/config.gonode/types/networks.goops/l2-genesis/go.modops/l2-genesis/morph-chain-ops/genesis/batch.goops/tools/go.modoracle/go.modtoken-price-oracle/go.modtx-submitter/go.mod
💤 Files with no reviewable changes (1)
- node/core/config.go
✅ Files skipped from review due to trivial changes (1)
- oracle/go.mod
🚧 Files skipped from review as they are similar to previous changes (10)
- ops/tools/go.mod
- contracts/go.mod
- common/go.mod
- ops/l2-genesis/go.mod
- bindings/go.mod
- tx-submitter/go.mod
- node/derivation/config.go
- node/go.mod
- node/derivation/derivation.go
- node/cmd/node/main.go
| } else if ctx.GlobalBool(flags.MainnetFlag.Name) { | ||
| addr := types.MainnetSyncDepositContractAddress | ||
| c.L1MessageQueueAddress = &addr | ||
| } else if ctx.GlobalBool(flags.HoodiFlag.Name) { | ||
| addr := types.HoodiSyncDepositContractAddress | ||
| c.L1MessageQueueAddress = &addr |
There was a problem hiding this comment.
Reject --mainnet and --hoodi together.
This else if chain silently makes --mainnet win when both presets are set. That's a bad failure mode for env-driven deployments: the process looks healthy but syncs against the wrong L1 queue contract.
Suggested fix
+ if ctx.GlobalBool(flags.MainnetFlag.Name) && ctx.GlobalBool(flags.HoodiFlag.Name) {
+ return errors.New("mainnet and hoodi are mutually exclusive")
+ }
+
if ctx.GlobalIsSet(flags.SyncDepositContractAddr.Name) {
addr := common.HexToAddress(ctx.GlobalString(flags.SyncDepositContractAddr.Name))
c.L1MessageQueueAddress = &addr
if addr == (common.Address{}) {
return errors.New("invalid SyncDepositContractAddr")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@node/sync/config.go` around lines 58 - 63, The flags handling for Mainnet and
Hoodi currently uses an else-if chain (checking
ctx.GlobalBool(flags.MainnetFlag.Name) then
ctx.GlobalBool(flags.HoodiFlag.Name)) so if both are set Mainnet silently wins;
change the logic in the same block to explicitly detect when both
ctx.GlobalBool(flags.MainnetFlag.Name) and ctx.GlobalBool(flags.HoodiFlag.Name)
are true and return an error (or exit) instead of proceeding, otherwise set
c.L1MessageQueueAddress to the appropriate address
(types.MainnetSyncDepositContractAddress or
types.HoodiSyncDepositContractAddress) when only one is set; reference the
existing symbols ctx.GlobalBool, flags.MainnetFlag.Name, flags.HoodiFlag.Name
and c.L1MessageQueueAddress to locate and update the logic.
| MainnetL1SequencerContractAddress = common.HexToAddress("") | ||
|
|
||
| // L1 Hoodi Contract Addresses | ||
| HoodiRollupContractAddress = common.HexToAddress("0x57e0e6dde89dc52c01fe785774271504b1e04664") | ||
| HoodiSyncDepositContractAddress = common.HexToAddress("0xd7f39d837f4790b215ba67e0ab63665912648dbe") | ||
| HoodiL1SequencerContractAddress = common.HexToAddress("") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether the zero-address defaults are wired into runtime config.
rg -n -C2 'MainnetL1SequencerContractAddress|HoodiL1SequencerContractAddress|l1\.sequencerContract|L1 Sequencer contract address is required'Repository: morph-l2/morph
Length of output: 2108
Fix zero-value L1 sequencer contract defaults (breaks --mainnet/--hoodi).
node/types/networks.go sets MainnetL1SequencerContractAddress and HoodiL1SequencerContractAddress to common.HexToAddress("") (zero). In node/cmd/node/main.go, when --mainnet/--hoodi are selected and --l1.sequencerContract isn’t explicitly provided, the zero address flows into the required-contract check and startup fails with L1 Sequencer contract address is required, check l1.sequencerContract configuration. Populate these constants with real contract addresses or ensure the presets always set --l1.sequencerContract.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@node/types/networks.go` around lines 13 - 18,
MainnetL1SequencerContractAddress and HoodiL1SequencerContractAddress are set to
common.HexToAddress("") which yields the zero address and causes the
required-contract check to fail when presets (--mainnet/--hoodi) are used
without an explicit --l1.sequencerContract; fix by replacing those zero-value
constants with the correct deployed L1 sequencer contract addresses (populate
MainnetL1SequencerContractAddress and HoodiL1SequencerContractAddress with their
real hex strings) or alternatively change the preset initialization code that
uses these constants (e.g., where the preset for l1.sequencerContract is applied
in the startup logic) to always supply a non-zero default when the preset is
selected.
Summary by CodeRabbit
New Features
Tests