This repository was archived by the owner on Dec 5, 2021. It is now read-only.
[pull] develop from ethereum-optimism:develop#573
Open
pull[bot] wants to merge 10000 commits into
Open
Conversation
…ctivation block (#20716) * fix(op-supernode): guard executing-message verifier against interop activation block verifyExecutingMessage previously checked timestamp ordering and expiry only, never consulting i.activationTimestamp. Both canonical fault-proof verifiers reject executing messages whose executing or initiating block is pre-activation or in the activation block of the relevant chain — kona's MessageGraph::check_single_dependency (rust/kona/crates/protocol/interop/src/ graph.rs) and op-program's cross.CrossUnsafeHazards → depset.LinkChecker. CanExecute (op-supervisor/supervisor/backend/depset/links.go, imported as library code) — so supernode diverges from the FP path at the activation boundary and accepts blocks the FP would replace with deposits-only. Adds the symmetric pair of guards inside verifyExecutingMessage, matching kona PR #20550's scope. Refs issue #20684. Adds six ActivationBoundary/* table rows under TestVerifyInteropMessages (four guard-firing, two positive controls) that fail on the un-guarded verifier and pass once the guards are in place. * fix(op-supernode): shorten activation-guard comments Address review: drop file/line refs from the activation-invariant comment in verifyExecutingMessage and from the boundary-test block in algo_test.go. * fix(op-supernode): drop PR-description reference from test comment * fix(op-supernode): document new activation-boundary checks in verifyExecutingMessage --------- Co-authored-by: wwared <541936+wwared@users.noreply.github.com>
…P-7904) (#20689) Split the single gas constant in each FPVM-accelerated precompile into two named values: the L2 gas charge (unchanged) and the oracle/L1 staticcall gas required by EIP-7904, sent in the L1Precompile preimage hint. Without this fix, once Glamsterdam activates on L1, the `loadPrecompilePreimagePart` staticcall would silently OOG for KZG, BLS12-381 G1Add, G2Add, and bn256 pairing because the requiredGas embedded in the preimage key would be below the post-7904 L1 cost. Higher-than-needed oracle gas is always safe, so this can deploy proactively. L2 gas charging (`EthPrecompileOutput::gas_used`) and the OOG guards continue to use the existing values, preserving state-root agreement with op-reth pre-revm-bump. Affected (EIP-7904 L1 cost → embedded in oracle hint): - KZG point eval (0x0a): 89_363 - BLS12-381 G1Add (0x0b): 643 - BLS12-381 G2Add (0x0d): 765 - bn256 pairing (0x08): 45_000 + 34_103 × k Mirrors #19381 for kona-proof. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gives clearer semantics and prevents, no change in behaviour other than logging
* feat(op-builder): vendor op-rbuilder:op-builder/v0.2.13 * feat(rollup-boost): vendor rollup-boost:rollup-boost/v0.7.11 * _ * fix path * ci(rust): build op-rbuilder and rollup-boost as vendored dirs, not submodules Adds rust-build-vendored job that hashes the directory tree via git ls-tree instead of reading a submodule gitlink SHA, and skips the git submodule update --init step since the code is checked in. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci(rust): fix op-rbuilder/rollup-boost binary paths The rust-build-vendored job saves binaries flat into .circleci-cache/rust-binaries/, so the env vars should not include the spurious rust/ subdirectory prefix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Einar Rasmussen <einar@oplabs.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ate (#20734) * chore(op-reth/proofs): consistent `trie::*` log targets across the crate * align prune logging
#19758) * fix(op-node,op-batcher): fix maxSafeLag stall, resume, and channel timeout The `--sequencer.max-safe-lag` feature was effectively non-functional due to a state-management bug in the sequencer, compounded by a missing timeout re-check in the batcher. ## Root Cause (op-node) In `onForkchoiceUpdate`, the maxSafeLag check ran **before** the head-advancement block. When a new block was confirmed (`UnsafeL2Head > latestHead`), the head-advancement block unconditionally set `nextActionOK = true`, immediately overriding the `nextActionOK = false` set by the maxSafeLag check. The sequencer continued producing blocks regardless of the safe-head lag. ## Fix (op-node) Move the maxSafeLag check to **after** the head-advancement block so it can properly override `nextActionOK = true`. Additionally: - **Stall/resume tracking**: Introduce a `stalledByMaxSafeLag` flag so the recovery branch only resumes sequencing when the stall was caused by maxSafeLag specifically. Without this, the resume logic would interfere with other `nextActionOK = false` states (pipeline reset, L1-derivation backoff, mid-seal wait). - **Safe head catch-up recovery**: When the safe head catches up (gap drops below maxSafeLag), the sequencer automatically resumes via the `else if d.stalledByMaxSafeLag` branch. Uses `d.active.Load()` to match the resume pattern used throughout the file. - **Runtime disable**: If an operator sets `maxSafeLag = 0` while the sequencer is stalled, the outer `else if d.stalledByMaxSafeLag` branch detects this and resumes immediately, preventing a permanent stall. - **Lifecycle cleanup**: `stalledByMaxSafeLag` is cleared in `forceStart`, `onReset`, and `Stop` to prevent stale state across lifecycle transitions. ## Root Cause (op-batcher) When the sequencer stalls and stops producing blocks, the batcher quickly consumes all pending blocks into a channel (`pendingBlocks = 0`). In `getReadyChannel`, the `pendingBlocks() == 0` early-return path skips `registerL1Block()` — the only call site that checks channel duration timeout. The channel never times out, never closes, and the data is never submitted to L1, leaving the safe head permanently stuck. ## Fix (op-batcher) When `pendingBlocks == 0` but a non-full channel exists, still call `registerL1Block()` to re-evaluate the channel duration timeout. If the channel times out, flush it immediately. Respects the `ignoreMaxChannelDuration` flag for consistency with the normal path. ## Scenarios Covered | Scenario | Behavior | |----------|----------| | Gap exceeds maxSafeLag | Sequencer stalls, stops producing blocks | | Safe head catches up | Sequencer auto-resumes | | Batcher running, sequencer stalled | Channel times out via batcher fix, data submitted, safe head advances | | Batcher restarted during stall | New channel created, data submitted quickly | | maxSafeLag disabled at runtime | Sequencer resumes immediately | | Pipeline reset during stall | Flag cleared, reset proceeds normally | | Conductor failover | forceStart clears flag, new leader starts clean | Fixes #17936 * refactor(op-batcher): move channel timeout check before pendingBlocks early return * test(op-acceptance-tests): add maxSafeLag stall/resume acceptance test Adds an acceptance test that verifies sequencer.max-safe-lag behavior: - sequencer stalls when unsafe/safe gap exceeds maxSafeLag - sequencer auto-resumes once safe head catches up Supporting devstack additions: - L2CLSequencerMaxSafeLag option to configure max-safe-lag via WithGlobalL2CLOption (mirrors existing L2CLSequencer/L2CLIndexing) - NewMinimalNoFaultProofs preset (and underlying sysgo runtime variant) that skips the proposer and challenger, following the existing NewMinimalWithConductors pattern. This avoids requiring cannon prestate artifacts in local test runs for tests that only exercise the sequencer + batcher + derivation loop. * chore: trigger ci * refactor(op-batcher): rename toBeAddedBlocks to havePendingBlocks Address review feedback: clearer name for the boolean tracking whether pendingBlocks() > 0 in getReadyChannel. * test(op-acceptance-tests): skip max-safe-lag test on kona-node The max-safe-lag stall/resume logic lives in op-node's Go sequencer. kona-node has its own sequencer implementation that is out of scope for this regression test, so skip the test in the kona-node CI matrix variant. Also reformat the godoc list to satisfy goimports.
* refactor(op-core/nuts): write fork_lock.toml entries chronologically Iterate forks.All when encoding so adding a new fork doesn't reshuffle existing entries. Map iteration order was alphabetical, which would push new forks above older ones in the file. * feat(op-core/nuts): commit Interop NUT bundle and embed Captures the forge-script output as the canonical Interop bundle and exposes it to op-node/kona consumers. --------- Co-authored-by: maurelian <maurelian@protonmail.com>
…ers (#20652) The preset's initial CrossSafe match for ELSync-mode verifiers was a fixed 120-attempt poll (240s). Under CI resource contention, op-geth's beacon-driven EL sync sometimes does not complete within that budget — the verifier's unsafe head keeps advancing via CL gossip every 2s while the safe head stays at 0 because the EL is still snap-syncing historical blocks. The whole budget then burns and the test fails at setup. Replace the fixed-attempt poll with a progress-aware wait: keep polling for up to 8 minutes, but fail fast (within 30s) if the verifier's LocalUnsafe head stops advancing. LocalUnsafe is driven by CL P2P gossip and is independent of EL snap-sync, so a stall there means the test setup is genuinely stuck and more waiting will not help. Successful runs still finish in tens of seconds; only the worst-case CI runs use the extended window. Refs #20649.
* feat(op-node): add follow source success metric Add follow_source_successes_total counter to track successful follow source updates, complementing the existing error metric. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(op-node): fix goimports formatting in metrics Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d_hash / fetch_output_block_hash (#20724) * fix(kona/client): validate output-root version word in fetch_safe_head_hash / fetch_output_block_hash Both call sites previously sliced `output_preimage[96..128]` as the L2 block hash without checking the version word at `[0..32]`. Today only `OutputVersionV0` (the zero word) is defined; op-program's equivalent rejects any non-V0 word via `ErrInvalidOutputVersion`. The downstream defenses already refuse a hypothetical V1 claim, so this is not a consensus fix — the goal is forensic: surface "unknown output version" explicitly instead of masking it as a generic `InvalidClaim` later in the pipeline. Adds `OracleProviderError::UnknownOutputVersion(B256)` and a unit test on `fetch_safe_head_hash` that fails on the pre-fix code (returns `Ok(B256::ZERO)`) and passes after. * fix(kona/client): reject malformed output-root preimage length in fetch_output_block_hash Addresses the review nit on #20724: `fetch_output_block_hash` only guarded the version word, so a preimage shorter than 32 bytes silently fell through to the `[96..128]` slice and panicked, and longer-than-128 preimages were read past their meaningful payload. Add an explicit length-128 check that returns `Preimage(BufferLengthMismatch(128, n))`, matching the behavior that `single::fetch_safe_head_hash` already gets for free from `get_exact`. Test reorganization: * Move shared `MockOracle` from inline in `trace_extension.rs` into `tests/common/mod.rs` so version + length tests can share it. * Add `tests/output_root.rs` covering both checks (version word and preimage length) for each helper — `fetch_safe_head_hash` and `fetch_output_block_hash` — so regressions in either function are caught independently. * Bump `interop::util` and `fetch_output_block_hash` to `pub` to mirror the existing surface for `single::fetch_safe_head_hash`; the lib has no external consumers beyond these integration tests. --------- Co-authored-by: wwared <541936+wwared@users.noreply.github.com>
…s-safe head (#20769) * test: migrate TestL2ReorgAfterL1Reorg to supernode to reproduce cross-safe stall Migrate the L1 reorg test from NewSimpleInterop (supervisor) to NewTwoL2SupernodeInterop (supernode) to demonstrate a cross-safe head stall after deep L1 reorgs. The shallow reorg subtest (n=3) passes: the supernode rewinds one timestamp at a time and eventually recovers. The deep reorg subtest (n=10) fails: cross-safe permanently stalls because the batcher enters an infinite out-of-sync loop after the supernode resets currentL1 to zero during rewind. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(op-node): ignore non-canonical super authority safe head * fix(op-node): use finalized fallback for stale super authority safe * fix(op-node): tolerate unknown super authority safe head * fix(op-node): use super authority finalized safe fallback * test: fix race in TestL2ReorgAfterL1Reorg unsafe subtest (#20775) The unsafe (n=3) subtest captured crossSafeRef and localSafeRef after the manual L1 sequencing loop, so their L1 origins could land in the to-be-reorged window and the "should still be canonical" post-checks would flake when timing shifted them past the divergence point. Split the helper to run a pre-early callback before sys.L1CL.Stop(), where L1 origins are guaranteed to be in the pre-divergence prefix, and capture the stable refs there. The n=10 subtest expects all refs reorged, so it captures them after sequencing as before. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: wwared <541936+wwared@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Karl Floersch <karl@oplabs.co>
* test(op-interop-filter): add integration suite for logsdb contract Adds 52 tests against the real on-disk logsdb covering the op-interop-filter ↔ logsdb boundary: rejection-reason classification through Backend.CheckAccessList, GetExecMsgsAtTimestamp round-trip, accessor behaviour, ingest happy/error paths, reorg recovery, init/resume, restart durability, and backend failsafe wiring. The suite is structured so it should pass against any logsdb implementation honouring the supervisor error sentinel contract — no mocks of logsdb. Cases that the real DB can't produce under correct ingester control flow (injected ErrDataCorruption / ErrConflict / ErrOutOfOrder etc. from AddLog/SealBlock) remain covered by the existing mock-based unit tests and are inventoried in plans/logsdb/op-interop-filter-test-plan.md. Also drops the old RealLogsDB happy-path test — fully subsumed by the new sequential-ingest and accessor coverage. All tests run with t.Parallel() and t.TempDir(). * test(op-interop-filter): cover SealBlock backwards-timestamp contract Adds TestIntegration_Ingest_BackwardsTimestamp_TripsBackendFailsafe — the one ca26218 ("align raft-wal LogsDB error behavior") harmonization change reachable through op-interop-filter's normal control flow. The other items affect paths the ingester pre-guards against and remain covered at the logsdb package level. writeFetchedBlock pre-checks block number and parent hash but not block timestamp, so a block whose timestamp regresses reaches the logsdb's SealBlock. The behavioural contract is that the resulting failure must trip Backend failsafe so subsequent CheckAccessList requests are rejected with the failsafe label — a logsdb that returns ErrOutOfOrder instead would silently retry and the operator would never be alerted. * test(op-interop-filter): consolidate duplicates into integration suite Removes 20 unit tests in logsdb_chain_ingester_test.go and backend_test.go whose coverage is now provided by the integration suite, after first plugging the gaps the unit tests caught that the integration suite did not: - Parameterised the RPC-fetch-error test over both InfoByNumber and FetchReceipts injection points (replaces RPCError + ReceiptsError). - Added ErrorEncounteredMidRange_StopsAndReportsBlock, distinct from AfterIngesterError_SubsequentIngestsSkipped (the latter pre-sets the error, this exercises an error encountered partway through a concurrently-fetched range). - Added Contains / GetExecMsgsAtTimestamp / RewindToFinalized BeforeInit_Uninitialized cases. - Extended RecoverReorg_HappyPath to assert the returned timestamp and the post-rewind applyPendingRewind resume value. - Added Backend_UnsupportedSafetyLevel_Rejected and Backend_EmptyAccessList_LocalUnsafe_Accepted. Kept: TestLogsDBChainIngester_ErrorTypes (enum stringer), the range-ordering and progress-metric tests, init/sealParent/Contains focused unit tests, and the cross-validator-specific failsafe + Ready tests — all of which exercise paths or APIs that the integration suite does not. * test(op-interop-filter): pin write-path sentinel dispatch via LogsDB interface writeFetchedBlock dispatches errors from processBlockLogs to IngesterError states (ErrConflict -> ErrorConflict, ErrDataCorruption -> ErrorDataCorruption, ErrInvalidLog -> ErrorInvalidExecutingMessage). The real on-disk logsdb cannot produce these sentinels from AddLog/SealBlock under correct ingester control flow because writeFetchedBlock pre-checks block number and parent hash before calling either method, so the integration suite can't exercise them. Introduces a LogsDB interface (the subset of *logs.DB that LogsDBChainIngester depends on) and a fakeLogsDB satisfying it. Two table-driven tests pin the positive dispatch (ErrConflict / ErrDataCorruption from each write method) and the negative passthrough (ErrFuture / ErrSkipped / ErrOutOfOrder / generic must not set IngesterError).
…ram (#20717) * fix(kona-client/interop): mirror SuperRoot trace-extension arm on TransitionState prestate When the agreed pre-state is a `PreState::TransitionState` and `transition_state.pre_state.timestamp >= claimed_l2_timestamp`, the interop `run()` previously short-circuited to `Err(InvalidClaim)` unconditionally, regardless of whether `claimed_post_state == agreed_pre_state_commitment`. The parallel `PreState::SuperRoot` arm already returned `Ok(())` in the matching-claim case (trace extension). This commit extends the TransitionState arm to mirror that behavior, bringing kona-client into parity with op-program's `stateTransition`/`ValidateClaim` semantics at the `>=` boundary on sub-case A (`T == GT AND claim == prestate`). Adds three integration tests in `bin/client/tests/interop_trace_extension.rs`: sub-case A (RED on baseline, GREEN after fix), sub-case B (fail-closed regression guard), and sub-case C-eq (symmetric strict-`>` half). * test(kona-client/interop): trim verbose comments from trace-extension tests * fix(kona-proof-interop/boot): reject future-timestamped prestate (#20727) Add an `assert!` in `BootInfo::load` rejecting any agreed pre-state whose timestamp exceeds `claimed_l2_timestamp`. The honest actor never agrees to such a pre-state; op-program panics on the same condition (see `op-program/client/interop/interop.go:87-97`). Without this guard, a malicious proposer could register a future-timestamped SuperRoot or TransitionState preimage (the oracle only verifies `key == keccak256(preimage)`, not the timestamp inside) and commit the same hash as both starting and disputed claim at trace-extended bisection positions, where kona's `claim == prestate => Ok(())` arm would resolve as `vmStatus = VALID`. With the guard, both arms of `interop::run` only need to handle the legitimate `==` boundary; tighten `>=` to `==` accordingly to make intent explicit. Tests: - Flip `trace_extension_transition_state_past_game_timestamp_accepts_matching_claim` to `#[should_panic]`; its previous assertion pinned the buggy lenient behavior. The flipped version is now the regression guard for the TransitionState arm. - Add `rejects_super_root_with_timestamp_after_game_timestamp` as the symmetric guard for the SuperRoot arm. - Refactor `setup_interop_preimages` to take a `PreState` so both arms reuse the fixture. Resolves the "narrow both kona arms to `==`" follow-up flagged in #20717. * test(kona-client/interop): cover SuperRoot ==-boundary trace-extension cases Adds the SuperRoot-arm counterparts of the existing TransitionState `==` trace-extension tests. Without them, a future refactor that breaks the SuperRoot `==` arm in `bin/client/src/interop/mod.rs` would be caught only by the strict-`>` panic test, leaving the consensus-critical `T == GT` boundary unguarded for the SuperRoot variant. - trace_extension_super_root_at_game_timestamp_accepts_matching_claim asserts `Ok(())` when `super_root.timestamp == claimed_l2_timestamp` and `claim == prestate_commitment`. - trace_extension_super_root_at_game_timestamp_rejects_mismatched_claim asserts `Err(InvalidClaim)` when the timestamps match but the claim differs from the prestate commitment. Reuses the existing `setup_interop_preimages` fixture which already takes a `PreState`, so no production or fixture changes. * fix(kona-client/interop): Use realistic TransitionState in unit tests Co-authored-by: Inphi <mlaw2501@gmail.com> --------- Co-authored-by: wwared <541936+wwared@users.noreply.github.com> Co-authored-by: Rodrigo Araújo <rod.dearaujo@gmail.com> Co-authored-by: Inphi <mlaw2501@gmail.com>
) (#20733) * fix(op-supernode): anchor startup backfill to EL finalized head * test(op-supernode): cover logsDB-ahead reconcile and act==genesis backfill clamp * fix(op-supernode): resume from verifiedDB unconditionally on warm restart Gating warm-restart resume on EL finalized rejects the normal case where cross-safe (and therefore verifiedDB.LastTimestamp) is ahead of finalized, and it also blocks sequencer-side liveness — the sequencer must produce blocks before L1 finality can advance. When verifiedDB is initialized, resume at LastTimestamp + 1 unconditionally. If EL finalized happens to be ahead (e.g. EL restored from snapshot while supernode was offline), the main loop rolls forward through it; finalized blocks cannot contain invalid exec messages. On cold start, anchor on min EL finalized head + 1 (clamped to activation). A finalized head at genesis (Number == 0) with a real hash is a valid anchor — only reject the zero-value response that signals an EL that isn't ready yet. The previous "Number == 0" guard rejected genesis and caused op-e2e action tests to hang forever during supernode startup. The "verifiedDB ahead of canonical L2" failure mode (e.g. a local L2 reorg invalidating a prior commit) cannot be detected via finalized — that needs a tip-hash check against the EL block-by-number lookup, out of scope here. * test(op-supernode): use valid hash in genesis EL finalized test The previous string "0xgenesis" is not valid hex and decoded to the zero hash; Time: 50 was the only thing making the L2BlockRef non-zero, so the case did not actually cover "genesis with a real hash". --------- Co-authored-by: Karl Floersch <karl@oplabs.co>
* feat(op-supernode): add raft-wal-backed LogsDB implementation Adds a new raftwallogdb sub-package implementing the LogsDB interface on top of hashicorp/raft-wal. Each sealed block is stored as a single raft-wal entry whose payload is a fixed-width binary header followed by a logHash array and an execMsg array, so Contains is an O(1) memcpy regardless of how many logs the block carries. StoreLog fsyncs the entry to disk before returning, so SealBlock is durable on return and atomic with any pending logs buffered by AddLog. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(op-supernode,op-interop-filter): use raft-wal LogsDB Swaps op-supernode and op-interop-filter from the op-supervisor logs.DB implementation to the new raft-wal-backed raftwallogdb.DB. The op-supervisor implementation is untouched. The raft-wal store is directory-rooted rather than a single file, so the ingester now passes the chain directory to Open instead of a logs.db path. The no-op metrics adapters previously required by logs.DB are removed; raftwallogdb does not surface metrics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(raftwallogdb): satisfy errorlint and gofmt errorlint requires every wrapped inner error to use %w rather than %v in fmt.Errorf. gofmt wanted the const block aligned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(raftwallogdb): clarify the two logIdx fields inside an execMsg record The first logIdx is the *local* slot in this block that carries the executing message; the second is the *source-chain* log index of the initiating message it points at. They are not echoes of each other. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: drop reads.Invalidator from LogsDB Rewind/Clear No production caller wires in a real Invalidator — op-supernode passes a hand-rolled noopInvalidator and op-interop-filter passes reads.NoopRegistry. The Invalidator parameter and the synthetic DerivedInvalidation rule built inside Rewind/Clear were dead weight inherited from op-supervisor's logs.DB API. This drops the unused parameter from the LogsDB interface, the raftwallogdb implementation, every test mock that has to satisfy LogsDB, and the call sites. op-supernode no longer imports op-supervisor/.../reads at all; the only remaining op-supervisor dependencies are 'types' (BlockSeal, ContainsQuery, ExecutingMessage, error sentinels) and 'processors' (helper functions that decode receipts into log hashes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: go mod tidy Promotes hashicorp/raft-wal from indirect to direct require (it is now imported by op-supernode's raftwallogdb package), and prunes stale entries that were left behind when the reads.Invalidator wiring was removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix lint * test(raftwallogdb): expand coverage to match op-supervisor's LogsDB Brings raftwallogdb behavioural coverage up to (or above) what op-supervisor's logs.DB test suite asserts about the LogsDB contract. The new tests cover: - Empty-DB behaviour across every read entry point - SealBlock validation: parent-hash mismatch, wrong number, timestamp regression - AddLog validation: bad parent, non-zero first index, duplicate index, skipped index - Contains: every error path (wrong checksum, wrong timestamp, out-of-range logIdx, future block with future / past timestamp, block 0) - FindSealedBlock: ErrSkipped below first, ErrFuture above latest - OpenBlock: boundary checks plus block 0 happy-path - Multi-block roundtrip across 10 blocks with mixed executing-messages - Rewind edges: empty DB, at-latest no-op, above-latest ErrFuture, before-first clears, at-first keeps it, pending-buffer-dropped semantics - Clear on populated and empty DBs - Persistence across close/reopen and pre-seal-crash buffer loss - blockRecord and execMsg fixed-width encoding roundtrip Supervisor tests around checkpoint placement, file-format recovery, and the internal iterator API are intentionally not mirrored — they cover storage- engine internals that have no analogue in the raft-wal implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(raftwallogdb): match logs.DB Contains semantics and broaden coverage Drop the special-case ErrConflict for BlockNum==0 in Contains so behavior matches op-supervisor's logs.DB (empty DB → ErrFuture; populated DB flows through the normal logIdx/timestamp checks). Add tests for: block 0 handling, multiple ExecutingMessages per block, last-index Contains boundary, and reopen-after-Rewind / reopen-after-Clear persistence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(raftwallogdb): align error contract with op-supervisor logs DB Audit of error-type divergences between the original op-supervisor logs DB and the new raft-wal-backed implementation surfaced several behaviour differences that downstream consumers (op-supernode, op-interop-filter) implicitly depend on. Reconciled to match the old contract: - Pass-through transient read errors from GetLog/readBlockAt instead of wrapping them as ErrDataCorruption. Real structural corruption (short buffer in decodeBlockRecord, length checks on entry data) keeps the ErrDataCorruption tag. - OpenBlock returns ErrSkipped at blockNum == firstBlock when firstBlock is non-zero, matching the old DB so op-supernode's algo.go fallback remains a no-op replica of the success path. Tracked for cleanup in #20726. - AddLog rejects parentBlock == eth.BlockID{} with ErrOutOfOrder. Genesis blocks cannot carry receipts; structurally enforces the invariant at the write boundary so Contains does not need a read-path guard. - AddLog parent-identity mismatches return ErrOutOfOrder (was ErrConflict). Matches old DB. Prevents op-interop-filter's ErrorConflict failsafe from tripping on transient state-disagreement. - SealBlock block-number gap and backwards-timestamp now return ErrConflict (were ErrOutOfOrder). Matches old DB. Restores op-interop-filter's failsafe trip on structural desync. - Removed the dead raft.ErrLogNotFound tolerance in clearLocked. raft-wal's DeleteRange does not produce that sentinel — it returns nil for ranges outside the stored log. Test fixtures using blockID(0, 0x00) collided with eth.BlockID{} (real chains never have a zero genesis hash); switched to blockID(0, 0xA0). * refactor(raftwallogdb): centralize entry length validation in decodeSealedBlock Addresses review feedback on PR #20688: the truncation checks in Contains and OpenBlock duplicated the same offset arithmetic. decodeSealedBlock validates the full entry length up front (strict equality) and returns slices for the log-hash and execMsg regions so callers can index without re-checking bounds. * refactor(raftwallogdb): move entry accessors onto blockRecord decodeBlockRecord now validates the full entry length and populates slice views of the log-hash and execMsg regions on the returned record. New LogHash(i) and ExecMsg(i) methods expose them, removing the offset arithmetic from Contains and OpenBlock and dropping the standalone decodeSealedBlock helper. * fix(op-interop-filter): align LogsDB.Rewind signature with raftwallogdb The raftwallogdb.DB.Rewind has signature (eth.BlockID) error and no longer takes a reads.Invalidator. Update the LogsDB interface (and the fake in logsdb_dispatch_test) to match so *raftwallogdb.DB satisfies it and the package compiles. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
op-conductor no longer dials op-supervisor for health checks. Drop the supervisor.rpc flag/config, the SupervisorHealthAPI client and mock, the ErrSupervisorConnectionDown error path, and the associated tests.
* chore(op-proposer): remove op-supervisor proposal source The op-proposer can now drive super-root proposals only via supernode RPCs. Drops the SupervisorRpcs config field and --supervisor-rpcs flag, along with the unused supervisor-backed superproofs devstack preset and the interop e2e proposer wiring (which had no supernode option). * chore(op-proposer): drop now-unused supervisor super-root helper goimports cleanup after removing the supervisor superproofs path: deletes getSupervisorSuperRoot and reorders import groups.
TestWithdrawal_CannonKona flaked under op-reth with 'no state found for
block number N'. forGamePublished returns once the dispute game references
block N on L1, but op-reth persists L2 state asynchronously, so the
immediate follow-up eth_getProof for N races the persistence service flush.
Wrap GetProof in an Eventually that only retries on the specific
state-not-yet-persisted errors emitted by op-reth ('no state found for
block') and op-geth ('missing trie node'). Real errors still propagate
immediately. Refs #19964.
…ng (#20788) * perf(rust-e2e-restart): 1s L2 block time, shorter NotAdvanced window Drops L2 block time for the kona node-restart sysgo suite from the default (2s) to 1s, and trims TestSequencerRestart's NotAdvancedFn window from 50 to 20 slots. Both changes target the wall-clock dominated rust-e2e-restart CI job, which spends most of its ~24m runtime waiting for L2 blocks. * ci(rust-e2e): persist prebuilt rust binaries to workspace The cannon-kona-host, kona-build-release, and op-reth-build jobs build release binaries that downstream e2e jobs already require but do not consume from the workspace. Each downstream job then re-runs rust-build, restoring the rust target cache and re-linking the same binary (~9m). Set persist_to_workspace: true on the three builders and drop the redundant rust-build step from rust-e2e-sysgo-tests, rust-restart-sysgo-tests, op-reth-e2e-sysgo-tests, and kona-proof-action-tests. The downstream jobs already attach the workspace and reference $WD/rust/target/release/<binary>, so the persisted globs land at the expected path. * ci(rust-e2e): persist only from kona-build-release cannon-kona-host, kona-build-release, and op-reth-build write overlapping files into rust/target/release (kona-build-release builds the entire workspace, so it produces every binary). CircleCI rejects concurrent persists of the same file with 'Concurrent upstream jobs persisted the same file(s)'. Persist only from kona-build-release. The other two still build in parallel to prime caches but no longer persist. * ci(rust-e2e): drop redundant cannon-kona-host and op-reth-build jobs kona-build-release builds the entire rust workspace, so it already produces kona-host and op-reth alongside kona-node. Running cannon-kona-host and op-reth-build in parallel was duplicate work — both built subsets of what kona-build-release produces, and they cannot persist to the workspace without colliding with it. Drop the two jobs entirely and route their former consumers to kona-build-release. * ci(rust-e2e): drop redundant rust build jobs kona-build-release builds the entire rust workspace, so it already produces kona-host and op-reth alongside kona-node. The parallel cannon-kona-host and op-reth-build jobs were rebuilding subsets of the same output. They cannot persist alongside it without colliding on rust/target/release/* paths. Drop both, route all consumers to a single workspace build, and rename it to rust-workspace-release to reflect that it produces the full set of release binaries — not just kona.
…Converge (#20729) * fix(op-acceptance-tests): stabilize TestFollowSource_HeadsDivergeThenConverge Cross-safety only commits at L2 timestamps where every chain in the interop set has a local-safe block. With a 2s block time, stopping two sequencers via sequential RPCs lets the leader sneak in one extra block while the laggard is still being told to stop (the gap was ~80ms in the original CI failure, well over the inter-block window in the race). The laggard then never produces a block at that timestamp, so the supernode verifier stalls at the laggard's last timestamp. The follower for the leader chain ends up with local-safe at the extra block but cross-safe stuck a block behind, and the convergence wait in the test times out. Replace the two sequential StopSequencer calls with a new dsl.StopSequencersSynced helper that issues all stops concurrently and then aligns the chains: any sequencer below the maximum unsafe head is restarted just long enough to produce the missing block(s) and stopped again. The alignment loop is bounded so a real bug fails fast instead of hanging. Chains in the same interop set share a genesis time and block time, so equal unsafe numbers imply equal timestamps and the verifier can advance. Refs #19821 * refactor(dsl): align StopSequencersSynced by timestamp, not block number The previous implementation aligned chains by unsafe-head block number, which only equals timestamp alignment when chains share genesis time AND block time. That assumption holds for the current acceptance-test presets but is not generally true for Superchain interop sets, where member chains can have different rollup configs. Cross-safety only cares about timestamps: the verifier walks one L2 timestamp at a time and requires every chain to have a local-safe block at that timestamp. Aligning by timestamp is therefore both correct and strictly more general. Drop the docstring claim about shared genesis/block time, and switch the leader/level-up comparison plus log fields to timestamps. No behaviour change for chains with identical configs (the only current call site). * refactor: align chains via TestSequencer instead of a new DSL helper The start/poll/stop level-up approach in StopSequencersSynced is fundamentally racy: a restarted sequencer produces a burst of catch-up blocks faster than any sane poll interval, so stopping at exactly the target block is unreliable. Use TestSequencer.SequenceBlock instead, which builds exactly one block at parent.Time + blockTime deterministically. The test now stops both sequencers and then drives the trailing chain up to the leader's timestamp with single-block SequenceBlock calls — no restart, no overshoot risk. Drop dsl.StopSequencersSynced entirely; it has no remaining callers and the TestSequencer path is strictly better for presets that expose one. * fix(op-acceptance-tests): Remove unnecessary test sequencer check --------- Co-authored-by: wwared <541936+wwared@users.noreply.github.com>
…fig (#20613) * refactor(circleci): implement compute-changes job for dynamic configuration Replaced path-filtering with a custom compute-changes job in CircleCI configuration to enhance flexibility in detecting changed paths. This update includes merging continuation configs and continuing the pipeline based on computed changes. Additionally, updated rust-e2e workflow to utilize a dedicated flag for E2E changes detection, improving clarity and functionality. * chore(circleci): integrate continuation orb for pipeline management Added the CircleCI continuation orb to streamline the continuation of pipelines. Updated the configuration to utilize the orb for continuing the pipeline with merged configurations and parameters, enhancing the overall efficiency of the CI/CD process. * fix(circleci): enhance YAML merging in merge-configs script Updated the merge-configs.sh script to utilize 'explode(.)' in the YAML merging process, ensuring that YAML anchors and aliases are resolved correctly before merging. This change prevents undefined alias references in the output, improving the reliability of the configuration merging. * feat(circleci): normalize boolean environment variables in compute-changes script Added a new function to convert boolean environment variables from CircleCI into JSON-safe strings for use with jq. Updated the compute-changes script to utilize this function, enhancing the handling of boolean flags in the configuration process. * feat(circleci): implement dynamic path detection in compute-changes script Enhanced the compute-changes script to automatically detect changes in specified paths using environment variables defined in the CircleCI configuration. This update simplifies the detection logic by iterating over DETECT_* variables, improving the flexibility and maintainability of the CI/CD process. * refactor(circleci): enhance compute-changes script for parameter collection Updated the compute-changes script to streamline the collection of environment variables by categorizing them into string and boolean types. The script now supports a more structured approach to detect changes in specified paths, improving the overall efficiency and maintainability of the CircleCI configuration process. * revert changes on rust-e2e.yml * Extract remaining conditions * centralizing conditions * simplify * remove unused * revert change * revert * improve script nd add excpetion for shell check * Add collect-params.sh script to gather pipeline parameters This script collects environment variables prefixed with 'c-' and outputs them in JSON format. It supports three modes: string, boolean, and detection against changed files in the git diff. The output is appended to /tmp/pipeline-parameters.json. * Refactor CircleCI configuration to use collect-params.sh for parameter collection Updated the CircleCI config to rename the job from compute-changes to prepare-continuation-config and replaced calls to compute-changes.sh with collect-params.sh for collecting string, boolean, and detection parameters. This change enhances clarity and maintains consistency in the parameter collection process. * Enhance CircleCI configuration with improved decision tree for job execution Refactored the CircleCI config to streamline the decision-making process for job execution based on the TRIGGER_SOURCE. Introduced a new function to handle Rust and documentation workflows based on detected file changes, improving clarity and maintainability of the CI pipeline. * Refactor CircleCI configuration to utilize workflow helper functions Introduced a new script, workflow-helpers.sh, to encapsulate helper functions for managing the workflow decision tree in the CircleCI config. This refactor simplifies the main config file by offloading JSON handling and decision logic, enhancing maintainability and clarity in the CI pipeline. * Update CircleCI continuation orb version to 2.0.1 for improved functionality * Enhance CircleCI configuration with detailed comments for the prepare-continuation-config job Added comprehensive comments to the CircleCI config to clarify the steps involved in preparing the dynamic continuation pipeline. This includes explanations for installing dependencies, collecting parameters, detecting file changes, and merging configurations, improving maintainability and understanding of the CI process. * Refactor CircleCI configuration to enhance webhook job execution logic Updated the CircleCI config to improve the decision-making process for webhook job execution. Added detailed comments outlining three distinct lifecycle stages: feature branch pushes, merge queue validations, and post-merge actions. This refactor enhances clarity and maintainability of the CI pipeline. * Refactor CircleCI configuration to streamline job execution commands Updated the CircleCI config to separate job execution commands for improved clarity and maintainability. This change enhances the organization of the CI pipeline by clearly delineating the execution of main, release, and feature test jobs across different branch scenarios. * Refactor CircleCI configuration for improved readability in job execution logic Updated the CircleCI config to enhance the formatting of job execution commands, making the structure clearer and more maintainable. This change improves the organization of conditional statements for contracts, Rust, and documentation tests across various branch scenarios. * Add decision tree testing to CircleCI configuration Introduced a new job in the CircleCI config to validate the decision tree logic against known scenarios. This addition ensures that any refactor will fail fast if it breaks expected workflow activation, enhancing the reliability of the CI pipeline. * Refactor CircleCI configuration to optimize job execution for documentation changes Updated the CircleCI config to prioritize documentation-only changes in the job execution logic. This enhancement ensures that when only documentation is modified, the CI pipeline runs the documentation tests without executing other jobs, improving efficiency and reducing unnecessary resource usage. * Add CODEOWNERS entries for CircleCI configuration files Updated the CODEOWNERS file to include ownership for CircleCI configuration and script files. This change assigns @raffaele-oplabs and @geoknee as code owners for the .circleci/config.yml and .circleci/scripts/test-decision-tree.sh files, ensuring proper oversight and accountability for these CI components. * Update shellcheck directive in test-decision-tree.sh to disable SC1091 warning Modified the shellcheck directive in the test-decision-tree.sh script to disable the SC1091 warning, which pertains to sourcing files that may not exist. This change improves the script's compatibility with shellcheck without altering its functionality. * revert code owner * Update CircleCI configuration to dynamically set BASE_REVISION based on pull request base reference Modified the BASE_REVISION environment variable in the CircleCI config to use the pull request's base reference instead of a static 'develop' value. This change enhances the flexibility of the CI pipeline by ensuring it accurately reflects the context of the pull request being built. * Update BASE_REVISION in CircleCI config to a static branch name for improved clarity * Add feature tests for contracts and Rust in CircleCI configuration Enhanced the CircleCI config by adding jobs to run contracts feature tests and Rust CI checks when specific conditions are met. This update improves the testing coverage and ensures that relevant tests are executed for changes in contracts and Rust code, contributing to a more robust CI pipeline. * Update BASE_REVISION in CircleCI config to 'develop' for consistency with branch naming
TestMultiELSync flakes because session.ResetSession was leaving the ELSyncPolicy.cache untouched. When L2CL2's verifier op-node manages to push a NewPayload + FCU to the SyncTester before sys.L2CL2.Stop() returns, the WindowSyncPolicy cache ends up holding the payload number from that in-flight call (commonly 2). ResetSession clears Validated, CurrentState, and Payloads, but the policy's private cache survives — so the test's first FCU(targetNum-2 = 3) immediately forms the consecutive sequence [2,3] required by cnt=2, the policy returns Valid, and session.Validated jumps to 3. The follow-up NewPayload(targetNum-1 = 4) then returns VALID instead of SYNCING, hitting the assertion at sync_test.go:60. The race is intermittent because it depends on how far L2CL2's engine driver has progressed when Stop() is invoked. Add Reset() to the ELSyncPolicy interface, implement it on WindowSyncPolicy by zeroing the sliding-window cache, and invoke it from SyncTesterSession.ResetSession so the policy starts from scratch alongside the rest of the session state. Add TestWindowSyncPolicy_Reset that primes the cache with 2, calls Reset, then observes 3 (must be SYNCING) and 4 (must be VALID). The test fails deterministically against a no-op Reset with the exact CI symptom (expected SYNCING, actual VALID), confirming the root cause. Update TestMultiELSync to call ResetSession after stopping L2CL2, mirroring the pattern in TestSyncTesterELSync, so the policy cache is empty when the test starts driving the engine API manually regardless of how much L2CL2 work landed before Stop() returned. Fixes #20780
Acceptance tests on memory-all-opn-op-reth (and the op-geth/kona variants) write devnet state to /tmp on the runner's real disk. When many in-process tests run in parallel, op-reth's mdbx commit waits on fsync and stalls — we saw commit_duration spike from ~6ms to 60-125s, causing context-deadline cascades that failed ~36 unrelated tests in one merge-queue run. Durability is irrelevant for CI, so preload libeatmydata.so for the whole op-acceptance-tests job. Every subprocess (op-reth, op-node, op-geth, supernode, go test, just, ...) inherits LD_PRELOAD via $BASH_ENV and has fsync/fdatasync/msync turn into no-ops.
) The workspace unification in #19034 moved op-reth from `reth/` to `rust/op-reth/` and consolidated all build output under `rust/target/`. Four path references in `rust/kona/tests/justfile` were never updated: - `build-reth` recipe: `cd ../../reth` -> `cd ../../op-reth` - `OP_RETH_EXEC_PATH` in `acceptance-tests-run`, `test-e2e-sysgo-run`, and `long-running-test`: `../../reth/target/debug/op-reth` -> `../../target/debug/op-reth` (unified workspace target directory) CI was unaffected because it pre-sets `OP_RETH_EXEC_PATH` to the correct `rust/target/release/op-reth` before invoking justfile recipes, and never calls `build-reth` directly. These broken paths only affect local development workflows (`just build-reth`, `just acceptance-tests`, `just test-e2e-sysgo`, `just long-running-test`). Related: #19569, #19929 Co-authored-by: wwared <541936+wwared@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…9874) Add SpanDecodingError::TxGases and map decode_tx_gases failures to it instead of TxNonces.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )