Skip to content
This repository was archived by the owner on Dec 5, 2021. It is now read-only.

[pull] develop from ethereum-optimism:develop#573

Open
pull[bot] wants to merge 10000 commits into
omgnetwork:developfrom
ethereum-optimism:develop
Open

[pull] develop from ethereum-optimism:develop#573
pull[bot] wants to merge 10000 commits into
omgnetwork:developfrom
ethereum-optimism:develop

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented Oct 13, 2021

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

wwared and others added 30 commits May 13, 2026 19:30
…ctivation block (#20716)

* fix(op-supernode): guard executing-message verifier against interop activation block

verifyExecutingMessage previously checked timestamp ordering and expiry only,
never consulting i.activationTimestamp. Both canonical fault-proof verifiers
reject executing messages whose executing or initiating block is pre-activation
or in the activation block of the relevant chain — kona's
MessageGraph::check_single_dependency (rust/kona/crates/protocol/interop/src/
graph.rs) and op-program's cross.CrossUnsafeHazards → depset.LinkChecker.
CanExecute (op-supervisor/supervisor/backend/depset/links.go, imported as
library code) — so supernode diverges from the FP path at the activation
boundary and accepts blocks the FP would replace with deposits-only.

Adds the symmetric pair of guards inside verifyExecutingMessage, matching
kona PR #20550's scope. Refs issue #20684.

Adds six ActivationBoundary/* table rows under TestVerifyInteropMessages
(four guard-firing, two positive controls) that fail on the un-guarded
verifier and pass once the guards are in place.

* fix(op-supernode): shorten activation-guard comments

Address review: drop file/line refs from the activation-invariant comment in
verifyExecutingMessage and from the boundary-test block in algo_test.go.

* fix(op-supernode): drop PR-description reference from test comment

* fix(op-supernode): document new activation-boundary checks in verifyExecutingMessage

---------

Co-authored-by: wwared <541936+wwared@users.noreply.github.com>
…P-7904) (#20689)

Split the single gas constant in each FPVM-accelerated precompile into two
named values: the L2 gas charge (unchanged) and the oracle/L1 staticcall
gas required by EIP-7904, sent in the L1Precompile preimage hint.

Without this fix, once Glamsterdam activates on L1, the
`loadPrecompilePreimagePart` staticcall would silently OOG for KZG, BLS12-381
G1Add, G2Add, and bn256 pairing because the requiredGas embedded in the
preimage key would be below the post-7904 L1 cost. Higher-than-needed
oracle gas is always safe, so this can deploy proactively.

L2 gas charging (`EthPrecompileOutput::gas_used`) and the OOG guards
continue to use the existing values, preserving state-root agreement with
op-reth pre-revm-bump.

Affected (EIP-7904 L1 cost → embedded in oracle hint):
- KZG point eval (0x0a): 89_363
- BLS12-381 G1Add (0x0b): 643
- BLS12-381 G2Add (0x0d): 765
- bn256 pairing (0x08): 45_000 + 34_103 × k

Mirrors #19381 for kona-proof.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gives clearer semantics and prevents, no change in behaviour other than
logging
* feat(op-builder): vendor op-rbuilder:op-builder/v0.2.13

* feat(rollup-boost): vendor rollup-boost:rollup-boost/v0.7.11

* _

* fix path

* ci(rust): build op-rbuilder and rollup-boost as vendored dirs, not submodules

Adds rust-build-vendored job that hashes the directory tree via
git ls-tree instead of reading a submodule gitlink SHA, and skips
the git submodule update --init step since the code is checked in.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci(rust): fix op-rbuilder/rollup-boost binary paths

The rust-build-vendored job saves binaries flat into
.circleci-cache/rust-binaries/, so the env vars should not include
the spurious rust/ subdirectory prefix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Einar Rasmussen <einar@oplabs.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ate (#20734)

* chore(op-reth/proofs): consistent `trie::*` log targets across the crate

* align prune logging
#19758)

* fix(op-node,op-batcher): fix maxSafeLag stall, resume, and channel timeout

The `--sequencer.max-safe-lag` feature was effectively non-functional due to
a state-management bug in the sequencer, compounded by a missing timeout
re-check in the batcher.

## Root Cause (op-node)

In `onForkchoiceUpdate`, the maxSafeLag check ran **before** the head-advancement
block. When a new block was confirmed (`UnsafeL2Head > latestHead`), the
head-advancement block unconditionally set `nextActionOK = true`, immediately
overriding the `nextActionOK = false` set by the maxSafeLag check. The sequencer
continued producing blocks regardless of the safe-head lag.

## Fix (op-node)

Move the maxSafeLag check to **after** the head-advancement block so it can
properly override `nextActionOK = true`. Additionally:

- **Stall/resume tracking**: Introduce a `stalledByMaxSafeLag` flag so the
  recovery branch only resumes sequencing when the stall was caused by
  maxSafeLag specifically. Without this, the resume logic would interfere with
  other `nextActionOK = false` states (pipeline reset, L1-derivation backoff,
  mid-seal wait).

- **Safe head catch-up recovery**: When the safe head catches up (gap drops
  below maxSafeLag), the sequencer automatically resumes via the
  `else if d.stalledByMaxSafeLag` branch. Uses `d.active.Load()` to match the
  resume pattern used throughout the file.

- **Runtime disable**: If an operator sets `maxSafeLag = 0` while the sequencer
  is stalled, the outer `else if d.stalledByMaxSafeLag` branch detects this and
  resumes immediately, preventing a permanent stall.

- **Lifecycle cleanup**: `stalledByMaxSafeLag` is cleared in `forceStart`,
  `onReset`, and `Stop` to prevent stale state across lifecycle transitions.

## Root Cause (op-batcher)

When the sequencer stalls and stops producing blocks, the batcher quickly
consumes all pending blocks into a channel (`pendingBlocks = 0`). In
`getReadyChannel`, the `pendingBlocks() == 0` early-return path skips
`registerL1Block()` — the only call site that checks channel duration timeout.
The channel never times out, never closes, and the data is never submitted to
L1, leaving the safe head permanently stuck.

## Fix (op-batcher)

When `pendingBlocks == 0` but a non-full channel exists, still call
`registerL1Block()` to re-evaluate the channel duration timeout. If the channel
times out, flush it immediately. Respects the `ignoreMaxChannelDuration` flag
for consistency with the normal path.

## Scenarios Covered

| Scenario | Behavior |
|----------|----------|
| Gap exceeds maxSafeLag | Sequencer stalls, stops producing blocks |
| Safe head catches up | Sequencer auto-resumes |
| Batcher running, sequencer stalled | Channel times out via batcher fix, data submitted, safe head advances |
| Batcher restarted during stall | New channel created, data submitted quickly |
| maxSafeLag disabled at runtime | Sequencer resumes immediately |
| Pipeline reset during stall | Flag cleared, reset proceeds normally |
| Conductor failover | forceStart clears flag, new leader starts clean |

Fixes #17936

* refactor(op-batcher): move channel timeout check before pendingBlocks early return

* test(op-acceptance-tests): add maxSafeLag stall/resume acceptance test

Adds an acceptance test that verifies sequencer.max-safe-lag behavior:
- sequencer stalls when unsafe/safe gap exceeds maxSafeLag
- sequencer auto-resumes once safe head catches up

Supporting devstack additions:
- L2CLSequencerMaxSafeLag option to configure max-safe-lag via
  WithGlobalL2CLOption (mirrors existing L2CLSequencer/L2CLIndexing)
- NewMinimalNoFaultProofs preset (and underlying sysgo runtime
  variant) that skips the proposer and challenger, following the
  existing NewMinimalWithConductors pattern. This avoids requiring
  cannon prestate artifacts in local test runs for tests that only
  exercise the sequencer + batcher + derivation loop.

* chore: trigger ci

* refactor(op-batcher): rename toBeAddedBlocks to havePendingBlocks

Address review feedback: clearer name for the boolean tracking
whether pendingBlocks() > 0 in getReadyChannel.

* test(op-acceptance-tests): skip max-safe-lag test on kona-node

The max-safe-lag stall/resume logic lives in op-node's Go sequencer.
kona-node has its own sequencer implementation that is out of scope
for this regression test, so skip the test in the kona-node CI matrix
variant.

Also reformat the godoc list to satisfy goimports.
* refactor(op-core/nuts): write fork_lock.toml entries chronologically

Iterate forks.All when encoding so adding a new fork doesn't reshuffle
existing entries. Map iteration order was alphabetical, which would
push new forks above older ones in the file.

* feat(op-core/nuts): commit Interop NUT bundle and embed

Captures the forge-script output as the canonical Interop bundle and
exposes it to op-node/kona consumers.

---------

Co-authored-by: maurelian <maurelian@protonmail.com>
…ers (#20652)

The preset's initial CrossSafe match for ELSync-mode verifiers was a fixed
120-attempt poll (240s). Under CI resource contention, op-geth's beacon-driven
EL sync sometimes does not complete within that budget — the verifier's
unsafe head keeps advancing via CL gossip every 2s while the safe head stays
at 0 because the EL is still snap-syncing historical blocks. The whole
budget then burns and the test fails at setup.

Replace the fixed-attempt poll with a progress-aware wait: keep polling for
up to 8 minutes, but fail fast (within 30s) if the verifier's LocalUnsafe
head stops advancing. LocalUnsafe is driven by CL P2P gossip and is
independent of EL snap-sync, so a stall there means the test setup is
genuinely stuck and more waiting will not help. Successful runs still finish
in tens of seconds; only the worst-case CI runs use the extended window.

Refs #20649.
* feat(op-node): add follow source success metric

Add follow_source_successes_total counter to track successful follow
source updates, complementing the existing error metric.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(op-node): fix goimports formatting in metrics

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d_hash / fetch_output_block_hash (#20724)

* fix(kona/client): validate output-root version word in fetch_safe_head_hash / fetch_output_block_hash

Both call sites previously sliced `output_preimage[96..128]` as the L2 block
hash without checking the version word at `[0..32]`. Today only
`OutputVersionV0` (the zero word) is defined; op-program's equivalent rejects
any non-V0 word via `ErrInvalidOutputVersion`. The downstream defenses already
refuse a hypothetical V1 claim, so this is not a consensus fix — the goal is
forensic: surface "unknown output version" explicitly instead of masking it as
a generic `InvalidClaim` later in the pipeline.

Adds `OracleProviderError::UnknownOutputVersion(B256)` and a unit test on
`fetch_safe_head_hash` that fails on the pre-fix code (returns `Ok(B256::ZERO)`)
and passes after.

* fix(kona/client): reject malformed output-root preimage length in fetch_output_block_hash

Addresses the review nit on #20724: `fetch_output_block_hash` only guarded
the version word, so a preimage shorter than 32 bytes silently fell through
to the `[96..128]` slice and panicked, and longer-than-128 preimages were
read past their meaningful payload. Add an explicit length-128 check that
returns `Preimage(BufferLengthMismatch(128, n))`, matching the behavior
that `single::fetch_safe_head_hash` already gets for free from `get_exact`.

Test reorganization:
* Move shared `MockOracle` from inline in `trace_extension.rs` into
  `tests/common/mod.rs` so version + length tests can share it.
* Add `tests/output_root.rs` covering both checks (version word and
  preimage length) for each helper — `fetch_safe_head_hash` and
  `fetch_output_block_hash` — so regressions in either function are caught
  independently.
* Bump `interop::util` and `fetch_output_block_hash` to `pub` to mirror
  the existing surface for `single::fetch_safe_head_hash`; the lib has no
  external consumers beyond these integration tests.

---------

Co-authored-by: wwared <541936+wwared@users.noreply.github.com>
…s-safe head (#20769)

* test: migrate TestL2ReorgAfterL1Reorg to supernode to reproduce cross-safe stall

Migrate the L1 reorg test from NewSimpleInterop (supervisor) to
NewTwoL2SupernodeInterop (supernode) to demonstrate a cross-safe head
stall after deep L1 reorgs.

The shallow reorg subtest (n=3) passes: the supernode rewinds one
timestamp at a time and eventually recovers. The deep reorg subtest
(n=10) fails: cross-safe permanently stalls because the batcher enters
an infinite out-of-sync loop after the supernode resets currentL1 to
zero during rewind.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(op-node): ignore non-canonical super authority safe head

* fix(op-node): use finalized fallback for stale super authority safe

* fix(op-node): tolerate unknown super authority safe head

* fix(op-node): use super authority finalized safe fallback

* test: fix race in TestL2ReorgAfterL1Reorg unsafe subtest (#20775)

The unsafe (n=3) subtest captured crossSafeRef and localSafeRef after
the manual L1 sequencing loop, so their L1 origins could land in the
to-be-reorged window and the "should still be canonical" post-checks
would flake when timing shifted them past the divergence point.

Split the helper to run a pre-early callback before sys.L1CL.Stop(),
where L1 origins are guaranteed to be in the pre-divergence prefix,
and capture the stable refs there. The n=10 subtest expects all refs
reorged, so it captures them after sequencing as before.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: wwared <541936+wwared@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Karl Floersch <karl@oplabs.co>
* test(op-interop-filter): add integration suite for logsdb contract

Adds 52 tests against the real on-disk logsdb covering the op-interop-filter
↔ logsdb boundary: rejection-reason classification through
Backend.CheckAccessList, GetExecMsgsAtTimestamp round-trip, accessor
behaviour, ingest happy/error paths, reorg recovery, init/resume, restart
durability, and backend failsafe wiring.

The suite is structured so it should pass against any logsdb implementation
honouring the supervisor error sentinel contract — no mocks of logsdb.
Cases that the real DB can't produce under correct ingester control flow
(injected ErrDataCorruption / ErrConflict / ErrOutOfOrder etc. from
AddLog/SealBlock) remain covered by the existing mock-based unit tests
and are inventoried in plans/logsdb/op-interop-filter-test-plan.md.

Also drops the old RealLogsDB happy-path test — fully subsumed by the new
sequential-ingest and accessor coverage.

All tests run with t.Parallel() and t.TempDir().

* test(op-interop-filter): cover SealBlock backwards-timestamp contract

Adds TestIntegration_Ingest_BackwardsTimestamp_TripsBackendFailsafe — the
one ca26218 ("align raft-wal LogsDB error behavior") harmonization
change reachable through op-interop-filter's normal control flow. The
other items affect paths the ingester pre-guards against and remain
covered at the logsdb package level.

writeFetchedBlock pre-checks block number and parent hash but not block
timestamp, so a block whose timestamp regresses reaches the logsdb's
SealBlock. The behavioural contract is that the resulting failure must
trip Backend failsafe so subsequent CheckAccessList requests are
rejected with the failsafe label — a logsdb that returns ErrOutOfOrder
instead would silently retry and the operator would never be alerted.

* test(op-interop-filter): consolidate duplicates into integration suite

Removes 20 unit tests in logsdb_chain_ingester_test.go and backend_test.go
whose coverage is now provided by the integration suite, after first
plugging the gaps the unit tests caught that the integration suite did
not:

- Parameterised the RPC-fetch-error test over both InfoByNumber and
  FetchReceipts injection points (replaces RPCError + ReceiptsError).
- Added ErrorEncounteredMidRange_StopsAndReportsBlock, distinct from
  AfterIngesterError_SubsequentIngestsSkipped (the latter pre-sets the
  error, this exercises an error encountered partway through a
  concurrently-fetched range).
- Added Contains / GetExecMsgsAtTimestamp / RewindToFinalized
  BeforeInit_Uninitialized cases.
- Extended RecoverReorg_HappyPath to assert the returned timestamp and
  the post-rewind applyPendingRewind resume value.
- Added Backend_UnsupportedSafetyLevel_Rejected and
  Backend_EmptyAccessList_LocalUnsafe_Accepted.

Kept: TestLogsDBChainIngester_ErrorTypes (enum stringer), the
range-ordering and progress-metric tests, init/sealParent/Contains
focused unit tests, and the cross-validator-specific failsafe + Ready
tests — all of which exercise paths or APIs that the integration suite
does not.

* test(op-interop-filter): pin write-path sentinel dispatch via LogsDB interface

writeFetchedBlock dispatches errors from processBlockLogs to IngesterError
states (ErrConflict -> ErrorConflict, ErrDataCorruption ->
ErrorDataCorruption, ErrInvalidLog -> ErrorInvalidExecutingMessage). The
real on-disk logsdb cannot produce these sentinels from AddLog/SealBlock
under correct ingester control flow because writeFetchedBlock pre-checks
block number and parent hash before calling either method, so the
integration suite can't exercise them.

Introduces a LogsDB interface (the subset of *logs.DB that
LogsDBChainIngester depends on) and a fakeLogsDB satisfying it. Two
table-driven tests pin the positive dispatch (ErrConflict /
ErrDataCorruption from each write method) and the negative passthrough
(ErrFuture / ErrSkipped / ErrOutOfOrder / generic must not set
IngesterError).
…ram (#20717)

* fix(kona-client/interop): mirror SuperRoot trace-extension arm on TransitionState prestate

When the agreed pre-state is a `PreState::TransitionState` and
`transition_state.pre_state.timestamp >= claimed_l2_timestamp`, the interop
`run()` previously short-circuited to `Err(InvalidClaim)` unconditionally,
regardless of whether `claimed_post_state == agreed_pre_state_commitment`.
The parallel `PreState::SuperRoot` arm already returned `Ok(())` in the
matching-claim case (trace extension). This commit extends the
TransitionState arm to mirror that behavior, bringing kona-client into
parity with op-program's `stateTransition`/`ValidateClaim` semantics at the
`>=` boundary on sub-case A (`T == GT AND claim == prestate`).

Adds three integration tests in `bin/client/tests/interop_trace_extension.rs`:
sub-case A (RED on baseline, GREEN after fix), sub-case B (fail-closed
regression guard), and sub-case C-eq (symmetric strict-`>` half).

* test(kona-client/interop): trim verbose comments from trace-extension tests

* fix(kona-proof-interop/boot): reject future-timestamped prestate (#20727)

Add an `assert!` in `BootInfo::load` rejecting any agreed pre-state whose
timestamp exceeds `claimed_l2_timestamp`. The honest actor never agrees to
such a pre-state; op-program panics on the same condition (see
`op-program/client/interop/interop.go:87-97`). Without this guard, a
malicious proposer could register a future-timestamped SuperRoot or
TransitionState preimage (the oracle only verifies
`key == keccak256(preimage)`, not the timestamp inside) and commit the
same hash as both starting and disputed claim at trace-extended bisection
positions, where kona's `claim == prestate => Ok(())` arm would resolve
as `vmStatus = VALID`.

With the guard, both arms of `interop::run` only need to handle the
legitimate `==` boundary; tighten `>=` to `==` accordingly to make intent
explicit.

Tests:
- Flip
`trace_extension_transition_state_past_game_timestamp_accepts_matching_claim`
  to `#[should_panic]`; its previous assertion pinned the buggy lenient
  behavior. The flipped version is now the regression guard for the
  TransitionState arm.
- Add `rejects_super_root_with_timestamp_after_game_timestamp` as the
  symmetric guard for the SuperRoot arm.
- Refactor `setup_interop_preimages` to take a `PreState` so both arms
  reuse the fixture.

Resolves the "narrow both kona arms to `==`" follow-up flagged in #20717.

* test(kona-client/interop): cover SuperRoot ==-boundary trace-extension cases

Adds the SuperRoot-arm counterparts of the existing TransitionState `==`
trace-extension tests. Without them, a future refactor that breaks the
SuperRoot `==` arm in `bin/client/src/interop/mod.rs` would be caught only
by the strict-`>` panic test, leaving the consensus-critical `T == GT`
boundary unguarded for the SuperRoot variant.

- trace_extension_super_root_at_game_timestamp_accepts_matching_claim
  asserts `Ok(())` when `super_root.timestamp == claimed_l2_timestamp` and
  `claim == prestate_commitment`.
- trace_extension_super_root_at_game_timestamp_rejects_mismatched_claim
  asserts `Err(InvalidClaim)` when the timestamps match but the claim
  differs from the prestate commitment.

Reuses the existing `setup_interop_preimages` fixture which already takes
a `PreState`, so no production or fixture changes.

* fix(kona-client/interop): Use realistic TransitionState in unit tests

Co-authored-by: Inphi <mlaw2501@gmail.com>

---------

Co-authored-by: wwared <541936+wwared@users.noreply.github.com>
Co-authored-by: Rodrigo Araújo <rod.dearaujo@gmail.com>
Co-authored-by: Inphi <mlaw2501@gmail.com>
) (#20733)

* fix(op-supernode): anchor startup backfill to EL finalized head

* test(op-supernode): cover logsDB-ahead reconcile and act==genesis backfill clamp

* fix(op-supernode): resume from verifiedDB unconditionally on warm restart

Gating warm-restart resume on EL finalized rejects the normal case where
cross-safe (and therefore verifiedDB.LastTimestamp) is ahead of finalized,
and it also blocks sequencer-side liveness — the sequencer must produce
blocks before L1 finality can advance.

When verifiedDB is initialized, resume at LastTimestamp + 1 unconditionally.
If EL finalized happens to be ahead (e.g. EL restored from snapshot while
supernode was offline), the main loop rolls forward through it; finalized
blocks cannot contain invalid exec messages.

On cold start, anchor on min EL finalized head + 1 (clamped to activation).
A finalized head at genesis (Number == 0) with a real hash is a valid
anchor — only reject the zero-value response that signals an EL that
isn't ready yet. The previous "Number == 0" guard rejected genesis and
caused op-e2e action tests to hang forever during supernode startup.

The "verifiedDB ahead of canonical L2" failure mode (e.g. a local L2 reorg
invalidating a prior commit) cannot be detected via finalized — that needs
a tip-hash check against the EL block-by-number lookup, out of scope here.

* test(op-supernode): use valid hash in genesis EL finalized test

The previous string "0xgenesis" is not valid hex and decoded to the
zero hash; Time: 50 was the only thing making the L2BlockRef non-zero,
so the case did not actually cover "genesis with a real hash".

---------

Co-authored-by: Karl Floersch <karl@oplabs.co>
* feat(op-supernode): add raft-wal-backed LogsDB implementation

Adds a new raftwallogdb sub-package implementing the LogsDB interface on top
of hashicorp/raft-wal. Each sealed block is stored as a single raft-wal entry
whose payload is a fixed-width binary header followed by a logHash array and
an execMsg array, so Contains is an O(1) memcpy regardless of how many logs
the block carries. StoreLog fsyncs the entry to disk before returning, so
SealBlock is durable on return and atomic with any pending logs buffered by
AddLog.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(op-supernode,op-interop-filter): use raft-wal LogsDB

Swaps op-supernode and op-interop-filter from the op-supervisor logs.DB
implementation to the new raft-wal-backed raftwallogdb.DB. The op-supervisor
implementation is untouched.

The raft-wal store is directory-rooted rather than a single file, so the
ingester now passes the chain directory to Open instead of a logs.db path. The
no-op metrics adapters previously required by logs.DB are removed;
raftwallogdb does not surface metrics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(raftwallogdb): satisfy errorlint and gofmt

errorlint requires every wrapped inner error to use %w rather than %v in
fmt.Errorf. gofmt wanted the const block aligned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(raftwallogdb): clarify the two logIdx fields inside an execMsg record

The first logIdx is the *local* slot in this block that carries the executing
message; the second is the *source-chain* log index of the initiating message
it points at. They are not echoes of each other.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: drop reads.Invalidator from LogsDB Rewind/Clear

No production caller wires in a real Invalidator — op-supernode passes a
hand-rolled noopInvalidator and op-interop-filter passes reads.NoopRegistry.
The Invalidator parameter and the synthetic DerivedInvalidation rule built
inside Rewind/Clear were dead weight inherited from op-supervisor's logs.DB API.

This drops the unused parameter from the LogsDB interface, the raftwallogdb
implementation, every test mock that has to satisfy LogsDB, and the call
sites. op-supernode no longer imports op-supervisor/.../reads at all; the
only remaining op-supervisor dependencies are 'types' (BlockSeal,
ContainsQuery, ExecutingMessage, error sentinels) and 'processors' (helper
functions that decode receipts into log hashes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: go mod tidy

Promotes hashicorp/raft-wal from indirect to direct require (it is now
imported by op-supernode's raftwallogdb package), and prunes stale entries
that were left behind when the reads.Invalidator wiring was removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix lint

* test(raftwallogdb): expand coverage to match op-supervisor's LogsDB

Brings raftwallogdb behavioural coverage up to (or above) what op-supervisor's
logs.DB test suite asserts about the LogsDB contract. The new tests cover:

- Empty-DB behaviour across every read entry point
- SealBlock validation: parent-hash mismatch, wrong number, timestamp regression
- AddLog validation: bad parent, non-zero first index, duplicate index, skipped index
- Contains: every error path (wrong checksum, wrong timestamp, out-of-range
  logIdx, future block with future / past timestamp, block 0)
- FindSealedBlock: ErrSkipped below first, ErrFuture above latest
- OpenBlock: boundary checks plus block 0 happy-path
- Multi-block roundtrip across 10 blocks with mixed executing-messages
- Rewind edges: empty DB, at-latest no-op, above-latest ErrFuture, before-first
  clears, at-first keeps it, pending-buffer-dropped semantics
- Clear on populated and empty DBs
- Persistence across close/reopen and pre-seal-crash buffer loss
- blockRecord and execMsg fixed-width encoding roundtrip

Supervisor tests around checkpoint placement, file-format recovery, and the
internal iterator API are intentionally not mirrored — they cover storage-
engine internals that have no analogue in the raft-wal implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(raftwallogdb): match logs.DB Contains semantics and broaden coverage

Drop the special-case ErrConflict for BlockNum==0 in Contains so behavior
matches op-supervisor's logs.DB (empty DB → ErrFuture; populated DB flows
through the normal logIdx/timestamp checks). Add tests for: block 0
handling, multiple ExecutingMessages per block, last-index Contains
boundary, and reopen-after-Rewind / reopen-after-Clear persistence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(raftwallogdb): align error contract with op-supervisor logs DB

Audit of error-type divergences between the original op-supervisor logs DB
and the new raft-wal-backed implementation surfaced several behaviour
differences that downstream consumers (op-supernode, op-interop-filter)
implicitly depend on. Reconciled to match the old contract:

- Pass-through transient read errors from GetLog/readBlockAt instead of
  wrapping them as ErrDataCorruption. Real structural corruption (short
  buffer in decodeBlockRecord, length checks on entry data) keeps the
  ErrDataCorruption tag.
- OpenBlock returns ErrSkipped at blockNum == firstBlock when firstBlock
  is non-zero, matching the old DB so op-supernode's algo.go fallback
  remains a no-op replica of the success path. Tracked for cleanup in
  #20726.
- AddLog rejects parentBlock == eth.BlockID{} with ErrOutOfOrder. Genesis
  blocks cannot carry receipts; structurally enforces the invariant at
  the write boundary so Contains does not need a read-path guard.
- AddLog parent-identity mismatches return ErrOutOfOrder (was
  ErrConflict). Matches old DB. Prevents op-interop-filter's
  ErrorConflict failsafe from tripping on transient state-disagreement.
- SealBlock block-number gap and backwards-timestamp now return
  ErrConflict (were ErrOutOfOrder). Matches old DB. Restores
  op-interop-filter's failsafe trip on structural desync.
- Removed the dead raft.ErrLogNotFound tolerance in clearLocked.
  raft-wal's DeleteRange does not produce that sentinel — it returns nil
  for ranges outside the stored log.

Test fixtures using blockID(0, 0x00) collided with eth.BlockID{} (real
chains never have a zero genesis hash); switched to blockID(0, 0xA0).

* refactor(raftwallogdb): centralize entry length validation in decodeSealedBlock

Addresses review feedback on PR #20688: the truncation checks in Contains
and OpenBlock duplicated the same offset arithmetic. decodeSealedBlock
validates the full entry length up front (strict equality) and returns
slices for the log-hash and execMsg regions so callers can index without
re-checking bounds.

* refactor(raftwallogdb): move entry accessors onto blockRecord

decodeBlockRecord now validates the full entry length and populates
slice views of the log-hash and execMsg regions on the returned record.
New LogHash(i) and ExecMsg(i) methods expose them, removing the offset
arithmetic from Contains and OpenBlock and dropping the standalone
decodeSealedBlock helper.

* fix(op-interop-filter): align LogsDB.Rewind signature with raftwallogdb

The raftwallogdb.DB.Rewind has signature (eth.BlockID) error and no
longer takes a reads.Invalidator. Update the LogsDB interface (and the
fake in logsdb_dispatch_test) to match so *raftwallogdb.DB satisfies it
and the package compiles.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
op-conductor no longer dials op-supervisor for health checks. Drop the
supervisor.rpc flag/config, the SupervisorHealthAPI client and mock, the
ErrSupervisorConnectionDown error path, and the associated tests.
* chore(op-proposer): remove op-supervisor proposal source

The op-proposer can now drive super-root proposals only via supernode
RPCs. Drops the SupervisorRpcs config field and --supervisor-rpcs flag,
along with the unused supervisor-backed superproofs devstack preset and
the interop e2e proposer wiring (which had no supernode option).

* chore(op-proposer): drop now-unused supervisor super-root helper

goimports cleanup after removing the supervisor superproofs path:
deletes getSupervisorSuperRoot and reorders import groups.
TestWithdrawal_CannonKona flaked under op-reth with 'no state found for
block number N'. forGamePublished returns once the dispute game references
block N on L1, but op-reth persists L2 state asynchronously, so the
immediate follow-up eth_getProof for N races the persistence service flush.
Wrap GetProof in an Eventually that only retries on the specific
state-not-yet-persisted errors emitted by op-reth ('no state found for
block') and op-geth ('missing trie node'). Real errors still propagate
immediately. Refs #19964.
…ng (#20788)

* perf(rust-e2e-restart): 1s L2 block time, shorter NotAdvanced window

Drops L2 block time for the kona node-restart sysgo suite from the default
(2s) to 1s, and trims TestSequencerRestart's NotAdvancedFn window from 50
to 20 slots. Both changes target the wall-clock dominated rust-e2e-restart
CI job, which spends most of its ~24m runtime waiting for L2 blocks.

* ci(rust-e2e): persist prebuilt rust binaries to workspace

The cannon-kona-host, kona-build-release, and op-reth-build jobs build
release binaries that downstream e2e jobs already require but do not
consume from the workspace. Each downstream job then re-runs rust-build,
restoring the rust target cache and re-linking the same binary (~9m).

Set persist_to_workspace: true on the three builders and drop the
redundant rust-build step from rust-e2e-sysgo-tests,
rust-restart-sysgo-tests, op-reth-e2e-sysgo-tests, and
kona-proof-action-tests. The downstream jobs already attach the workspace
and reference $WD/rust/target/release/<binary>, so the persisted globs
land at the expected path.

* ci(rust-e2e): persist only from kona-build-release

cannon-kona-host, kona-build-release, and op-reth-build write overlapping
files into rust/target/release (kona-build-release builds the entire
workspace, so it produces every binary). CircleCI rejects concurrent
persists of the same file with 'Concurrent upstream jobs persisted the
same file(s)'.

Persist only from kona-build-release. The other two still build in
parallel to prime caches but no longer persist.

* ci(rust-e2e): drop redundant cannon-kona-host and op-reth-build jobs

kona-build-release builds the entire rust workspace, so it already
produces kona-host and op-reth alongside kona-node. Running cannon-kona-host
and op-reth-build in parallel was duplicate work — both built subsets of
what kona-build-release produces, and they cannot persist to the workspace
without colliding with it. Drop the two jobs entirely and route their
former consumers to kona-build-release.

* ci(rust-e2e): drop redundant rust build jobs

kona-build-release builds the entire rust workspace, so it already
produces kona-host and op-reth alongside kona-node. The parallel
cannon-kona-host and op-reth-build jobs were rebuilding subsets of the
same output. They cannot persist alongside it without colliding on
rust/target/release/* paths.

Drop both, route all consumers to a single workspace build, and rename
it to rust-workspace-release to reflect that it produces the full set
of release binaries — not just kona.
…Converge (#20729)

* fix(op-acceptance-tests): stabilize TestFollowSource_HeadsDivergeThenConverge

Cross-safety only commits at L2 timestamps where every chain in the
interop set has a local-safe block. With a 2s block time, stopping two
sequencers via sequential RPCs lets the leader sneak in one extra
block while the laggard is still being told to stop (the gap was ~80ms
in the original CI failure, well over the inter-block window in the
race). The laggard then never produces a block at that timestamp, so
the supernode verifier stalls at the laggard's last timestamp. The
follower for the leader chain ends up with local-safe at the extra
block but cross-safe stuck a block behind, and the convergence wait in
the test times out.

Replace the two sequential StopSequencer calls with a new
dsl.StopSequencersSynced helper that issues all stops concurrently and
then aligns the chains: any sequencer below the maximum unsafe head is
restarted just long enough to produce the missing block(s) and stopped
again. The alignment loop is bounded so a real bug fails fast instead
of hanging. Chains in the same interop set share a genesis time and
block time, so equal unsafe numbers imply equal timestamps and the
verifier can advance.

Refs #19821

* refactor(dsl): align StopSequencersSynced by timestamp, not block number

The previous implementation aligned chains by unsafe-head block number, which
only equals timestamp alignment when chains share genesis time AND block time.
That assumption holds for the current acceptance-test presets but is not
generally true for Superchain interop sets, where member chains can have
different rollup configs.

Cross-safety only cares about timestamps: the verifier walks one L2 timestamp
at a time and requires every chain to have a local-safe block at that
timestamp. Aligning by timestamp is therefore both correct and strictly more
general.

Drop the docstring claim about shared genesis/block time, and switch the
leader/level-up comparison plus log fields to timestamps. No behaviour change
for chains with identical configs (the only current call site).

* refactor: align chains via TestSequencer instead of a new DSL helper

The start/poll/stop level-up approach in StopSequencersSynced is fundamentally
racy: a restarted sequencer produces a burst of catch-up blocks faster than any
sane poll interval, so stopping at exactly the target block is unreliable.

Use TestSequencer.SequenceBlock instead, which builds exactly one block at
parent.Time + blockTime deterministically. The test now stops both sequencers
and then drives the trailing chain up to the leader's timestamp with
single-block SequenceBlock calls — no restart, no overshoot risk.

Drop dsl.StopSequencersSynced entirely; it has no remaining callers and the
TestSequencer path is strictly better for presets that expose one.

* fix(op-acceptance-tests): Remove unnecessary test sequencer check

---------

Co-authored-by: wwared <541936+wwared@users.noreply.github.com>
…fig (#20613)

* refactor(circleci): implement compute-changes job for dynamic configuration

Replaced path-filtering with a custom compute-changes job in CircleCI configuration to enhance flexibility in detecting changed paths. This update includes merging continuation configs and continuing the pipeline based on computed changes. Additionally, updated rust-e2e workflow to utilize a dedicated flag for E2E changes detection, improving clarity and functionality.

* chore(circleci): integrate continuation orb for pipeline management

Added the CircleCI continuation orb to streamline the continuation of pipelines. Updated the configuration to utilize the orb for continuing the pipeline with merged configurations and parameters, enhancing the overall efficiency of the CI/CD process.

* fix(circleci): enhance YAML merging in merge-configs script

Updated the merge-configs.sh script to utilize 'explode(.)' in the YAML merging process, ensuring that YAML anchors and aliases are resolved correctly before merging. This change prevents undefined alias references in the output, improving the reliability of the configuration merging.

* feat(circleci): normalize boolean environment variables in compute-changes script

Added a new function to convert boolean environment variables from CircleCI into JSON-safe strings for use with jq. Updated the compute-changes script to utilize this function, enhancing the handling of boolean flags in the configuration process.

* feat(circleci): implement dynamic path detection in compute-changes script

Enhanced the compute-changes script to automatically detect changes in specified paths using environment variables defined in the CircleCI configuration. This update simplifies the detection logic by iterating over DETECT_* variables, improving the flexibility and maintainability of the CI/CD process.

* refactor(circleci): enhance compute-changes script for parameter collection

Updated the compute-changes script to streamline the collection of environment variables by categorizing them into string and boolean types. The script now supports a more structured approach to detect changes in specified paths, improving the overall efficiency and maintainability of the CircleCI configuration process.

* revert changes on rust-e2e.yml

* Extract remaining conditions

* centralizing conditions

* simplify

* remove unused

* revert change

* revert

* improve script nd add excpetion for shell check

* Add collect-params.sh script to gather pipeline parameters

This script collects environment variables prefixed with 'c-' and outputs them in JSON format. It supports three modes: string, boolean, and detection against changed files in the git diff. The output is appended to /tmp/pipeline-parameters.json.

* Refactor CircleCI configuration to use collect-params.sh for parameter collection

Updated the CircleCI config to rename the job from compute-changes to prepare-continuation-config and replaced calls to compute-changes.sh with collect-params.sh for collecting string, boolean, and detection parameters. This change enhances clarity and maintains consistency in the parameter collection process.

* Enhance CircleCI configuration with improved decision tree for job execution

Refactored the CircleCI config to streamline the decision-making process for job execution based on the TRIGGER_SOURCE. Introduced a new function to handle Rust and documentation workflows based on detected file changes, improving clarity and maintainability of the CI pipeline.

* Refactor CircleCI configuration to utilize workflow helper functions

Introduced a new script, workflow-helpers.sh, to encapsulate helper functions for managing the workflow decision tree in the CircleCI config. This refactor simplifies the main config file by offloading JSON handling and decision logic, enhancing maintainability and clarity in the CI pipeline.

* Update CircleCI continuation orb version to 2.0.1 for improved functionality

* Enhance CircleCI configuration with detailed comments for the prepare-continuation-config job

Added comprehensive comments to the CircleCI config to clarify the steps involved in preparing the dynamic continuation pipeline. This includes explanations for installing dependencies, collecting parameters, detecting file changes, and merging configurations, improving maintainability and understanding of the CI process.

* Refactor CircleCI configuration to enhance webhook job execution logic

Updated the CircleCI config to improve the decision-making process for webhook job execution. Added detailed comments outlining three distinct lifecycle stages: feature branch pushes, merge queue validations, and post-merge actions. This refactor enhances clarity and maintainability of the CI pipeline.

* Refactor CircleCI configuration to streamline job execution commands

Updated the CircleCI config to separate job execution commands for improved clarity and maintainability. This change enhances the organization of the CI pipeline by clearly delineating the execution of main, release, and feature test jobs across different branch scenarios.

* Refactor CircleCI configuration for improved readability in job execution logic

Updated the CircleCI config to enhance the formatting of job execution commands, making the structure clearer and more maintainable. This change improves the organization of conditional statements for contracts, Rust, and documentation tests across various branch scenarios.

* Add decision tree testing to CircleCI configuration

Introduced a new job in the CircleCI config to validate the decision tree logic against known scenarios. This addition ensures that any refactor will fail fast if it breaks expected workflow activation, enhancing the reliability of the CI pipeline.

* Refactor CircleCI configuration to optimize job execution for documentation changes

Updated the CircleCI config to prioritize documentation-only changes in the job execution logic. This enhancement ensures that when only documentation is modified, the CI pipeline runs the documentation tests without executing other jobs, improving efficiency and reducing unnecessary resource usage.

* Add CODEOWNERS entries for CircleCI configuration files

Updated the CODEOWNERS file to include ownership for CircleCI configuration and script files. This change assigns @raffaele-oplabs and @geoknee as code owners for the .circleci/config.yml and .circleci/scripts/test-decision-tree.sh files, ensuring proper oversight and accountability for these CI components.

* Update shellcheck directive in test-decision-tree.sh to disable SC1091 warning

Modified the shellcheck directive in the test-decision-tree.sh script to disable the SC1091 warning, which pertains to sourcing files that may not exist. This change improves the script's compatibility with shellcheck without altering its functionality.

* revert code owner

* Update CircleCI configuration to dynamically set BASE_REVISION based on pull request base reference

Modified the BASE_REVISION environment variable in the CircleCI config to use the pull request's base reference instead of a static 'develop' value. This change enhances the flexibility of the CI pipeline by ensuring it accurately reflects the context of the pull request being built.

* Update BASE_REVISION in CircleCI config to a static branch name for improved clarity

* Add feature tests for contracts and Rust in CircleCI configuration

Enhanced the CircleCI config by adding jobs to run contracts feature tests and Rust CI checks when specific conditions are met. This update improves the testing coverage and ensures that relevant tests are executed for changes in contracts and Rust code, contributing to a more robust CI pipeline.

* Update BASE_REVISION in CircleCI config to 'develop' for consistency with branch naming
TestMultiELSync flakes because session.ResetSession was leaving the
ELSyncPolicy.cache untouched. When L2CL2's verifier op-node manages to
push a NewPayload + FCU to the SyncTester before sys.L2CL2.Stop() returns,
the WindowSyncPolicy cache ends up holding the payload number from that
in-flight call (commonly 2). ResetSession clears Validated, CurrentState,
and Payloads, but the policy's private cache survives — so the test's
first FCU(targetNum-2 = 3) immediately forms the consecutive sequence
[2,3] required by cnt=2, the policy returns Valid, and session.Validated
jumps to 3. The follow-up NewPayload(targetNum-1 = 4) then returns VALID
instead of SYNCING, hitting the assertion at sync_test.go:60. The race is
intermittent because it depends on how far L2CL2's engine driver has
progressed when Stop() is invoked.

Add Reset() to the ELSyncPolicy interface, implement it on WindowSyncPolicy
by zeroing the sliding-window cache, and invoke it from
SyncTesterSession.ResetSession so the policy starts from scratch alongside
the rest of the session state. Add TestWindowSyncPolicy_Reset that primes
the cache with 2, calls Reset, then observes 3 (must be SYNCING) and 4
(must be VALID). The test fails deterministically against a no-op Reset
with the exact CI symptom (expected SYNCING, actual VALID), confirming
the root cause.

Update TestMultiELSync to call ResetSession after stopping L2CL2,
mirroring the pattern in TestSyncTesterELSync, so the policy cache is
empty when the test starts driving the engine API manually regardless of
how much L2CL2 work landed before Stop() returned.

Fixes #20780
Acceptance tests on memory-all-opn-op-reth (and the op-geth/kona variants)
write devnet state to /tmp on the runner's real disk. When many in-process
tests run in parallel, op-reth's mdbx commit waits on fsync and stalls — we
saw commit_duration spike from ~6ms to 60-125s, causing context-deadline
cascades that failed ~36 unrelated tests in one merge-queue run.

Durability is irrelevant for CI, so preload libeatmydata.so for the whole
op-acceptance-tests job. Every subprocess (op-reth, op-node, op-geth,
supernode, go test, just, ...) inherits LD_PRELOAD via $BASH_ENV and has
fsync/fdatasync/msync turn into no-ops.
)

The workspace unification in #19034 moved op-reth from `reth/` to
`rust/op-reth/` and consolidated all build output under `rust/target/`.
Four path references in `rust/kona/tests/justfile` were never updated:

- `build-reth` recipe: `cd ../../reth` -> `cd ../../op-reth`
- `OP_RETH_EXEC_PATH` in `acceptance-tests-run`, `test-e2e-sysgo-run`,
  and `long-running-test`: `../../reth/target/debug/op-reth` ->
  `../../target/debug/op-reth` (unified workspace target directory)

CI was unaffected because it pre-sets `OP_RETH_EXEC_PATH` to the correct
`rust/target/release/op-reth` before invoking justfile recipes, and never
calls `build-reth` directly. These broken paths only affect local
development workflows (`just build-reth`, `just acceptance-tests`,
`just test-e2e-sysgo`, `just long-running-test`).

Related: #19569, #19929

Co-authored-by: wwared <541936+wwared@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…9874)

Add SpanDecodingError::TxGases and map decode_tx_gases failures to it instead of TxNonces.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

⤵️ pull merge-conflict Resolve conflicts manually