Skip to content

Summarize getter storage/mem-copy loops as (Storage)CopySlice#1063

Open
msooseth wants to merge 1 commit into
write-disjoint-readwordfrom
storagecopyslice
Open

Summarize getter storage/mem-copy loops as (Storage)CopySlice#1063
msooseth wants to merge 1 commit into
write-disjoint-readwordfrom
storagecopyslice

Conversation

@msooseth
Copy link
Copy Markdown
Collaborator

@msooseth msooseth commented May 12, 2026

Fixes #697 — public-getter copy loops blowing up symbolic execution.

What

  • New Expr Buf node StorageCopySlice with read-through rules, SMT unroll (capped 4096 words), CSE/Format/Traversals.
  • New module EVM.GetterDetection: detects Solidity SLOAD→MSTORE and MLOAD→MSTORE copy loops (both backward-JUMPI and backward-JUMP+forward-exit-JUMPI shapes). Refuses detection if stack-layout extraction fails.
  • SymExec consults Contract.getterLoops at JUMPI, takes the exit branch, and replaces memory with a single CopySlice / StorageCopySlice summary node so concrete-offset reads resolve directly.
  • New PartialExec variant SymbolicCopySliceSize: symbolic-size CopySlice/StorageCopySlice leaves now surface as Partial instead of a hard SMT Error.
  • New readWord disjointness rule: skips a WriteWord at Add (Lit c) X when X has a derivable static upper bound and byte ranges are provably disjoint mod 2^256.
  • CLI flag --no-skip-getter-loops (library default is on).

Tests

  • Detector unit tests + StorageCopySlice read-through unit tests.
  • End-to-end public getters: bytes, string, string[] collapse to non-Partial; uint256[], uint8[10] confirm no mis-fire.
  • Soundness: cross-contract bytes-getter test asserts no Cex and no Error.
  • 5 disjointness unit tests + 1 end-to-end test.

Checklist

  • tested locally
  • added automated tests
  • updated the docs (doc/src/common-options.md)
  • updated the changelog

@msooseth msooseth changed the title [DO NOT REVIEW] Experimental storagecopyslice so we can represent copy loops AND hopefully more in the future Summarize getter mem-copy loops as CopySlice May 13, 2026
@msooseth msooseth requested a review from blishko May 18, 2026 16:42
@msooseth msooseth marked this pull request as ready for review May 18, 2026 16:43
@msooseth msooseth changed the title Summarize getter mem-copy loops as CopySlice Summarize getter storage/mem-copy loops as (Storage)CopySlice May 18, 2026
@msooseth msooseth force-pushed the storagecopyslice branch 2 times, most recently from 031373c to 0f3765a Compare May 21, 2026 08:52
@msooseth msooseth force-pushed the storagecopyslice branch from 0f3765a to 825350b Compare May 21, 2026 09:32
@msooseth msooseth changed the base branch from main to write-disjoint-readword May 21, 2026 09:32
@msooseth msooseth force-pushed the storagecopyslice branch 2 times, most recently from b5cda57 to 825350b Compare May 21, 2026 10:00
add missing GetterDetection.hs (fixup for 79720af)

The previous commit referenced EVM.GetterDetection from hevm.cabal and
src/EVM.hs but the file itself was not staged, leaving the branch
unbuildable from a fresh checkout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tests: cover string/array public getters

Extend the GetterDetection test group with end-to-end cases for the
other public-getter shapes the issue (#697) is concerned with:

- string public name        — same SLOAD->MSTORE shape as bytes
- uint256[] public arr      — no copy loop; detection must not mis-fire
- uint8[10] public arr      — fixed-size; no copy loop either
- string[] public strs      — element-wise string copy still loops

Also adds TODO-storagecopyslice.md tracking the residual work on this
branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tests: soundness check for the bytes-getter summary

Adds 'bytes-getter-summary-sound' which exercises the same contract as
the existing no-partial test but additionally asserts:

  - no Cex results from checkAssert (no false-positive panics introduced
    by the summary);
  - no Error results (no StorageCopySlice with symbolic numWords leaking
    to exprToSMT, which would currently surface as a Left).

Catches both a wrong summary (false Cex) and the SMT fallback gap
called out in TODO-storagecopyslice.md task #3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SMT: cap StorageCopySlice unroll and align error wording with CopySlice

Two small SMT-encoder changes around StorageCopySlice:

  - Add an explicit unroll cap (4096 words) for the (Lit numWords) case
    so a pathologically large concrete count can't blow up the encoder
    before SMT even sees the query.
  - Use the same error wording as CopySlice for the symbolic-size case,
    making it clear this is the same limitation that already exists for
    CopySlice rather than a separate bug.

A real fix for the symbolic-numWords case (over-approximating to a
fresh AbstractBuf so it surfaces as Partial instead of Error) would
need to touch the buf-discovery preprocessing — same fix that
CopySlice would need.  Tracked in TODO-storagecopyslice.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SymExec: fail loudly when summary can't find the executing contract

The fallback to AbstractStore(LitAddr 0) silently aimed the summary at
slot 0 of a wrong address when the executing contract was missing from
env.contracts.  That's a real-bug-masker (wrong-answer-not-crash) for a
case that, by EVM-state invariants, can't actually happen.  Replace it
with internalError so we crash loudly if the invariant is ever broken.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

GetterDetection: drop unreachable empty-stack arms

The inner case-expressions for OpPop/OpMload/OpMstore/OpMstore8/OpSload/
OpSwap all had `[] -> Nothing` arms that GHC was flagging as redundant:
the outer `go{Mem,Stor} _ [] _ _ _ = Nothing` clause already catches the
empty-stack case, so by the time we reach the inner case `stk` is
provably non-empty.  Removes 7 redundant-pattern warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs: changelog + common-options entry for --skip-getter-loops

Add a CHANGELOG line under [Unreleased]/Added and a documentation
section in doc/src/common-options.md describing what the flag does,
when to reach for it, and the conservative detector scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Let's add TODO.

Trun on skipGetterLoops

SymExec: demote symbolic-size CopySlice leaves to Partial

Adds a new PartialExec variant, SymbolicCopySliceSize, and a check in
verifyInputsWithHandler that, after simplifying the leaf and its SMT
props, looks for any CopySlice or StorageCopySlice whose size is not a
Lit.  If found, the leaf is replaced with a Partial(SymbolicCopySliceSize)
before checkSatWithProps runs, so the case surfaces as a graceful
Partial rather than a hard Error from exprToSMT's "symbolically sized
region not currently implemented" path.  Closes task #3 in
TODO-storagecopyslice.md.

Also adds two cross-contract tests:

  - cross-contract-bytes-getter-with-detection
    C deploys A, seeds A's length slot via inline assembly (so the
    long-string branch of myBytes() iterates symbolically), then
    cross-calls a.myBytes() and asserts the decoded length.  With
    skipGetterLoops=True the summary fires; the ReadWord for b.length
    is at a symbolic offset against memory containing symbolic-offset
    WriteWords and a symbolic-size CopySlice, which the simplifier
    cannot fold without interval analysis.  The new demotion kicks in
    and we verify: no Cex (no false-positive panic) and no Error.

  - cross-contract-bytes-getter-without-detection
    Same setup with skipGetterLoops=False and a small maxIter; the
    symbolic getter loop must blow the budget and surface as Partial.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Expr: sound disjointness rule for readWord/WriteWord with bounded offset

Adds a new readWord clause that elides a WriteWord whose offset is of
the form `Add (Lit c) X`, when X has a derivable static upper bound and
the write region is provably disjoint from the read in EVM (mod 2^256)
arithmetic.  Backed by a new conservative `upperBound :: Expr EWord ->
Maybe W256` helper that handles Lit, And-with-Lit, Mod-by-Lit, Div-by-
Lit, Mul/Add with one Lit operand, and SHR by a Lit shift.

The disjointness check uses Integer arithmetic to detect wrap on both
the read and write byte ranges before concluding the write can be
skipped, so the rule does not assume non-wrapping addition.

Tests:

  - 5 local Expr unit tests in memoryTests:
      * positive: WriteWord at Add(Lit, And-bounded) skipped
      * positive: WriteWord at Add(Lit, Mul-And-bounded) skipped
      * negative: unbounded X — not skipped
      * negative: wrap-risk near maxBound — not skipped
      * negative: read region overlaps write min offset — not skipped

  - 1 end-to-end contract test: a function that reads from
    `below[0]` at 0x80 after writing to `arr[idx]` at
    `0xa0 + 32 * (And 0xff idx)`.  No Partial / Cex / Error.

Also updates three pre-existing tests that asserted the old
"symbolic-size CopySlice surfaces as Error" behaviour to instead
assert the new task-#3 behaviour (graceful Partial with
SymbolicCopySliceSize).  Both branches of the relevant assert paths
carry the symbolic-size buf in their constraints, so the path count
is 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

GetterDetection: refuse detection on layout-extraction failure

If extractMemStackLayout / extractStorageStackLayout returned Nothing,
analyseBody would still insert a CopyLoop with a Nothing layout; the
runtime then applied a no-op summary but still took the exit branch,
silently skipping the loop body with stale memory. Pull layout extraction
into the detection's Maybe chain so detection fails when extraction fails,
and turn the Nothing arms in the apply* functions into internalError to
make any future regression loud rather than silent.

Also:

- Flip --skip-getter-loops to --no-skip-getter-loops so the CLI matches
  the library default (on). Update the docs accordingly.
- Update the two hand-crafted GetterDetection unit tests so the
  comparison includes the original loop bound at stack[2] (DUP1 DUP4
  instead of DUP2 DUP2), matching how Solidity actually generates
  these loops. Without this, both compare operands were Computed and
  the now-strict layout extractor (correctly) rejected them.
- Drop the leftover Data.List import in SymExecTests now that
  getResError-based assertions are gone.
- CHANGELOG: trim the storage-copy-slice entry and add bullets for
  the readWord disjointness rule and the symbolic-size->Partial demotion.
- Remove TODO-storagecopyslice.md and storagecopyslice.md scratchpads
  from the repo (and the two test comments referencing them).
- Dedupe a doubled paragraph in verifyInputsWithHandler.

Condense comments and add bytes[] getter-loop Foundry tests

Trim verbose multi-line comments across the storagecopyslice work to
one- or two-liners; no behavioral change. Rename a pattern binding in
simplifyNoLitToKeccak for readability.

Add test/contracts/pass/getterLoopBytesArray.t.sol with four Forge-style
scenarios exercising the SLOAD->MSTORE copy loop in auto-generated
`bytes[] public` getters (single-contract, cross-contract consumer,
NFT registry, Merkle-proof reader), and wire them into FoundryTests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cleanup

Expr: fold single-word StorageCopySlice to WriteWord

simplifyNoLitToKeccak already collapses a zero-word StorageCopySlice to
its dst. A numWords=1 StorageCopySlice is, by definition, a single
WriteWord of SLoad(baseSlot, store) at dstOff -- so rewrite it as such.
Sound for symbolic baseSlot/dstOff and abstract store, and it turns a
StorageCopySlice node (which exprToSMT only handles via a bounded
unroll) into a WriteWord-of-SLoad that encodes and folds cleanly.

Mirrors the existing 32-byte CopySlice -> WriteWord rule in the
copySlice smart constructor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SMT: lower storageCopySliceUnrollLimit from 4096 to 32

A StorageCopySlice that reaches exprToSMT with a concrete numWords is
unrolled into a WriteWord(SLoad) chain. 4096 words is 8192 SMT nodes
for a single slice -- well past the point where the query is tractable.
In practice readByte/readWord fold concrete-offset reads before
encoding, so a slice this large reaching the encoder signals an
intractable query anyway; refusing it early is better than emitting it.

No tests depend on the old value; the getter-loop tests all copy 2-3
words.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

GetterDetection: deduplicate the stack-layout extractors

extractMemStackLayout and extractStorageStackLayout were ~90 lines each
and near-identical -- the only real difference was MLOAD vs SLOAD and
the result record's field names. Merge the abstract-stack simulation
into a single extractStackLayout parameterised by the load opcode,
returning a plain (load, bound, store) depth triple that the two thin
wrappers rename into LoopStackInfo / StorageLoopStackInfo.

Also tighten the rest of the module, all behaviour-preserving:
  - buildJumpdestSet: a one-line list comprehension
  - findBackwardJumps: a list comprehension instead of manual index
    recursion + a checkPrev helper
  - analyseBody: factor the two near-identical CopyLoop literals into a
    local mkLoop
  - fold duplicated OpJump/OpJumpi branches via a shared isJumpOp, and
    the five comparison opcodes via isCmpOp

Net ~120 lines smaller; GetterDetection test group unchanged and green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

GetterDetection: drop mem-to-mem loop summarization

Storage-to-mem detection is sufficient for the public-getter case this
PR targets; keeping the mem-to-mem branch added a parallel code path
(LoopKind, memStackLayout, applyMemCopyLoopSummary) with no validated
in-tree need.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@msooseth msooseth force-pushed the storagecopyslice branch from 825350b to 2aba41a Compare May 21, 2026 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bytes public data getter loops forever for equivalence checking

1 participant