refactor(crypto): codec wire-format KATs + round-trip fixes + Array[Byte]→Hex#53
Merged
scasplte2 merged 7 commits intoJun 17, 2026
Conversation
…itment
Codec hardening pass (consensus-hashed serializations). The new MerkleCodecKatSuite
pins the EXACT ordered field-name list + ADT discriminator + decode(encode(x))==x for
every merkle codec — a rename/reorder/discriminator drift now fails a test before it can
silently re-hash a Merkle root.
The KAT immediately caught two pre-existing encoder/decoder asymmetries:
- MerkleInclusionProof: `witness` is ENCODED as [{"digest":..,"side":..}] (objects) but was
DECODED with circe's default tuple form [[..,..]] (arrays) — round-trip broken.
- MerkleTree: identical mismatch on `leafDigestIndex` ({"digest","index"} out, arrays in).
Both fixed by aligning the DECODER to the object form the encoder emits — decoder-only, so
the encoder output (the hashed/persisted bytes) is unchanged (no consensus impact).
MerkleCommitment.Leaf/Internal: hand-rolled Json.obj/downField codecs replaced with derevo
@derive(encoder, decoder) — byte-identical to the prior form (same field names + order),
confirmed by the KAT (key order + round-trip) and the existing determinism / known-answer
suites (node digests unchanged). The {type,contents} ADT discriminators stay hand-rolled
(circe's derived sealed-trait format differs). 53/53 merkle suites green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Investigation (paused codec pass) into whether Map/Set iteration order in hand-rolled .asJson encoders can leak into a consensus-hashed root (cross-Scala/ JVM-version fork risk). Conclusion: NO risk. Everything hashed routes through JsonBinaryHasher.computeDigest -> JsonBinaryCodec.serialize, which dropNulls + RFC 8785 canonicalizes, and the canonicalizer sorts object keys with a fixed UTF-16BE comparator (not HashMap order) — so .asJson Map iteration order is irrelevant to the hashed bytes. The earlier MPT Branch "HashMap-order" concern is a false alarm (the sortBy(...).toMap dead-sort is harmless; JCS re-sorts at hash time). Residual classes JCS does not cover were scanned and are safe: the only Sets on a hashed path are SortedSet (CommitDelta/StateDelta.removes); raw Hash.fromBytes bypasses hash fixed-order bytes (names, raw values, concatenated roots). Documents the invariant for future hashed content (route through serialize; SortedSet for sets; JCS does not sort arrays). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire-format KATs for the consensus-hashed MPT codecs (commitment Leaf/Branch/ Extension + ADT, node Leaf/Branch/Extension + ADT, inclusion proof, trie): pin the exact ordered field-name list + ADT `type` discriminator + decode(encode(x))==x, so a rename/reorder/discriminator drift fails here before it can re-hash an MPT root. No round-trip asymmetry found (unlike the merkle witness/index tuple bug) — MPT uses explicit symmetric Nibble seq codecs. Finding: the MPT codecs are NOT cleanly derivable — Leaf/Extension carry `Seq[Nibble]`, and the custom `Nibble.nibbleSeqEncoder` is ambiguous with circe's generic `encodeSeq` under magnolia derivation (the hand-rolled encoders pass it explicitly). So they stay hand-rolled, now KAT-guarded. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…KATs Per review: Array[Byte] is a poor case-class field type — no circe codec (forcing a custom value codec), no structural equals (forcing a sameElements Eq), and mutable. Replace `value: Array[Byte]` with `value: Hex` (tessellation) on the SERIALIZED / result types — SparseMerkleProof.Inclusion (the wire proof, the one with a codec) and SparseMerkleEntry.Present (the verified result). This removes the bespoke valueEncoder/valueDecoder and the sameElements Eq, and makes the value immutable with a real codec + structural equality. Wire- and hash-COMPATIBLE: Inclusion.value already serialized as Hex.fromBytes(bytes), so a `value: Hex` field emits the identical string; the verifier's value-binding is unchanged (`Hash.fromBytes(value.toBytes)`); the SMT tree/root is untouched (the in-memory SparseMerkleNode.Leaf keeps raw Array[Byte] — it is never serialized, so it is not a wire format). Consumers convert at the boundary with `.toBytes` (verifier, AuthDbOps smt_verify opcode, OrdinalCatalogProof sub-root decode) and `Hex.fromBytes` (NodeOps proof construction). New SparseMerkleCodecKatSuite pins the exact field-name list + ADT discriminator + round-trip (Eq for the value-bearing proof) for every SMT codec. The Hash-only types (Sibling, Commitment Leaf/Internal, Root) stay hand-rolled here; they are cleanly derivable (Hash fields) and can be moved to derevo @derive in a follow-up. smt + committed + AuthDb suites: 44 green (incl. value-binding/tamper + committed proofs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Audit of Array[Byte] as a case-class field across metakit crypto/committed/json_logic (and ottochain): the only wire-format offenders were smt Inclusion/Present (fixed -> Hex); the rest (SparseMerkleNode.Leaf.value, sigma PropNode/ProofNode bytes) have no circe codec — internal/raw byte representations where Array[Byte] is appropriate. ottochain has no Array[Byte] case-class fields. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
JsonLogicValueCodecKatSuite pins the JSON-IDENTITY contract of JsonLogicValue (the
JLVM value type / committed JLVM state): each variant encodes to its bare JSON form
(true / 5 / [..] / {..}), NOT a tagged ADT, and decode(encode(x)) === x. Deriving the
sealed trait would tag it and silently change committed state hashes — this fails first.
CommittedRootsCodecKatSuite pins the committed breadcrumb codecs (CommitKey bare-string
newtype; CommittedRoots / CommittedBreadcrumb field-name + round-trip) — the constant-
size on-chain commitment a syncing node trusts.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Codec-hardening pass over the consensus-hashed serializations (merkle, mpt, smt, JLVM value, committed-state). Goal: pin every codec's wire contract so an accidental field rename / reorder / discriminator change fails a test before it can silently re-hash a committed root, and remove the
Array[Byte]-as-a-field smell. The pass also surfaced + fixed two real round-trip bugs.Wire-format KATs (the drift guard)
New
*CodecKatSuitefor each module pins the exact ordered field-name list + ADTtypediscriminator +decode(encode(x)) == xof fixed instances:MerkleCodecKatSuite,MerklePatriciaCodecKatSuite,SparseMerkleCodecKatSuite,JsonLogicValueCodecKatSuite,CommittedRootsCodecKatSuite.Bugs found + fixed (decoder-only — encoder/hashed bytes unchanged)
The merkle KAT immediately caught two pre-existing encoder/decoder asymmetries:
MerkleInclusionProofencodedwitnessas[{"digest","side"}](objects) but decoded with circe's default tuple form[[..]](arrays) — round-trip broken.MerkleTreehad the identical mismatch onleafDigestIndex.Both fixed by aligning the decoder to the object form the encoder emits — decoder-only, so the hashed/persisted bytes are unchanged.
Array[Byte]→HexSparseMerkleProof.Inclusion.value(a circe-coded proof, the bespoke wire format) andSparseMerkleEntry.Present.value(customsameElementsEq) →Hex(tessellation): a codec + structural equality + immutability, no hand-rolled array wire format. Wire- and hash-compatible (the value already serialized asHex.fromBytes(bytes); the in-memorySparseMerkleNode.Leafkeeps rawArray[Byte]— it is never serialized).Derivation
MerkleCommitment.Leaf/Internal→ derevo@derive(byte-identical, KAT-gated). Most other codecs are not cleanly derivable — finding documented: mpt's customNibbleseq codec is ambiguous with circe's genericencodeSeq; ADT discriminators use a custom{type,contents};JsonLogicValueis a JSON-identity codec. Those stay hand-rolled, now KAT-guarded.Audit docs (
docs/codec-determinism-audit.md)JsonBinaryCodec.serialize→ RFC 8785 (JCS) which sorts object keys with a fixed UTF-16BE comparator; the only hashedSets areSortedSet; rawHash.fromBytespaths hash fixed-order bytes. (The MPT BranchsortBy(...).toMapis dead but harmless.)Array[Byte]value-type audit: the only wire-format offender was the smt one (fixed); the rest (SparseMerkleNode.Leaf.value, sigmaPropNode/ProofNodebytes) have no codec — internal/raw, appropriate. ottochain has none.Tests
sbt clean scalafmtAll scalafmtCheckAll test→ 1225 green (+15 KATs), scalafmt clean.🤖 Generated with Claude Code