Skip to content

refactor(crypto): codec wire-format KATs + round-trip fixes + Array[Byte]→Hex#53

Merged
scasplte2 merged 7 commits into
Constellation-Labs:devfrom
ottobot-ai:refactor/codec-derive-and-kats
Jun 17, 2026
Merged

refactor(crypto): codec wire-format KATs + round-trip fixes + Array[Byte]→Hex#53
scasplte2 merged 7 commits into
Constellation-Labs:devfrom
ottobot-ai:refactor/codec-derive-and-kats

Conversation

@ottobot-ai

Copy link
Copy Markdown
Contributor

Summary

Codec-hardening pass over the consensus-hashed serializations (merkle, mpt, smt, JLVM value, committed-state). Goal: pin every codec's wire contract so an accidental field rename / reorder / discriminator change fails a test before it can silently re-hash a committed root, and remove the Array[Byte]-as-a-field smell. The pass also surfaced + fixed two real round-trip bugs.

Wire-format KATs (the drift guard)

New *CodecKatSuite for each module pins the exact ordered field-name list + ADT type discriminator + decode(encode(x)) == x of fixed instances: MerkleCodecKatSuite, MerklePatriciaCodecKatSuite, SparseMerkleCodecKatSuite, JsonLogicValueCodecKatSuite, CommittedRootsCodecKatSuite.

Bugs found + fixed (decoder-only — encoder/hashed bytes unchanged)

The merkle KAT immediately caught two pre-existing encoder/decoder asymmetries:

  • MerkleInclusionProof encoded witness as [{"digest","side"}] (objects) but decoded with circe's default tuple form [[..]] (arrays) — round-trip broken.
  • MerkleTree had the identical mismatch on leafDigestIndex.

Both fixed by aligning the decoder to the object form the encoder emits — decoder-only, so the hashed/persisted bytes are unchanged.

Array[Byte]Hex

SparseMerkleProof.Inclusion.value (a circe-coded proof, the bespoke wire format) and SparseMerkleEntry.Present.value (custom sameElements Eq) → Hex (tessellation): a codec + structural equality + immutability, no hand-rolled array wire format. Wire- and hash-compatible (the value already serialized as Hex.fromBytes(bytes); the in-memory SparseMerkleNode.Leaf keeps raw Array[Byte] — it is never serialized).

Derivation

MerkleCommitment.Leaf/Internal → derevo @derive (byte-identical, KAT-gated). Most other codecs are not cleanly derivable — finding documented: mpt's custom Nibble seq codec is ambiguous with circe's generic encodeSeq; ADT discriminators use a custom {type,contents}; JsonLogicValue is a JSON-identity codec. Those stay hand-rolled, now KAT-guarded.

Audit docs (docs/codec-determinism-audit.md)

  • Determinism: Map/Set ordering does not threaten consensus — hashing routes through JsonBinaryCodec.serialize → RFC 8785 (JCS) which sorts object keys with a fixed UTF-16BE comparator; the only hashed Sets are SortedSet; raw Hash.fromBytes paths hash fixed-order bytes. (The MPT Branch sortBy(...).toMap is dead but harmless.)
  • Array[Byte] value-type audit: the only wire-format offender was the smt one (fixed); the rest (SparseMerkleNode.Leaf.value, sigma PropNode/ProofNode bytes) have no codec — internal/raw, appropriate. ottochain has none.

Tests

sbt clean scalafmtAll scalafmtCheckAll test1225 green (+15 KATs), scalafmt clean.

🤖 Generated with Claude Code

scasplte2 and others added 7 commits June 17, 2026 02:46
…itment

Codec hardening pass (consensus-hashed serializations). The new MerkleCodecKatSuite
pins the EXACT ordered field-name list + ADT discriminator + decode(encode(x))==x for
every merkle codec — a rename/reorder/discriminator drift now fails a test before it can
silently re-hash a Merkle root.

The KAT immediately caught two pre-existing encoder/decoder asymmetries:
- MerkleInclusionProof: `witness` is ENCODED as [{"digest":..,"side":..}] (objects) but was
  DECODED with circe's default tuple form [[..,..]] (arrays) — round-trip broken.
- MerkleTree: identical mismatch on `leafDigestIndex` ({"digest","index"} out, arrays in).
Both fixed by aligning the DECODER to the object form the encoder emits — decoder-only, so
the encoder output (the hashed/persisted bytes) is unchanged (no consensus impact).

MerkleCommitment.Leaf/Internal: hand-rolled Json.obj/downField codecs replaced with derevo
@derive(encoder, decoder) — byte-identical to the prior form (same field names + order),
confirmed by the KAT (key order + round-trip) and the existing determinism / known-answer
suites (node digests unchanged). The {type,contents} ADT discriminators stay hand-rolled
(circe's derived sealed-trait format differs). 53/53 merkle suites green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Investigation (paused codec pass) into whether Map/Set iteration order in
hand-rolled .asJson encoders can leak into a consensus-hashed root (cross-Scala/
JVM-version fork risk). Conclusion: NO risk.

Everything hashed routes through JsonBinaryHasher.computeDigest ->
JsonBinaryCodec.serialize, which dropNulls + RFC 8785 canonicalizes, and the
canonicalizer sorts object keys with a fixed UTF-16BE comparator (not HashMap
order) — so .asJson Map iteration order is irrelevant to the hashed bytes. The
earlier MPT Branch "HashMap-order" concern is a false alarm (the sortBy(...).toMap
dead-sort is harmless; JCS re-sorts at hash time).

Residual classes JCS does not cover were scanned and are safe: the only Sets on a
hashed path are SortedSet (CommitDelta/StateDelta.removes); raw Hash.fromBytes
bypasses hash fixed-order bytes (names, raw values, concatenated roots). Documents
the invariant for future hashed content (route through serialize; SortedSet for
sets; JCS does not sort arrays).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire-format KATs for the consensus-hashed MPT codecs (commitment Leaf/Branch/
Extension + ADT, node Leaf/Branch/Extension + ADT, inclusion proof, trie): pin the
exact ordered field-name list + ADT `type` discriminator + decode(encode(x))==x, so
a rename/reorder/discriminator drift fails here before it can re-hash an MPT root.

No round-trip asymmetry found (unlike the merkle witness/index tuple bug) — MPT uses
explicit symmetric Nibble seq codecs. Finding: the MPT codecs are NOT cleanly
derivable — Leaf/Extension carry `Seq[Nibble]`, and the custom `Nibble.nibbleSeqEncoder`
is ambiguous with circe's generic `encodeSeq` under magnolia derivation (the
hand-rolled encoders pass it explicitly). So they stay hand-rolled, now KAT-guarded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…KATs

Per review: Array[Byte] is a poor case-class field type — no circe codec (forcing a
custom value codec), no structural equals (forcing a sameElements Eq), and mutable.
Replace `value: Array[Byte]` with `value: Hex` (tessellation) on the SERIALIZED /
result types — SparseMerkleProof.Inclusion (the wire proof, the one with a codec)
and SparseMerkleEntry.Present (the verified result). This removes the bespoke
valueEncoder/valueDecoder and the sameElements Eq, and makes the value immutable
with a real codec + structural equality.

Wire- and hash-COMPATIBLE: Inclusion.value already serialized as Hex.fromBytes(bytes),
so a `value: Hex` field emits the identical string; the verifier's value-binding is
unchanged (`Hash.fromBytes(value.toBytes)`); the SMT tree/root is untouched (the
in-memory SparseMerkleNode.Leaf keeps raw Array[Byte] — it is never serialized, so it
is not a wire format). Consumers convert at the boundary with `.toBytes` (verifier,
AuthDbOps smt_verify opcode, OrdinalCatalogProof sub-root decode) and `Hex.fromBytes`
(NodeOps proof construction).

New SparseMerkleCodecKatSuite pins the exact field-name list + ADT discriminator +
round-trip (Eq for the value-bearing proof) for every SMT codec. The Hash-only types
(Sibling, Commitment Leaf/Internal, Root) stay hand-rolled here; they are cleanly
derivable (Hash fields) and can be moved to derevo @derive in a follow-up.

smt + committed + AuthDb suites: 44 green (incl. value-binding/tamper + committed proofs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Audit of Array[Byte] as a case-class field across metakit crypto/committed/json_logic
(and ottochain): the only wire-format offenders were smt Inclusion/Present (fixed ->
Hex); the rest (SparseMerkleNode.Leaf.value, sigma PropNode/ProofNode bytes) have no
circe codec — internal/raw byte representations where Array[Byte] is appropriate.
ottochain has no Array[Byte] case-class fields.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
JsonLogicValueCodecKatSuite pins the JSON-IDENTITY contract of JsonLogicValue (the
JLVM value type / committed JLVM state): each variant encodes to its bare JSON form
(true / 5 / [..] / {..}), NOT a tagged ADT, and decode(encode(x)) === x. Deriving the
sealed trait would tag it and silently change committed state hashes — this fails first.

CommittedRootsCodecKatSuite pins the committed breadcrumb codecs (CommitKey bare-string
newtype; CommittedRoots / CommittedBreadcrumb field-name + round-trip) — the constant-
size on-chain commitment a syncing node trusts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@scasplte2 scasplte2 merged commit a2e40b6 into Constellation-Labs:dev Jun 17, 2026
1 check passed
@scasplte2 scasplte2 deleted the refactor/codec-derive-and-kats branch June 17, 2026 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants