Skip to content

fix(storage,sdk): RGA/CRDT correctness — HLC receive-path, tombstone-aware merge, deterministic id-seed, RekeyTarget compile-guard (D1–D5)#2950

Open
rtb-12 wants to merge 20 commits into
masterfrom
fix/rga-crdt-correctness
Open

fix(storage,sdk): RGA/CRDT correctness — HLC receive-path, tombstone-aware merge, deterministic id-seed, RekeyTarget compile-guard (D1–D5)#2950
rtb-12 wants to merge 20 commits into
masterfrom
fix/rga-crdt-correctness

Conversation

@rtb-12

@rtb-12 rtb-12 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Five correctness fixes from the core RGA/CRDT audit. D1–D4 fix concrete convergence and data-loss defects in the storage layer; D5 turns the underlying footgun — a nested collection silently receiving a per-replica-random id — into a compile error so it cannot recur.

Each section below states the bug, the fix, and the regression tests. All work is on this branch and verified green (see Validation).

Defect Severity Fix location
D1 Local HLC not advanced on the receive path High collections/root.rs
D2 RGA root-level merge resurrects deleted chars High collections/rga.rs, crdt_impls.rs, unordered_map.rs
D3 sync_compliance test over-stated its coverage Low node/tests/sync_compliance/crdt_merge.rs
D4 Executor-id → HLC seed collides on shared prefixes High logical_clock.rs, env.rs
D5 Mergeable can nest a collection without re-keying it High collections/crdt_meta.rs, sdk/macros/src/state.rs

D1 — Advance the local HLC on the receive path

Bug. env::update_hlc had no callers on the apply path: a node observed a remote delta's HLC only as a u64 copy in metadata, never feeding it back into its own clock. Since RGA stamps CharIds from the local clock and orders siblings by Reverse(CharId), a node could stamp its next local insert with a timestamp ≤ the remote char it causally follows — convergent, but to a causally-wrong interleave.

Fix. Root::sync now feeds a CausalActions delta's delta_hlc into env::update_hlc before applying its actions. A drift-rejected timestamp (>5s ahead) is logged and skipped, not fatal — the actions remain valid replicated state.

Tests (src/tests/rga.rs): the next local timestamp sorts strictly after an observed remote delta; an end-to-end insert-after-observe renders causally as "AB"; a drift-rejected (>5s) HLC still applies without advancing the clock.


D2 — Make the RGA blob merge tombstone-aware

Bug. The root-level RGA merge (cold-join / full-state conflict path) re-inserted any char present in other but absent from self, gated only on is_none(). A char self had concurrently deleted is absent from its live set, so the merge resurrected it — dropping the tombstone and diverging the chars-map hash from peers that saw the delete.

Fix. Before re-inserting a char absent from self, consult its tombstone via Index::is_deleted; a tombstone means delete wins. Logic extracted to a generic merge_chars_from for cross-store unit testing. This path is add-only (delete propagation in the reverse direction travels via the DeleteRef delta-sync path, as before).

Tests (collections/rga.rs): non-resurrection of a concurrently-deleted char across two isolated stores (asserted at the index level, since get_text masks tombstones); genuinely-new chars are still added; the reverse direction (other-deleted / self-live) is characterized.


D3 — Honest sync_compliance scope + a real interleave test

Bug. crdt_merge.rs::test_i5_rga_interleave_merge inserted opaque non-RGA blobs and asserted only crdt_type == Rga, yet a comment claimed the merge was "verified by the actual sync execution."

Fix. Documented the harness's true scope (metadata + protocol selection, not value-level merge), renamed the test to match what it asserts, and added a real interleave test in calimero-storage: two replicas concurrently insert distinct runs at the same gap, exchange deltas both ways, and converge to identical text/hash with each run kept contiguous.


D4 — Collision-resistant executor-id → HLC seed

Bug. The WASM HLC seeding path derived the 16-byte instance id from only the first 16 bytes of the 32-byte executor key. Two executors sharing a 16-byte prefix collapsed to one HLC id, so two concurrently-minted CharIds could collide and silently drop one replica's character during RGA sync.

Fix. Extracted a pure, unit-testable hlc_seed_from_executor_id that takes the first 16 bytes of SHA-256(executor_id) (sha2 is already a crate dependency). SHA-256 is collision-resistant over the full key; a naive XOR-fold was rejected because it collapses any [k; 32] key to an all-zero seed.

Tests (logical_clock.rs): distinct executor ids — including the adversarial repeated-byte and shared-prefix families — yield distinct seeds and distinct HLC ids; a test documents why the XOR-fold alternative is unsafe.


D5 — RekeyTarget as a supertrait of Mergeable (compile-guard)

Bug (the footgun). Collections created on demand get a random internal id. The sync model converges collections by matching entity ids, so two nodes that independently first-create the same logical nested CRDT mint different ids; their children never merge — a permanent divergence with no runtime error (the #2577 data loss). The deterministic re-key machinery (RekeyTarget) fixes it, but a hand-written Mergeable impl could simply forget to register its re-key.

Fix. Make RekeyTarget a supertrait of Mergeable, so a missing re-key impl is a compile error rather than silent divergence.

  • #[derive(Mergeable)] already emits the matching RekeyTarget; derive sites are unaffected.
  • #[app::state] now generates a RekeyTarget impl mirroring the derive (per-field re-key cascade + nested-value-type registration).
  • Added the hand-written RekeyTarget impls the new bound surfaces across the workspace (collection wrappers, test fixtures, app crates with hand-rolled merges). Leaf records take a no-op via the new impl_atomic_lww! helper, which emits both the LWW Mergeable and the matching no-op RekeyTarget in one line.
  • Net: only a hand-written impl Mergeable that omits RekeyTarget fails to compile — the intended guard.

Tests. A compile_fail case proves a hand-written Mergeable without RekeyTarget is rejected (E0277); a feature-insensitive compile guard pins the supertrait bound in every feature set; and a dedicated #[app::state] two-replica convergence test proves the macro-generated re-key cascade converges a nested collection at runtime (not just compiles).


Validation

All green on this branch (after merging master):

  • cargo fmt --check — clean.
  • cargo build --workspace --tests — builds.
  • cargo test -p calimero-storage --features testing — 640 lib unit tests + all integration binaries (incl. rga, rekey_record, converge, merge_registry_*, compile_fail).
  • cargo test -p calimero-node --test sync_sim — incl. the D3 sync_compliance suite.
  • cargo test -p calimero-runtime (incl. crdt_conformance), cargo test -p calimero-sdk, cargo test -p calimero-sdk-macros.
  • cargo clippy --workspace --all-targets --features calimero-storage/testing -- -D warnings — clean.

Review follow-up

Addressed human and automated-review feedback: corrected doc/comment accuracy (notably the D4 history — the fix is SHA-256, not a fold), removed transient defect labels and over-long rationale from source comments, and added the test coverage noted above (the #[app::state] cascade convergence test, the supertrait compile guard, and the D1 drift-rejection and D2 reverse-direction cases).

Follow-ups (deferred)

  • Tombstone GC — unbounded growth of deleted_children / per-entity tombstones is a design change, not a correctness defect, and needs a causal-stability-based prune. Tracked separately.
  • Map/Set root-level merge still resolves add-wins (does not consult deleted_at); delete-wins for those collections is a separate concern from this RGA-scoped fix and is tracked separately.

@meroreviewer meroreviewer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 MeroReviewer

Reviewed by 1 agents | Quality score: 30% | Review time: 90.5s


🟡 Warning (2)

1. XOR-fold seed can produce all-zero output for certain key pairs

File: crates/storage/src/logical_clock.rs (line 237-265) | Consensus: 1/1 agents ✓

The XOR-fold in hlc_seed_from_executor_id produces an all-zero seed when the low 16 bytes and high 16 bytes of the executor key are identical (e.g., [0xAB; 32]). The comment says "the constructor's zero→1 guard still covers the all-zero input", but this relies on an undocumented invariant in LogicalClock::new that zero seeds are promoted to 1. If that guard is ever removed or the seed is used in another context, two executors with key[i] == key[i+16] for all i would silently share an HLC id. A simple mitigation is to XOR in a non-zero constant (e.g., seed[0] ^= 0x01 after the fold) to guarantee the seed is never all-zero regardless of the constructor's behavior.

Suggested fix:

After the fold loop, add: `if seed == [0u8; 16] { seed[0] = 1; }` — or document the dependency on the constructor guard explicitly with a `// SAFETY:` comment referencing the specific guard in `LogicalClock::new`.

Found by: security-reviewer

2. HLC update on receive path silently skips drift-rejected timestamps without advancing the clock

File: crates/storage/src/collections/root.rs (line 207-229) | Consensus: 1/1 agents ✓

When a remote delta_hlc is rejected by the drift guard (>5s ahead), the code logs a warning and continues applying the delta's actions without advancing the local clock. This is correct for safety, but the comment says "applying without advancing the local clock" — the actions are still applied. This means a node can apply a causally-ordered delta (with a far-future HLC) without its clock observing that timestamp, so subsequent local inserts may still get timestamps that sort before the remote chars. The PR description says drift-rejected timestamps are "not fatal", but the causal ordering guarantee (the whole point of D1) is silently broken for those deltas. This should at minimum be documented as a known limitation, or the drift tolerance should be widened/the behavior reconsidered.

Suggested fix:

Add a comment explicitly stating that causal ordering is not guaranteed for drift-rejected deltas, and consider whether the drift tolerance (5s) is appropriate for the use case. Alternatively, clamp the remote HLC to `local + drift_tolerance` and advance to that instead of skipping entirely.

Found by: security-reviewer

💡 Suggestion (1)

1. Interleave convergence test asserts only two possible orderings but RGA tie-breaking may produce others

File: crates/storage/src/tests/rga.rs (line 963-1249) | Consensus: 1/1 agents ✓

In test_rga_real_interleave_merge_converges, the final assertion checks xy_text == "aXXYYb" || xy_text == "aYYXXb". This is correct for a well-behaved RGA (each run stays contiguous between anchors), but the assertion comment says "higher-CharId run sorts first". If the RGA's Reverse(CharId) ordering is deterministic, only ONE of the two orderings should be possible for a given pair of executor ids ([1;32] vs [2;32]). The disjunction hides whether the ordering is actually deterministic — a future regression that produces non-contiguous interleaving (e.g. "aXYXYb") would still pass if it happens to equal one of the two strings. Consider asserting the exact expected string based on which executor id produces a higher CharId.

Suggested fix:

Determine which executor (`[1;32]` or `[2;32]`) produces a higher HLC id after `hlc_seed_from_executor_id`, and assert the single expected ordering rather than a disjunction. This makes the test a true regression guard for the interleave order.

Found by: security-reviewer


🤖 Generated by MeroReviewer | Review ID: review-8f6d3682

@meroreviewer

meroreviewer Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Documentation Review

The following documentation may need updates based on the changes in this PR:

  • 🟡 architecture/crates/node.html: Files matching crates/node/** were changed but architecture/crates/node.html was not updated (per source_to_docs_mapping).
  • 🟡 architecture/crates/sdk.html: Files matching crates/sdk/** were changed but architecture/crates/sdk.html was not updated (per source_to_docs_mapping).
  • 🟡 architecture/crates/sync.html: Files matching crates/node/** were changed but architecture/crates/sync.html was not updated (per source_to_docs_mapping).
  • 🟡 architecture/migrations.html: Files matching crates/storage/** were changed but architecture/migrations.html was not updated (per source_to_docs_mapping).
  • 🟡 architecture/storage-schema.html: Files matching crates/storage/** were changed but architecture/storage-schema.html was not updated (per source_to_docs_mapping).

Comment thread apps/scaffolding-e2e/src/lib.rs Outdated

@meroreviewer meroreviewer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 51% | Review time: 267.4s

🟡 5 warnings, 💡 1 suggestions. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-75342916

Comment thread crates/storage/src/logical_clock.rs Outdated
Comment thread crates/storage/src/logical_clock.rs Outdated
Comment thread crates/storage/src/collections/rga.rs
Comment thread crates/storage/src/collections/rga.rs
Comment thread crates/storage/src/tests/rga.rs
Comment thread crates/storage/src/tests/rga.rs
rtb-12 added a commit that referenced this pull request Jun 25, 2026
…2950)

The maintainer flagged that hand-writing `RekeyTarget` for every app
(`apps/scaffolding-e2e/src/lib.rs:62`, etc.) "is a bit insane — implement
this directly into the storage crate." D5 had added hand-written
`RekeyTarget` impls to five app structs because each had a hand-written
`impl Mergeable` (and `RekeyTarget` is now its supertrait).

Mechanism (storage-crate-provided, ZERO app `RekeyTarget` code):

1. Field-by-field merges → `#[derive(Mergeable)]`. The derive already
   emits BOTH the field-by-field `Mergeable` AND the matching
   `RekeyTarget` (re-keying each nested collection field, no-op for
   leaves). Converting these removes the hand-written `Mergeable` too:
   - `NestedMap` (scaffolding-e2e, kv-store-with-user-and-frozen-storage)
   - `MergeableRecord` (abi_conformance)
   - `Profile` (state-schema-conformance)
   Each was exactly `self.field.merge(&other.field)` per field — what the
   derive generates — so behavior is identical.

2. Whole-record atomic LWW → new `calimero_storage::impl_atomic_lww!`.
   `FileRecord` (blobs, scaffolding-e2e) is an immutable upload record of
   plain `String`/`u64`/`[u8;32]` fields, replaced atomically by
   `uploaded_at`. It CANNOT derive Mergeable (the forbidden-type lint
   rejects bare primitives — correctly). Its `RekeyTarget` was pure noise
   (a leaf with no nested id). The new macro lives in the storage crate
   and emits BOTH the LWW `Mergeable` and the matching no-op `RekeyTarget`
   in one line, so the app writes no `RekeyTarget`.

Net: every app now reaches `Mergeable` through a storage-crate-provided
path (`#[derive(Mergeable)]` or `impl_atomic_lww!`) and writes ZERO
`RekeyTarget`. The supertrait footgun guard is intact: a HAND-WRITTEN
`impl Mergeable` on a collection-nesting struct that forgets re-keying
still fails to compile (the compile-fail test is unchanged).

`apps/team-metrics-custom` keeps its hand-written Mergeable+RekeyTarget
deliberately — it is a teaching example (paired with team-metrics-macro's
derive path) and is not part of this PR.

Macro unit-tested for LWW-by-tie-field semantics (idempotent,
order-independent) and the no-op re-key.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cursor

cursor Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Bugbot is paused — on-demand spend limit reached

Bugbot uses usage-based billing for this team and has hit its on-demand spend limit.

A team admin can raise the spend limit in the Cursor dashboard, or wait for the next billing cycle to continue.

@rtb-12

rtb-12 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Review follow-up (D1–D5) — pushed f2fb1866..5c9b1282

Addressed the human + AI review items. Per-item disposition:

1. Remove per-app RekeyTarget boilerplate (@chefsale, scaffolding-e2e/src/lib.rs:62)

Reworked so no app writes any RekeyTarget code — every app now reaches Mergeable through a storage-crate-provided path:

  • Field-by-field merges → #[derive(Mergeable)]. The derive already emits both the field-by-field Mergeable and the matching RekeyTarget. The four app structs whose hand-written merge was exactly self.field.merge(&other.field) per field now just derive — removing both the hand-written Mergeable and the RekeyTarget:

    • NestedMap (scaffolding-e2e, kv-store-with-user-and-frozen-storage)
    • MergeableRecord (abi_conformance)
    • Profile (state-schema-conformance)
  • Whole-record atomic LWW → new calimero_storage::impl_atomic_lww!(Type, tie_field). FileRecord (blobs, scaffolding-e2e) is an immutable upload record of plain String/u64/[u8;32] fields, replaced atomically by uploaded_at. It genuinely cannot #[derive(Mergeable)] — the forbidden-type lint correctly rejects bare primitives — and its RekeyTarget was pure noise (a leaf with no nested id). The new macro lives in the storage crate (crates/storage/src/collections/crdt_meta.rs) and emits both the LWW Mergeable and the matching no-op RekeyTarget in one line.

Result: app authors using the normal path write zero RekeyTarget. The supertrait footgun guard is intact — a hand-written impl Mergeable on a collection-nesting struct that forgets re-keying still fails to compile (tests/compile_fail/mergeable_without_rekeytarget.rs unchanged and passing).

apps/team-metrics-custom keeps its hand-written Mergeable+RekeyTarget deliberately — it's a teaching example paired with team-metrics-macro's derive path, and is not part of this PR.

2. D4 — XOR-fold HLC seed (logical_clock.rs)

Fixed. The XOR-fold (seed[i % 16] ^= key[i]) collapsed any key[i] == key[i+16] key — e.g. [k; 32] for any byte k — to an all-zero seed, defeating the zero→1 guard for a whole family of keys. Replaced with SHA-256 over the full 32-byte id, taking the first 16 bytes (sha2 already used by compute_id/index hashing). Collision-resistant, no structural all-zero input. Tests now demonstrate the [k;32]→0 collapse of the old fold and assert distinct seeds/HLC-ids across an adversarial key set ([1;32], [2;32], [255;32], shared-prefix, all-zero).

3. D2 — tombstone check in merge_chars_fromverified correct, no logic change

The entry_id/S-vs-S2 usage is correct: self.chars.entry_id(&key) derives compute_id(self.chars.id(), key.bytes) — id taken from self's own map, so the lookup addresses self's tombstone (the "did self delete this char?" question); S2 is used only to read other.chars.entries(). Index::<S>::is_deleted reads self's store at that id. key.bytes is borsh(CharId), map-id-independent, so the same CharId resolves to the same entity under a shared map — and two replicas converge an RGA only when their chars maps share an id (different ids ⇒ unrelated entities that never converge anyway, so there's no reachable "wrong tombstone" case). Expanded the in-code comment and added a mutation-checked test (merge_distinguishes_tombstoned_from_unseen_in_one_pass) across two isolated MockedStorage scopes. Bypassing the guard fails it (and the existing blob_merge_does_not_resurrect_*), asserting at the index/wire level since get_text filters tombstoned ids.

4. D1 — receive-path advance test was trivially passing

Fixed. It read hlc_timestamp() and used A's real-clock HLC, so B's own wall clock was already ahead — verified it passed even with Root::sync's update_hlc removed. Now: A's delta_hlc is set ~2s ahead of the wall clock (within drift tolerance), and the assertion is on the HLC B minted for its insert (from B's captured insert delta), proving the mint moved past A's observed HLC. Mutation-checked: with the receive-path advance disabled, B's minted CharId sorts before A's HLC and the test fails.

5. Interleave test — dead || branch

Fixed. Pinned the timestamps (base "ab" low tick, X's "XX" higher, Y's "YY" highest) via insert_str_at_timestamp, so the merged order is a pure function of the CharIds rather than wall-clock spacing. Assert the exact aYYXXb (Y's higher CharId sorts first under Reverse(CharId)). The commutativity assertion is unchanged.

Validation (all green, default features = what CI runs)

cargo build --workspace --tests; cargo test -p calimero-storage (629 unit + integration); cargo test -p calimero-node --test sync_sim (314, incl. D3 sync_compliance); cargo test -p calimero-sdk-macros (80); cargo test -p calimero-sdk/-p calimero-runtime; cargo clippy --workspace --all-targets -- -D warnings; cargo fmt --check. All five converted apps build to wasm32-unknown-unknown.

Note on the failing Rust CI job (not this PR)

The Rust job fails on crates/store/blobs/src/utils.rs (path-validation) and crates/dag/.../test_bug_missing_handler_broadcast. This PR touches neither crate (empty diff vs master for both); their last commits predate this branch. The blobstore tests pass locally here (CI-filesystem-specific), and the dag test is a known-bug divergence test. Not chasing these.

Note: the mergeable_without_rekeytarget trybuild snapshot only matches under default features (what CI runs). Under --features testing the truncated "other types implement RekeyTarget" help list shifts (a testing-only EmptyData fixture enters the alphabetical slice) — the actual BadStruct: RekeyTarget is not satisfied error is identical. This is pre-existing (present at the D5 commit) and documented in the test's own comment.

@meroreviewer meroreviewer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 59% | Review time: 297.6s

🟡 2 warnings. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-dfbfcd26

Comment thread crates/storage/src/tests/rga.rs
Comment thread crates/storage/src/tests/rga.rs

@meroreviewer meroreviewer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 55% | Review time: 332.0s

🟡 1 warnings. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-c5bef692

Comment thread crates/storage/src/tests/rga.rs
rtb-12 added a commit that referenced this pull request Jun 26, 2026
…2950)

The `mergeable_without_rekeytarget` trybuild case captured rustc's
"the following other types implement trait `RekeyTarget`" help block,
whose first-8 (alphabetical, then `and $N others`) membership shifts with
the active feature set: `calimero-storage/testing` pulls
`tests::common::EmptyData` into scope, which sorts into the window and
displaces `PermissionedStorage<T, A>`.

CI runs `cargo test` over the whole workspace, where feature unification
(`calimero-dag`/`calimero-node` enable `calimero-storage/testing`) turns
`testing` ON for this crate's own test binary too — so the previously
blessed default-feature snapshot mismatched ("1 of 7 tests failed").
trybuild has no partial-line wildcard, so a single literal snapshot can't
satisfy both feature sets.

Fix: bless the snapshot for the `testing`-ON output (CI's reality) and gate
the case behind `#[cfg(feature = "testing")]` so it only runs against the
snapshot it was blessed for. The other six compile-fail cases are
feature-stable and keep running under every feature set. Verified under
`cargo test -p calimero-storage --test compile_fail` (default),
`--features testing`, and `cargo test --workspace --test compile_fail`
(CI-faithful, testing unified ON).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rtb-12 and others added 12 commits June 26, 2026 07:19
The WASM HLC seeding path seeded the 16-byte instance id from only the
first 16 bytes of the 32-byte executor public key (`take(16)`), with a
dead `for i in 32..16` fold loop that never ran. Two executors sharing a
16-byte key prefix collapsed to one HLC id, so two concurrently-minted
CharIds could collide and silently drop one replica's character during
RGA sync.

Extract a pure, unit-testable `hlc_seed_from_executor_id` that XOR-folds
all 32 bytes into the 16-byte seed and call it from the WASM seeding
path. Adds a reproducer documenting the old shared-prefix collision and
tests proving the fix yields distinct seeds and distinct HLC ids.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The root-level RGA `Mergeable::merge` (the cold-join / full-state conflict
path, `merge_root_state_typed` → `RGA::merge`) re-inserted any char present
in `other` but absent from `self`, gated only on `is_none()`. A char that
`self` had concurrently deleted is absent from its live set (tombstoned), so
the merge re-added it: it un-tombstoned the char entity, dropped it from the
parent's `deleted_children` wire advertisement, and re-listed it as a live
child — diverging the chars-map Merkle hash from peers that saw the delete.
The visible `get_text` masks this (the map iterator silently drops the
tombstoned id via `find_by_id`), but the lost tombstone + hash divergence is
the real damage on the sync wire.

Fix: before re-inserting a char absent from `self`, consult its entity
tombstone via `Index::is_deleted`. A tombstone means delete wins (RGA chars
are immutable, so the only conflict is presence-vs-tombstone — no
update-vs-delete race), so the char is not resurrected. The element-level
DeleteRef LWW path is unchanged and still carries the tombstone for
convergence.

Convergence stays intact: the merge is still commutative/associative/
idempotent over the (live-set, tombstone-set) lattice. Logic extracted to a
generic `merge_chars_from<S2>` so it is unit-testable across two isolated
`MockedStorage` scopes (two replicas with shared CharIds) — the reproducer
fails without the fix (asserts the wire tombstone + hash) and passes with it.
Adds an `entry_id` accessor on `UnorderedMap`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`env::update_hlc` / `LogicalClock::update` had ZERO callers on the
receive/apply path: a node observed a remote delta's HLC only as a u64
`updated_at` nanos copy in metadata, never feeding the full HLC back into its
own clock. Because RGA stamps `CharId`s from the local clock and orders
siblings by `Reverse(CharId)` (high bits = wall-clock time), a node that
observed a remote insert could then stamp its NEXT local insert with a
timestamp ≤ the remote char it causally follows — converging, but with a
causally-wrong interleave.

Fix: `Root::sync` now feeds a `CausalActions` delta's `delta_hlc` into
`env::update_hlc` BEFORE applying its actions, so the local clock's
happens-before guarantee holds across the receive boundary. A drift-rejected
remote timestamp (>5s ahead) is logged and skipped, not fatal — the actions
are still valid replicated state. `Actions` (host-side replay) carries no
delta HLC, so there is nothing to observe.

Reproducers:
- `test_sync_advances_local_hlc_to_observe_remote_delta`: applying a
  CausalActions delta with a high delta_hlc must make the next local
  timestamp sort strictly after it (fails without the fix).
- `test_rga_insert_after_observing_remote_is_causally_ordered`: B observes
  A's char (carrying A's HLC) with its own clock behind, inserts locally, and
  its CharId + rendered order are causal ("AB").

The clock's `update` rule itself was already correct (b9ffc90); this wires
it into the path that was missing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… scope (D3)

`sync_compliance/crdt_merge.rs::test_i5_rga_interleave_merge` inserted opaque
non-RGA blobs, forced HashComparison (which runs NO merge in the sync_sim
harness), and asserted only `crdt_type == Rga` — yet a comment claimed the
merge was "verified by the actual sync execution".

- Document the harness's true scope (metadata + protocol selection, not
  value-level merge) in the module header and correct the misleading comments.
- Rename `test_i5_rga_interleave_merge` →
  `test_i5_rga_crdt_type_and_protocol_selection` to reflect what it asserts.
- Add `test_rga_real_interleave_merge_converges` in calimero-storage: two
  replicas concurrently insert distinct runs at the same gap, exchange deltas
  in both orders, and must converge to identical text/hash with each run kept
  contiguous between the base anchors.

Also tightens the D1/D2 doc comments per review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… nested-collection re-key (D5)

Make `RekeyTarget` a supertrait of `Mergeable`
(`crates/storage/src/collections/crdt_meta.rs:88`) so the
nested-collection footgun behind #2577 (the data loss that removed RGA
from mero-drive) becomes a COMPILE error instead of silent divergence:
a struct that nests a CRDT collection but never registers a
deterministic-id re-key thunk would otherwise mint a per-replica
`Id::random()` for that nested collection, so its children land under
different parents and never converge — with no runtime error.

`#[derive(Mergeable)]` already emits the matching `RekeyTarget` impl
(PRs #2577/#2581), so derive sites are unaffected. The missing keystone
was the `#[app::state]` macro: it generates a `Mergeable` impl but not a
`RekeyTarget` one, so every `#[app::state]` struct in the workspace
failed the new supertrait bound. Generate the `RekeyTarget` impl there
too (`crates/sdk/macros/src/state.rs`), mirroring the derive: a per-field
`rekey_relative_to` cascade (`rekey_field_if_supported!` under a
field-namespaced child id) and a `register_nested_value_types` reusing
the existing `rekey_register_calls` scan. State enums get no impl (they
get no `Mergeable` impl either). This keeps all `#[app::state]` and
`#[derive(Mergeable)]` sites compiling automatically; only a HAND-WRITTEN
`impl Mergeable` that omits `RekeyTarget` now fails to compile — the
intended guard.

Add the matching hand-written `RekeyTarget` impls the supertrait surfaces
across the workspace: real collection types (the prior D5 work in
access_control/authored_map/authored_vector/crdt_impls/frozen/
frozen_value/permissioned/shared/user/merge), in-crate test fixtures
(tests/common.rs, src/tests/merge_integration.rs's 24 hand-written
fixtures, tests/converge.rs, tests/rekey_record.rs,
tests/merge_registry_*), and app crates with hand-rolled merges
(abi_conformance, blobs, scaffolding-e2e, state-schema-conformance,
kv-store-with-user-and-frozen-storage). Leaf structs (no nested
collection — e.g. blobs::FileRecord, atomic LWW records) take the no-op
default; nesting structs re-key each collection field.

In `tests/rekey_record.rs`, `UnfixedStats` (the pre-fix data-loss
fixture) now must impl `RekeyTarget` to be `Mergeable` at all — "Mergeable
without RekeyTarget" is no longer expressible, which is the point of D5.
The runtime data-loss case is preserved by NOT registering its thunk
(dispatch is by registry lookup, not the trait bound), so
`unregistered_value_loses_data_pre_fix` still demonstrates the LWW loss.

Regenerate two `#[app::state]` compile-fail snapshots
(`crates/sdk/tests/macros/{invalid_nested_generics,error_bad_authorizer}.stderr`)
whose diagnostics now additionally mention the `RekeyTarget` supertrait
chain — expected and additive for those already-invalid inputs.

Design note (enum/generic Mergeable): `#[derive(Mergeable)]` already
rejects enums/unions; `#[app::state]` enums emit no Mergeable/RekeyTarget;
generic `#[app::state]<T>` was already rejected ("generic types are not
supported") and now also surfaces the `T: RekeyTarget`/`'static`
supertrait requirement — no new generic Mergeable surface to handle.

Validation: `cargo build --workspace --tests`, `cargo test -p
calimero-storage` (626+ unit, all integration), node `sync_sim` (314),
`calimero-runtime`, `calimero-sdk`, `calimero-sdk-macros`, and `cargo
clippy --workspace --all-targets --features calimero-storage/testing -- -D
warnings` all green; `cargo fmt --check` clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…keyTarget (D5)

Add a trybuild compile-fail case proving the #D5 supertrait guard fires:
a struct with a HAND-WRITTEN `impl Mergeable` that nests a `Counter` but
does NOT impl `RekeyTarget` must fail to compile (E0277, "the trait bound
`BadStruct: RekeyTarget` is not satisfied", required by the `Mergeable`
supertrait bound) instead of silently last-writer-wins'ing the nested
collection (the #2577 data loss). Follows the existing
`tests/compile_fail/*.rs` + `.stderr` pattern.

The captured `.stderr` is for the default feature set (what CI's `cargo
test` runs); the truncated "other types implement RekeyTarget" help list
varies with features, noted in the test header.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The D4 fix XOR-folded the high 16 bytes of the 32-byte executor id onto
the low 16 (`seed[i % 16] ^= key[i]`). Any key with `key[i] ==
key[i + 16]` for all i — most starkly `[k; 32]` for any byte k —
XOR-cancels to an ALL-ZERO seed, so every such executor collapses to the
same (zero→1-guarded) HLC id. Two distinct executors then share one HLC
id space, two concurrently-minted CharIds collide, and one replica's
character is silently lost during RGA sync. The zero→1 guard only covered
a pre-fold all-zero input, not this fold-induced family.

Replace the fold with SHA-256 over the full 32-byte executor id, taking
the first 16 bytes of the digest as the seed (sha2 is already a crate
dep, used by compute_id and the index hashing). This is
collision-resistant — distinct keys yield distinct digests — and has no
structural input that collapses to all-zero (an all-zero 16-byte prefix
would need an infeasible preimage). The constructor's zero→1 guard stays
as a final safety net.

Tests: replace the stale `take(16)` reproduction with one that
demonstrates the XOR-fold's `[k; 32]`→all-zero collapse AND the resulting
collision between distinct repeated keys; assert SHA-256 seeds (and
production HLC ids) are distinct across an adversarial key set including
the `[k; 32]` family, shared-prefix keys, and all-zero; assert no
repeated-byte key seeds to all-zero.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reviewer raised whether `merge_chars_from`'s tombstone check addresses
the right entity: `self.chars.entry_id(&key)` then
`Index::<S>::is_deleted`, given `self` (adaptor `S`) and `other`
(adaptor `S2`) may have different `chars` map ids.

Verified correct, no code-logic change:
- `entry_id` derives `compute_id(self.chars.id(), key.bytes)` — the id is
  taken from `self`'s OWN map, so we address the tombstone in `self`'s
  store: exactly the "did self delete this char?" question. `S2` is used
  only to read `other.chars.entries()`; it never enters the lookup.
- `Index::<S>::is_deleted` reads `self`'s store at that id — same map,
  same store.
- `key.bytes` is `borsh(CharId)`, independent of any map id, so the same
  CharId derives the same entity id under a given map. Two replicas
  converge an RGA only when their chars maps share an id; under that
  shared id a char tombstoned on self resolves to the same entity other
  holds live. Different ids ⇒ unrelated entities that never converge
  anyway, so the real sync path has no "wrong tombstone" case.

Expanded the in-code comment to spell this out, and added a
mutation-checked test (`merge_distinguishes_tombstoned_from_unseen_in_one_pass`)
that, in ONE merge across two isolated MockedStorage scopes, asserts a
tombstoned char stays deleted (index-level: `is_deleted` +
`deleted_children`) while a never-seen char is still added (add-wins) —
so the guard can't be satisfied by suppressing all inserts. Bypassing the
guard fails this test (confirmed), as it does the existing
`blob_merge_does_not_resurrect_*`. Asserts at the index/wire level, not
`get_text`, since the map iterator filters tombstoned ids and would mask a
resurrection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…al (D1)

`test_rga_insert_after_observing_remote_is_causally_ordered` asserted on
a `hlc_timestamp()` READ and used A's real-clock HLC, so B's own
wall-clock mint was naturally already ahead of A's HLC — the test passed
even with the receive-path advance removed (verified: bypassing
`Root::sync`'s `update_hlc` did NOT fail the old assertion).

Strengthen on both axes:
- Set A's advertised `delta_hlc` ~2s AHEAD of the real wall clock (within
  the 5s drift tolerance, so `update_hlc` accepts it). B mints from the
  real clock, so it is naturally BEHIND A's HLC; B's mint can only end up
  after A's HLC if the receive path actually advanced B's clock.
- Assert on the timestamp B MINTED for its insert (read from B's captured
  insert delta's `hlc`), not a `hlc_timestamp()` read — proving the mint
  itself moved past A's observed HLC.

Mutation-checked: with `Root::sync`'s receive-path `update_hlc` disabled,
B's minted CharId now sorts BEFORE A's HLC and the test fails; with it
enabled the mint sorts strictly after and the rendered order stays "AB".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`test_rga_real_interleave_merge_converges` asserted
`xy_text == "aXXYYb" || "aYYXXb"`. With the real node-local HLC the order
is decided by which replica's wall clock ticks later — incidentally
stable but timing-dependent, and one `||` branch never runs.

Pin the timestamps so the order is a pure function of the CharIds, as the
reviewer asked: base "ab" at a low tick, X's "XX" at a higher tick, Y's
"YY" at a strictly-higher tick still (`insert_str_at_timestamp`). Both
inserts sit above the base tick so they land in the a|b gap (not after the
higher-CharId 'b'); Y's CharIds are strictly higher than X's, so under
`Reverse(CharId)` Y's block sorts first. Assert the exact "aYYXXb".

The commutativity assertion (X-then-Y vs Y-then-X converge to identical
text+hash) is unchanged and still the primary property. Verified
deterministic across repeated runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…2950)

The maintainer flagged that hand-writing `RekeyTarget` for every app
(`apps/scaffolding-e2e/src/lib.rs:62`, etc.) "is a bit insane — implement
this directly into the storage crate." D5 had added hand-written
`RekeyTarget` impls to five app structs because each had a hand-written
`impl Mergeable` (and `RekeyTarget` is now its supertrait).

Mechanism (storage-crate-provided, ZERO app `RekeyTarget` code):

1. Field-by-field merges → `#[derive(Mergeable)]`. The derive already
   emits BOTH the field-by-field `Mergeable` AND the matching
   `RekeyTarget` (re-keying each nested collection field, no-op for
   leaves). Converting these removes the hand-written `Mergeable` too:
   - `NestedMap` (scaffolding-e2e, kv-store-with-user-and-frozen-storage)
   - `MergeableRecord` (abi_conformance)
   - `Profile` (state-schema-conformance)
   Each was exactly `self.field.merge(&other.field)` per field — what the
   derive generates — so behavior is identical.

2. Whole-record atomic LWW → new `calimero_storage::impl_atomic_lww!`.
   `FileRecord` (blobs, scaffolding-e2e) is an immutable upload record of
   plain `String`/`u64`/`[u8;32]` fields, replaced atomically by
   `uploaded_at`. It CANNOT derive Mergeable (the forbidden-type lint
   rejects bare primitives — correctly). Its `RekeyTarget` was pure noise
   (a leaf with no nested id). The new macro lives in the storage crate
   and emits BOTH the LWW `Mergeable` and the matching no-op `RekeyTarget`
   in one line, so the app writes no `RekeyTarget`.

Net: every app now reaches `Mergeable` through a storage-crate-provided
path (`#[derive(Mergeable)]` or `impl_atomic_lww!`) and writes ZERO
`RekeyTarget`. The supertrait footgun guard is intact: a HAND-WRITTEN
`impl Mergeable` on a collection-nesting struct that forgets re-keying
still fails to compile (the compile-fail test is unchanged).

`apps/team-metrics-custom` keeps its hand-written Mergeable+RekeyTarget
deliberately — it is a teaching example (paired with team-metrics-macro's
derive path) and is not part of this PR.

Macro unit-tested for LWW-by-tie-field semantics (idempotent,
order-independent) and the no-op re-key.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…2950)

The `mergeable_without_rekeytarget` trybuild case captured rustc's
"the following other types implement trait `RekeyTarget`" help block,
whose first-8 (alphabetical, then `and $N others`) membership shifts with
the active feature set: `calimero-storage/testing` pulls
`tests::common::EmptyData` into scope, which sorts into the window and
displaces `PermissionedStorage<T, A>`.

CI runs `cargo test` over the whole workspace, where feature unification
(`calimero-dag`/`calimero-node` enable `calimero-storage/testing`) turns
`testing` ON for this crate's own test binary too — so the previously
blessed default-feature snapshot mismatched ("1 of 7 tests failed").
trybuild has no partial-line wildcard, so a single literal snapshot can't
satisfy both feature sets.

Fix: bless the snapshot for the `testing`-ON output (CI's reality) and gate
the case behind `#[cfg(feature = "testing")]` so it only runs against the
snapshot it was blessed for. The other six compile-fail cases are
feature-stable and keep running under every feature set. Verified under
`cargo test -p calimero-storage --test compile_fail` (default),
`--features testing`, and `cargo test --workspace --test compile_fail`
(CI-faithful, testing unified ON).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rtb-12 rtb-12 force-pushed the fix/rga-crdt-correctness branch from b7c83bf to 26b292e Compare June 26, 2026 01:50
@cursor

cursor Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Bugbot is paused — on-demand spend limit reached

Bugbot uses usage-based billing for this team and has hit its on-demand spend limit.

A team admin can raise the spend limit in the Cursor dashboard, or wait for the next billing cycle to continue.

@meroreviewer meroreviewer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 68% | Review time: 230.2s


🟡 Warning (4)

1. impl_atomic_lww! macro can be misused on structs containing nested collections

File: crates/storage/src/collections/crdt_meta.rs (line 267-325) | Consensus: 1/3 agents

The impl_atomic_lww! macro emits a no-op RekeyTarget impl unconditionally. The doc comment warns 'a HAND-WRITTEN impl Mergeable on a struct that DOES nest a collection cannot use this macro (it would silently never re-key) — that path stays a compile error until the author derives or re-keys explicitly.' However, the macro itself does NOT enforce this: it will happily compile for a struct that contains a nested collection (e.g. UnorderedMap), silently emitting a no-op rekey_relative_to that never re-keys the nested collection. The D5 supertrait guard only catches missing RekeyTarget impls, not incorrect (no-op) ones. A developer who uses impl_atomic_lww!(MyStruct, ts) on a struct with a nested collection will get silent data loss — the exact footgun D5 was designed to prevent — with no compile error.

Suggested fix:

Add a compile-time check inside the macro (e.g. via a marker trait or a `const` assertion) that the type does not implement `RekeyTarget` for any nested collection, or at minimum add a `#[doc(hidden)]` marker trait `AtomicLwwLeaf` that the macro requires and that collection types do not implement, so misuse fails to compile. Alternatively, document this limitation prominently in the macro's rustdoc with a `# Panics` or `# Safety` section.

Found by: security-reviewer

2. merge_chars_from tombstone lookup uses self's map id but other may have a different map id in practice

File: crates/storage/src/collections/rga.rs (line 453-519) | Consensus: 1/3 agents

The comment at line 480–495 argues that the tombstone lookup is correct because both replicas share the same chars map id (derived from the field name / parent). This is true when the RGA is constructed via new_with_field_name and the deterministic-id machinery runs. However, merge_chars_from is pub(crate) and called from Mergeable::merge in crdt_impls.rs. If two replicas ever end up with chars maps under different ids (e.g. a replica that was constructed before the D5 re-key fix ran, or a test that constructs RGAs with UnorderedMap::new() directly), then self.chars.entry_id(&key) computes an id under self's map id, but the char in other lives under other's map id. The tombstone check would then look up the wrong entity id — either finding no tombstone (resurrection) or a spurious tombstone (suppressing a genuine add). The fix is sound for the post-D5 world, but there is no runtime assertion that self.chars.id() == other.chars.id() before the merge, so a misconfigured replica silently gets wrong behavior. Consider adding a debug assertion.

Found by: logic-reviewer

3. merge_chars_from uses self.chars.entry_id but other may have a different map id in production paths

File: crates/storage/src/collections/rga.rs (line 453-519) | Consensus: 1/3 agents

The tombstone check at line ~490 calls self.chars.entry_id(&key) to derive the entity id for the tombstone lookup. The doc comment correctly explains that this is intentional — we want to check self's tombstone. However, the correctness argument rests on the assumption that both replicas' chars maps share the same id (deterministic from field name / parent). If two replicas' RGA chars maps have different ids (e.g. due to a pre-D5 random-id bug on one replica), entry_id on self and other would derive different entity ids for the same CharId key, and the tombstone check would look up the wrong entity — potentially allowing resurrection or suppressing valid adds. The fix (D5) prevents new occurrences, but existing data with mismatched ids would silently misbehave. This is a pre-existing data hazard, not introduced by this PR, but the new code path now silently depends on the invariant without asserting it.

Suggested fix:

Consider adding a debug_assert or a comment explicitly noting that if the maps have different ids (pre-D5 data), the tombstone lookup may address the wrong entity and the merge degrades to the old (resurrection-possible) behavior for that data. This makes the invariant dependency explicit for future maintainers.

Found by: patterns-reviewer

4. SHA-256 truncation discards half the entropy; XOR-fold would be simpler and equally collision-resistant for this use-case

File: crates/storage/src/logical_clock.rs (line 237-272) | Consensus: 1/3 agents

The function takes the first 16 bytes of a SHA-256 digest and discards the remaining 16. While SHA-256 is collision-resistant, the doc comment's stated motivation — "no trivial all-zero seed" — is the real driver, and the XOR-fold it replaces was broken for a different reason (it XOR-cancelled symmetric keys to zero). The SHA-256 approach is correct and safe, but the doc comment overstates the problem with XOR-fold: the real bug was that seed[i % 16] ^= key[i] for i in 0..32 XOR-cancels when key[i] == key[i+16], not a general XOR-fold weakness. The fix is sound, but the comment in test_old_xor_fold_collapses_repeated_key_to_zero says "XOR-fold collapses [{k}; 32] to an all-zero seed" — this is only true for the specific i % 16 fold, not XOR-fold in general. This is a documentation accuracy issue, not a correctness bug, but it may mislead future maintainers into thinking any XOR-fold is broken.

Suggested fix:

Clarify the doc comment: the bug was specifically `seed[i % 16] ^= key[i]` for a 32-byte key over a 16-byte seed, which cancels when `key[i] == key[i+16]` for all i. The SHA-256 approach is a valid fix but the comment should not imply XOR-fold is inherently broken.

Found by: patterns-reviewer

💡 Suggestion (3)

1. impl_atomic_lww! tie-break is strict >, making merge non-commutative on equal tie values

File: crates/storage/src/collections/crdt_meta.rs (line 267-325) | Consensus: 1/3 agents

The macro uses if other.$tie > self.$tie { *self = clone(other) }. When two replicas have the same tie-breaker value (e.g. two uploads at the exact same millisecond), the outcome depends on which side is self and which is other — i.e. on merge order. The comment says 'ties keep self, so merge is idempotent and order-independent for distinct tie values' but the qualifier 'for distinct tie values' is the escape hatch: equal ties are NOT order-independent. For a true LWW CRDT, ties must be broken deterministically (e.g. by a secondary key like a node id or a hash of the record). This is a known limitation of timestamp-only LWW and may be acceptable for the use cases here, but the macro doc should explicitly state that equal tie values produce non-deterministic merge outcomes rather than implying idempotency covers all cases.

Found by: logic-reviewer

2. merge_chars_from correctness comment overstates the 'no wrong tombstone' guarantee

File: crates/storage/src/collections/rga.rs (line 453-519) | Consensus: 1/3 agents

The inline comment at line ~490 states: 'If the maps had different ids the chars would be unrelated entities that never converge regardless — so there is no "wrong tombstone" case the real sync path can reach.' This is true for the normal sync path, but merge_chars_from is pub(crate) and called from crdt_impls's Mergeable::merge. If a caller constructs two ReplicatedGrowableArray instances with different chars map ids (e.g. via UnorderedMap::new() without a field name) and calls merge_chars_from, the entry_id lookup will address the wrong entity in self's store — is_deleted will return false for an unrelated entity, and the char will be re-inserted even if self had tombstoned it under a different id. The comment should acknowledge this precondition rather than asserting it is unreachable.

Suggested fix:

Add a debug assertion that `self.chars.id() == other.chars.id()` at the top of `merge_chars_from`, or document the precondition explicitly as a `# Preconditions` section in the rustdoc. This makes the invariant explicit and catches misuse in tests.

Found by: security-reviewer

3. Feature-gated compile-fail test creates a CI/local divergence that is easy to miss

File: crates/storage/tests/compile_fail.rs (line 14-44) | Consensus: 1/3 agents

The mergeable_without_rekeytarget compile-fail test is gated behind #[cfg(feature = "testing")] because the .stderr snapshot differs between feature sets. The comment explains this is because rustc's 'the following other types implement trait' help block is truncated after 8 entries and its membership shifts with the active feature set. This means running cargo test -p calimero-storage in isolation (without --features testing) silently skips the D5 compile-fail guard test. A developer adding a new RekeyTarget impl that shifts the truncation window could break the snapshot without any local signal. The $N others wildcard in the .stderr file partially mitigates this but does not cover membership changes in the first 8 entries.

Suggested fix:

Consider using `trybuild`'s `TRYBUILD=overwrite` workflow in CI to regenerate the snapshot on every PR, or split the snapshot into a feature-independent portion (the error message and note) and a feature-dependent portion (the help list), asserting only the former. Alternatively, use `insta` with a wildcard matcher for the help block.

Found by: security-reviewer


🤖 Generated by AI Code Reviewer | Review ID: review-e7ea84c7

@meroreviewer meroreviewer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 52% | Review time: 417.2s


🟡 Warning (7)

1. impl_atomic_lww! can be misapplied to structs that DO nest collections

File: crates/storage/src/collections/crdt_meta.rs (line 267-320) | Consensus: 1/3 agents

The macro doc comment says "The supertrait guard still bites: a HAND-WRITTEN impl Mergeable on a struct that DOES nest a collection cannot use this macro (it would silently never re-key) — that path stays a compile error until the author derives or re-keys explicitly." However, the macro itself emits a no-op RekeyTarget impl unconditionally. If a user calls impl_atomic_lww!(MyStruct, ts) on a struct that contains a Counter or UnorderedMap, the no-op RekeyTarget satisfies the supertrait bound and the nested collection is silently never re-keyed — exactly the D5 footgun the supertrait was meant to prevent. The compile guard does NOT bite here because RekeyTarget IS implemented (as a no-op).

Suggested fix:

Add a `#[doc(hidden)]` marker or a compile-time assertion that the type has no nested collections, or at minimum strengthen the doc comment to warn that this macro is ONLY safe for leaf structs with no CRDT fields. Consider naming it `impl_leaf_lww!` to make the constraint self-documenting.

Found by: patterns-reviewer

2. impl_atomic_lww! macro allows misuse: tie-breaker field not constrained to be monotonic

File: crates/storage/src/collections/crdt_meta.rs (line 267-325) | Consensus: 1/3 agents

The impl_atomic_lww! macro accepts any field name as the tie-breaker ($tie) and requires only that the field implements PartialOrd. Nothing prevents a caller from passing a non-monotonic field (e.g. a random nonce, a boolean, or a field that decreases over time). If the tie-breaker is not monotonic, the merge is not idempotent for distinct tie values: two replicas can flip-flop between states depending on merge order. The macro's doc comment says 'ties keep self, so merge is idempotent and order-independent for distinct tie values' — but this is only true if the tie-breaker is strictly monotonic across writes. A caller who passes a non-monotonic field gets silent data loss with no compile-time or runtime error. The existing uses (uploaded_at) are correct, but the macro is public (#[macro_export]) and the footgun is real for future callers.

Suggested fix:

Add a note to the macro's doc comment explicitly warning that `$tie` MUST be a monotonically increasing field (e.g. a timestamp or version counter) for the merge to be idempotent and order-independent. Consider renaming the parameter to `$monotonic_tie` to make the contract visible at call sites.

Found by: security-reviewer

3. merge_chars_from tombstone lookup uses self's map id but other's char data — correctness depends on shared map id invariant that is not enforced at the call site

File: crates/storage/src/collections/rga.rs (line 453-519) | Consensus: 1/3 agents

The tombstone check in merge_chars_from calls self.chars.entry_id(&key) to derive the entity id, then checks Index::<S>::is_deleted(entry_id). The inline comment correctly explains that this is correct only when both replicas' chars maps share the same id (derived from the same field name / parent). If two replicas' chars maps have different ids (e.g. because the RGA was constructed with UnorderedMap::new() instead of new_with_field_name_and_crdt_type), the entity id derived from self.chars for a given CharKey would differ from the entity id other.chars uses for the same key — meaning the tombstone lookup would check the wrong entity and either falsely resurrect a deleted char or falsely suppress a live one. The method is pub(crate) and the callers in the test suite use new_with_field_name_and_crdt_type, so the invariant holds in practice. However, the method has no assertion or debug-assert that self.chars.id() == other.chars.id(), so a future caller that constructs RGAs differently could silently corrupt state.

Suggested fix:

Add a `debug_assert_eq!(self.chars.id(), other.chars.id(), "merge_chars_from: both RGAs must share the same chars-map id for tombstone lookup to be correct");` at the top of `merge_chars_from`. This catches the invariant violation in test/debug builds at zero cost in release.

Found by: security-reviewer

4. merge_chars_from uses self.chars.entry_id but other may have a different map id — tombstone lookup is only correct when both replicas share the same map id

File: crates/storage/src/collections/rga.rs (line 453-519) | Consensus: 1/3 agents

The tombstone check at line ~490 calls self.chars.entry_id(&key) to derive the entity id, then checks Index::<S>::is_deleted(entry_id). The comment in the code correctly notes that this is only valid when both replicas' chars maps share the same id (deterministic from the field name / parent). However, merge_chars_from is a pub(crate) method with a generic S2 adaptor — it can be called with an other whose chars map has a different id (e.g. if the RGA was constructed with UnorderedMap::new() instead of new_with_field_name_and_crdt_type, giving it a random id). In that case other.chars.entries() yields keys whose entity ids under other's map differ from those under self's map, so self.chars.entry_id(&key) computes the wrong entity id and the tombstone lookup silently returns false (not deleted), causing the char to be re-inserted — the exact resurrection bug D2 was meant to fix. The fix should either assert/verify that both maps share the same id, or document that callers are responsible for ensuring id alignment. The tombstone_merge_tests do enforce this via rga_in with a shared field name, but the method itself has no guard.

Suggested fix:

Add a debug_assert or a runtime check: `debug_assert_eq!(self.chars.id(), other.chars.id(), "merge_chars_from requires both RGAs to share the same chars map id");` at the top of `merge_chars_from`. Alternatively, document the precondition prominently in the method's doc comment and make the method `unsafe` or rename it to signal the invariant.

Found by: logic-reviewer

5. test_rga_insert_after_observing_remote_is_causally_ordered asserts on b_delta.hlc but commit_causal_delta returns the root hash's HLC, not the minted CharId's HLC

File: crates/storage/src/tests/rga.rs (line 963-1330) | Consensus: 1/3 agents

The test at line ~1200 captures b_delta.hlc and asserts it is strictly greater than a_hlc. The comment says 'b_delta.hlc is the timestamp embedded in the CharId B just minted'. However, commit_causal_delta returns a delta whose hlc field is the HLC at commit time (the root-level delta HLC), not necessarily the HLC embedded in the specific CharId minted by b.content.insert(1, 'B'). If the insert and the commit happen in the same logical tick, they will match; but if commit_causal_delta re-reads the clock (which may have advanced further), the assertion proves only that the commit HLC is after a_hlc, not that the CharId itself is. The test comment acknowledges this gap ('A test that only read hlc_timestamp() could pass for an unrelated reason') but the fix — asserting on b_delta.hlc — has the same gap if commit_causal_delta's HLC is not the CharId's HLC. To close this, the test should extract the actual CharId from the inserted char and compare its timestamp directly.

Found by: logic-reviewer

6. SHA-256 truncation to 16 bytes reduces collision resistance for HLC seeding

File: crates/storage/src/logical_clock.rs (line 237-272) | Consensus: 1/3 agents

The function hlc_seed_from_executor_id takes the first 16 bytes of a SHA-256 digest as the HLC instance seed. While SHA-256 is collision-resistant over its full 256-bit output, truncating to 128 bits means the birthday bound for collisions is 2^64 operations — not 2^128. For a system with many nodes (executor keys), this is acceptable in practice, but the doc comment claims 'cryptographic probability' of distinctness without acknowledging the truncation halves the security margin. More importantly, the doc comment says 'No trivial all-zero seed' but the zero→1 guard in LogicalClock::new is described as a 'final safety net' — if the guard fires for a real key (probability ~2^-128), two executors whose SHA-256 prefixes both happen to be all-zero would get the same HLC id (both guarded to 1). This is astronomically unlikely but the comment should not claim the guard 'only ever fires for genuine input that happens to hash to a zero prefix — which no real key does' since that is not provable. The real concern is correctness: the guard maps all-zero seeds to id=1, so two distinct executors whose SHA-256 prefixes are both all-zero would collide at id=1. The fix is correct and a major improvement over XOR-fold; the documentation overstates the guarantee.

Suggested fix:

Amend the doc comment to note that the 16-byte truncation gives 128-bit collision resistance (birthday bound 2^64), and that the zero→1 guard is a safety net that could in principle map two distinct keys to the same id (with probability ~2^-128). This is not a code change but a documentation accuracy issue.

Found by: security-reviewer

7. D1 HLC advance happens before deserialization — a malformed delta payload skips the clock advance entirely

File: crates/storage/src/collections/root.rs (line 207-229) | Consensus: 1/3 agents

The D1 fix pattern is: (1) match on &artifact to extract delta_hlc, (2) call env::update_hlc, (3) proceed to apply. However, artifact is the already-deserialized StorageDelta — the deserialization happens before this block. If deserialization fails (malformed bytes), the function returns an error before reaching the HLC advance, which is correct. But the ordering means the HLC is advanced even for a delta whose actions will later fail to apply (e.g. a valid envelope with corrupt action payloads). This is a minor issue: advancing the clock for a rejected delta is safe (the clock only moves forward), but it means a malicious peer can advance the local clock up to the 5s drift limit by sending a valid envelope with a garbage body. This is noted as 'drift-rejected timestamp is logged, not fatal' but the 5s window is still exploitable for clock skew. This is a known trade-off in HLC designs, but worth flagging.

Found by: logic-reviewer

💡 Suggestion (6)

1. merge_chars_from is pub(crate) but the tombstone-check comment assumes shared map ids — worth asserting or documenting the precondition

File: crates/storage/src/collections/rga.rs (line 453-519) | Consensus: 1/3 agents

The correctness comment at line 480–497 explains that self.chars.entry_id(&key) is correct because both replicas share the same map id (deterministic from field name/parent). This is a non-trivial precondition: if two RGA instances were constructed with different field names or parent ids, entry_id would compute a different entity id than the one other holds, and the tombstone lookup would be against the wrong entity — silently passing (no tombstone found) and resurrecting the char. The method is pub(crate) so callers are limited, but a debug assertion or a comment at the call site in crdt_impls.rs would make the precondition explicit.

Suggested fix:

Add `debug_assert_eq!(self.chars.id(), other.chars.id(), "merge_chars_from requires both RGAs to share the same chars-map id");` at the top of `merge_chars_from`, or document the precondition in the method signature.

Found by: patterns-reviewer

2. HLC update on receive path skips update on drift rejection but continues applying actions — potential causal ordering violation

File: crates/storage/src/collections/root.rs (line 207-229) | Consensus: 1/3 agents

When update_hlc returns Err(HlcDriftError) (remote timestamp >5s ahead), the code logs a warning and continues applying the delta's actions without advancing the local clock. This means any CharIds minted during or after this delta application may still sort before the remote chars, preserving the causal ordering bug D1 was meant to fix — just for the drift case. The PR description says "A drift-rejected remote timestamp (>5s ahead) is logged and skipped, not fatal" which is a deliberate design choice, but the consequence (causal ordering not guaranteed for drift-rejected deltas) is not documented at the call site.

Suggested fix:

Add a comment at the drift-rejection branch explaining that causal ordering is not guaranteed when a remote delta is drift-rejected, and that this is an accepted trade-off (the alternative — rejecting the delta entirely — would cause data loss for legitimate clock-skewed nodes).

Found by: patterns-reviewer

3. D1 test test_rga_insert_after_observing_remote_is_causally_ordered asserts on b_delta.hlc but commit_causal_delta returns the delta's own HLC, not the minted CharId's HLC

File: crates/storage/src/tests/rga.rs (line 963-1330) | Consensus: 1/3 agents

The test asserts b_minted_hlc.get_time().as_u64() > a_hlc.get_time().as_u64() where b_minted_hlc = b_delta.hlc. The comment says this proves the mint advanced past A's HLC. However, commit_causal_delta returns the delta's HLC which is the clock state at commit time, not necessarily the HLC embedded in the specific CharId minted by insert. If insert mints the CharId from the clock BEFORE commit_causal_delta advances it further, the delta HLC could be >= the CharId's HLC but the CharId itself might still be <= A's HLC. The test would pass without actually proving the CharId ordering. To be definitive, the test should read the CharId directly from the RGA's char list and compare its embedded timestamp.

Suggested fix:

After `b.content.insert(1, 'B')`, read the last char's `CharId` directly from `b.content` (e.g. via `chars.entries()`) and assert that `char_id.timestamp().get_time().as_u64() > a_hlc.get_time().as_u64()`. This directly proves the minted CharId is causally after A's HLC.

Found by: patterns-reviewer

4. impl_atomic_lww! macro uses strict > for LWW — ties keep self, making merge order-dependent for equal tie values

File: crates/storage/src/collections/crdt_meta.rs (line 267-325) | Consensus: 1/3 agents

The macro comment says 'Strict > keeps self on ties, so the merge is idempotent and the outcome is independent of which side is self for distinct tie values.' The qualifier 'for distinct tie values' is load-bearing: if two replicas concurrently write the same uploaded_at value with different content, the merge outcome depends on which replica is self — i.e. which replica's merge runs first. This is a standard LWW trade-off, but the macro doc says 'order-independent' without the qualifier, which is misleading. More importantly, the test impl_atomic_lww_is_last_write_wins_by_tie_field only tests distinct tie values; it does not test the equal-tie case. A user who stores two records with the same uploaded_at will get non-deterministic merge results depending on sync order.

Suggested fix:

Either document the equal-tie non-determinism explicitly in the macro doc, or change the tie-break to use a secondary deterministic comparator (e.g. a hash of the full record) when `uploaded_at` values are equal. At minimum, add a test for the equal-tie case that documents the current behavior.

Found by: logic-reviewer

5. Compile-fail test for mergeable_without_rekeytarget is gated behind #[cfg(feature = "testing")] in the test binary, not in the test file itself

File: crates/storage/tests/compile_fail.rs (line 14-44) | Consensus: 1/3 agents

The mergeable_without_rekeytarget compile-fail case is conditionally included via #[cfg(feature = "testing")] inside the compile_fail() test function. This means running cargo test -p calimero-storage (without the testing feature) silently skips the D5 compile-guard regression test. The comment explains the reason (trybuild's .stderr snapshot differs between feature sets), but a developer running the storage crate in isolation will never see this test fail even if the RekeyTarget supertrait bound is accidentally removed. The test is only exercised in CI via workspace-level feature unification.

Suggested fix:

Document in the test file (or a README) that `cargo test -p calimero-storage --features testing` is required to run the full compile-fail suite, and consider adding a CI step that explicitly runs `cargo test -p calimero-storage --features testing --test compile_fail` to make the coverage explicit rather than relying on feature unification.

Found by: security-reviewer

6. SHA-256 truncation discards half the entropy; consider using full 32-byte digest or a dedicated KDF

File: crates/storage/src/logical_clock.rs (line 237-272) | Consensus: 1/3 agents

The function takes the first 16 bytes of a SHA-256 digest and discards the remaining 16. While SHA-256 is collision-resistant, truncating to 16 bytes reduces the security margin to 64-bit collision resistance (birthday bound ~2^32 operations). For a HLC instance-id that must be globally unique across all executor keys, using the full 32-byte digest (or at least a 128-bit UUID derived via a proper KDF like HKDF) would be more robust. The current approach is a significant improvement over the XOR-fold bug, but the truncation is worth documenting as a deliberate trade-off.

Suggested fix:

Either document explicitly why 16 bytes (128-bit) collision resistance is sufficient for the expected number of executors, or use all 32 bytes by XOR-folding the two halves of the SHA-256 digest: `seed[i] = digest[i] ^ digest[i+16]` — this preserves all 256 bits of entropy in the 16-byte output without truncation.

Found by: patterns-reviewer


🤖 Generated by AI Code Reviewer | Review ID: review-b71286fa

@meroreviewer meroreviewer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 50% | Review time: 703.1s

🟡 1 warnings. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-79a59d56

Comment thread crates/storage/src/tests/rga.rs
…nup, test coverage

Docs/correctness:
- crdt_meta: retract impl_atomic_lww! false "compile error" misuse claim
  (state the no-collection-fields precondition + monotonicity instead); make
  the Mergeable: RekeyTarget supertrait doc honest (enforces impl existence,
  not registration/cascade correctness).
- logical_clock/env: correct the D4 history — the fix is SHA-256, and the
  XOR-fold was a rejected alternative, never the prior production code (which
  was a first-16-byte prefix copy). Rename the reproducer test accordingly.

Comments:
- Strip ephemeral D1–D5 defect labels from all comments (they rot).
- Trim multi-paragraph rationale essays to load-bearing lines; dedup the
  24x repeated RekeyTarget comment in merge_integration into one note.

Tests:
- rekey_record: add a dedicated #[app::state] cascade convergence test driven
  through the two-replica harness via the macro-generated __calimero_register_rekey
  (mutation-checked: drops to LWW loss without it).
- crdt_meta: feature-insensitive compile guard that Mergeable: RekeyTarget holds.
- rga: characterize that blob merge does not propagate other's delete to a
  char self holds live (delete travels via the DeleteRef path).
- rga tests: Root::sync with a drift-rejected (>5s ahead) HLC still applies
  without advancing the local clock.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cursor

cursor Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Bugbot is paused — on-demand spend limit reached

Bugbot uses usage-based billing for this team and has hit its on-demand spend limit.

A team admin can raise the spend limit in the Cursor dashboard, or wait for the next billing cycle to continue.

@meroreviewer meroreviewer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 66% | Review time: 342.8s

🟡 3 warnings. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-d91425a8

Comment thread crates/storage/src/collections/rga.rs
Comment thread crates/storage/src/tests/rga.rs
Comment thread crates/storage/src/tests/rga.rs
…tness

# Conflicts:
#	crates/storage/src/collections/crdt_impls.rs
@rtb-12 rtb-12 changed the title fix(storage,sdk): RGA/CRDT correctness — HLC receive-path, tombstone-aware merge, id-seed fold, RekeyTarget compile-guard (D1–D5) fix(storage,sdk): RGA/CRDT correctness — HLC receive-path, tombstone-aware merge, deterministic id-seed, RekeyTarget compile-guard (D1–D5) Jun 27, 2026

@meroreviewer meroreviewer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 53% | Review time: 348.7s

🟡 2 warnings, 💡 1 suggestions. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-f30fae67

Comment thread crates/storage/src/tests/rga.rs
Comment thread crates/storage/src/tests/rga.rs
Comment thread crates/storage/src/tests/rga.rs
…ard + test note

- merge_chars_from: debug_assert both replicas' chars maps share a map id
  (the tombstone guard derives entry ids from self's map id, so an unrelated
  map would silently resurrect a deleted char). Adds UnorderedMap::id() accessor.
- interleave test ts_at: note that all pinned timestamps share one HLC instance
  id, so ordering is by tick only and the ID-based tiebreaker is not exercised.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@meroreviewer meroreviewer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 59% | Review time: 320.8s

🟡 2 warnings. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-fdf81e43

&mut self,
other: &Self,
) -> ::core::result::Result<(), $crate::collections::crdt_meta::MergeError> {
// Last-write-wins by the monotonic tie-breaker. Strict `>` keeps

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 impl_atomic_lww! silently produces a broken RekeyTarget when applied to a struct with nested collections

The macro's doc comment warns 'MUST only be used on a struct with NO collection fields' and 'The macro cannot check this', but the emitted RekeyTarget is an unconditional no-op regardless. If a caller applies impl_atomic_lww!(MyStruct, ts) to a struct that contains an UnorderedMap or Counter, the nested collection will keep a per-replica random id and diverge permanently — exactly the #2577 data loss the PR is trying to prevent. The macro provides no compile-time or runtime guard against this misuse, so the footgun it was meant to close (hand-written Mergeable without RekeyTarget) is replaced by a new footgun (hand-written impl_atomic_lww! on the wrong type).

Suggested fix:

Add a `register_nested_value_types` override that calls `register_rekey_if_supported!` on the type's fields, or at minimum add a `#[doc(hidden)]` compile-time marker (e.g. a const assertion or a sealed trait) that prevents the macro from being applied to types that implement `RekeyTarget` non-trivially. At minimum, the doc comment should be a `compile_error!`-backed restriction rather than a prose warning.

}

type S = MainStorage;
let root_hash = || {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 HLC causality test is non-discriminating when A's wall-clock is naturally ahead of B's

The test test_rga_insert_after_observing_remote_is_causally_ordered sets A's advertised HLC to now + 2s and then asserts that B's minted CharId HLC is strictly greater than A's advertised HLC. The comment acknowledges this is meant to be discriminating, but the test still passes trivially if Root::sync does NOT advance B's clock, as long as B's wall clock happens to be more than 2s ahead of A's wall clock at the time the test runs (e.g. on a slow CI machine where the two SystemTime::now() calls are separated by >2s). The test is serial but not isolated from wall-clock drift. A truly discriminating test would freeze the clock (use a mock time source) so B's natural wall-clock cannot exceed A's pinned future HLC without the receive-path advance.

Suggested fix:

Use the existing `env::reset_for_testing()` + a mock/pinned time source (if available) to freeze B's clock at a value strictly below A's advertised HLC before calling `Root::sync`, then assert the minted CharId is above A's HLC. This removes the wall-clock race condition.

The per-type "derive generates both Mergeable and RekeyTarget" / "why not
derive" rationale duplicated the impl_atomic_lww! and derive macro docs at
every use site. Cut to a one-line type description (apps: abi_conformance,
blobs, kv-store-with-user-and-frozen-storage, scaffolding-e2e,
state-schema-conformance) and shorten the merge_chars_from debug_assert note.

Also fmt the rekey_record app_state test asserts (was not re-fmt-checked when
added).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@meroreviewer meroreviewer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 73% | Review time: 349.5s

🟡 1 warnings. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-75a8c3b8

Comment thread crates/storage/src/collections/rga.rs Outdated
Apply the codebase's short-comment norm to the doc/inline comments this PR
added: keep the load-bearing fact, drop restated macro/derive docs, historical
bug notes, and review-thread rationale (those live in the PR description).

- crdt_meta: impl_atomic_lww! doc 21→~8 (keep the no-collection-fields
  precondition) and the Mergeable supertrait doc 11→7.
- logical_clock: hlc_seed_from_executor_id doc 13→7.
- rga / tests/rga: merge_chars_from doc 12→4; D1 and interleave test essays
  trimmed to the discrimination/ordering fact (~3 lines each).
- compile_fail + mergeable_without_rekeytarget: feature-gating essays → ~5.
- sync_compliance/crdt_merge: SCOPE module header and RGA test doc tightened.
- env (D4 seed), unordered_map (entry_id), rekey_record (UnfixedStats),
  state.rs (generate_struct_rekey): tightened.

Comment-only; fmt clean, storage tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cursor

cursor Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Bugbot is paused — on-demand spend limit reached

Bugbot uses usage-based billing for this team and has hit its on-demand spend limit.

A team admin can raise the spend limit in the Cursor dashboard, or wait for the next billing cycle to continue.

…eceive, tombstone)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@meroreviewer meroreviewer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 57% | Review time: 284.0s


🟡 Warning (6)

1. impl_atomic_lww! cannot enforce its own 'no nested collection' precondition

File: crates/storage/src/collections/crdt_meta.rs (line 275-320) | Consensus: 1/3 agents

The macro's doc comment states it "MUST only be used on a struct with NO collection fields" because the emitted RekeyTarget is an unconditional no-op. However, the macro has no way to enforce this at compile time or runtime. A user who applies impl_atomic_lww!(MyStruct, ts) to a struct containing an UnorderedMap field will silently get the #2577 divergence the PR is trying to prevent — the very footgun D5 is designed to eliminate. The compile guard (D5) only catches hand-written impl Mergeable without RekeyTarget; impl_atomic_lww! satisfies the supertrait bound with a no-op, so the guard passes even for structs with nested collections.

Suggested fix:

Consider adding a `const _: () = assert!(/* some check */);` or a marker trait that the macro requires, or at minimum upgrade the doc comment to a `# Safety` / `# Panics` section and add a clippy lint or test that exercises the macro only on leaf structs. Alternatively, rename the macro to `impl_leaf_lww!` to make the constraint more obvious at the call site.

Found by: security-reviewer

2. debug_assert on shared map id may silently pass in release builds

File: crates/storage/src/collections/rga.rs (line 453-501) | Consensus: 1/3 agents

The merge_chars_from method uses debug_assert_eq! to verify that both RGA instances share the same chars map id before consulting tombstones. In release builds this assertion is compiled out, so a cross-store merge with mismatched map ids would silently proceed: Index::is_deleted(entry_id) would query tombstones derived from self's map id against an entry id derived from other's map id, potentially misidentifying live chars as tombstoned (or vice versa) and corrupting the array. The comment acknowledges this risk but relies on the caller always satisfying the precondition — which is not enforced at the type level.

Suggested fix:

Either promote to a hard `assert_eq!` (cheap: called only on the full-state merge path, not per-op), or restructure so the tombstone lookup uses the entry id derived from `self.chars.entry_id(&key)` (which is already done on line ~487) and document that the precondition is guaranteed by the sync path always sharing map ids. The current code already derives `entry_id` from `self.chars`, so the assertion is only needed to catch bugs in callers — a hard assert is the safer choice.

Found by: security-reviewer

3. merge_chars_from debug_assert on map id equality is the only guard against cross-map resurrection

File: crates/storage/src/collections/rga.rs (line 453-501) | Consensus: 1/3 agents

The tombstone check Index::<S>::is_deleted(entry_id) is correct only when self.chars and other.chars share the same map id (entry ids are derived from the map id). The debug_assert_eq! fires only in debug builds; in release builds a caller passing two RGAs with different map ids would silently skip the tombstone check for every key (because entry_id would address the wrong entity in self's store), potentially resurrecting deleted chars in production. The function is pub(crate) so callers are limited, but the invariant is load-bearing for correctness.

Suggested fix:

Promote to a hard `assert_eq!` (not `debug_assert_eq!`) since the cost is a single comparison and the consequence of a mismatch is data corruption, not a performance regression.

Found by: patterns-reviewer

4. Cross-store tombstone check uses wrong store adaptor

File: crates/storage/src/collections/rga.rs (line 453-501) | Consensus: 1/3 agents

In merge_chars_from<S2>, the tombstone check calls Index::<S>::is_deleted(entry_id) where entry_id is derived from self.chars.entry_id(&key). This is correct — it checks self's store (adaptor S). However, the debug_assert_eq! on map ids (line ~467) asserts that self.chars.id() == other.chars.id(), which is a necessary precondition for the tombstone check to be meaningful. If this assertion is violated in a release build (where debug_assert is compiled out), entry_id is derived from self's map id but the key came from other's entries — if the two maps have different ids, the derived entry_id would be wrong and the tombstone check would silently pass on a garbage id, potentially resurrecting a deleted char. The debug_assert should be a hard assert! or a returned Err since this is a correctness invariant, not a performance hint.

Suggested fix:

Replace `debug_assert_eq!` with a hard `assert_eq!` or return an error: `if self.chars.id() != other.chars.id() { return Err(StoreError::StorageError(...)); }`

Found by: logic-reviewer

5. SHA-256 truncation to 16 bytes reduces collision resistance for HLC instance ids

File: crates/storage/src/logical_clock.rs (line 237-252) | Consensus: 1/3 agents

hlc_seed_from_executor_id takes the first 16 bytes of SHA-256(executor_id) as the HLC instance id. SHA-256 gives 256-bit preimage resistance, but the 16-byte (128-bit) truncation means the birthday bound for HLC id collisions is ~2^64 executor ids — adequate for a small network but worth documenting. More concretely, the HLC id is used as a tiebreaker in CharId ordering; a collision between two live nodes causes their concurrently-minted CharIds to be indistinguishable by id, falling back to whatever secondary ordering exists. The PR description says this is collision-resistant, which is true, but the effective security level is 64-bit birthday, not 128-bit. This is a documentation/awareness issue rather than an immediate bug, but should be noted given the stated goal of preventing silent character loss.

Suggested fix:

Add a comment to `hlc_seed_from_executor_id` noting that the 16-byte truncation gives a 64-bit birthday bound (~2^64 nodes before a collision becomes likely), which is sufficient for the expected network scale. If the HLC library supports a larger id, consider using all 32 bytes of the digest.

Found by: security-reviewer

6. HLC update applied before delta deserialization succeeds — but after the match on artifact type

File: crates/storage/src/collections/root.rs (line 207-227) | Consensus: 1/3 agents

The HLC update is applied by matching on &artifact (a borrow) before the subsequent match artifact { ... } that actually applies the actions. If the action-application step fails and returns an error, the local HLC has already been advanced to the remote delta's timestamp. This means a failed sync still permanently advances the local clock, which could cause future local inserts to have timestamps higher than expected even though the remote state was never applied. While not a data-corruption issue (HLC only moves forward), it violates the principle that a failed operation should leave state unchanged — and could cause causal ordering anomalies if the failed delta is retried with a lower timestamp.

Suggested fix:

Advance the HLC only after the actions are successfully applied, or document explicitly that HLC advancement is intentionally unconditional (fire-and-forget) even on apply failure.

Found by: logic-reviewer

💡 Suggestion (3)

1. HLC advance on receive path only covers CausalActions, not Actions

File: crates/storage/src/collections/root.rs (line 207-227) | Consensus: 1/3 agents

The D1 fix advances the local HLC only for StorageDelta::CausalActions (which carries delta_hlc). The StorageDelta::Actions variant does not carry an HLC, so actions received via that path do not advance the local clock. If a peer sends an Actions delta containing chars stamped with a future HLC (e.g. after the peer itself observed a CausalActions delta and advanced its clock), the local node's next insert could still sort before those chars. The PR description acknowledges Actions don't carry an HLC, but the causal ordering guarantee is then weaker than stated for the Actions path.

Suggested fix:

Document explicitly in the `Root::sync` code that the HLC advance is only possible for `CausalActions` deltas, and that `Actions` deltas are assumed to carry only causally-prior or concurrent chars (i.e. the sender's clock was not advanced by an unobserved remote). If stronger guarantees are needed, consider adding an HLC field to `Actions` as a follow-up.

Found by: security-reviewer

2. End-to-end HLC advance test relies on wall-clock timing and could be flaky under load

File: crates/storage/src/tests/rga.rs (line 963-1356) | Consensus: 1/3 agents

test_sync_advances_local_hlc_to_observe_remote_delta and test_rga_insert_after_observing_remote_is_causally_ordered construct a "remote" HLC that is 2 seconds ahead of std::time::SystemTime::now(). If the test machine's clock jumps, the system is under heavy load causing the test to run slowly, or the HLC drift tolerance is tightened, the 2-second margin could be insufficient or could accidentally exceed the 5-second drift guard. The tests are #[serial] which helps, but the wall-clock dependency is still fragile.

Suggested fix:

Mock the wall clock (the `LogicalClock` already accepts a time source closure) so the test controls both the "local" and "remote" times deterministically, removing the dependency on `SystemTime::now()`.

Found by: patterns-reviewer

3. Interleave test pins all CharIds to the same HLC instance id, making the ID tiebreaker untested

File: crates/storage/src/tests/rga.rs (line 963-1356) | Consensus: 1/3 agents

In test_rga_real_interleave_merge_converges, ts_at(tick) constructs timestamps using *HybridTimestamp::zero().get_id() — the same instance id for every timestamp. The comment acknowledges this: 'All share one HLC instance id, so ordering is by tick only (the ID tiebreaker isn't exercised).' The RGA ordering is Reverse(CharId) where CharId is (HybridTimestamp, seq). If two concurrent inserts happen to have the same tick (e.g., due to clock skew), the ID tiebreaker determines order. The test does not cover this case, so a bug in the ID tiebreaker path would not be caught. This is a test coverage gap, not a code bug.

Suggested fix:

Add a variant of the test where X and Y use the same tick but different HLC instance ids, to exercise the ID tiebreaker and verify deterministic ordering in that case.

Found by: logic-reviewer


🤖 Generated by AI Code Reviewer | Review ID: review-b4b38685

@meroreviewer meroreviewer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 73% | Review time: 571.3s

🟡 1 warnings. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-b4c4d353

}

/// End-to-end: B observes A's char (CausalActions delta carrying A's HLC),
/// then inserts locally. The receive path advances B's clock past A's HLC, so

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 D1 test constructs a_hlc from a manually-set future time rather than from the actual delta's HLC

In test_rga_insert_after_observing_remote_is_causally_ordered, the test builds a_hlc by taking a_delta.hlc's ID but replacing the time component with now + 2s (lines ~1090-1097). The assertion at line ~1100 then checks that a_hlc > a_delta.hlc, which is trivially true by construction. The actual CausalActions delta sent to B carries this synthetic a_hlc, not the real HLC from A's insert. This means the test does not prove that Root::sync advances the clock when a real delta arrives — it proves it advances when a hand-crafted future HLC arrives. The test still has value but its discriminating power is weaker than the comment claims.

Suggested fix:

Either (a) use `a_delta.hlc` directly as the `delta_hlc` (and adjust the test to use a wall-clock-frozen environment so A's natural HLC is ahead of B's), or (b) document explicitly in the test that `a_hlc` is a synthetic future timestamp used to force the discriminating condition, and that the real-delta case is covered by the tombstone/interleave tests.

…pile_fail snapshot

CI surfaced two regressions I introduced:

1. The thread-11 `debug_assert!(self.chars.id() == other.chars.id())` in
   `merge_chars_from` was a false invariant — `crdt_impls`'s RGA merge unit tests
   (and `merge_integration`/`linearization`) legitimately merge two
   independently-keyed RGAs in a shared store, where the tombstone check always
   uses `self`'s own id so there is no aliasing. The assert fired in debug (CI),
   failing 5 RGA-merge tests. Removed it (and the now-unused `UnorderedMap::id()`);
   the shared-id property is the real-sync-path condition, not a function precondition.

2. Trimming the `mergeable_without_rekeytarget.rs` header shifted line numbers, so
   its trybuild `.stderr` (pinned at :35) no longer matched. Regenerated (now :19).

Verified: full `cargo test -p calimero-storage --features testing` green
(640 lib + all integration), clippy -D warnings clean, fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@meroreviewer meroreviewer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Reviewer

Reviewed by 3 agents | Quality score: 56% | Review time: 263.3s

🟡 1 warnings, 💡 1 suggestions. See inline comments.


🤖 Generated by AI Code Reviewer | Review ID: review-174f6b19

}

/// End-to-end: B observes A's char (CausalActions delta carrying A's HLC),
/// then inserts locally. The receive path advances B's clock past A's HLC, so

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 End-to-end HLC causality test asserts on b_delta.hlc, not on the minted CharId

The test test_rga_insert_after_observing_remote_is_causally_ordered asserts that b_minted_hlc.get_time().as_u64() > a_hlc.get_time().as_u64(), where b_minted_hlc is the delta's commit HLC (b_delta.hlc). The delta HLC is the clock value at commit time, not necessarily the CharId stamped inside the RGA char. If commit_causal_delta samples the HLC independently of the CharId minted during insert, the assertion proves the commit clock advanced but not that the char's CharId itself sorts after a_hlc. The load-bearing invariant for RGA ordering is the CharId timestamp, not the delta's commit HLC.

Suggested fix:

After the merge, read back the RGA's char list and assert that the `CharId` of B's inserted char (the second char in the linearized order) has a timestamp strictly greater than A's char's `CharId` timestamp. This directly tests the invariant that matters for RGA ordering.

};
}

/// Whole-record last-write-wins `Mergeable` for a leaf struct stored as a

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 impl_atomic_lww! cannot enforce its "no collection fields" precondition

The macro's doc comment states "MUST only be used on a struct with NO collection fields" because the emitted RekeyTarget is an unconditional no-op. However, the macro has no way to enforce this at compile time or runtime. A caller who uses impl_atomic_lww! on a struct that contains an UnorderedMap or Counter will silently get the pre-D5 divergence bug (nested collection id never re-keyed), with no diagnostic. The comment warns about this but the warning is easy to miss.

Suggested fix:

Consider adding a `#[doc(hidden)]` marker or a `compile_error!` triggered by a feature flag, or at minimum elevate the warning in the doc comment to a `# Safety` / `# Panics`-style `# MUST` section that is harder to overlook. Alternatively, rename the macro to `impl_atomic_lww_leaf!` to signal that it is only for leaf (non-collection) types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants