fix(storage,sdk): RGA/CRDT correctness — HLC receive-path, tombstone-aware merge, deterministic id-seed, RekeyTarget compile-guard (D1–D5)#2950
Conversation
There was a problem hiding this comment.
🤖 MeroReviewer
Reviewed by 1 agents | Quality score: 30% | Review time: 90.5s
🟡 Warning (2)
1. XOR-fold seed can produce all-zero output for certain key pairs
File: crates/storage/src/logical_clock.rs (line 237-265) | Consensus: 1/1 agents ✓
The XOR-fold in hlc_seed_from_executor_id produces an all-zero seed when the low 16 bytes and high 16 bytes of the executor key are identical (e.g., [0xAB; 32]). The comment says "the constructor's zero→1 guard still covers the all-zero input", but this relies on an undocumented invariant in LogicalClock::new that zero seeds are promoted to 1. If that guard is ever removed or the seed is used in another context, two executors with key[i] == key[i+16] for all i would silently share an HLC id. A simple mitigation is to XOR in a non-zero constant (e.g., seed[0] ^= 0x01 after the fold) to guarantee the seed is never all-zero regardless of the constructor's behavior.
Suggested fix:
After the fold loop, add: `if seed == [0u8; 16] { seed[0] = 1; }` — or document the dependency on the constructor guard explicitly with a `// SAFETY:` comment referencing the specific guard in `LogicalClock::new`.
Found by: security-reviewer
2. HLC update on receive path silently skips drift-rejected timestamps without advancing the clock
File: crates/storage/src/collections/root.rs (line 207-229) | Consensus: 1/1 agents ✓
When a remote delta_hlc is rejected by the drift guard (>5s ahead), the code logs a warning and continues applying the delta's actions without advancing the local clock. This is correct for safety, but the comment says "applying without advancing the local clock" — the actions are still applied. This means a node can apply a causally-ordered delta (with a far-future HLC) without its clock observing that timestamp, so subsequent local inserts may still get timestamps that sort before the remote chars. The PR description says drift-rejected timestamps are "not fatal", but the causal ordering guarantee (the whole point of D1) is silently broken for those deltas. This should at minimum be documented as a known limitation, or the drift tolerance should be widened/the behavior reconsidered.
Suggested fix:
Add a comment explicitly stating that causal ordering is not guaranteed for drift-rejected deltas, and consider whether the drift tolerance (5s) is appropriate for the use case. Alternatively, clamp the remote HLC to `local + drift_tolerance` and advance to that instead of skipping entirely.
Found by: security-reviewer
💡 Suggestion (1)
1. Interleave convergence test asserts only two possible orderings but RGA tie-breaking may produce others
File: crates/storage/src/tests/rga.rs (line 963-1249) | Consensus: 1/1 agents ✓
In test_rga_real_interleave_merge_converges, the final assertion checks xy_text == "aXXYYb" || xy_text == "aYYXXb". This is correct for a well-behaved RGA (each run stays contiguous between anchors), but the assertion comment says "higher-CharId run sorts first". If the RGA's Reverse(CharId) ordering is deterministic, only ONE of the two orderings should be possible for a given pair of executor ids ([1;32] vs [2;32]). The disjunction hides whether the ordering is actually deterministic — a future regression that produces non-contiguous interleaving (e.g. "aXYXYb") would still pass if it happens to equal one of the two strings. Consider asserting the exact expected string based on which executor id produces a higher CharId.
Suggested fix:
Determine which executor (`[1;32]` or `[2;32]`) produces a higher HLC id after `hlc_seed_from_executor_id`, and assert the single expected ordering rather than a disjunction. This makes the test a true regression guard for the interleave order.
Found by: security-reviewer
🤖 Generated by MeroReviewer | Review ID: review-8f6d3682
Documentation ReviewThe following documentation may need updates based on the changes in this PR:
|
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 51% | Review time: 267.4s
🟡 5 warnings, 💡 1 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-75342916
…2950) The maintainer flagged that hand-writing `RekeyTarget` for every app (`apps/scaffolding-e2e/src/lib.rs:62`, etc.) "is a bit insane — implement this directly into the storage crate." D5 had added hand-written `RekeyTarget` impls to five app structs because each had a hand-written `impl Mergeable` (and `RekeyTarget` is now its supertrait). Mechanism (storage-crate-provided, ZERO app `RekeyTarget` code): 1. Field-by-field merges → `#[derive(Mergeable)]`. The derive already emits BOTH the field-by-field `Mergeable` AND the matching `RekeyTarget` (re-keying each nested collection field, no-op for leaves). Converting these removes the hand-written `Mergeable` too: - `NestedMap` (scaffolding-e2e, kv-store-with-user-and-frozen-storage) - `MergeableRecord` (abi_conformance) - `Profile` (state-schema-conformance) Each was exactly `self.field.merge(&other.field)` per field — what the derive generates — so behavior is identical. 2. Whole-record atomic LWW → new `calimero_storage::impl_atomic_lww!`. `FileRecord` (blobs, scaffolding-e2e) is an immutable upload record of plain `String`/`u64`/`[u8;32]` fields, replaced atomically by `uploaded_at`. It CANNOT derive Mergeable (the forbidden-type lint rejects bare primitives — correctly). Its `RekeyTarget` was pure noise (a leaf with no nested id). The new macro lives in the storage crate and emits BOTH the LWW `Mergeable` and the matching no-op `RekeyTarget` in one line, so the app writes no `RekeyTarget`. Net: every app now reaches `Mergeable` through a storage-crate-provided path (`#[derive(Mergeable)]` or `impl_atomic_lww!`) and writes ZERO `RekeyTarget`. The supertrait footgun guard is intact: a HAND-WRITTEN `impl Mergeable` on a collection-nesting struct that forgets re-keying still fails to compile (the compile-fail test is unchanged). `apps/team-metrics-custom` keeps its hand-written Mergeable+RekeyTarget deliberately — it is a teaching example (paired with team-metrics-macro's derive path) and is not part of this PR. Macro unit-tested for LWW-by-tie-field semantics (idempotent, order-independent) and the no-op re-key. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bugbot is paused — on-demand spend limit reachedBugbot uses usage-based billing for this team and has hit its on-demand spend limit. A team admin can raise the spend limit in the Cursor dashboard, or wait for the next billing cycle to continue. |
Review follow-up (D1–D5) — pushed
|
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 59% | Review time: 297.6s
🟡 2 warnings. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-dfbfcd26
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 55% | Review time: 332.0s
🟡 1 warnings. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-c5bef692
…2950) The `mergeable_without_rekeytarget` trybuild case captured rustc's "the following other types implement trait `RekeyTarget`" help block, whose first-8 (alphabetical, then `and $N others`) membership shifts with the active feature set: `calimero-storage/testing` pulls `tests::common::EmptyData` into scope, which sorts into the window and displaces `PermissionedStorage<T, A>`. CI runs `cargo test` over the whole workspace, where feature unification (`calimero-dag`/`calimero-node` enable `calimero-storage/testing`) turns `testing` ON for this crate's own test binary too — so the previously blessed default-feature snapshot mismatched ("1 of 7 tests failed"). trybuild has no partial-line wildcard, so a single literal snapshot can't satisfy both feature sets. Fix: bless the snapshot for the `testing`-ON output (CI's reality) and gate the case behind `#[cfg(feature = "testing")]` so it only runs against the snapshot it was blessed for. The other six compile-fail cases are feature-stable and keep running under every feature set. Verified under `cargo test -p calimero-storage --test compile_fail` (default), `--features testing`, and `cargo test --workspace --test compile_fail` (CI-faithful, testing unified ON). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The WASM HLC seeding path seeded the 16-byte instance id from only the first 16 bytes of the 32-byte executor public key (`take(16)`), with a dead `for i in 32..16` fold loop that never ran. Two executors sharing a 16-byte key prefix collapsed to one HLC id, so two concurrently-minted CharIds could collide and silently drop one replica's character during RGA sync. Extract a pure, unit-testable `hlc_seed_from_executor_id` that XOR-folds all 32 bytes into the 16-byte seed and call it from the WASM seeding path. Adds a reproducer documenting the old shared-prefix collision and tests proving the fix yields distinct seeds and distinct HLC ids. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The root-level RGA `Mergeable::merge` (the cold-join / full-state conflict path, `merge_root_state_typed` → `RGA::merge`) re-inserted any char present in `other` but absent from `self`, gated only on `is_none()`. A char that `self` had concurrently deleted is absent from its live set (tombstoned), so the merge re-added it: it un-tombstoned the char entity, dropped it from the parent's `deleted_children` wire advertisement, and re-listed it as a live child — diverging the chars-map Merkle hash from peers that saw the delete. The visible `get_text` masks this (the map iterator silently drops the tombstoned id via `find_by_id`), but the lost tombstone + hash divergence is the real damage on the sync wire. Fix: before re-inserting a char absent from `self`, consult its entity tombstone via `Index::is_deleted`. A tombstone means delete wins (RGA chars are immutable, so the only conflict is presence-vs-tombstone — no update-vs-delete race), so the char is not resurrected. The element-level DeleteRef LWW path is unchanged and still carries the tombstone for convergence. Convergence stays intact: the merge is still commutative/associative/ idempotent over the (live-set, tombstone-set) lattice. Logic extracted to a generic `merge_chars_from<S2>` so it is unit-testable across two isolated `MockedStorage` scopes (two replicas with shared CharIds) — the reproducer fails without the fix (asserts the wire tombstone + hash) and passes with it. Adds an `entry_id` accessor on `UnorderedMap`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`env::update_hlc` / `LogicalClock::update` had ZERO callers on the
receive/apply path: a node observed a remote delta's HLC only as a u64
`updated_at` nanos copy in metadata, never feeding the full HLC back into its
own clock. Because RGA stamps `CharId`s from the local clock and orders
siblings by `Reverse(CharId)` (high bits = wall-clock time), a node that
observed a remote insert could then stamp its NEXT local insert with a
timestamp ≤ the remote char it causally follows — converging, but with a
causally-wrong interleave.
Fix: `Root::sync` now feeds a `CausalActions` delta's `delta_hlc` into
`env::update_hlc` BEFORE applying its actions, so the local clock's
happens-before guarantee holds across the receive boundary. A drift-rejected
remote timestamp (>5s ahead) is logged and skipped, not fatal — the actions
are still valid replicated state. `Actions` (host-side replay) carries no
delta HLC, so there is nothing to observe.
Reproducers:
- `test_sync_advances_local_hlc_to_observe_remote_delta`: applying a
CausalActions delta with a high delta_hlc must make the next local
timestamp sort strictly after it (fails without the fix).
- `test_rga_insert_after_observing_remote_is_causally_ordered`: B observes
A's char (carrying A's HLC) with its own clock behind, inserts locally, and
its CharId + rendered order are causal ("AB").
The clock's `update` rule itself was already correct (b9ffc90); this wires
it into the path that was missing.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… scope (D3) `sync_compliance/crdt_merge.rs::test_i5_rga_interleave_merge` inserted opaque non-RGA blobs, forced HashComparison (which runs NO merge in the sync_sim harness), and asserted only `crdt_type == Rga` — yet a comment claimed the merge was "verified by the actual sync execution". - Document the harness's true scope (metadata + protocol selection, not value-level merge) in the module header and correct the misleading comments. - Rename `test_i5_rga_interleave_merge` → `test_i5_rga_crdt_type_and_protocol_selection` to reflect what it asserts. - Add `test_rga_real_interleave_merge_converges` in calimero-storage: two replicas concurrently insert distinct runs at the same gap, exchange deltas in both orders, and must converge to identical text/hash with each run kept contiguous between the base anchors. Also tightens the D1/D2 doc comments per review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… nested-collection re-key (D5) Make `RekeyTarget` a supertrait of `Mergeable` (`crates/storage/src/collections/crdt_meta.rs:88`) so the nested-collection footgun behind #2577 (the data loss that removed RGA from mero-drive) becomes a COMPILE error instead of silent divergence: a struct that nests a CRDT collection but never registers a deterministic-id re-key thunk would otherwise mint a per-replica `Id::random()` for that nested collection, so its children land under different parents and never converge — with no runtime error. `#[derive(Mergeable)]` already emits the matching `RekeyTarget` impl (PRs #2577/#2581), so derive sites are unaffected. The missing keystone was the `#[app::state]` macro: it generates a `Mergeable` impl but not a `RekeyTarget` one, so every `#[app::state]` struct in the workspace failed the new supertrait bound. Generate the `RekeyTarget` impl there too (`crates/sdk/macros/src/state.rs`), mirroring the derive: a per-field `rekey_relative_to` cascade (`rekey_field_if_supported!` under a field-namespaced child id) and a `register_nested_value_types` reusing the existing `rekey_register_calls` scan. State enums get no impl (they get no `Mergeable` impl either). This keeps all `#[app::state]` and `#[derive(Mergeable)]` sites compiling automatically; only a HAND-WRITTEN `impl Mergeable` that omits `RekeyTarget` now fails to compile — the intended guard. Add the matching hand-written `RekeyTarget` impls the supertrait surfaces across the workspace: real collection types (the prior D5 work in access_control/authored_map/authored_vector/crdt_impls/frozen/ frozen_value/permissioned/shared/user/merge), in-crate test fixtures (tests/common.rs, src/tests/merge_integration.rs's 24 hand-written fixtures, tests/converge.rs, tests/rekey_record.rs, tests/merge_registry_*), and app crates with hand-rolled merges (abi_conformance, blobs, scaffolding-e2e, state-schema-conformance, kv-store-with-user-and-frozen-storage). Leaf structs (no nested collection — e.g. blobs::FileRecord, atomic LWW records) take the no-op default; nesting structs re-key each collection field. In `tests/rekey_record.rs`, `UnfixedStats` (the pre-fix data-loss fixture) now must impl `RekeyTarget` to be `Mergeable` at all — "Mergeable without RekeyTarget" is no longer expressible, which is the point of D5. The runtime data-loss case is preserved by NOT registering its thunk (dispatch is by registry lookup, not the trait bound), so `unregistered_value_loses_data_pre_fix` still demonstrates the LWW loss. Regenerate two `#[app::state]` compile-fail snapshots (`crates/sdk/tests/macros/{invalid_nested_generics,error_bad_authorizer}.stderr`) whose diagnostics now additionally mention the `RekeyTarget` supertrait chain — expected and additive for those already-invalid inputs. Design note (enum/generic Mergeable): `#[derive(Mergeable)]` already rejects enums/unions; `#[app::state]` enums emit no Mergeable/RekeyTarget; generic `#[app::state]<T>` was already rejected ("generic types are not supported") and now also surfaces the `T: RekeyTarget`/`'static` supertrait requirement — no new generic Mergeable surface to handle. Validation: `cargo build --workspace --tests`, `cargo test -p calimero-storage` (626+ unit, all integration), node `sync_sim` (314), `calimero-runtime`, `calimero-sdk`, `calimero-sdk-macros`, and `cargo clippy --workspace --all-targets --features calimero-storage/testing -- -D warnings` all green; `cargo fmt --check` clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…keyTarget (D5) Add a trybuild compile-fail case proving the #D5 supertrait guard fires: a struct with a HAND-WRITTEN `impl Mergeable` that nests a `Counter` but does NOT impl `RekeyTarget` must fail to compile (E0277, "the trait bound `BadStruct: RekeyTarget` is not satisfied", required by the `Mergeable` supertrait bound) instead of silently last-writer-wins'ing the nested collection (the #2577 data loss). Follows the existing `tests/compile_fail/*.rs` + `.stderr` pattern. The captured `.stderr` is for the default feature set (what CI's `cargo test` runs); the truncated "other types implement RekeyTarget" help list varies with features, noted in the test header. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The D4 fix XOR-folded the high 16 bytes of the 32-byte executor id onto the low 16 (`seed[i % 16] ^= key[i]`). Any key with `key[i] == key[i + 16]` for all i — most starkly `[k; 32]` for any byte k — XOR-cancels to an ALL-ZERO seed, so every such executor collapses to the same (zero→1-guarded) HLC id. Two distinct executors then share one HLC id space, two concurrently-minted CharIds collide, and one replica's character is silently lost during RGA sync. The zero→1 guard only covered a pre-fold all-zero input, not this fold-induced family. Replace the fold with SHA-256 over the full 32-byte executor id, taking the first 16 bytes of the digest as the seed (sha2 is already a crate dep, used by compute_id and the index hashing). This is collision-resistant — distinct keys yield distinct digests — and has no structural input that collapses to all-zero (an all-zero 16-byte prefix would need an infeasible preimage). The constructor's zero→1 guard stays as a final safety net. Tests: replace the stale `take(16)` reproduction with one that demonstrates the XOR-fold's `[k; 32]`→all-zero collapse AND the resulting collision between distinct repeated keys; assert SHA-256 seeds (and production HLC ids) are distinct across an adversarial key set including the `[k; 32]` family, shared-prefix keys, and all-zero; assert no repeated-byte key seeds to all-zero. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reviewer raised whether `merge_chars_from`'s tombstone check addresses the right entity: `self.chars.entry_id(&key)` then `Index::<S>::is_deleted`, given `self` (adaptor `S`) and `other` (adaptor `S2`) may have different `chars` map ids. Verified correct, no code-logic change: - `entry_id` derives `compute_id(self.chars.id(), key.bytes)` — the id is taken from `self`'s OWN map, so we address the tombstone in `self`'s store: exactly the "did self delete this char?" question. `S2` is used only to read `other.chars.entries()`; it never enters the lookup. - `Index::<S>::is_deleted` reads `self`'s store at that id — same map, same store. - `key.bytes` is `borsh(CharId)`, independent of any map id, so the same CharId derives the same entity id under a given map. Two replicas converge an RGA only when their chars maps share an id; under that shared id a char tombstoned on self resolves to the same entity other holds live. Different ids ⇒ unrelated entities that never converge anyway, so the real sync path has no "wrong tombstone" case. Expanded the in-code comment to spell this out, and added a mutation-checked test (`merge_distinguishes_tombstoned_from_unseen_in_one_pass`) that, in ONE merge across two isolated MockedStorage scopes, asserts a tombstoned char stays deleted (index-level: `is_deleted` + `deleted_children`) while a never-seen char is still added (add-wins) — so the guard can't be satisfied by suppressing all inserts. Bypassing the guard fails this test (confirmed), as it does the existing `blob_merge_does_not_resurrect_*`. Asserts at the index/wire level, not `get_text`, since the map iterator filters tombstoned ids and would mask a resurrection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…al (D1) `test_rga_insert_after_observing_remote_is_causally_ordered` asserted on a `hlc_timestamp()` READ and used A's real-clock HLC, so B's own wall-clock mint was naturally already ahead of A's HLC — the test passed even with the receive-path advance removed (verified: bypassing `Root::sync`'s `update_hlc` did NOT fail the old assertion). Strengthen on both axes: - Set A's advertised `delta_hlc` ~2s AHEAD of the real wall clock (within the 5s drift tolerance, so `update_hlc` accepts it). B mints from the real clock, so it is naturally BEHIND A's HLC; B's mint can only end up after A's HLC if the receive path actually advanced B's clock. - Assert on the timestamp B MINTED for its insert (read from B's captured insert delta's `hlc`), not a `hlc_timestamp()` read — proving the mint itself moved past A's observed HLC. Mutation-checked: with `Root::sync`'s receive-path `update_hlc` disabled, B's minted CharId now sorts BEFORE A's HLC and the test fails; with it enabled the mint sorts strictly after and the rendered order stays "AB". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`test_rga_real_interleave_merge_converges` asserted `xy_text == "aXXYYb" || "aYYXXb"`. With the real node-local HLC the order is decided by which replica's wall clock ticks later — incidentally stable but timing-dependent, and one `||` branch never runs. Pin the timestamps so the order is a pure function of the CharIds, as the reviewer asked: base "ab" at a low tick, X's "XX" at a higher tick, Y's "YY" at a strictly-higher tick still (`insert_str_at_timestamp`). Both inserts sit above the base tick so they land in the a|b gap (not after the higher-CharId 'b'); Y's CharIds are strictly higher than X's, so under `Reverse(CharId)` Y's block sorts first. Assert the exact "aYYXXb". The commutativity assertion (X-then-Y vs Y-then-X converge to identical text+hash) is unchanged and still the primary property. Verified deterministic across repeated runs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…2950) The maintainer flagged that hand-writing `RekeyTarget` for every app (`apps/scaffolding-e2e/src/lib.rs:62`, etc.) "is a bit insane — implement this directly into the storage crate." D5 had added hand-written `RekeyTarget` impls to five app structs because each had a hand-written `impl Mergeable` (and `RekeyTarget` is now its supertrait). Mechanism (storage-crate-provided, ZERO app `RekeyTarget` code): 1. Field-by-field merges → `#[derive(Mergeable)]`. The derive already emits BOTH the field-by-field `Mergeable` AND the matching `RekeyTarget` (re-keying each nested collection field, no-op for leaves). Converting these removes the hand-written `Mergeable` too: - `NestedMap` (scaffolding-e2e, kv-store-with-user-and-frozen-storage) - `MergeableRecord` (abi_conformance) - `Profile` (state-schema-conformance) Each was exactly `self.field.merge(&other.field)` per field — what the derive generates — so behavior is identical. 2. Whole-record atomic LWW → new `calimero_storage::impl_atomic_lww!`. `FileRecord` (blobs, scaffolding-e2e) is an immutable upload record of plain `String`/`u64`/`[u8;32]` fields, replaced atomically by `uploaded_at`. It CANNOT derive Mergeable (the forbidden-type lint rejects bare primitives — correctly). Its `RekeyTarget` was pure noise (a leaf with no nested id). The new macro lives in the storage crate and emits BOTH the LWW `Mergeable` and the matching no-op `RekeyTarget` in one line, so the app writes no `RekeyTarget`. Net: every app now reaches `Mergeable` through a storage-crate-provided path (`#[derive(Mergeable)]` or `impl_atomic_lww!`) and writes ZERO `RekeyTarget`. The supertrait footgun guard is intact: a HAND-WRITTEN `impl Mergeable` on a collection-nesting struct that forgets re-keying still fails to compile (the compile-fail test is unchanged). `apps/team-metrics-custom` keeps its hand-written Mergeable+RekeyTarget deliberately — it is a teaching example (paired with team-metrics-macro's derive path) and is not part of this PR. Macro unit-tested for LWW-by-tie-field semantics (idempotent, order-independent) and the no-op re-key. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…2950) The `mergeable_without_rekeytarget` trybuild case captured rustc's "the following other types implement trait `RekeyTarget`" help block, whose first-8 (alphabetical, then `and $N others`) membership shifts with the active feature set: `calimero-storage/testing` pulls `tests::common::EmptyData` into scope, which sorts into the window and displaces `PermissionedStorage<T, A>`. CI runs `cargo test` over the whole workspace, where feature unification (`calimero-dag`/`calimero-node` enable `calimero-storage/testing`) turns `testing` ON for this crate's own test binary too — so the previously blessed default-feature snapshot mismatched ("1 of 7 tests failed"). trybuild has no partial-line wildcard, so a single literal snapshot can't satisfy both feature sets. Fix: bless the snapshot for the `testing`-ON output (CI's reality) and gate the case behind `#[cfg(feature = "testing")]` so it only runs against the snapshot it was blessed for. The other six compile-fail cases are feature-stable and keep running under every feature set. Verified under `cargo test -p calimero-storage --test compile_fail` (default), `--features testing`, and `cargo test --workspace --test compile_fail` (CI-faithful, testing unified ON). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
b7c83bf to
26b292e
Compare
Bugbot is paused — on-demand spend limit reachedBugbot uses usage-based billing for this team and has hit its on-demand spend limit. A team admin can raise the spend limit in the Cursor dashboard, or wait for the next billing cycle to continue. |
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 68% | Review time: 230.2s
🟡 Warning (4)
1. impl_atomic_lww! macro can be misused on structs containing nested collections
File: crates/storage/src/collections/crdt_meta.rs (line 267-325) | Consensus: 1/3 agents
The impl_atomic_lww! macro emits a no-op RekeyTarget impl unconditionally. The doc comment warns 'a HAND-WRITTEN impl Mergeable on a struct that DOES nest a collection cannot use this macro (it would silently never re-key) — that path stays a compile error until the author derives or re-keys explicitly.' However, the macro itself does NOT enforce this: it will happily compile for a struct that contains a nested collection (e.g. UnorderedMap), silently emitting a no-op rekey_relative_to that never re-keys the nested collection. The D5 supertrait guard only catches missing RekeyTarget impls, not incorrect (no-op) ones. A developer who uses impl_atomic_lww!(MyStruct, ts) on a struct with a nested collection will get silent data loss — the exact footgun D5 was designed to prevent — with no compile error.
Suggested fix:
Add a compile-time check inside the macro (e.g. via a marker trait or a `const` assertion) that the type does not implement `RekeyTarget` for any nested collection, or at minimum add a `#[doc(hidden)]` marker trait `AtomicLwwLeaf` that the macro requires and that collection types do not implement, so misuse fails to compile. Alternatively, document this limitation prominently in the macro's rustdoc with a `# Panics` or `# Safety` section.
Found by: security-reviewer
2. merge_chars_from tombstone lookup uses self's map id but other may have a different map id in practice
File: crates/storage/src/collections/rga.rs (line 453-519) | Consensus: 1/3 agents
The comment at line 480–495 argues that the tombstone lookup is correct because both replicas share the same chars map id (derived from the field name / parent). This is true when the RGA is constructed via new_with_field_name and the deterministic-id machinery runs. However, merge_chars_from is pub(crate) and called from Mergeable::merge in crdt_impls.rs. If two replicas ever end up with chars maps under different ids (e.g. a replica that was constructed before the D5 re-key fix ran, or a test that constructs RGAs with UnorderedMap::new() directly), then self.chars.entry_id(&key) computes an id under self's map id, but the char in other lives under other's map id. The tombstone check would then look up the wrong entity id — either finding no tombstone (resurrection) or a spurious tombstone (suppressing a genuine add). The fix is sound for the post-D5 world, but there is no runtime assertion that self.chars.id() == other.chars.id() before the merge, so a misconfigured replica silently gets wrong behavior. Consider adding a debug assertion.
Found by: logic-reviewer
3. merge_chars_from uses self.chars.entry_id but other may have a different map id in production paths
File: crates/storage/src/collections/rga.rs (line 453-519) | Consensus: 1/3 agents
The tombstone check at line ~490 calls self.chars.entry_id(&key) to derive the entity id for the tombstone lookup. The doc comment correctly explains that this is intentional — we want to check self's tombstone. However, the correctness argument rests on the assumption that both replicas' chars maps share the same id (deterministic from field name / parent). If two replicas' RGA chars maps have different ids (e.g. due to a pre-D5 random-id bug on one replica), entry_id on self and other would derive different entity ids for the same CharId key, and the tombstone check would look up the wrong entity — potentially allowing resurrection or suppressing valid adds. The fix (D5) prevents new occurrences, but existing data with mismatched ids would silently misbehave. This is a pre-existing data hazard, not introduced by this PR, but the new code path now silently depends on the invariant without asserting it.
Suggested fix:
Consider adding a debug_assert or a comment explicitly noting that if the maps have different ids (pre-D5 data), the tombstone lookup may address the wrong entity and the merge degrades to the old (resurrection-possible) behavior for that data. This makes the invariant dependency explicit for future maintainers.
Found by: patterns-reviewer
4. SHA-256 truncation discards half the entropy; XOR-fold would be simpler and equally collision-resistant for this use-case
File: crates/storage/src/logical_clock.rs (line 237-272) | Consensus: 1/3 agents
The function takes the first 16 bytes of a SHA-256 digest and discards the remaining 16. While SHA-256 is collision-resistant, the doc comment's stated motivation — "no trivial all-zero seed" — is the real driver, and the XOR-fold it replaces was broken for a different reason (it XOR-cancelled symmetric keys to zero). The SHA-256 approach is correct and safe, but the doc comment overstates the problem with XOR-fold: the real bug was that seed[i % 16] ^= key[i] for i in 0..32 XOR-cancels when key[i] == key[i+16], not a general XOR-fold weakness. The fix is sound, but the comment in test_old_xor_fold_collapses_repeated_key_to_zero says "XOR-fold collapses [{k}; 32] to an all-zero seed" — this is only true for the specific i % 16 fold, not XOR-fold in general. This is a documentation accuracy issue, not a correctness bug, but it may mislead future maintainers into thinking any XOR-fold is broken.
Suggested fix:
Clarify the doc comment: the bug was specifically `seed[i % 16] ^= key[i]` for a 32-byte key over a 16-byte seed, which cancels when `key[i] == key[i+16]` for all i. The SHA-256 approach is a valid fix but the comment should not imply XOR-fold is inherently broken.
Found by: patterns-reviewer
💡 Suggestion (3)
1. impl_atomic_lww! tie-break is strict >, making merge non-commutative on equal tie values
File: crates/storage/src/collections/crdt_meta.rs (line 267-325) | Consensus: 1/3 agents
The macro uses if other.$tie > self.$tie { *self = clone(other) }. When two replicas have the same tie-breaker value (e.g. two uploads at the exact same millisecond), the outcome depends on which side is self and which is other — i.e. on merge order. The comment says 'ties keep self, so merge is idempotent and order-independent for distinct tie values' but the qualifier 'for distinct tie values' is the escape hatch: equal ties are NOT order-independent. For a true LWW CRDT, ties must be broken deterministically (e.g. by a secondary key like a node id or a hash of the record). This is a known limitation of timestamp-only LWW and may be acceptable for the use cases here, but the macro doc should explicitly state that equal tie values produce non-deterministic merge outcomes rather than implying idempotency covers all cases.
Found by: logic-reviewer
2. merge_chars_from correctness comment overstates the 'no wrong tombstone' guarantee
File: crates/storage/src/collections/rga.rs (line 453-519) | Consensus: 1/3 agents
The inline comment at line ~490 states: 'If the maps had different ids the chars would be unrelated entities that never converge regardless — so there is no "wrong tombstone" case the real sync path can reach.' This is true for the normal sync path, but merge_chars_from is pub(crate) and called from crdt_impls's Mergeable::merge. If a caller constructs two ReplicatedGrowableArray instances with different chars map ids (e.g. via UnorderedMap::new() without a field name) and calls merge_chars_from, the entry_id lookup will address the wrong entity in self's store — is_deleted will return false for an unrelated entity, and the char will be re-inserted even if self had tombstoned it under a different id. The comment should acknowledge this precondition rather than asserting it is unreachable.
Suggested fix:
Add a debug assertion that `self.chars.id() == other.chars.id()` at the top of `merge_chars_from`, or document the precondition explicitly as a `# Preconditions` section in the rustdoc. This makes the invariant explicit and catches misuse in tests.
Found by: security-reviewer
3. Feature-gated compile-fail test creates a CI/local divergence that is easy to miss
File: crates/storage/tests/compile_fail.rs (line 14-44) | Consensus: 1/3 agents
The mergeable_without_rekeytarget compile-fail test is gated behind #[cfg(feature = "testing")] because the .stderr snapshot differs between feature sets. The comment explains this is because rustc's 'the following other types implement trait' help block is truncated after 8 entries and its membership shifts with the active feature set. This means running cargo test -p calimero-storage in isolation (without --features testing) silently skips the D5 compile-fail guard test. A developer adding a new RekeyTarget impl that shifts the truncation window could break the snapshot without any local signal. The $N others wildcard in the .stderr file partially mitigates this but does not cover membership changes in the first 8 entries.
Suggested fix:
Consider using `trybuild`'s `TRYBUILD=overwrite` workflow in CI to regenerate the snapshot on every PR, or split the snapshot into a feature-independent portion (the error message and note) and a feature-dependent portion (the help list), asserting only the former. Alternatively, use `insta` with a wildcard matcher for the help block.
Found by: security-reviewer
🤖 Generated by AI Code Reviewer | Review ID: review-e7ea84c7
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 52% | Review time: 417.2s
🟡 Warning (7)
1. impl_atomic_lww! can be misapplied to structs that DO nest collections
File: crates/storage/src/collections/crdt_meta.rs (line 267-320) | Consensus: 1/3 agents
The macro doc comment says "The supertrait guard still bites: a HAND-WRITTEN impl Mergeable on a struct that DOES nest a collection cannot use this macro (it would silently never re-key) — that path stays a compile error until the author derives or re-keys explicitly." However, the macro itself emits a no-op RekeyTarget impl unconditionally. If a user calls impl_atomic_lww!(MyStruct, ts) on a struct that contains a Counter or UnorderedMap, the no-op RekeyTarget satisfies the supertrait bound and the nested collection is silently never re-keyed — exactly the D5 footgun the supertrait was meant to prevent. The compile guard does NOT bite here because RekeyTarget IS implemented (as a no-op).
Suggested fix:
Add a `#[doc(hidden)]` marker or a compile-time assertion that the type has no nested collections, or at minimum strengthen the doc comment to warn that this macro is ONLY safe for leaf structs with no CRDT fields. Consider naming it `impl_leaf_lww!` to make the constraint self-documenting.
Found by: patterns-reviewer
2. impl_atomic_lww! macro allows misuse: tie-breaker field not constrained to be monotonic
File: crates/storage/src/collections/crdt_meta.rs (line 267-325) | Consensus: 1/3 agents
The impl_atomic_lww! macro accepts any field name as the tie-breaker ($tie) and requires only that the field implements PartialOrd. Nothing prevents a caller from passing a non-monotonic field (e.g. a random nonce, a boolean, or a field that decreases over time). If the tie-breaker is not monotonic, the merge is not idempotent for distinct tie values: two replicas can flip-flop between states depending on merge order. The macro's doc comment says 'ties keep self, so merge is idempotent and order-independent for distinct tie values' — but this is only true if the tie-breaker is strictly monotonic across writes. A caller who passes a non-monotonic field gets silent data loss with no compile-time or runtime error. The existing uses (uploaded_at) are correct, but the macro is public (#[macro_export]) and the footgun is real for future callers.
Suggested fix:
Add a note to the macro's doc comment explicitly warning that `$tie` MUST be a monotonically increasing field (e.g. a timestamp or version counter) for the merge to be idempotent and order-independent. Consider renaming the parameter to `$monotonic_tie` to make the contract visible at call sites.
Found by: security-reviewer
3. merge_chars_from tombstone lookup uses self's map id but other's char data — correctness depends on shared map id invariant that is not enforced at the call site
File: crates/storage/src/collections/rga.rs (line 453-519) | Consensus: 1/3 agents
The tombstone check in merge_chars_from calls self.chars.entry_id(&key) to derive the entity id, then checks Index::<S>::is_deleted(entry_id). The inline comment correctly explains that this is correct only when both replicas' chars maps share the same id (derived from the same field name / parent). If two replicas' chars maps have different ids (e.g. because the RGA was constructed with UnorderedMap::new() instead of new_with_field_name_and_crdt_type), the entity id derived from self.chars for a given CharKey would differ from the entity id other.chars uses for the same key — meaning the tombstone lookup would check the wrong entity and either falsely resurrect a deleted char or falsely suppress a live one. The method is pub(crate) and the callers in the test suite use new_with_field_name_and_crdt_type, so the invariant holds in practice. However, the method has no assertion or debug-assert that self.chars.id() == other.chars.id(), so a future caller that constructs RGAs differently could silently corrupt state.
Suggested fix:
Add a `debug_assert_eq!(self.chars.id(), other.chars.id(), "merge_chars_from: both RGAs must share the same chars-map id for tombstone lookup to be correct");` at the top of `merge_chars_from`. This catches the invariant violation in test/debug builds at zero cost in release.
Found by: security-reviewer
4. merge_chars_from uses self.chars.entry_id but other may have a different map id — tombstone lookup is only correct when both replicas share the same map id
File: crates/storage/src/collections/rga.rs (line 453-519) | Consensus: 1/3 agents
The tombstone check at line ~490 calls self.chars.entry_id(&key) to derive the entity id, then checks Index::<S>::is_deleted(entry_id). The comment in the code correctly notes that this is only valid when both replicas' chars maps share the same id (deterministic from the field name / parent). However, merge_chars_from is a pub(crate) method with a generic S2 adaptor — it can be called with an other whose chars map has a different id (e.g. if the RGA was constructed with UnorderedMap::new() instead of new_with_field_name_and_crdt_type, giving it a random id). In that case other.chars.entries() yields keys whose entity ids under other's map differ from those under self's map, so self.chars.entry_id(&key) computes the wrong entity id and the tombstone lookup silently returns false (not deleted), causing the char to be re-inserted — the exact resurrection bug D2 was meant to fix. The fix should either assert/verify that both maps share the same id, or document that callers are responsible for ensuring id alignment. The tombstone_merge_tests do enforce this via rga_in with a shared field name, but the method itself has no guard.
Suggested fix:
Add a debug_assert or a runtime check: `debug_assert_eq!(self.chars.id(), other.chars.id(), "merge_chars_from requires both RGAs to share the same chars map id");` at the top of `merge_chars_from`. Alternatively, document the precondition prominently in the method's doc comment and make the method `unsafe` or rename it to signal the invariant.
Found by: logic-reviewer
5. test_rga_insert_after_observing_remote_is_causally_ordered asserts on b_delta.hlc but commit_causal_delta returns the root hash's HLC, not the minted CharId's HLC
File: crates/storage/src/tests/rga.rs (line 963-1330) | Consensus: 1/3 agents
The test at line ~1200 captures b_delta.hlc and asserts it is strictly greater than a_hlc. The comment says 'b_delta.hlc is the timestamp embedded in the CharId B just minted'. However, commit_causal_delta returns a delta whose hlc field is the HLC at commit time (the root-level delta HLC), not necessarily the HLC embedded in the specific CharId minted by b.content.insert(1, 'B'). If the insert and the commit happen in the same logical tick, they will match; but if commit_causal_delta re-reads the clock (which may have advanced further), the assertion proves only that the commit HLC is after a_hlc, not that the CharId itself is. The test comment acknowledges this gap ('A test that only read hlc_timestamp() could pass for an unrelated reason') but the fix — asserting on b_delta.hlc — has the same gap if commit_causal_delta's HLC is not the CharId's HLC. To close this, the test should extract the actual CharId from the inserted char and compare its timestamp directly.
Found by: logic-reviewer
6. SHA-256 truncation to 16 bytes reduces collision resistance for HLC seeding
File: crates/storage/src/logical_clock.rs (line 237-272) | Consensus: 1/3 agents
The function hlc_seed_from_executor_id takes the first 16 bytes of a SHA-256 digest as the HLC instance seed. While SHA-256 is collision-resistant over its full 256-bit output, truncating to 128 bits means the birthday bound for collisions is 2^64 operations — not 2^128. For a system with many nodes (executor keys), this is acceptable in practice, but the doc comment claims 'cryptographic probability' of distinctness without acknowledging the truncation halves the security margin. More importantly, the doc comment says 'No trivial all-zero seed' but the zero→1 guard in LogicalClock::new is described as a 'final safety net' — if the guard fires for a real key (probability ~2^-128), two executors whose SHA-256 prefixes both happen to be all-zero would get the same HLC id (both guarded to 1). This is astronomically unlikely but the comment should not claim the guard 'only ever fires for genuine input that happens to hash to a zero prefix — which no real key does' since that is not provable. The real concern is correctness: the guard maps all-zero seeds to id=1, so two distinct executors whose SHA-256 prefixes are both all-zero would collide at id=1. The fix is correct and a major improvement over XOR-fold; the documentation overstates the guarantee.
Suggested fix:
Amend the doc comment to note that the 16-byte truncation gives 128-bit collision resistance (birthday bound 2^64), and that the zero→1 guard is a safety net that could in principle map two distinct keys to the same id (with probability ~2^-128). This is not a code change but a documentation accuracy issue.
Found by: security-reviewer
7. D1 HLC advance happens before deserialization — a malformed delta payload skips the clock advance entirely
File: crates/storage/src/collections/root.rs (line 207-229) | Consensus: 1/3 agents
The D1 fix pattern is: (1) match on &artifact to extract delta_hlc, (2) call env::update_hlc, (3) proceed to apply. However, artifact is the already-deserialized StorageDelta — the deserialization happens before this block. If deserialization fails (malformed bytes), the function returns an error before reaching the HLC advance, which is correct. But the ordering means the HLC is advanced even for a delta whose actions will later fail to apply (e.g. a valid envelope with corrupt action payloads). This is a minor issue: advancing the clock for a rejected delta is safe (the clock only moves forward), but it means a malicious peer can advance the local clock up to the 5s drift limit by sending a valid envelope with a garbage body. This is noted as 'drift-rejected timestamp is logged, not fatal' but the 5s window is still exploitable for clock skew. This is a known trade-off in HLC designs, but worth flagging.
Found by: logic-reviewer
💡 Suggestion (6)
1. merge_chars_from is pub(crate) but the tombstone-check comment assumes shared map ids — worth asserting or documenting the precondition
File: crates/storage/src/collections/rga.rs (line 453-519) | Consensus: 1/3 agents
The correctness comment at line 480–497 explains that self.chars.entry_id(&key) is correct because both replicas share the same map id (deterministic from field name/parent). This is a non-trivial precondition: if two RGA instances were constructed with different field names or parent ids, entry_id would compute a different entity id than the one other holds, and the tombstone lookup would be against the wrong entity — silently passing (no tombstone found) and resurrecting the char. The method is pub(crate) so callers are limited, but a debug assertion or a comment at the call site in crdt_impls.rs would make the precondition explicit.
Suggested fix:
Add `debug_assert_eq!(self.chars.id(), other.chars.id(), "merge_chars_from requires both RGAs to share the same chars-map id");` at the top of `merge_chars_from`, or document the precondition in the method signature.
Found by: patterns-reviewer
2. HLC update on receive path skips update on drift rejection but continues applying actions — potential causal ordering violation
File: crates/storage/src/collections/root.rs (line 207-229) | Consensus: 1/3 agents
When update_hlc returns Err(HlcDriftError) (remote timestamp >5s ahead), the code logs a warning and continues applying the delta's actions without advancing the local clock. This means any CharIds minted during or after this delta application may still sort before the remote chars, preserving the causal ordering bug D1 was meant to fix — just for the drift case. The PR description says "A drift-rejected remote timestamp (>5s ahead) is logged and skipped, not fatal" which is a deliberate design choice, but the consequence (causal ordering not guaranteed for drift-rejected deltas) is not documented at the call site.
Suggested fix:
Add a comment at the drift-rejection branch explaining that causal ordering is not guaranteed when a remote delta is drift-rejected, and that this is an accepted trade-off (the alternative — rejecting the delta entirely — would cause data loss for legitimate clock-skewed nodes).
Found by: patterns-reviewer
3. D1 test test_rga_insert_after_observing_remote_is_causally_ordered asserts on b_delta.hlc but commit_causal_delta returns the delta's own HLC, not the minted CharId's HLC
File: crates/storage/src/tests/rga.rs (line 963-1330) | Consensus: 1/3 agents
The test asserts b_minted_hlc.get_time().as_u64() > a_hlc.get_time().as_u64() where b_minted_hlc = b_delta.hlc. The comment says this proves the mint advanced past A's HLC. However, commit_causal_delta returns the delta's HLC which is the clock state at commit time, not necessarily the HLC embedded in the specific CharId minted by insert. If insert mints the CharId from the clock BEFORE commit_causal_delta advances it further, the delta HLC could be >= the CharId's HLC but the CharId itself might still be <= A's HLC. The test would pass without actually proving the CharId ordering. To be definitive, the test should read the CharId directly from the RGA's char list and compare its embedded timestamp.
Suggested fix:
After `b.content.insert(1, 'B')`, read the last char's `CharId` directly from `b.content` (e.g. via `chars.entries()`) and assert that `char_id.timestamp().get_time().as_u64() > a_hlc.get_time().as_u64()`. This directly proves the minted CharId is causally after A's HLC.
Found by: patterns-reviewer
4. impl_atomic_lww! macro uses strict > for LWW — ties keep self, making merge order-dependent for equal tie values
File: crates/storage/src/collections/crdt_meta.rs (line 267-325) | Consensus: 1/3 agents
The macro comment says 'Strict > keeps self on ties, so the merge is idempotent and the outcome is independent of which side is self for distinct tie values.' The qualifier 'for distinct tie values' is load-bearing: if two replicas concurrently write the same uploaded_at value with different content, the merge outcome depends on which replica is self — i.e. which replica's merge runs first. This is a standard LWW trade-off, but the macro doc says 'order-independent' without the qualifier, which is misleading. More importantly, the test impl_atomic_lww_is_last_write_wins_by_tie_field only tests distinct tie values; it does not test the equal-tie case. A user who stores two records with the same uploaded_at will get non-deterministic merge results depending on sync order.
Suggested fix:
Either document the equal-tie non-determinism explicitly in the macro doc, or change the tie-break to use a secondary deterministic comparator (e.g. a hash of the full record) when `uploaded_at` values are equal. At minimum, add a test for the equal-tie case that documents the current behavior.
Found by: logic-reviewer
5. Compile-fail test for mergeable_without_rekeytarget is gated behind #[cfg(feature = "testing")] in the test binary, not in the test file itself
File: crates/storage/tests/compile_fail.rs (line 14-44) | Consensus: 1/3 agents
The mergeable_without_rekeytarget compile-fail case is conditionally included via #[cfg(feature = "testing")] inside the compile_fail() test function. This means running cargo test -p calimero-storage (without the testing feature) silently skips the D5 compile-guard regression test. The comment explains the reason (trybuild's .stderr snapshot differs between feature sets), but a developer running the storage crate in isolation will never see this test fail even if the RekeyTarget supertrait bound is accidentally removed. The test is only exercised in CI via workspace-level feature unification.
Suggested fix:
Document in the test file (or a README) that `cargo test -p calimero-storage --features testing` is required to run the full compile-fail suite, and consider adding a CI step that explicitly runs `cargo test -p calimero-storage --features testing --test compile_fail` to make the coverage explicit rather than relying on feature unification.
Found by: security-reviewer
6. SHA-256 truncation discards half the entropy; consider using full 32-byte digest or a dedicated KDF
File: crates/storage/src/logical_clock.rs (line 237-272) | Consensus: 1/3 agents
The function takes the first 16 bytes of a SHA-256 digest and discards the remaining 16. While SHA-256 is collision-resistant, truncating to 16 bytes reduces the security margin to 64-bit collision resistance (birthday bound ~2^32 operations). For a HLC instance-id that must be globally unique across all executor keys, using the full 32-byte digest (or at least a 128-bit UUID derived via a proper KDF like HKDF) would be more robust. The current approach is a significant improvement over the XOR-fold bug, but the truncation is worth documenting as a deliberate trade-off.
Suggested fix:
Either document explicitly why 16 bytes (128-bit) collision resistance is sufficient for the expected number of executors, or use all 32 bytes by XOR-folding the two halves of the SHA-256 digest: `seed[i] = digest[i] ^ digest[i+16]` — this preserves all 256 bits of entropy in the 16-byte output without truncation.
Found by: patterns-reviewer
🤖 Generated by AI Code Reviewer | Review ID: review-b71286fa
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 50% | Review time: 703.1s
🟡 1 warnings. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-79a59d56
…nup, test coverage Docs/correctness: - crdt_meta: retract impl_atomic_lww! false "compile error" misuse claim (state the no-collection-fields precondition + monotonicity instead); make the Mergeable: RekeyTarget supertrait doc honest (enforces impl existence, not registration/cascade correctness). - logical_clock/env: correct the D4 history — the fix is SHA-256, and the XOR-fold was a rejected alternative, never the prior production code (which was a first-16-byte prefix copy). Rename the reproducer test accordingly. Comments: - Strip ephemeral D1–D5 defect labels from all comments (they rot). - Trim multi-paragraph rationale essays to load-bearing lines; dedup the 24x repeated RekeyTarget comment in merge_integration into one note. Tests: - rekey_record: add a dedicated #[app::state] cascade convergence test driven through the two-replica harness via the macro-generated __calimero_register_rekey (mutation-checked: drops to LWW loss without it). - crdt_meta: feature-insensitive compile guard that Mergeable: RekeyTarget holds. - rga: characterize that blob merge does not propagate other's delete to a char self holds live (delete travels via the DeleteRef path). - rga tests: Root::sync with a drift-rejected (>5s ahead) HLC still applies without advancing the local clock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bugbot is paused — on-demand spend limit reachedBugbot uses usage-based billing for this team and has hit its on-demand spend limit. A team admin can raise the spend limit in the Cursor dashboard, or wait for the next billing cycle to continue. |
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 66% | Review time: 342.8s
🟡 3 warnings. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-d91425a8
…tness # Conflicts: # crates/storage/src/collections/crdt_impls.rs
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 53% | Review time: 348.7s
🟡 2 warnings, 💡 1 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-f30fae67
…ard + test note - merge_chars_from: debug_assert both replicas' chars maps share a map id (the tombstone guard derives entry ids from self's map id, so an unrelated map would silently resurrect a deleted char). Adds UnorderedMap::id() accessor. - interleave test ts_at: note that all pinned timestamps share one HLC instance id, so ordering is by tick only and the ID-based tiebreaker is not exercised. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 59% | Review time: 320.8s
🟡 2 warnings. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-fdf81e43
| &mut self, | ||
| other: &Self, | ||
| ) -> ::core::result::Result<(), $crate::collections::crdt_meta::MergeError> { | ||
| // Last-write-wins by the monotonic tie-breaker. Strict `>` keeps |
There was a problem hiding this comment.
🟡 impl_atomic_lww! silently produces a broken RekeyTarget when applied to a struct with nested collections
The macro's doc comment warns 'MUST only be used on a struct with NO collection fields' and 'The macro cannot check this', but the emitted RekeyTarget is an unconditional no-op regardless. If a caller applies impl_atomic_lww!(MyStruct, ts) to a struct that contains an UnorderedMap or Counter, the nested collection will keep a per-replica random id and diverge permanently — exactly the #2577 data loss the PR is trying to prevent. The macro provides no compile-time or runtime guard against this misuse, so the footgun it was meant to close (hand-written Mergeable without RekeyTarget) is replaced by a new footgun (hand-written impl_atomic_lww! on the wrong type).
Suggested fix:
Add a `register_nested_value_types` override that calls `register_rekey_if_supported!` on the type's fields, or at minimum add a `#[doc(hidden)]` compile-time marker (e.g. a const assertion or a sealed trait) that prevents the macro from being applied to types that implement `RekeyTarget` non-trivially. At minimum, the doc comment should be a `compile_error!`-backed restriction rather than a prose warning.
| } | ||
|
|
||
| type S = MainStorage; | ||
| let root_hash = || { |
There was a problem hiding this comment.
🟡 HLC causality test is non-discriminating when A's wall-clock is naturally ahead of B's
The test test_rga_insert_after_observing_remote_is_causally_ordered sets A's advertised HLC to now + 2s and then asserts that B's minted CharId HLC is strictly greater than A's advertised HLC. The comment acknowledges this is meant to be discriminating, but the test still passes trivially if Root::sync does NOT advance B's clock, as long as B's wall clock happens to be more than 2s ahead of A's wall clock at the time the test runs (e.g. on a slow CI machine where the two SystemTime::now() calls are separated by >2s). The test is serial but not isolated from wall-clock drift. A truly discriminating test would freeze the clock (use a mock time source) so B's natural wall-clock cannot exceed A's pinned future HLC without the receive-path advance.
Suggested fix:
Use the existing `env::reset_for_testing()` + a mock/pinned time source (if available) to freeze B's clock at a value strictly below A's advertised HLC before calling `Root::sync`, then assert the minted CharId is above A's HLC. This removes the wall-clock race condition.
The per-type "derive generates both Mergeable and RekeyTarget" / "why not derive" rationale duplicated the impl_atomic_lww! and derive macro docs at every use site. Cut to a one-line type description (apps: abi_conformance, blobs, kv-store-with-user-and-frozen-storage, scaffolding-e2e, state-schema-conformance) and shorten the merge_chars_from debug_assert note. Also fmt the rekey_record app_state test asserts (was not re-fmt-checked when added). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 73% | Review time: 349.5s
🟡 1 warnings. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-75a8c3b8
Apply the codebase's short-comment norm to the doc/inline comments this PR added: keep the load-bearing fact, drop restated macro/derive docs, historical bug notes, and review-thread rationale (those live in the PR description). - crdt_meta: impl_atomic_lww! doc 21→~8 (keep the no-collection-fields precondition) and the Mergeable supertrait doc 11→7. - logical_clock: hlc_seed_from_executor_id doc 13→7. - rga / tests/rga: merge_chars_from doc 12→4; D1 and interleave test essays trimmed to the discrimination/ordering fact (~3 lines each). - compile_fail + mergeable_without_rekeytarget: feature-gating essays → ~5. - sync_compliance/crdt_merge: SCOPE module header and RGA test doc tightened. - env (D4 seed), unordered_map (entry_id), rekey_record (UnfixedStats), state.rs (generate_struct_rekey): tightened. Comment-only; fmt clean, storage tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bugbot is paused — on-demand spend limit reachedBugbot uses usage-based billing for this team and has hit its on-demand spend limit. A team admin can raise the spend limit in the Cursor dashboard, or wait for the next billing cycle to continue. |
…eceive, tombstone) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 57% | Review time: 284.0s
🟡 Warning (6)
1. impl_atomic_lww! cannot enforce its own 'no nested collection' precondition
File: crates/storage/src/collections/crdt_meta.rs (line 275-320) | Consensus: 1/3 agents
The macro's doc comment states it "MUST only be used on a struct with NO collection fields" because the emitted RekeyTarget is an unconditional no-op. However, the macro has no way to enforce this at compile time or runtime. A user who applies impl_atomic_lww!(MyStruct, ts) to a struct containing an UnorderedMap field will silently get the #2577 divergence the PR is trying to prevent — the very footgun D5 is designed to eliminate. The compile guard (D5) only catches hand-written impl Mergeable without RekeyTarget; impl_atomic_lww! satisfies the supertrait bound with a no-op, so the guard passes even for structs with nested collections.
Suggested fix:
Consider adding a `const _: () = assert!(/* some check */);` or a marker trait that the macro requires, or at minimum upgrade the doc comment to a `# Safety` / `# Panics` section and add a clippy lint or test that exercises the macro only on leaf structs. Alternatively, rename the macro to `impl_leaf_lww!` to make the constraint more obvious at the call site.
Found by: security-reviewer
2. debug_assert on shared map id may silently pass in release builds
File: crates/storage/src/collections/rga.rs (line 453-501) | Consensus: 1/3 agents
The merge_chars_from method uses debug_assert_eq! to verify that both RGA instances share the same chars map id before consulting tombstones. In release builds this assertion is compiled out, so a cross-store merge with mismatched map ids would silently proceed: Index::is_deleted(entry_id) would query tombstones derived from self's map id against an entry id derived from other's map id, potentially misidentifying live chars as tombstoned (or vice versa) and corrupting the array. The comment acknowledges this risk but relies on the caller always satisfying the precondition — which is not enforced at the type level.
Suggested fix:
Either promote to a hard `assert_eq!` (cheap: called only on the full-state merge path, not per-op), or restructure so the tombstone lookup uses the entry id derived from `self.chars.entry_id(&key)` (which is already done on line ~487) and document that the precondition is guaranteed by the sync path always sharing map ids. The current code already derives `entry_id` from `self.chars`, so the assertion is only needed to catch bugs in callers — a hard assert is the safer choice.
Found by: security-reviewer
3. merge_chars_from debug_assert on map id equality is the only guard against cross-map resurrection
File: crates/storage/src/collections/rga.rs (line 453-501) | Consensus: 1/3 agents
The tombstone check Index::<S>::is_deleted(entry_id) is correct only when self.chars and other.chars share the same map id (entry ids are derived from the map id). The debug_assert_eq! fires only in debug builds; in release builds a caller passing two RGAs with different map ids would silently skip the tombstone check for every key (because entry_id would address the wrong entity in self's store), potentially resurrecting deleted chars in production. The function is pub(crate) so callers are limited, but the invariant is load-bearing for correctness.
Suggested fix:
Promote to a hard `assert_eq!` (not `debug_assert_eq!`) since the cost is a single comparison and the consequence of a mismatch is data corruption, not a performance regression.
Found by: patterns-reviewer
4. Cross-store tombstone check uses wrong store adaptor
File: crates/storage/src/collections/rga.rs (line 453-501) | Consensus: 1/3 agents
In merge_chars_from<S2>, the tombstone check calls Index::<S>::is_deleted(entry_id) where entry_id is derived from self.chars.entry_id(&key). This is correct — it checks self's store (adaptor S). However, the debug_assert_eq! on map ids (line ~467) asserts that self.chars.id() == other.chars.id(), which is a necessary precondition for the tombstone check to be meaningful. If this assertion is violated in a release build (where debug_assert is compiled out), entry_id is derived from self's map id but the key came from other's entries — if the two maps have different ids, the derived entry_id would be wrong and the tombstone check would silently pass on a garbage id, potentially resurrecting a deleted char. The debug_assert should be a hard assert! or a returned Err since this is a correctness invariant, not a performance hint.
Suggested fix:
Replace `debug_assert_eq!` with a hard `assert_eq!` or return an error: `if self.chars.id() != other.chars.id() { return Err(StoreError::StorageError(...)); }`
Found by: logic-reviewer
5. SHA-256 truncation to 16 bytes reduces collision resistance for HLC instance ids
File: crates/storage/src/logical_clock.rs (line 237-252) | Consensus: 1/3 agents
hlc_seed_from_executor_id takes the first 16 bytes of SHA-256(executor_id) as the HLC instance id. SHA-256 gives 256-bit preimage resistance, but the 16-byte (128-bit) truncation means the birthday bound for HLC id collisions is ~2^64 executor ids — adequate for a small network but worth documenting. More concretely, the HLC id is used as a tiebreaker in CharId ordering; a collision between two live nodes causes their concurrently-minted CharIds to be indistinguishable by id, falling back to whatever secondary ordering exists. The PR description says this is collision-resistant, which is true, but the effective security level is 64-bit birthday, not 128-bit. This is a documentation/awareness issue rather than an immediate bug, but should be noted given the stated goal of preventing silent character loss.
Suggested fix:
Add a comment to `hlc_seed_from_executor_id` noting that the 16-byte truncation gives a 64-bit birthday bound (~2^64 nodes before a collision becomes likely), which is sufficient for the expected network scale. If the HLC library supports a larger id, consider using all 32 bytes of the digest.
Found by: security-reviewer
6. HLC update applied before delta deserialization succeeds — but after the match on artifact type
File: crates/storage/src/collections/root.rs (line 207-227) | Consensus: 1/3 agents
The HLC update is applied by matching on &artifact (a borrow) before the subsequent match artifact { ... } that actually applies the actions. If the action-application step fails and returns an error, the local HLC has already been advanced to the remote delta's timestamp. This means a failed sync still permanently advances the local clock, which could cause future local inserts to have timestamps higher than expected even though the remote state was never applied. While not a data-corruption issue (HLC only moves forward), it violates the principle that a failed operation should leave state unchanged — and could cause causal ordering anomalies if the failed delta is retried with a lower timestamp.
Suggested fix:
Advance the HLC only after the actions are successfully applied, or document explicitly that HLC advancement is intentionally unconditional (fire-and-forget) even on apply failure.
Found by: logic-reviewer
💡 Suggestion (3)
1. HLC advance on receive path only covers CausalActions, not Actions
File: crates/storage/src/collections/root.rs (line 207-227) | Consensus: 1/3 agents
The D1 fix advances the local HLC only for StorageDelta::CausalActions (which carries delta_hlc). The StorageDelta::Actions variant does not carry an HLC, so actions received via that path do not advance the local clock. If a peer sends an Actions delta containing chars stamped with a future HLC (e.g. after the peer itself observed a CausalActions delta and advanced its clock), the local node's next insert could still sort before those chars. The PR description acknowledges Actions don't carry an HLC, but the causal ordering guarantee is then weaker than stated for the Actions path.
Suggested fix:
Document explicitly in the `Root::sync` code that the HLC advance is only possible for `CausalActions` deltas, and that `Actions` deltas are assumed to carry only causally-prior or concurrent chars (i.e. the sender's clock was not advanced by an unobserved remote). If stronger guarantees are needed, consider adding an HLC field to `Actions` as a follow-up.
Found by: security-reviewer
2. End-to-end HLC advance test relies on wall-clock timing and could be flaky under load
File: crates/storage/src/tests/rga.rs (line 963-1356) | Consensus: 1/3 agents
test_sync_advances_local_hlc_to_observe_remote_delta and test_rga_insert_after_observing_remote_is_causally_ordered construct a "remote" HLC that is 2 seconds ahead of std::time::SystemTime::now(). If the test machine's clock jumps, the system is under heavy load causing the test to run slowly, or the HLC drift tolerance is tightened, the 2-second margin could be insufficient or could accidentally exceed the 5-second drift guard. The tests are #[serial] which helps, but the wall-clock dependency is still fragile.
Suggested fix:
Mock the wall clock (the `LogicalClock` already accepts a time source closure) so the test controls both the "local" and "remote" times deterministically, removing the dependency on `SystemTime::now()`.
Found by: patterns-reviewer
3. Interleave test pins all CharIds to the same HLC instance id, making the ID tiebreaker untested
File: crates/storage/src/tests/rga.rs (line 963-1356) | Consensus: 1/3 agents
In test_rga_real_interleave_merge_converges, ts_at(tick) constructs timestamps using *HybridTimestamp::zero().get_id() — the same instance id for every timestamp. The comment acknowledges this: 'All share one HLC instance id, so ordering is by tick only (the ID tiebreaker isn't exercised).' The RGA ordering is Reverse(CharId) where CharId is (HybridTimestamp, seq). If two concurrent inserts happen to have the same tick (e.g., due to clock skew), the ID tiebreaker determines order. The test does not cover this case, so a bug in the ID tiebreaker path would not be caught. This is a test coverage gap, not a code bug.
Suggested fix:
Add a variant of the test where X and Y use the same tick but different HLC instance ids, to exercise the ID tiebreaker and verify deterministic ordering in that case.
Found by: logic-reviewer
🤖 Generated by AI Code Reviewer | Review ID: review-b4b38685
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 73% | Review time: 571.3s
🟡 1 warnings. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-b4c4d353
| } | ||
|
|
||
| /// End-to-end: B observes A's char (CausalActions delta carrying A's HLC), | ||
| /// then inserts locally. The receive path advances B's clock past A's HLC, so |
There was a problem hiding this comment.
🟡 D1 test constructs a_hlc from a manually-set future time rather than from the actual delta's HLC
In test_rga_insert_after_observing_remote_is_causally_ordered, the test builds a_hlc by taking a_delta.hlc's ID but replacing the time component with now + 2s (lines ~1090-1097). The assertion at line ~1100 then checks that a_hlc > a_delta.hlc, which is trivially true by construction. The actual CausalActions delta sent to B carries this synthetic a_hlc, not the real HLC from A's insert. This means the test does not prove that Root::sync advances the clock when a real delta arrives — it proves it advances when a hand-crafted future HLC arrives. The test still has value but its discriminating power is weaker than the comment claims.
Suggested fix:
Either (a) use `a_delta.hlc` directly as the `delta_hlc` (and adjust the test to use a wall-clock-frozen environment so A's natural HLC is ahead of B's), or (b) document explicitly in the test that `a_hlc` is a synthetic future timestamp used to force the discriminating condition, and that the real-delta case is covered by the tombstone/interleave tests.
…pile_fail snapshot CI surfaced two regressions I introduced: 1. The thread-11 `debug_assert!(self.chars.id() == other.chars.id())` in `merge_chars_from` was a false invariant — `crdt_impls`'s RGA merge unit tests (and `merge_integration`/`linearization`) legitimately merge two independently-keyed RGAs in a shared store, where the tombstone check always uses `self`'s own id so there is no aliasing. The assert fired in debug (CI), failing 5 RGA-merge tests. Removed it (and the now-unused `UnorderedMap::id()`); the shared-id property is the real-sync-path condition, not a function precondition. 2. Trimming the `mergeable_without_rekeytarget.rs` header shifted line numbers, so its trybuild `.stderr` (pinned at :35) no longer matched. Regenerated (now :19). Verified: full `cargo test -p calimero-storage --features testing` green (640 lib + all integration), clippy -D warnings clean, fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 56% | Review time: 263.3s
🟡 1 warnings, 💡 1 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-174f6b19
| } | ||
|
|
||
| /// End-to-end: B observes A's char (CausalActions delta carrying A's HLC), | ||
| /// then inserts locally. The receive path advances B's clock past A's HLC, so |
There was a problem hiding this comment.
🟡 End-to-end HLC causality test asserts on b_delta.hlc, not on the minted CharId
The test test_rga_insert_after_observing_remote_is_causally_ordered asserts that b_minted_hlc.get_time().as_u64() > a_hlc.get_time().as_u64(), where b_minted_hlc is the delta's commit HLC (b_delta.hlc). The delta HLC is the clock value at commit time, not necessarily the CharId stamped inside the RGA char. If commit_causal_delta samples the HLC independently of the CharId minted during insert, the assertion proves the commit clock advanced but not that the char's CharId itself sorts after a_hlc. The load-bearing invariant for RGA ordering is the CharId timestamp, not the delta's commit HLC.
Suggested fix:
After the merge, read back the RGA's char list and assert that the `CharId` of B's inserted char (the second char in the linearized order) has a timestamp strictly greater than A's char's `CharId` timestamp. This directly tests the invariant that matters for RGA ordering.
| }; | ||
| } | ||
|
|
||
| /// Whole-record last-write-wins `Mergeable` for a leaf struct stored as a |
There was a problem hiding this comment.
💡 impl_atomic_lww! cannot enforce its "no collection fields" precondition
The macro's doc comment states "MUST only be used on a struct with NO collection fields" because the emitted RekeyTarget is an unconditional no-op. However, the macro has no way to enforce this at compile time or runtime. A caller who uses impl_atomic_lww! on a struct that contains an UnorderedMap or Counter will silently get the pre-D5 divergence bug (nested collection id never re-keyed), with no diagnostic. The comment warns about this but the warning is easy to miss.
Suggested fix:
Consider adding a `#[doc(hidden)]` marker or a `compile_error!` triggered by a feature flag, or at minimum elevate the warning in the doc comment to a `# Safety` / `# Panics`-style `# MUST` section that is harder to overlook. Alternatively, rename the macro to `impl_atomic_lww_leaf!` to signal that it is only for leaf (non-collection) types.
Summary
Five correctness fixes from the core RGA/CRDT audit. D1–D4 fix concrete convergence and data-loss defects in the storage layer; D5 turns the underlying footgun — a nested collection silently receiving a per-replica-random id — into a compile error so it cannot recur.
Each section below states the bug, the fix, and the regression tests. All work is on this branch and verified green (see Validation).
collections/root.rscollections/rga.rs,crdt_impls.rs,unordered_map.rssync_compliancetest over-stated its coveragenode/tests/sync_compliance/crdt_merge.rslogical_clock.rs,env.rsMergeablecan nest a collection without re-keying itcollections/crdt_meta.rs,sdk/macros/src/state.rsD1 — Advance the local HLC on the receive path
Bug.
env::update_hlchad no callers on the apply path: a node observed a remote delta's HLC only as au64copy in metadata, never feeding it back into its own clock. Since RGA stampsCharIds from the local clock and orders siblings byReverse(CharId), a node could stamp its next local insert with a timestamp ≤ the remote char it causally follows — convergent, but to a causally-wrong interleave.Fix.
Root::syncnow feeds aCausalActionsdelta'sdelta_hlcintoenv::update_hlcbefore applying its actions. A drift-rejected timestamp (>5s ahead) is logged and skipped, not fatal — the actions remain valid replicated state.Tests (
src/tests/rga.rs): the next local timestamp sorts strictly after an observed remote delta; an end-to-end insert-after-observe renders causally as"AB"; a drift-rejected (>5s) HLC still applies without advancing the clock.D2 — Make the RGA blob merge tombstone-aware
Bug. The root-level RGA merge (cold-join / full-state conflict path) re-inserted any char present in
otherbut absent fromself, gated only onis_none(). A charselfhad concurrently deleted is absent from its live set, so the merge resurrected it — dropping the tombstone and diverging the chars-map hash from peers that saw the delete.Fix. Before re-inserting a char absent from
self, consult its tombstone viaIndex::is_deleted; a tombstone means delete wins. Logic extracted to a genericmerge_chars_fromfor cross-store unit testing. This path is add-only (delete propagation in the reverse direction travels via theDeleteRefdelta-sync path, as before).Tests (
collections/rga.rs): non-resurrection of a concurrently-deleted char across two isolated stores (asserted at the index level, sinceget_textmasks tombstones); genuinely-new chars are still added; the reverse direction (other-deleted /self-live) is characterized.D3 — Honest
sync_compliancescope + a real interleave testBug.
crdt_merge.rs::test_i5_rga_interleave_mergeinserted opaque non-RGA blobs and asserted onlycrdt_type == Rga, yet a comment claimed the merge was "verified by the actual sync execution."Fix. Documented the harness's true scope (metadata + protocol selection, not value-level merge), renamed the test to match what it asserts, and added a real interleave test in
calimero-storage: two replicas concurrently insert distinct runs at the same gap, exchange deltas both ways, and converge to identical text/hash with each run kept contiguous.D4 — Collision-resistant executor-id → HLC seed
Bug. The WASM HLC seeding path derived the 16-byte instance id from only the first 16 bytes of the 32-byte executor key. Two executors sharing a 16-byte prefix collapsed to one HLC id, so two concurrently-minted
CharIds could collide and silently drop one replica's character during RGA sync.Fix. Extracted a pure, unit-testable
hlc_seed_from_executor_idthat takes the first 16 bytes ofSHA-256(executor_id)(sha2is already a crate dependency). SHA-256 is collision-resistant over the full key; a naive XOR-fold was rejected because it collapses any[k; 32]key to an all-zero seed.Tests (
logical_clock.rs): distinct executor ids — including the adversarial repeated-byte and shared-prefix families — yield distinct seeds and distinct HLC ids; a test documents why the XOR-fold alternative is unsafe.D5 —
RekeyTargetas a supertrait ofMergeable(compile-guard)Bug (the footgun). Collections created on demand get a random internal id. The sync model converges collections by matching entity ids, so two nodes that independently first-create the same logical nested CRDT mint different ids; their children never merge — a permanent divergence with no runtime error (the #2577 data loss). The deterministic re-key machinery (
RekeyTarget) fixes it, but a hand-writtenMergeableimpl could simply forget to register its re-key.Fix. Make
RekeyTargeta supertrait ofMergeable, so a missing re-key impl is a compile error rather than silent divergence.#[derive(Mergeable)]already emits the matchingRekeyTarget; derive sites are unaffected.#[app::state]now generates aRekeyTargetimpl mirroring the derive (per-field re-key cascade + nested-value-type registration).RekeyTargetimpls the new bound surfaces across the workspace (collection wrappers, test fixtures, app crates with hand-rolled merges). Leaf records take a no-op via the newimpl_atomic_lww!helper, which emits both the LWWMergeableand the matching no-opRekeyTargetin one line.impl Mergeablethat omitsRekeyTargetfails to compile — the intended guard.Tests. A
compile_failcase proves a hand-writtenMergeablewithoutRekeyTargetis rejected (E0277); a feature-insensitive compile guard pins the supertrait bound in every feature set; and a dedicated#[app::state]two-replica convergence test proves the macro-generated re-key cascade converges a nested collection at runtime (not just compiles).Validation
All green on this branch (after merging
master):cargo fmt --check— clean.cargo build --workspace --tests— builds.cargo test -p calimero-storage --features testing— 640 lib unit tests + all integration binaries (incl.rga,rekey_record,converge,merge_registry_*,compile_fail).cargo test -p calimero-node --test sync_sim— incl. the D3sync_compliancesuite.cargo test -p calimero-runtime(incl.crdt_conformance),cargo test -p calimero-sdk,cargo test -p calimero-sdk-macros.cargo clippy --workspace --all-targets --features calimero-storage/testing -- -D warnings— clean.Review follow-up
Addressed human and automated-review feedback: corrected doc/comment accuracy (notably the D4 history — the fix is SHA-256, not a fold), removed transient defect labels and over-long rationale from source comments, and added the test coverage noted above (the
#[app::state]cascade convergence test, the supertrait compile guard, and the D1 drift-rejection and D2 reverse-direction cases).Follow-ups (deferred)
deleted_children/ per-entity tombstones is a design change, not a correctness defect, and needs a causal-stability-based prune. Tracked separately.deleted_at); delete-wins for those collections is a separate concern from this RGA-scoped fix and is tracked separately.