Skip to content

feat(shadows): gate publish methods on initial cloud sync#128

Merged
KennethKnudsen97 merged 11 commits into
masterfrom
feat/shadow-ensure-initialized
May 21, 2026
Merged

feat(shadows): gate publish methods on initial cloud sync#128
KennethKnudsen97 merged 11 commits into
masterfrom
feat/shadow-ensure-initialized

Conversation

@KennethKnudsen97
Copy link
Copy Markdown
Contributor

@KennethKnudsen97 KennethKnudsen97 commented May 19, 2026

Summary

Closes the race where a partial update_reported could land on a brand-new shadow before wait_delta's first GET, leaving the cloud doc permanently incomplete.

Hit in factbird-mini PR #666 after factory reset + re-provisioning:

  1. Device re-provisions to a new customer's IoT thing (no shadow exists yet).
  2. ConnectionReporter calls update_reported(state: Disabled).
  3. Wifi delta loop calls wait_delta which on first-call subscribes + GETs.
  4. Reporter wins (no subscribe step), AWS auto-creates the shadow with just reported.state = "disabled".
  5. By the time wait_delta's GET runs, the shadow exists (200, not 404) — create_shadow fallback never fires.
  6. Result: { "state": { "reported": { "state": "disabled" } } } and nothing else, forever.

Change

  • Add initialized: Mutex<NoopRawMutex, bool> to Shadow.
  • Add private ensure_initialized() that takes the lock and runs subscribe + GET + 404-fallback exactly once.
  • update_reported, update_desired, wait_delta all await ensure_initialized first.
  • delete_shadow resets the flag so a follow-up publish (e.g. button-driven "reset wifi" where the front-end expects to configure new credentials right after) re-initializes the cloud doc from local defaults.
  • Not re-run on clean-session reconnect — only the delta subscription is re-established. The cloud doc still exists.

Properties

  • Concurrent first-callers serialize on the lock; only one does the work.
  • Subsequent calls fall through on the first instruction.
  • create_shadow and sync_shadow are deliberately not gated — they're explicit force-push operations and gating them would create a recursion via get_shadow_from_cloud's 404 fallback.
  • The first-call drop-guard moved from wait_delta into ensure_initialized; the cancellation invariant ("subscription == Some implies the sync ran to completion") is preserved.

Test plan

  • cargo check with shadow features clean
  • cargo test --lib shadows — all 138 unit tests pass
  • End-to-end test against AWS IoT via factbird-mini — pinned in PR #666
  • Confirm update_reported from ConnectionReporter no longer creates a partial cloud doc on a fresh thing
  • Confirm clean-session reconnect resubscribes without re-running the initial sync
  • Confirm delete_shadowupdate_reported re-initializes the cloud doc

@KennethKnudsen97 KennethKnudsen97 force-pushed the feat/shadow-ensure-initialized branch 3 times, most recently from a00bd68 to 693976d Compare May 19, 2026 13:00
Before this change, calling update_reported (or update_desired) on a
fresh shadow could race with wait_delta's first-call subscribe + GET.
If update_reported won, AWS would auto-create the shadow doc with only
the field that arrived first, leaving the rest of the doc permanently
missing. This was hit in factbird-mini after factory reset and
re-provisioning: the connection reporter pushed reported.state=Disabled
before the wifi delta loop did its initial sync, so the new customer's
shadow ended up holding just {reported: {state: "disabled"}} forever.

Add an `initialized: Mutex<NoopRawMutex, bool>` to Shadow and a private
ensure_initialized() that takes the lock, runs the subscribe + GET +
404-fallback once, and flips the flag. update_reported, update_desired,
and wait_delta now await ensure_initialized before doing anything else.
Concurrent first-callers serialize on the lock; subsequent callers fall
through on the first instruction.

Not re-run on clean-session reconnect — the cloud doc still exists, only
the delta subscription needs re-establishing (handle_delta resubscribes
without re-syncing). Reset to false by delete_shadow so a follow-up
publish (e.g. button-driven "reset wifi" where the front-end expects to
configure new credentials immediately afterwards) re-initializes the
cloud doc from local defaults.

wait_delta's first-call drop-guard moves into ensure_initialized; its
invariant ("subscription == Some implies the sync ran to completion") is
the same.
@KennethKnudsen97 KennethKnudsen97 force-pushed the feat/shadow-ensure-initialized branch from 693976d to 26348e6 Compare May 19, 2026 13:32
Comment thread src/shadows/shadow/mod.rs
pub(crate) subscription: Mutex<NoopRawMutex, Option<C::Subscription<'m, 1>>>,
/// One-shot gate: `true` once the initial cloud sync has run.
/// Reset by `delete_shadow`. See `ensure_initialized`.
pub(crate) initialized: Mutex<NoopRawMutex, bool>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should just be an AtomicBool?

Kenneth Sylvest Knudsen and others added 10 commits May 20, 2026 09:29
- handle_delta owns lazy (re)subscribe + GET drain + apply/ack/return.
  Clean-session reconnect now also drains pending state via GET.
- wait_delta and sync_shadow drop ensure_initialized; sync_shadow flips
  the gate itself after a successful GET.
- sync_shadow_inner inlined into sync_shadow, signature changed to
  Result<(S, Option<S::Delta>), Error>.
- delete_shadow: drop ensure_initialized, treat 404 as idempotent success.
The Mutex serves a single-flight purpose: only one caller per shadow
runs the initial GET; the rest wait on the mutex and short-circuit.
AtomicBool loses that, so concurrent first-time callers would both
GET (and both 404→create) — wasteful and a potential cloud-side race
on a virgin shadow.
Clearing the subscription on apply_delta_and_ack failure forces a
re-drain via GET on the next call, which re-fetches the same bad
delta and loops indefinitely on invalid payloads. Leave the
subscription cached and wait on the next desired change instead.
@KennethKnudsen97 KennethKnudsen97 merged commit 4b42af5 into master May 21, 2026
5 checks passed
@KennethKnudsen97 KennethKnudsen97 deleted the feat/shadow-ensure-initialized branch May 21, 2026 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants