feat(shadows): gate publish methods on initial cloud sync#128
Merged
Conversation
a00bd68 to
693976d
Compare
Before this change, calling update_reported (or update_desired) on a
fresh shadow could race with wait_delta's first-call subscribe + GET.
If update_reported won, AWS would auto-create the shadow doc with only
the field that arrived first, leaving the rest of the doc permanently
missing. This was hit in factbird-mini after factory reset and
re-provisioning: the connection reporter pushed reported.state=Disabled
before the wifi delta loop did its initial sync, so the new customer's
shadow ended up holding just {reported: {state: "disabled"}} forever.
Add an `initialized: Mutex<NoopRawMutex, bool>` to Shadow and a private
ensure_initialized() that takes the lock, runs the subscribe + GET +
404-fallback once, and flips the flag. update_reported, update_desired,
and wait_delta now await ensure_initialized before doing anything else.
Concurrent first-callers serialize on the lock; subsequent callers fall
through on the first instruction.
Not re-run on clean-session reconnect — the cloud doc still exists, only
the delta subscription needs re-establishing (handle_delta resubscribes
without re-syncing). Reset to false by delete_shadow so a follow-up
publish (e.g. button-driven "reset wifi" where the front-end expects to
configure new credentials immediately afterwards) re-initializes the
cloud doc from local defaults.
wait_delta's first-call drop-guard moves into ensure_initialized; its
invariant ("subscription == Some implies the sync ran to completion") is
the same.
693976d to
26348e6
Compare
MathiasKoch
requested changes
May 19, 2026
| pub(crate) subscription: Mutex<NoopRawMutex, Option<C::Subscription<'m, 1>>>, | ||
| /// One-shot gate: `true` once the initial cloud sync has run. | ||
| /// Reset by `delete_shadow`. See `ensure_initialized`. | ||
| pub(crate) initialized: Mutex<NoopRawMutex, bool>, |
Member
There was a problem hiding this comment.
Seems like this should just be an AtomicBool?
MathiasKoch
approved these changes
May 20, 2026
- handle_delta owns lazy (re)subscribe + GET drain + apply/ack/return. Clean-session reconnect now also drains pending state via GET. - wait_delta and sync_shadow drop ensure_initialized; sync_shadow flips the gate itself after a successful GET. - sync_shadow_inner inlined into sync_shadow, signature changed to Result<(S, Option<S::Delta>), Error>. - delete_shadow: drop ensure_initialized, treat 404 as idempotent success.
The Mutex serves a single-flight purpose: only one caller per shadow runs the initial GET; the rest wait on the mutex and short-circuit. AtomicBool loses that, so concurrent first-time callers would both GET (and both 404→create) — wasteful and a potential cloud-side race on a virgin shadow.
Clearing the subscription on apply_delta_and_ack failure forces a re-drain via GET on the next call, which re-fetches the same bad delta and loops indefinitely on invalid payloads. Leave the subscription cached and wait on the next desired change instead.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the race where a partial
update_reportedcould land on a brand-new shadow beforewait_delta's first GET, leaving the cloud doc permanently incomplete.Hit in factbird-mini PR #666 after factory reset + re-provisioning:
update_reported(state: Disabled).wait_deltawhich on first-call subscribes + GETs.reported.state = "disabled".wait_delta's GET runs, the shadow exists (200, not 404) —create_shadowfallback never fires.{ "state": { "reported": { "state": "disabled" } } }and nothing else, forever.Change
initialized: Mutex<NoopRawMutex, bool>toShadow.ensure_initialized()that takes the lock and runs subscribe + GET + 404-fallback exactly once.update_reported,update_desired,wait_deltaall awaitensure_initializedfirst.delete_shadowresets the flag so a follow-up publish (e.g. button-driven "reset wifi" where the front-end expects to configure new credentials right after) re-initializes the cloud doc from local defaults.Properties
create_shadowandsync_shadoware deliberately not gated — they're explicit force-push operations and gating them would create a recursion viaget_shadow_from_cloud's 404 fallback.wait_deltaintoensure_initialized; the cancellation invariant ("subscription == Some implies the sync ran to completion") is preserved.Test plan
cargo checkwith shadow features cleancargo test --lib shadows— all 138 unit tests passupdate_reportedfromConnectionReporterno longer creates a partial cloud doc on a fresh thingdelete_shadow→update_reportedre-initializes the cloud doc