ci(mobile): stop OTA workflow from wiping CodePush release history#14339
Merged
Conversation
The OTA jobs called `code-push create-history` on every push and every
workflow_dispatch. That command writes a fresh history JSON containing
only the binary-version placeholder and uploads it to
s3://.../mobile-ota/histories/<platform>/<channel>/<binary>.json,
overwriting every prior entry. The subsequent `code-push release` step
then read that wiped file and wrote it back with just
`[placeholder, latest_OTA]`, so each run silently dropped every
previously-published OTA bundle for that binary version.
Live state today across all four files (verified via
download.audius.co/mobile-ota/histories/{ios,android}/{rc,production}/1.5.179.json):
iOS rc -> [1.5.179, 1.5.100776]
iOS prod -> [1.5.179, 1.5.100775]
Android rc -> [1.5.179, 1.5.100776]
Android prod -> [1.5.179, 1.5.100775]
Per packages/mobile/OTA_UPDATES.md, `create-history` is meant to be a
one-time init when shipping a new native binary, not part of the OTA
release loop. The client (packages/mobile/src/app/ota-updates.ts)
already substitutes a no-update placeholder if the history JSON is
missing or empty, so the call isn't needed to keep CodePush from
throwing "There is no latest release."
Also adds a per-platform concurrency group so two pushes landing close
together can't race on the same S3 history file: with the wipe gone,
each run still does a read-modify-write on the history JSON, and an
unserialized race would let a later writer clobber an earlier release's
entry. `cancel-in-progress: false` queues subsequent runs instead of
killing an in-flight publish mid-upload.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The OTA jobs in
.github/workflows/mobile.ymlcallcode-push create-historyon every push and everyworkflow_dispatch. That command writes a fresh history JSON containing only the binary-version placeholder and uploads it tos3://.../mobile-ota/histories/<platform>/<channel>/<binary>.json, overwriting every prior entry. The subsequentcode-push releasestep then reads that wiped file and writes it back with just[placeholder, latest_OTA], so each run silently drops every previously-published OTA bundle for that binary version.User-facing symptom: cycling
On RC, every push to main triggers a new OTA run. While each run is in flight, the history file sits in one of two bad states:
Wipe window (~3–5 minutes between
create-historyupload andreleaseupload): the file holds only the binary placeholder. A device opening the app and runningCodePush.sync()during this window fetches a history whose "latest enabled release" is the placeholder (emptydownloadUrl). That's treated as "no update available," and CodePush has no record of the OTA the device already has pending — so the pending-update banner (packages/mobile/src/components/ota-update-banner/OtaUpdateBanner.tsx:75) goes away.Concurrent runs racing on the same JSON: when two pushes land close together, both call
getReleaseHistory → setReleaseHistoryon the same S3 path. With the wipe gone there's still a read-modify-write race; with the wipe present it's worse because a latercreate-historytruncates back to[placeholder]before the earlier run'sreleasestep has finished its read-modify-write.Together those two flows produce the symptom the user described — banner appears, disappears a few minutes later when the next run's
create-historylands, then comes back when that run'sreleasestep writes a new bundle.Live evidence
All four history files currently hold exactly one real OTA each, with every earlier release gone:
download.audius.co/mobile-ota/histories/ios/rc/1.5.179.json→[1.5.179, 1.5.100776]download.audius.co/mobile-ota/histories/ios/production/1.5.179.json→[1.5.179, 1.5.100775]download.audius.co/mobile-ota/histories/android/rc/1.5.179.json→[1.5.179, 1.5.100776]download.audius.co/mobile-ota/histories/android/production/1.5.179.json→[1.5.179, 1.5.100775]Why removing the call is safe
packages/mobile/OTA_UPDATES.md:55 describes
create-historyas a one-time init when shipping a new native binary, not part of the OTA release loop. And the client (packages/mobile/src/app/ota-updates.ts:94) already substitutes a no-update placeholder when the history JSON is missing or empty, so the call isn't needed to keep CodePush from throwing "There is no latest release." On a brand-new binary versioncode-push releasewill see a 404 fromgetReleaseHistory, return{}, append its entry, and create the file from scratch with just the new OTA — which the client treats correctly.Concurrency group
Adds a per-platform concurrency group on both OTA jobs (
mobile-ota-release-rc-<platform>andmobile-ota-release-production-<platform>). With the wipe removed, each run still does a read-modify-write on the history JSON; an unserialized race between two pushes landing close together would let a later writer clobber an earlier release's entry.cancel-in-progress: falsequeues subsequent runs rather than killing a publish mid-upload.Out of scope
Test plan
python3 -c "import yaml; yaml.safe_load(...)").workflow_dispatchonrc) and confirmcode-push releaseno longer prints a wipe step — only the release/upload — and the resultinghistories/ios/rc/1.5.179.jsonkeeps the previous OTA entry and appends the new one.workflow_dispatch ota_channel=productionand confirm the production history file grows instead of resetting.🤖 Generated with Claude Code