Skip to content

fix(pwa): auto-reload on iOS WebKit render freeze (Paint Timing self-heal)#221

Merged
jasoneplumb merged 2 commits into
mainlinefrom
fix/render-freeze-autoreload
Jun 5, 2026
Merged

fix(pwa): auto-reload on iOS WebKit render freeze (Paint Timing self-heal)#221
jasoneplumb merged 2 commits into
mainlinefrom
fix/render-freeze-autoreload

Conversation

@jasoneplumb

Copy link
Copy Markdown
Owner

Summary

The blank-on-cold-start is a whole-page render freeze in iOS WebKit (iOS 26 / WKWebView), not any content-level bug. On a blank load, neither the consent overlay nor the separate inline diagnostic overlay paints — two unrelated DOM elements both invisible — so the entire page fails to paint white. DOM is complete, bundle runs (bundleRan=true), service worker uninvolved (swController=no). User confirmed: only a full refresh clears it (scroll/tap/rotate do not), browser chrome normal. Content is never painted and won't be until a fresh navigation — which is why none of the prior content-level fixes could have worked.

Fix: Paint-Timing self-heal

Since a reload reliably clears it, detect and reload:

  • index.html inline script: ~3s after load, check the Paint Timing API for a first-contentful-paint entry. Absent ⇒ the page never painted ⇒ location.reload() once (sessionStorage-guarded). Runs independent of the bundle, so it works through the freeze. If Paint Timing is unavailable, treated as painted (never risk a loop). Complements the boot-watchdog (bundle-never-runs case).
  • consent.ts — revert the requestAnimationFrame mount from fix(consent): mount consent overlay after first paint to fix iOS blank screen #219 (wrong layer; 5/9-vs-3/7 suggested it slightly regressed).

This doesn't prevent the WKWebView freeze (un-patchable from JS), but the app now auto-recovers (brief white flash → reload → working) instead of a stuck dead page.

Test plan

  • type-check (app + worker), lint, test (96), build — all green; self-heal verified in built index.html
  • Real-device soak (Edge/iPhone) — clear data, cold-start ×N. Expect: any freeze now auto-reloads within ~3s and lands on a working page (brief white flash, no stuck blank). If a stuck blank still survives, Paint Timing isn't catching the freeze and we escalate (animation-start probe / CSS trigger bisect).

Notes

Once confirmed in the field, the temporary diagnostics (#207/#208 + SW-diag) can be removed and the SW simplified (it was never the cause).

Closes #220
Refs #218

🤖 Generated with Claude Code

…heal)

On-device evidence resolved the real nature of the blank-on-cold-start: on a
blank load, neither the consent overlay nor the (separate) inline diagnostic
overlay paints — two unrelated DOM elements both invisible. The whole page fails
to paint (white) on ~5/9 cold loads in Edge/iOS (iOS 26 / WKWebView). DOM is
complete, bundle runs (bundleRan=true), SW uninvolved (swController=no). User
confirms only a full refresh clears it — scroll/tap/rotate do not — and the
browser chrome is fine. That's a hard iOS-WebKit render-process freeze: content
is never painted and won't be until a fresh navigation. So none of the content-
level theories (service worker, consent timing) could have fixed it.

Since a reload reliably clears it, self-heal: ~3s after load, check the Paint
Timing API for a 'first-contentful-paint' entry; if absent, the page never
painted → reload once (sessionStorage-guarded against loops). Runs in the inline
script, independent of the bundle, so it works through the freeze. Complements
the boot-watchdog (which covers a bundle that never runs). If Paint Timing is
unavailable, treat as painted so we never risk a reload loop.

Also revert the requestAnimationFrame consent-overlay mount (#219): it targeted
the wrong layer and 5/9-vs-3/7 suggested it made things slightly worse.

Closes #220
Refs #218

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jasoneplumb jasoneplumb added bug Something isn't working review-requested Triggers Claude code review labels Jun 5, 2026
@claude

claude Bot commented Jun 5, 2026

Copy link
Copy Markdown

Code Review — PR #221

Overview

This PR fixes an iOS WebKit render-freeze (blank cold-start) by detecting the absence of a first-contentful-paint Paint Timing entry ~3s after load, then issuing one sessionStorage-guarded reload. It also reverts the requestAnimationFrame consent-overlay mount from #219 (confirmed wrong layer).

The diagnosis is solid: two unrelated DOM elements both invisible, DOM complete, bundle ran, SW uninvolved — that's a renderer freeze only a reload can clear. The self-heal approach is the right level to address it.


index.html — Render-freeze self-heal

Strengths

  • Correct use of var (engine-agnostic, matching existing inline-script convention)
  • Safe fallback: Paint Timing unavailable → painted = true → never risks a loop
  • sessionStorage guard prevents reload loops across the common case
  • Independent of the bundle, so it fires even through a render freeze

Potential issue — infinite reload if sessionStorage is fully unavailable

try { rAlready = sessionStorage.getItem(RKEY) === '1'; } catch (e) { rAlready = false; }
if (!rAlready) {
  try { sessionStorage.setItem(RKEY, '1'); } catch (e) {}
  location.reload();        // ← called even if setItem threw
}

If both getItem and setItem throw (e.g. storage blocked by ITP in a private-browsing edge case), rAlready stays false each reload and setItem never persists the guard — causing an infinite reload loop. The boot-watchdog above avoids this by gating the reload on persisted = true (it only reloads if setItem succeeded). Consider the same pattern here:

if (!rAlready) {
  var persisted = false;
  try { sessionStorage.setItem(RKEY, '1'); persisted = true; } catch (e) {}
  if (persisted) { location.reload(); }
}

Minor — first-load false positive on slow connections

The 3s timer starts when the inline script is parsed (before load), not after load. On a first install (SW not yet caching), a slow network could mean FCP hasn't fired within 3s even though the page is fine. The sessionStorage guard prevents a loop but adds an unnecessary reload. Not a regression vs. the current codebase, but worth noting for when temporary diagnostics are removed.

Interaction with boot-watchdog at the same 3000ms tick — benign

Both fire at 3000ms. If the bundle never ran AND the render froze, both watchdogs check their own distinct keys and both could call location.reload(). Only the first navigation takes effect. On the reloaded page, the boot-watchdog is blocked by KEY='1' and the render-freeze self-heal is blocked by RKEY='1'. No issue.


src/consent.ts — Revert of #219

Correct diagnosis

The data (5/9 vs 3/7) showing the rAF made things slightly worse, combined with identifying the root cause as a renderer-level freeze (not a paint-timing issue at the JS layer), makes reverting the right call.

Minor — document.getElementById vs panel.querySelector

The shift from scoped panel.querySelector('#consent-accept') to document.getElementById('consent-accept') loses encapsulation. It works correctly here (overlay is appended before listeners), but panel.querySelector is safer against hypothetical future ID collisions and makes the scope of the lookup explicit. Not blocking, just a mild style preference.

Focus timing

Calling focus() synchronously after appendChild (without rAF) matches the pre-#219 behavior that was working. Browsers may silently drop focus() if the document doesn't have user focus at that moment, but that's a pre-existing limitation, not introduced here.


Test coverage

The CI suite (type-check, lint, 96 unit tests, build) is the right bar for this change — the failure mode is device-specific rendering behavior that can't be unit-tested. The real-device soak plan in the test checklist is appropriate. One suggestion: document the expected sessionStorage key (webmap-render-reload) and its lifecycle in a brief comment alongside RKEY, so future readers know to check for it in browser devtools when debugging.


Summary

The approach is well-reasoned and correctly targeted at the platform-level root cause. The main concern worth addressing before merge is the potential infinite reload when sessionStorage is fully unavailable — gating location.reload() on a successful setItem (same pattern as the boot-watchdog) closes that hole. Everything else is minor.

Suggested action: Patch the setItem guard and it's ready for the device soak.

Prevents an infinite reload loop when sessionStorage is fully unavailable
(private mode / storage disabled): only reload if the guard key actually
persisted, mirroring the boot-watchdog. Addresses review feedback on #221.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jasoneplumb jasoneplumb merged commit 92d32c8 into mainline Jun 5, 2026
2 checks passed
@jasoneplumb jasoneplumb deleted the fix/render-freeze-autoreload branch June 5, 2026 19:41
@claude

claude Bot commented Jun 5, 2026

Copy link
Copy Markdown

Code Review

Good diagnosis and a clean, minimal fix. Here are my observations:


index.html — Paint Timing self-heal

Logic correctness

The guard logic is sound:

  • painted = true fallback on catch correctly prevents reload loops in environments without the Paint Timing API.
  • The two-step setItem → check rPersisted before calling reload() elegantly handles private-browsing / storage-disabled scenarios, mirroring the existing boot-watchdog pattern.
  • Clearing RKEY on the painted path (sessionStorage.removeItem(RKEY)) correctly resets the guard after a successful reload so it can fire again if needed in a future session.

Minor observations

  1. 3-second threshold and slow connections. This is a PWA with Workbox caching so subsequent loads should be fast, but on a genuine first install over a slow connection FCP could legitimately arrive after 3s — triggering an unnecessary reload. Consider whether 4–5s (matching the 5s blank-state probe below) would give more headroom without meaningfully delaying recovery. Alternatively, document the rationale for 3s explicitly (e.g. "matches the boot-watchdog above").

  2. sessionStorage.removeItem is safe to call unconditionally. The try/catch wrapper on removeItem(RKEY) is defensive and consistent with the rest of the script, but removeItem on a non-existent key never throws per spec. Not a bug, just noise.

  3. ES5 style is correct here. Using var and named function expressions (no arrow functions, no const/let) is the right call for an inline script that must survive any browser version before the module bundle is evaluated. Consistent with the existing watchdog above it.

  4. Interaction with the boot-watchdog. Both timers fire at ~3s and use separate sessionStorage keys, so they're independent. Worth a brief comment noting the keys are distinct to avoid future confusion: something like // separate guard key from 'webmap-boot-reload'.


src/consent.ts — rAF revert

Correctness

The revert is clean. Since the true cause of the blank screen is the WKWebView render freeze (not when the overlay is injected), the requestAnimationFrame workaround was solving the wrong problem and the simpler synchronous mount is correct.

One subtle shift: document.getElementById vs panel.querySelector

The old code wired listeners on the detached element using panel.querySelector, explicitly because getElementById requires the element to be in the document. The new code appends first, then uses document.getElementById — which is correct, but the ordering is now load-bearing. If the append line were ever moved below the listener-wiring in a future edit, getElementById would return null and the ! assertions would throw. A small panel.querySelector call here would be equally valid and slightly more defensive since it doesn't depend on document attachment:

// Slightly more robust — works whether or not overlay is in the document
panel.querySelector<HTMLElement>('#consent-accept')!.addEventListener('click', () => cleanup(true));
panel.querySelector<HTMLElement>('#consent-decline')!.addEventListener('click', () => cleanup(false));
// ...
(panel.querySelector('#consent-accept') as HTMLElement).focus();

That said, the current approach is functionally correct as written.

Focus: non-null assertion vs optional chain

The old code used ?.focus() (safe). The new code uses !.focus() (asserts non-null). Since the element was just appended, the assertion is safe, but the non-null assertion will crash harder than a silent no-op if something ever changes. Either style is fine; just flagging the difference.


Overall

  • Diagnosis is well-supported by the described evidence (bundleRan=true, swController=no, full-page white, only reload clears it).
  • The fix is appropriately scoped — it doesn't try to prevent the WKWebView freeze (which JS can't), it just auto-recovers.
  • Both changes are minimal and don't introduce new dependencies.
  • The test plan correctly identifies that real-device cold-start testing is the only definitive validation.

Nits aside, this looks good to ship. The 3s threshold question is the only one worth a quick deliberate decision before merging.

jasoneplumb added a commit that referenced this pull request Jun 5, 2026
…e worker

The Edge/iOS blank-on-cold-start was traced to an iOS-26 WKWebView whole-page
render freeze — not the app, the service worker, the consent modal, or first-paint
compositing (see #224 for the full record). Remove the investigation scaffolding now
that it's documented:

- index.html: drop the on-screen error/blank-state diagnostic overlay (#207/#208)
  and the inert Paint-Timing render-reload (#221). Keep the boot-watchdog (#206) —
  a legit, general recovery for a bundle that never executes (stale SW serving a
  404'd chunk), unrelated to this saga.
- src/sw.ts: remove the SW-side diagnostics (recordSwError + webmap-sw-diag cache)
  and the empty-body navigation check (it was probing the disproven "SW returns an
  empty navigation" theory). The SW keeps its sound shape: precache, network-first
  navigation with an install-verified precache offline fallback, tile
  StaleWhileRevalidate with purgeOnQuotaError, geocode NetworkOnly, manual
  SKIP_WAITING + clientsClaim.
- src/main.ts: remove surfaceSwDiagnostics() and its call/import.
- src/sw-constants.ts: remove the SW_DIAG_* constants.

Net -190 lines. Keeps the injectManifest architecture (working + deployed); a full
revert to generateSW is a possible future simplification but not worth destabilizing
a working SW here.

Refs #224

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working review-requested Triggers Claude code review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

iOS WebKit whole-page render freeze on cold load — auto-reload self-heal (Paint Timing)

1 participant