Skip to content

Session history: a navigation is silently abandoned, with no recovery, when a traversal re-stamps the navigable’s ongoing navigation during the navigation’s in-parallel unload check #12581

@sideshowbarker

Description

@sideshowbarker

What is the issue with the HTML Standard?

The navigate algorithm claims the navigable’s ongoing navigation and then, in an “in parallel” block, performs an asynchronous unload check before continuing. Three of its steps abandon the navigation when the ongoing navigation is, or becomes, a traversal:

  • Step 18 — “If navigable’s ongoing navigation is traversal: 1. Invoke WebDriver BiDi navigation failed … 2. Return.”
  • Step 19 — “Set the ongoing navigation for navigable to navigationId. This will have the effect of aborting other ongoing navigations of navigable, since at certain points during navigation changes to the ongoing navigation will cause further work to be abandoned.”
  • Step 23.2 (inside the in-parallel block, after the unload check) — “If unloadPromptCanceled is not continue, or navigable’s ongoing navigation is no longer navigationId: 1. Invoke WebDriver BiDi navigation failed … 2. Abort these steps.”

Step 18 catches a traversal that’s already ongoing when the navigation begins. But a traversal can also begin after step 19 and before step 23.2 — that is, during the navigation’s own asynchronous unload check. Apply the history step sets a navigable’s ongoing navigation to "traversal", and it runs on the session history traversal queue — concurrently with the navigate algorithm’s in-parallel block. When that interleaving happens, step 23.2 observes that the ongoing navigation is no longer navigationId — and aborts the navigation.

The problem is what happens next: Nothing. The spec prescribes no recovery, no re-queue, and no retry for the abandoned navigation. And it provides no guarantee that the traversal that re-stamped the ongoing navigation will itself produce a document or fire a load for that navigable.

Step 19’s note frames the abandonment as intentional — but its stated rationale (“aborting other ongoing navigations”) is about a newer navigation superseding an older one — where the newer navigation goes on to complete and fire its own load.

But a traversal is different: It can be a same-document traversal, or otherwise resolve to no document change for the navigable — in which case, the abandoned navigation’s intended load is simply lost and the navigable is left with no load event ever firing.

So a navigation and a concurrent traversal of the same navigable can race such that the navigation is silently dropped and never realized.

And so this is kind of a sibling of #12576: It’s the same navigation-vs-traversal concurrency — but where #12576 corrupts session-history step numbers, this entirely loses a navigation.

Implementations

In Ladybird, this showed up as an intermittent CI hang: A cross-document navigation (the about:blank load our harness uses to reset between tests) is dropped when a traversal left draining by a prior test re-stamps the navigable’s ongoing navigation during its unload check. The load never completes, and the navigable wedges.

LadybirdBrowser/ladybird#10122 describes that, and LadybirdBrowser/ladybird#10123 has the code change I ended up implementing for it.

Along with making a change for this in Ladybird, I looked at the code in other engines. None reproduces the silent drop: In all three, an in-flight navigation can be cancelled when a same-frame traversal takes over — but the traversal is then structurally guaranteed to drive the navigable to a committed load. No engine does the spec’s “compare the ongoing-navigation identity, and if it changed, abort with nothing loading.”

Engine How the silent-drop is avoided Telling artifact
Chromium Single FrameTreeNode::navigation_request_ slot; a new navigation/traversal aborts the in-flight one but is installed before the async beforeunload window, and on resume the browser drives whoever owns the slot now (Navigator::BeforeUnloadCompleted re-reads navigation_request(), not a captured id); the arriving commit is identity-matched (no match = renderer kill) A comment calls the takeover-during-beforeunload case explicitly safe; open TODOs for the residual edges — history.pushState() racing a pending entry (https://crbug.com/41437754), a beforeunload ack for a different navigation (https://crbug.com/402545469)
WebKit Single provisional DocumentLoader (a second is forbidden by assertion); beforeunload (shouldClose()) is dispatched synchronously at commit under a NavigationDisabler, so there is no separable in-parallel stage to take over; a same-frame back/forward calls stopAllLoaders up front and becomes the sole provisional load Long-standing FIXME for a lost provisional item: https://bugs.webkit.org/show_bug.cgi?id=146842
Gecko (session-history-in-parent) Session history is authoritative in the parent, which guarantees the traversal commits a load; in-flight loads are an id-keyed list (CanonicalBrowsingContext::mLoadingEntries), not a clobberable slot; and where it has the spec’s resume-and-re-check shape (nsDocShell), the mismatch case drops only stale state and still calls LoadURI — it proceeds where the spec aborts FIXMEs in the same code: “UpdateIndex() here may update index too early”, “XXX Should the loading entries before [i] be removed?”

So, similar to what I reported in #12576, it seems that in this case also, engines have additional extra-spectual bookkeeping (id-keyed loading lists, identity-matched commits, parent-authoritative liveness) they’ve ended up implementing. (Plus documented residual races).

Possible directions

Engine behavior suggests two complementary fixes — and at least one is already demonstrated in practice:

  • Don’t abort on an identity mismatch caused by a traversal — proceed. When the ongoing navigation has become traversal (as opposed to a newer navigation’s id), step 23.2 shouldn’t abandon the navigation: Instead, defer it and re-run it once the traversal completes (my Ladybird fix), or else just proceed with the load, and drop only the now-stale state (exactly what Gecko already does in its CurrentLoadIdentifier re-check).
  • Guarantee the traversal re-drives the navigable. Make abandoning the navigation actually safe by requiring that the traversal which re-stamped the ongoing navigation drives that navigable to a document, or fires its load — which is what every other engine I looked at relies on structurally (Chromium drives whatever owns the slot to commit; WebKit’s traversal becomes the sole provisional load; Gecko’s parent-authoritative session history initiates and commits the load).
  • Distinguish the two reasons the ongoing navigation can change: superseded by a newer navigation versus taken over by a traversal — and define the handoff for the traversal case so the navigation’s intended load isn’t silently lost.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions