What is the issue with the HTML Standard?
The navigate algorithm claims the navigable’s ongoing navigation and then, in an “in parallel” block, performs an asynchronous unload check before continuing. Three of its steps abandon the navigation when the ongoing navigation is, or becomes, a traversal:
- Step 18 — “If navigable’s ongoing navigation is
traversal: 1. Invoke WebDriver BiDi navigation failed … 2. Return.”
- Step 19 — “Set the ongoing navigation for navigable to navigationId. This will have the effect of aborting other ongoing navigations of navigable, since at certain points during navigation changes to the ongoing navigation will cause further work to be abandoned.”
- Step 23.2 (inside the in-parallel block, after the unload check) — “If unloadPromptCanceled is not
continue, or navigable’s ongoing navigation is no longer navigationId: 1. Invoke WebDriver BiDi navigation failed … 2. Abort these steps.”
Step 18 catches a traversal that’s already ongoing when the navigation begins. But a traversal can also begin after step 19 and before step 23.2 — that is, during the navigation’s own asynchronous unload check. Apply the history step sets a navigable’s ongoing navigation to "traversal", and it runs on the session history traversal queue — concurrently with the navigate algorithm’s in-parallel block. When that interleaving happens, step 23.2 observes that the ongoing navigation is no longer navigationId — and aborts the navigation.
The problem is what happens next: Nothing. The spec prescribes no recovery, no re-queue, and no retry for the abandoned navigation. And it provides no guarantee that the traversal that re-stamped the ongoing navigation will itself produce a document or fire a load for that navigable.
Step 19’s note frames the abandonment as intentional — but its stated rationale (“aborting other ongoing navigations”) is about a newer navigation superseding an older one — where the newer navigation goes on to complete and fire its own load.
But a traversal is different: It can be a same-document traversal, or otherwise resolve to no document change for the navigable — in which case, the abandoned navigation’s intended load is simply lost and the navigable is left with no load event ever firing.
So a navigation and a concurrent traversal of the same navigable can race such that the navigation is silently dropped and never realized.
And so this is kind of a sibling of #12576: It’s the same navigation-vs-traversal concurrency — but where #12576 corrupts session-history step numbers, this entirely loses a navigation.
Implementations
In Ladybird, this showed up as an intermittent CI hang: A cross-document navigation (the about:blank load our harness uses to reset between tests) is dropped when a traversal left draining by a prior test re-stamps the navigable’s ongoing navigation during its unload check. The load never completes, and the navigable wedges.
LadybirdBrowser/ladybird#10122 describes that, and LadybirdBrowser/ladybird#10123 has the code change I ended up implementing for it.
Along with making a change for this in Ladybird, I looked at the code in other engines. None reproduces the silent drop: In all three, an in-flight navigation can be cancelled when a same-frame traversal takes over — but the traversal is then structurally guaranteed to drive the navigable to a committed load. No engine does the spec’s “compare the ongoing-navigation identity, and if it changed, abort with nothing loading.”
| Engine |
How the silent-drop is avoided |
Telling artifact |
| Chromium |
Single FrameTreeNode::navigation_request_ slot; a new navigation/traversal aborts the in-flight one but is installed before the async beforeunload window, and on resume the browser drives whoever owns the slot now (Navigator::BeforeUnloadCompleted re-reads navigation_request(), not a captured id); the arriving commit is identity-matched (no match = renderer kill) |
A comment calls the takeover-during-beforeunload case explicitly safe; open TODOs for the residual edges — history.pushState() racing a pending entry (https://crbug.com/41437754), a beforeunload ack for a different navigation (https://crbug.com/402545469) |
| WebKit |
Single provisional DocumentLoader (a second is forbidden by assertion); beforeunload (shouldClose()) is dispatched synchronously at commit under a NavigationDisabler, so there is no separable in-parallel stage to take over; a same-frame back/forward calls stopAllLoaders up front and becomes the sole provisional load |
Long-standing FIXME for a lost provisional item: https://bugs.webkit.org/show_bug.cgi?id=146842 |
| Gecko (session-history-in-parent) |
Session history is authoritative in the parent, which guarantees the traversal commits a load; in-flight loads are an id-keyed list (CanonicalBrowsingContext::mLoadingEntries), not a clobberable slot; and where it has the spec’s resume-and-re-check shape (nsDocShell), the mismatch case drops only stale state and still calls LoadURI — it proceeds where the spec aborts |
FIXMEs in the same code: “UpdateIndex() here may update index too early”, “XXX Should the loading entries before [i] be removed?” |
So, similar to what I reported in #12576, it seems that in this case also, engines have additional extra-spectual bookkeeping (id-keyed loading lists, identity-matched commits, parent-authoritative liveness) they’ve ended up implementing. (Plus documented residual races).
Possible directions
Engine behavior suggests two complementary fixes — and at least one is already demonstrated in practice:
- Don’t abort on an identity mismatch caused by a traversal — proceed. When the ongoing navigation has become
traversal (as opposed to a newer navigation’s id), step 23.2 shouldn’t abandon the navigation: Instead, defer it and re-run it once the traversal completes (my Ladybird fix), or else just proceed with the load, and drop only the now-stale state (exactly what Gecko already does in its CurrentLoadIdentifier re-check).
- Guarantee the traversal re-drives the navigable. Make abandoning the navigation actually safe by requiring that the traversal which re-stamped the ongoing navigation drives that navigable to a document, or fires its load — which is what every other engine I looked at relies on structurally (Chromium drives whatever owns the slot to commit; WebKit’s traversal becomes the sole provisional load; Gecko’s parent-authoritative session history initiates and commits the load).
- Distinguish the two reasons the ongoing navigation can change: superseded by a newer navigation versus taken over by a traversal — and define the handoff for the traversal case so the navigation’s intended load isn’t silently lost.
What is the issue with the HTML Standard?
The navigate algorithm claims the navigable’s ongoing navigation and then, in an “in parallel” block, performs an asynchronous unload check before continuing. Three of its steps abandon the navigation when the ongoing navigation is, or becomes, a traversal:
traversal: 1. Invoke WebDriver BiDi navigation failed … 2. Return.”continue, or navigable’s ongoing navigation is no longer navigationId: 1. Invoke WebDriver BiDi navigation failed … 2. Abort these steps.”Step 18 catches a traversal that’s already ongoing when the navigation begins. But a traversal can also begin after step 19 and before step 23.2 — that is, during the navigation’s own asynchronous unload check. Apply the history step sets a navigable’s ongoing navigation to
"traversal", and it runs on the session history traversal queue — concurrently with the navigate algorithm’s in-parallel block. When that interleaving happens, step 23.2 observes that the ongoing navigation is no longer navigationId — and aborts the navigation.The problem is what happens next: Nothing. The spec prescribes no recovery, no re-queue, and no retry for the abandoned navigation. And it provides no guarantee that the traversal that re-stamped the ongoing navigation will itself produce a document or fire a load for that navigable.
Step 19’s note frames the abandonment as intentional — but its stated rationale (“aborting other ongoing navigations”) is about a newer navigation superseding an older one — where the newer navigation goes on to complete and fire its own load.
But a traversal is different: It can be a same-document traversal, or otherwise resolve to no document change for the navigable — in which case, the abandoned navigation’s intended load is simply lost and the navigable is left with no load event ever firing.
So a navigation and a concurrent traversal of the same navigable can race such that the navigation is silently dropped and never realized.
And so this is kind of a sibling of #12576: It’s the same navigation-vs-traversal concurrency — but where #12576 corrupts session-history step numbers, this entirely loses a navigation.
Implementations
In Ladybird, this showed up as an intermittent CI hang: A cross-document navigation (the
about:blankload our harness uses to reset between tests) is dropped when a traversal left draining by a prior test re-stamps the navigable’s ongoing navigation during its unload check. The load never completes, and the navigable wedges.LadybirdBrowser/ladybird#10122 describes that, and LadybirdBrowser/ladybird#10123 has the code change I ended up implementing for it.
Along with making a change for this in Ladybird, I looked at the code in other engines. None reproduces the silent drop: In all three, an in-flight navigation can be cancelled when a same-frame traversal takes over — but the traversal is then structurally guaranteed to drive the navigable to a committed load. No engine does the spec’s “compare the ongoing-navigation identity, and if it changed, abort with nothing loading.”
FrameTreeNode::navigation_request_slot; a new navigation/traversal aborts the in-flight one but is installed before the async beforeunload window, and on resume the browser drives whoever owns the slot now (Navigator::BeforeUnloadCompletedre-readsnavigation_request(), not a captured id); the arriving commit is identity-matched (no match = renderer kill)history.pushState()racing a pending entry (https://crbug.com/41437754), a beforeunload ack for a different navigation (https://crbug.com/402545469)DocumentLoader(a second is forbidden by assertion); beforeunload (shouldClose()) is dispatched synchronously at commit under aNavigationDisabler, so there is no separable in-parallel stage to take over; a same-frame back/forward callsstopAllLoadersup front and becomes the sole provisional loadCanonicalBrowsingContext::mLoadingEntries), not a clobberable slot; and where it has the spec’s resume-and-re-check shape (nsDocShell), the mismatch case drops only stale state and still callsLoadURI— it proceeds where the spec abortsSo, similar to what I reported in #12576, it seems that in this case also, engines have additional extra-spectual bookkeeping (id-keyed loading lists, identity-matched commits, parent-authoritative liveness) they’ve ended up implementing. (Plus documented residual races).
Possible directions
Engine behavior suggests two complementary fixes — and at least one is already demonstrated in practice:
traversal(as opposed to a newer navigation’s id), step 23.2 shouldn’t abandon the navigation: Instead, defer it and re-run it once the traversal completes (my Ladybird fix), or else just proceed with the load, and drop only the now-stale state (exactly what Gecko already does in itsCurrentLoadIdentifierre-check).