Skip to content

fix(ext/web): handle Windows file paths in URL parsing#33097

Closed
renezander030 wants to merge 3 commits into
denoland:mainfrom
renezander030:fix/30363-windows-path-url
Closed

fix(ext/web): handle Windows file paths in URL parsing#33097
renezander030 wants to merge 3 commits into
denoland:mainfrom
renezander030:fix/30363-windows-path-url

Conversation

@renezander030

Copy link
Copy Markdown

Summary

Implements the WHATWG URL spec change (whatwg/url#874) to handle Windows-style file paths in URL parsing.

When new URL("C:\\path\\file.txt") is called, the parser now detects the Windows drive letter pattern (single ASCII alpha + : + \) and converts it to a file:/// URL with normalized forward slashes: file:///C:/path/file.txt.

Changes

  • ext/web/url.rs: Added maybe_convert_windows_path_to_file_url() preprocessing function that detects Windows drive letter patterns before the input reaches rust-url's parser. Called from parse_url() for both op_url_parse and op_url_parse_with_base code paths.
  • tests/unit/url_test.ts: Added tests covering basic paths, different drive letters, lowercase drives, mixed separators, paths with base URL, and URL.parse().

Spec details

The WHATWG URL spec change adds a check in the "scheme start state": when the parser encounters a single ASCII alpha character as a potential scheme, followed by : and \, it recognizes this as a Windows drive path rather than a URL scheme. It then sets scheme to file, host to empty string, and transitions to path state.

Test plan

  • Unit tests added in tests/unit/url_test.ts
  • WPT tests will pass once web-platform-tests/wpt#53459 lands and the test expectations are updated

Fixes #30363

Implement WHATWG URL spec change (url#874) to detect Windows drive
letter patterns (e.g., C:\path\file.txt) in the URL parser's scheme
start state and automatically convert them to file:/// URLs
(file:///C:/path/file.txt).

The spec adds a check: when parsing encounters a single ASCII alpha
letter as the scheme buffer, followed by ':' and '\', it recognizes
this as a Windows drive path rather than a URL scheme. The parser then
sets the scheme to "file", the host to empty string, and transitions
to path state with backslashes normalized to forward slashes.

This is implemented as a preprocessing step in the Rust parse_url
function, before the input reaches the rust-url crate parser.
@Hajime-san

Copy link
Copy Markdown
Contributor

@renezander030

Copy link
Copy Markdown
Author

Thanks for the pointer. I'm aware of that comment. The difference here is that this implements the WHATWG URL spec change (url#874), which is being tracked by Chromium, Gecko, and WebKit as well.

The upstream Rust url crate follows the WHATWG spec. Since the spec PR hasn't merged yet, the crate won't add this behavior until it does. So if Deno wants to ship this ahead of the spec landing, the conversion has to live in Deno for now.

The implementation is intentionally minimal and isolated (one function). Easy to remove once the upstream crate picks it up.

Happy to hear from the maintainers on whether they'd prefer to wait for upstream or ship early.

@CLAassistant

CLAassistant commented Apr 2, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@fibibot fibibot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substance covers the most common Windows-path shape (<letter>:\<rest>) and the basic conversion logic looks right, but a few gaps worth resolving before merge.

What's right

  • Drive-letter detection (bytes[0].is_ascii_alphabetic() && bytes[1] == b':' && bytes[2] == b'\\') correctly requires a single ASCII alpha — Cabc:\path doesn't match because bytes[1] is 'a' not ':'. ✓
  • Drive-letter case preservation: let drive_letter = bytes[0] as char keeps c:\path as file:///c:/path (lowercase). Worth confirming this matches the WHATWG spec — some path normalizations uppercase the drive letter, others preserve. The PR's test asserts the preservation behavior, so at least it's pinned.
  • Base-URL precedence: the comment correctly notes that Windows drive letters indicate absolute paths and override any provided base. Spec-conformant.
  • parse_url integration: the conversion happens once, before either op_url_parse or op_url_parse_with_base reach rust-url. No path-by-path duplication.

Gaps to address

1. Forward-slash separator not handled

The pattern check is bytes[2] == b'\\' — backslash only. A common JS-code shape is new URL("C:/path/file.txt") (forward slash), which today produces file: URL with host="path"-style breakage. Per WHATWG url#874's "normalize the windows-drive-letter quirk," the conversion should fire for both separators.

One-liner fix:

if bytes.len() >= 3
  && bytes[0].is_ascii_alphabetic()
  && bytes[1] == b':'
  && (bytes[2] == b'\\' || bytes[2] == b'/')

Worth adding a test:

assertEquals(new URL("C:/path/file.txt").href, "file:///C:/path/file.txt");

Reverting just the bytes[2] == b'/' clause should fail this test.

2. No URL encoding of path content

let path = rest.replace('\\', "/") substitutes separators but doesn't percent-encode special characters. So new URL("C:\\path with space\\file.txt") produces file:///C:/path with space/file.txt — which then goes through Url::options().parse(...) which presumably handles encoding. But it's worth confirming with a test:

assertEquals(
  new URL("C:\\path with space\\file.txt").href,
  "file:///C:/path%20with%20space/file.txt",
);

If rust-url normalizes the spaces, this test passes. If not, the output href would have a literal space which Node would reject. Either way, pinning the behavior is useful.

3. WHATWG spec citation needs verification

The PR body cites whatwg/url#874 but I haven't confirmed:

  • Whether the spec change has actually been merged into the WHATWG URL Living Standard
  • The exact normalization behavior (uppercase vs preserve drive letter, separator handling)
  • Whether other URL implementations (Chrome, Firefox, Node) have shipped this

If the spec change is still in flight or differs from this implementation, Deno could end up with non-standard behavior. Worth either citing the merged spec text or noting in the code that this matches the proposed change rather than current spec.

CI not yet run

Only license/cla (signed) + lint title (pass) reported. Substance shards (test unit for tests/unit/url_test.ts) haven't kicked off — looks like first-time-contributor action_required gating. PR is also stale (created 2026-03-31, ~33 days old). A maintainer-authorize-CI cycle plus the forward-slash case fix would clear the substance gate.

Substance is APPROVE-shape-with-fixes — the basic conversion is right, just needs the forward-slash branch + URL-encoding test + spec citation verification.

@fibibot fibibot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right idea (implementing the WHATWG URL spec change url#874 for Windows-drive paths), but the heuristic over-matches and breaks existing valid URLs of the form letter:/... that should be parsed as scheme+authority. WPT surfaces it directly.

Specific failure mode

wpt release linux-x86_64 reports four URL-parsing regressions on web-platform.test/url/url-setters.any.js:

assert_equals: href expected "c:/foo"     but got "file:///c:/foo"
assert_equals: href expected "h://."      but got "file:///h://"
assert_equals: href expected "w://x:0"    but got "file:///w://x:0"
assert_equals: expected "a://example.net" but got "file:///a://example.net"

All four are caused by maybe_convert_windows_path_to_file_url matching letter:/... strings that are actually scheme+authority URLs. The heuristic:

bytes[0].is_ascii_alphabetic()
  && bytes[1] == b':'
  && (bytes[2] == b'\\' || bytes[2] == b'/')

catches the legitimate Windows-path cases (C:\path, C:/path) but also catches a://example.net because the : and the first / after it match - even though the second / of :// makes it a scheme+authority URL, not a drive path.

Fix direction

Distinguish C:/path from a://example.net by checking the character at position 3:

if bytes.len() >= 3
  && bytes[0].is_ascii_alphabetic()
  && bytes[1] == b':'
  && (bytes[2] == b'\\'
    || (bytes[2] == b'/' && bytes.get(3).copied() != Some(b'/')))

This rejects : followed by // (scheme-relative form) while still accepting the single-/ Windows-drive case. Two of the WPT regressions (h://. and w://x:0) are pure :// forms and would pass with this change.

For the remaining two (c:/foo and a://example.net):

  • c:/foo - single c:/ is genuinely ambiguous between Windows-drive and c: scheme. Per WHATWG url#874, the spec resolves the ambiguity in favor of Windows-drive treatment but ONLY in path-style contexts, not when the URL is being parsed as a scheme. The simplest implementation that matches the spec is to check whether c: is a known scheme - if yes, treat as scheme; if no, treat as Windows drive. Or, conservatively: only trigger the conversion for \\ (backslash) and require explicit file:// for forward-slash drive paths. The latter is what Node does today.
  • a://example.net - :// form, fixed by the position-3 check above.

Substance walk

  • Spec reference (whatwg/url#874) is correctly cited in the comment. The implementation diverges from the spec in the :// non-distinction. Worth re-reading the PR's spec text to confirm which forms it actually converts.
  • Drive-letter detection correctly accepts both upper and lower case (bytes[0].is_ascii_alphabetic()).
  • Path normalization (rest.replace('\\', "/")) correctly converts backslashes in the path portion.
  • Test coverage - the 8 added test cases all exercise legitimate Windows paths (C:\path, c:\folder, C:/path, mixed separators, with-base, percent-encoding). None exercise the letter:// ambiguity that's regressing. Adding negative test cases (e.g., assertEquals(new URL("a://example.net").href, "a://example.net")) before fixing the heuristic would lock the fix in.

CI

97 success, 1 skipped, 2 fail. The 2 failures are the WPT regressions above + the ci status rollup. No node_compat or other surface affected - this is purely the URL-parsing regression.

Not blocking the spec direction - just need the heuristic narrowed before this can land cleanly.

Address review feedback (fibibot CHANGES_REQUESTED): the previous
forward-slash extension over-matched scheme URLs (`a://example.net`,
`c:/foo`, `h://.`, `w://x:0`) and broke 4 WPT url-setters tests.

Revert the heuristic to backslash-only (matches Node behavior) and add
a regression test pinning the four formerly-regressing scheme URLs to
their unmodified parse output. Forward-slash drive paths now require
explicit `file:///C:/path`; supporting bare `C:/path` cleanly needs a
scheme-aware check that I'm leaving for a follow-up.
@renezander030 renezander030 force-pushed the fix/30363-windows-path-url branch from 02a5b48 to 1257494 Compare May 5, 2026 09:39
@renezander030

Copy link
Copy Markdown
Author

Thanks for the catch. The forward-slash extension regressed scheme URLs (a://example.net, c:/foo, h://., w://x:0) — that's a real bug, not a flaky WPT.

Reverted to backslash-only in 1257494 and added a regression test pinning all four formerly-regressing URLs to their unmodified parse output. Forward-slash drive paths (C:/path) now require explicit file:///C:/path, which matches Node's current behavior.

Going scheme-aware (so bare C:/path works without breaking c:/foo) is doable but needs a known-scheme registry check that I'd rather land as a follow-up once this baseline merges. Happy to spin that up after if it's wanted.

On the spec citation point from your earlier review: whatwg/url#874 is still an open PR, not merged into the Living Standard. I've left the PR body reference in place but happy to soften it to "tracks proposed change in whatwg/url#874" if that reads better.

@crowlKats

Copy link
Copy Markdown
Member

hey, apologies on the late reply.
We would rather have this land in rust-url even if we have to wait a bit, we dont have any pressing need for this, especially since spec is still not merged and has been open for quite a long time, so unclear when exactly it will actually land.

@fibibot fibibot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clears my prior REQUEST_CHANGES at 02a5b48c. The four WPT regressions I identified (c:/foo, h://., w://x:0, a://example.net) are fixed by the conservative path I called out as fix direction (b) — narrow the heuristic to backslash-only and rely on explicit file:/// for forward-slash drive paths. CI fully green at this HEAD: 134 success, 2 skipping, 0 fail.

What I verified

The heuristic change (ext/web/url.rs:74-77):

if bytes.len() >= 3
  && bytes[0].is_ascii_alphabetic()
  && bytes[1] == b':'
  && bytes[2] == b'\\'   // was: (bytes[2] == b'\\' || bytes[2] == b'/')

Dropping the || bytes[2] == b'/' clause closes the entire ambiguity surface — c:/foo, c:something, and letter://... all now flow through to standard URL parsing as scheme URLs, exactly like Node treats them.

The test churn matches the heuristic change:

  • urlWindowsFilePathForwardSlash — removed (was asserting C:/path/file.txtfile:///, which is no longer the contract).
  • The forward-slash assertion in urlWindowsFilePathPercentEncodesSpaces — removed (kept only the backslash one).
  • urlSchemeAuthorityNotMisinterpretedAsDrive — added, asserts the four exact WPT shapes that were regressing (a://example.net, h://., w://x:0, c:/foo) now round-trip unchanged.

Doc comment (lines 60-65) explicitly calls out the intentional non-conversion of forward-slash drives and points users at file:///C:/path as the explicit form. ✓

Behavior parity check: Node's pathToFileURL only converts backslash-style Windows paths; forward-slash drive paths in raw new URL(...) produce c: scheme URLs there too. So the trimmed scope here matches the upstream contract the PR description claims.

CI

134 success, 2 skipping, 0 fail. WPT release linux-x86_64 (the shard I previously called out as failing four URL-setters assertions) is now green at this HEAD.

No concerns inline.

@renezander030

Copy link
Copy Markdown
Author

Makes sense. I'll take this to rust-url and revisit here once it lands upstream and a Deno version bump pulls it in. Closing this PR in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Windows file path handling in URL parsing (WHATWG URL #874)

5 participants