Skip to content

Implement WHATWG URL spec change for Windows drive letter paths (whatwg/url#874)#1121

Closed
renezander030 wants to merge 1 commit into
servo:mainfrom
renezander030:fix/whatwg-url-874-windows-paths
Closed

Implement WHATWG URL spec change for Windows drive letter paths (whatwg/url#874)#1121
renezander030 wants to merge 1 commit into
servo:mainfrom
renezander030:fix/whatwg-url-874-windows-paths

Conversation

@renezander030

Copy link
Copy Markdown

Summary

Implements whatwg/url#874: in the URL parser's scheme start state, <ASCII alpha> : \ is recognized as a Windows drive letter path and parsed into a file:///<drive>:/... URL.

Url::parse(r"C:\path\file.txt")?;        // file:///C:/path/file.txt
Url::parse(r"D:\foo\bar.exe")?;          // file:///D:/foo/bar.exe
Url::parse(r"c:\folder\file.txt")?;      // file:///c:/folder/file.txt  (case preserved)

Conservative scope — only the backslash shape triggers the conversion. Forward-slash drive paths (c:/foo) are intentionally not rewritten because <alpha>:/ is ambiguous with single-letter scheme URLs (c: scheme, a://example.net, h://., w://x:0). Tests guard against regressing those.

Status: draft — pending whatwg/url#874 merge

The spec PR is still open (last activity 2025-11-28). Opening this as a draft so the implementation can be reviewed in parallel and land quickly once the spec merges. The spec PR cites multi-implementer interest (Ladybird, jsdom/Node, Chromium, Gecko, WebKit, Deno).

Context: a Deno-side port (denoland/deno#33097) was redirected upstream to rust-url by @crowlKats — "we would rather have this land in rust-url even if we have to wait a bit." This PR is that upstream landing.

Implementation

url/src/parser.rs:

  • New helper starts_with_windows_drive_letter_path peeks 3 chars for <alpha>:\ (placed alongside the existing starts_with_windows_drive_letter_segment family).
  • New method parse_windows_drive_letter_path runs the spec's path state directly: pushes file://, sets empty host, dispatches to parse_path with the original input. parse_path handles \/ normalization for special schemes, producing /C:/path/file.txt.
  • parse_url checks the new helper before parse_scheme, short-circuiting only when the pattern matches.

The Windows-drive shortcut deliberately drops any base URL — the spec's scheme start state ignores base when scheme is set, and <alpha>:\ is unambiguously absolute.

Tests (url/tests/unit.rs)

8 new unit tests:

  • windows_drive_path_basicC:\path\file.txtfile:///C:/path/file.txt
  • windows_drive_path_different_drivesD:, Z:
  • windows_drive_path_preserves_drive_case — lowercase + uppercase preserved
  • windows_drive_path_mixed_separators\ and / interchangeable in body
  • windows_drive_path_percent_encodes_spaces — path encoding still applies
  • windows_drive_path_drops_base — base URL ignored when shortcut fires
  • windows_drive_path_with_query_and_fragment?q=1#frag flows correctly
  • windows_drive_path_does_not_rewrite_scheme_urls — regression guard for c:/foo, a://example.net, h://., w://x:0

Full suite (cargo test -p url) and WPT (cargo test --test url_wpt -p url) pass with no regressions.

Test plan

  • All existing unit tests pass (74/74)
  • All existing doctests pass (67/67)
  • WPT suite passes (urltestdata.json)
  • cargo fmt --check clean
  • cargo clippy --all-targets clean
  • WPT test vectors will be added to urltestdata.json once web-platform-tests/wpt#53459 lands and the spec PR merges

Per whatwg/url#874 (still open at time of writing), the URL parser's
scheme start state recognizes `<ASCII alpha> : \` (a Windows drive
letter pattern) as a Windows drive path rather than a single-letter
URL scheme. The parser sets scheme to "file", host to empty string,
and transitions to path state, producing a URL of the form
`file:///<drive>:/path`.

Forward-slash drive paths (`c:/foo`) are intentionally not covered:
the `<alpha> : /` shape is ambiguous with single-letter scheme URLs
(e.g. `c:` scheme, `a://example.net`) and rewriting them would
regress legitimate scheme URLs.

Includes 8 unit tests covering basic conversion, drive-letter case
preservation, mixed separators, percent-encoded path components,
query/fragment, base-URL override, and regression guards for
single-letter scheme URLs that must NOT be rewritten.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mrobinson

mrobinson commented May 6, 2026

Copy link
Copy Markdown
Member

@renezander030 Did you use generative AI to produce the code in this pull request or the PR description?

@renezander030

Copy link
Copy Markdown
Author

Yes. I used Claude (an LLM) to help draft both the code and the PR description, working from my earlier Deno-side PR (denoland/deno#33097) that @crowlKats redirected here.

The substance is mine: the conservative scope (backslash-only, not forward-slash drive paths) and the regression guards for c:/foo, a://example.net, h://., w://x:0 came directly from issues I hit during the Deno review. The full test suite, WPT suite, cargo fmt --check, and cargo clippy --all-targets all pass locally.

Happy to comply with whatever policy Servo / rust-url has on AI-assisted contributions — if you'd prefer I rewrite the PR description by hand, or if there's a contributor attestation you'd like me to add, please let me know.

@mrobinson

Copy link
Copy Markdown
Member

Here's Servo's AI use policy: https://book.servo.org/contributing/getting-started.html#ai-contributions

It's fine to use LLMs to help understand an issue, find bugs, or to plan / architect a change, but all code, issue text, and PR text must be your own (apart from translation and transcription).

@renezander030

Copy link
Copy Markdown
Author

Here's Servo's AI use policy: https://book.servo.org/contributing/getting-started.html#ai-contributions

It's fine to use LLMs to help understand an issue, find bugs, or to plan / architect a change, but all code, issue text, and PR text must be your own (apart from translation and transcription).

I decline. Closing this PR now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants