Comments on URL-interop.md

I’m reading it at commit 863655160ffe6696ece399e4e8ac0e0bf08f7941.

> 86: must have the scheme present
> 
> TWUS: Describes in the 4.2 URL parsing section how a parser should
> accept URLs without a scheme.

IIRC the TWUS parser only accepts input without a scheme when there’s a base URL. The input is relative, in these cases.

86 has this grammar, which seems equivalent?

    URI-reference = URI / relative-ref


> It also there divides parsers into "Non-web-browser implementations"
> without specifying how to make that distinction.

In this specific instance, I think "Non-web-browser" means anything that doesn’t also implement https://w3c.github.io/FileAPI/ since the difference between "basic URL parser" and "URL parser" is all about blob: URLs.


> TWUS: says a parser must accept one to an infinite amount of slashes

I think this is really not a big deal. It could just as well be 5 max, but 5 is arbitrary and less theoretically pleasing than http://www.catb.org/jargon/html/Z/Zero-One-Infinity-Rule.html


> Real world: 32 bit numbers occur, and are automagically supported if
> typical OS level name resolver funcitons

When I looked into it, it seemed hard to choose to *not* support it in such functions. (The most a program could do is recognize such "exotic" IPv4 syntax and reject them with a parse error, if it doesn’t want to resolve the IP address.)


> TWUS: Doesn't specify IDNA 2003 nor 2008, but somehow that's still clear

It specified Unicode TR46, which fully defines algorithms independently of IDNA 2003 or 2008. (Though it is based on the Punycode RFC.)


> Real world: at least curl and wget2 ignore "rubbish" entered after the number all the way to the next component divider

Personal opinion: it sounds problematic to silently ignore part of the input?

> A TWUS URL thus needs other magic to know where a URL ends.

For example in `<a href="…">` HTML syntax defines exactly where the value href attribute ends, so there is no need for magic.

If URLs need to be found in the middle of a free-form paragraph of text without any markup, there’s a lot more magic (and heuristics) required than splitting on spaces. I think defining this does not belong in an URL spec.

> TWUS has a test suite (that only runs in javacript-enabled browsers).

Part (arguably the most important part) of this test suite has its test cases in a JSON file that can be used without JavaScript (and is [in rust-url](https://github.com/servo/rust-url/blob/7c7ff55702c3070769955ea956f657700e852398/tests/data.rs#L112)).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments on URL-interop.md #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Comments on URL-interop.md #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions