diff --git a/url.bs b/url.bs index 5058807d..e6f65bcc 100644 --- a/url.bs +++ b/url.bs @@ -107,25 +107,30 @@ valid input. User agents, especially conformance checkers, are encouraged to rep domain-to-ASCII -

Unicode ToASCII records an error or returns the empty string. - [[UTS46]] +

Unicode ToASCII records an error when CheckHyphens, + UseSTD3ASCIIRules, and VerifyDnsLength are all set to true. [[UTS46]]

If details about Unicode ToASCII errors are recorded, user agents are encouraged to pass those along. - Yes
(unless domain is an ASCII string) - - domain-invalid-code-point - -

The input's host contains a forbidden domain code point.

Hosts are percent-decoded before being processed when the URL is special, which would result in the following host portion becoming "exa#mple.org" and thus triggering this error.

"https://exa%23mple.org"

- Yes + Yes
(when beStrict is true, or domain is + not an ASCII string and Unicode ToASCII with relaxed + parameters also fails) Host parsing + + + domain-percent-encoded + +

The input's host to be processed as a domain contains a + percent-encoded byte. +

"https://exam%70le.org" + · host-invalid-code-point @@ -907,65 +912,48 @@ concepts. steps. They return failure or a domain.

    +
  1. Let strictResult be the result of running domain parser ToASCII with + domain and true. + +

  2. If strictResult is a failure value, domain-to-ASCII + validation error. This step does not return. +

  3. If beStrict is true:

      -
    1. Let result be the result of running - Unicode ToASCII with domain_name set to domain, - CheckHyphens set to true, CheckBidi set to true, CheckJoiners set to true, - UseSTD3ASCIIRules set to true, Transitional_Processing set to false, - VerifyDnsLength set to true, and IgnoreInvalidPunycode set to false. [[!UTS46]] - -

    2. If result is a failure value, domain-to-ASCII validation error, - return failure. +

    3. If strictResult is a failure value, then return failure. -

    4. Return result. +

    5. Return strictResult.

  4. Let result be null.

  5. -

    If domain is an ASCII string: - -

      -
    1. If running Unicode ToASCII with domain_name set to - domain, CheckHyphens set to false, CheckBidi set to true, - CheckJoiners set to true, UseSTD3ASCIIRules set to false, - Transitional_Processing set to false, VerifyDnsLength set to false, and - IgnoreInvalidPunycode set to false is a failure value, domain-to-ASCII - validation error. [[!UTS46]] - -

    2. Set result to domain, lowercased. -

    +

    If domain is an ASCII string, then set result to + domain, lowercased.

    When beStrict is false and domain is an ASCII string, - Unicode ToASCII failures only result in validation errors - (instead of failing the whole algorithm) due to web compatibility. IgnoreInvalidPunycode - is not sufficient on its own, as Punycode can decode successfully yet still fail validity - criteria. E.g., xn--8i7caa decodes to www, whose code points have - status "mapped". [[UTS46]] + the algorithm returns domain lowercased regardless of + Unicode ToASCII's outcome, due to web compatibility. + IgnoreInvalidPunycode is not sufficient on its own, as Punycode can decode successfully + yet still fail validity criteria. E.g., xn--8i7caa decodes to www, + whose code points have status "mapped". [[UTS46]]

  6. Otherwise:

      -
    1. Set result to the result of running - Unicode ToASCII with domain_name set to domain, - CheckHyphens set to false, CheckBidi set to true, CheckJoiners set to true, - UseSTD3ASCIIRules set to false, Transitional_Processing set to false, - VerifyDnsLength set to false, and IgnoreInvalidPunycode set to false. [[!UTS46]] +

    2. Set result to the result of running domain parser ToASCII with + domain and false. -

    3. If result is a failure value, domain-to-ASCII validation error, - return failure. +

    4. If result is a failure value, then return failure.

    -
  7. If result is the empty string, domain-to-ASCII validation error, - return failure. +

  8. If result is the empty string, then return failure.

  9. -

    If result contains a forbidden domain code point, - domain-invalid-code-point validation error, return failure. +

    If result contains a forbidden domain code point, then return failure.

    Due to web compatibility and compatibility with non-DNS-based systems the forbidden domain code points are a subset of those disallowed when UseSTD3ASCIIRules @@ -979,6 +967,16 @@ steps. They return failure or a domain. ☕.example becomes xn--53h.example and not failure. [[UTS46]] [[RFC5890]] +

    +

    The domain parser ToASCII algorithm, given a scalar value string +domain and a boolean beStrict, returns the result of running +Unicode ToASCII with domain_name set to domain, +CheckHyphens set to beStrict, CheckBidi set to true, CheckJoiners +set to true, UseSTD3ASCIIRules set to beStrict, Transitional_Processing +set to false, VerifyDnsLength set to beStrict, and IgnoreInvalidPunycode +set to false. [[!UTS46]] +

    +

    The domain to Unicode algorithm, given a domain domain, runs these steps: @@ -1072,6 +1070,9 @@ false), and then runs these steps. They return failure or a host.

  10. Assert: input is not the empty string. +

  11. If input contains a percent-encoded byte, + domain-percent-encoded validation error. +

  12. Let domain be the result of running UTF-8 decode without BOM on the percent-decoding of input. @@ -1677,10 +1678,15 @@ unified model would be, please file an issue. ✅ file:///C:/ - file://loc%61lhost/ + file://localhost/file:/// + + file://loc%61lhost/ + + ❌ + file:/// https://user:password@example.org/ @@ -2014,7 +2020,8 @@ an absolute-URL string, optionally followed by U+0023 (#) and a URL-fr special scheme and not an ASCII case-insensitive match for "file", followed by U+003A (:) and a scheme-relative-special-URL string

  13. a URL-scheme string that is not an ASCII case-insensitive match for a - special scheme, followed by U+003A (:) and a relative-URL string + special scheme, followed by U+003A (:) and one of: a scheme-relative-URL string, a + path-absolute-URL string, or zero or more URL units

  14. a URL-scheme string that is an ASCII case-insensitive match for "file", followed by U+003A (:) and a scheme-relative-file-URL string @@ -2970,6 +2977,8 @@ and then runs these steps:

    Otherwise, if c is U+0020 SPACE:

      +
    1. Invalid-URL-unit validation error. +

    2. If remaining starts with U+003F (?) or U+0023 (#), then append "%20" to url's path.