diff --git a/url.bs b/url.bs index c69d2349..b2834297 100644 --- a/url.bs +++ b/url.bs @@ -111,7 +111,7 @@ valid input. User agents, especially conformance checkers, are encouraged to rep [[UTS46]]

If details about Unicode ToASCII errors are recorded, user agents are encouraged to pass those along. - Yes + Yes
(unless domain is an ASCII string) domain-invalid-code-point @@ -123,12 +123,6 @@ valid input. User agents, especially conformance checkers, are encouraged to rep

"https://exa%23mple.org" Yes - - domain-to-Unicode - -

Unicode ToUnicode records an error. [[UTS46]] -

The same considerations as with domain-to-ASCII apply. - · Host parsing @@ -913,43 +907,68 @@ concepts.

  1. -

    Let result be the result of running Unicode ToASCII - with domain_name set to domain, CheckHyphens set to beStrict, - CheckBidi set to true, CheckJoiners set to true, UseSTD3ASCIIRules set to - beStrict, Transitional_Processing set to false, VerifyDnsLength set to - beStrict, and IgnoreInvalidPunycode set to false. [[!UTS46]] - -

    If beStrict is false, domain is an ASCII string, and - strictly splitting domain on U+002E (.) does not produce any - item that starts with an ASCII case-insensitive match for - "xn--", this step is equivalent to ASCII lowercasing domain. - -

  2. If result is a failure value, domain-to-ASCII validation error, - return failure. +

    If beStrict is true: + +

      +
    1. Let result be the result of running + Unicode ToASCII with domain_name set to domain, + CheckHyphens set to true, CheckBidi set to true, CheckJoiners set to true, + UseSTD3ASCIIRules set to true, Transitional_Processing set to false, + VerifyDnsLength set to true, and IgnoreInvalidPunycode set to false. [[!UTS46]] + +

    2. If result is a failure value, domain-to-ASCII validation error, + return failure. + +

    3. Return result. +

    + +
  3. Let result be null.

  4. -

    If beStrict is false: +

    If domain is an ASCII string:

      -
    1. If result is the empty string, domain-to-ASCII validation error, - return failure. +

    2. If running Unicode ToASCII with domain_name set to + domain, CheckHyphens set to false, CheckBidi set to true, + CheckJoiners set to true, UseSTD3ASCIIRules set to false, + Transitional_Processing set to false, VerifyDnsLength set to false, and + IgnoreInvalidPunycode set to false is a failure value, domain-to-ASCII + validation error. [[!UTS46]] + +

    3. Set result to domain, lowercased. +

    -
  5. -

    If result contains a forbidden domain code point, - domain-invalid-code-point validation error, return failure. +

    When beStrict is false and domain is an ASCII string, + Unicode ToASCII failures only result in validation errors + (instead of failing the whole algorithm) due to web compatibility. IgnoreInvalidPunycode + is not sufficient on its own, as Punycode can decode successfully yet still fail validity + criteria. E.g., xn--8i7caa decodes to www, whose code points have + status "mapped". [[UTS46]] + +

  6. +

    Otherwise: -

    Due to web compatibility and compatibility with non-DNS-based systems the - forbidden domain code points are a subset of those disallowed when - UseSTD3ASCIIRules is true. See also - issue #397. +

      +
    1. Set result to the result of running + Unicode ToASCII with domain_name set to domain, + CheckHyphens set to false, CheckBidi set to true, CheckJoiners set to true, + UseSTD3ASCIIRules set to false, Transitional_Processing set to false, + VerifyDnsLength set to false, and IgnoreInvalidPunycode set to false. [[!UTS46]] + +

    2. If result is a failure value, domain-to-ASCII validation error, + return failure.

    +
  7. If result is the empty string, domain-to-ASCII validation error, + return failure. +

  8. -

    Assert: result is not the empty string and does not contain a - forbidden domain code point. +

    If result contains a forbidden domain code point, + domain-invalid-code-point validation error, return failure. -

    Unicode IDNA Compatibility Processing guarantees this holds when - beStrict is true. [[UTS46]] +

    Due to web compatibility and compatibility with non-DNS-based systems the + forbidden domain code points are a subset of those disallowed when UseSTD3ASCIIRules + is true. See also issue #397.

  9. Return result.

@@ -970,8 +989,15 @@ concepts. set to true, UseSTD3ASCIIRules set to beStrict, Transitional_Processing set to false, and IgnoreInvalidPunycode set to false. [[!UTS46]] -
  • Signify domain-to-Unicode validation errors for any returned errors, and then, - return result. +

  • +

    If an error was recorded, then return domain. + +

    Because domain can only result from the host parser, any recorded + errors will already have been signified as validation errors. Returning domain + ensures domain to ASCII and domain to Unicode roundtrip on input such as + xn--8i7caa. + +

  • Return result. @@ -4167,6 +4193,7 @@ Ian Hickson, Ilya Grigorik, Italo A. Casas, Jakub Gieryluk, +James C. Wise, James Graham, James Manger, James Ross,