From 270cd1ffd1f34ce30468d00c154ff516dbd0aa9d Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Mon, 8 Jun 2026 12:01:16 -0700 Subject: [PATCH 1/6] Add Unicode ToASCII fallback for ASCII domains Unfortunately implementing Unicode ToASCII ends up rejecting websites such as - http://xn--72czcrhaj7cpt0ed1dxb4mb1s1.blogspot.com/2018/11/blog-post_60.html - https://xn--8i7caa.famitei.net/sekou/all - https://xn--board-ngr.palungjit.org/members/ which is not really acceptable. So when ToASCII fails, domain is all ASCII, and there are no forbidden domain code points, we just return the lowercased domain. If an implementation does not surface validation errors this is indistinguishable from having an ASCII fast path. Tests: https://github.com/web-platform-tests/wpt/pull/60476 --- url.bs | 36 ++++++++++++++++++++++-------------- 1 file changed, 22 insertions(+), 14 deletions(-) diff --git a/url.bs b/url.bs index c69d2349..222a87f7 100644 --- a/url.bs +++ b/url.bs @@ -111,7 +111,7 @@ valid input. User agents, especially conformance checkers, are encouraged to rep [[UTS46]]

If details about Unicode ToASCII errors are recorded, user agents are encouraged to pass those along. - Yes + Yes
(unless domain is an ASCII string) domain-invalid-code-point @@ -912,20 +912,24 @@ concepts. domain and a boolean beStrict, runs these steps:

    +
  1. Let result be the result of running + Unicode ToASCII with domain_name set to domain, + CheckHyphens set to beStrict, CheckBidi set to true, CheckJoiners + set to true, UseSTD3ASCIIRules set to beStrict, Transitional_Processing + set to false, VerifyDnsLength set to beStrict, and IgnoreInvalidPunycode + set to false. [[!UTS46]] +

  2. -

    Let result be the result of running Unicode ToASCII - with domain_name set to domain, CheckHyphens set to beStrict, - CheckBidi set to true, CheckJoiners set to true, UseSTD3ASCIIRules set to - beStrict, Transitional_Processing set to false, VerifyDnsLength set to - beStrict, and IgnoreInvalidPunycode set to false. [[!UTS46]] - -

    If beStrict is false, domain is an ASCII string, and - strictly splitting domain on U+002E (.) does not produce any - item that starts with an ASCII case-insensitive match for - "xn--", this step is equivalent to ASCII lowercasing domain. - -

  3. If result is a failure value, domain-to-ASCII validation error, - return failure. +

    If result is a failure value: + +

      +
    1. domain-to-ASCII validation error. + +

    2. If beStrict is false and domain is an ASCII string, then set + result to domain, lowercased. + +

    3. Otherwise, return failure. +

  4. If beStrict is false: @@ -954,6 +958,10 @@ concepts.

  5. Return result.

+

If beStrict is false and domain is an ASCII string, this +algorithm is effectively ASCII lowercasing domain and +checking the result for forbidden domain code points. This is done for web compatibility. +

This document and the web platform at large use Unicode IDNA Compatibility Processing and not IDNA2008. For instance, ☕.example becomes xn--53h.example and not failure. [[UTS46]] [[RFC5890]] From d906311aa6fbe62cd69fb95dab507fc71ac86de7 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Wed, 10 Jun 2026 06:04:04 -0700 Subject: [PATCH 2/6] go with Domenic's model which is nicer --- url.bs | 41 ++++++++++++++++++++++++++--------------- 1 file changed, 26 insertions(+), 15 deletions(-) diff --git a/url.bs b/url.bs index 222a87f7..7bb63791 100644 --- a/url.bs +++ b/url.bs @@ -912,23 +912,38 @@ concepts. domain and a boolean beStrict, runs these steps:

    -
  1. Let result be the result of running - Unicode ToASCII with domain_name set to domain, - CheckHyphens set to beStrict, CheckBidi set to true, CheckJoiners - set to true, UseSTD3ASCIIRules set to beStrict, Transitional_Processing - set to false, VerifyDnsLength set to beStrict, and IgnoreInvalidPunycode - set to false. [[!UTS46]] +

  2. Let result be null.

  3. -

    If result is a failure value: +

    If beStrict is false and domain is an ASCII string:

      -
    1. domain-to-ASCII validation error. +

    2. If running Unicode ToASCII with domain_name set to + domain, CheckHyphens set to false, CheckBidi set to true, + CheckJoiners set to true, UseSTD3ASCIIRules set to false, + Transitional_Processing set to false, VerifyDnsLength set to false, and + IgnoreInvalidPunycode set to false is a failure value, domain-to-ASCII + validation error. [[!UTS46]] + +

    3. Set result to domain, lowercased. +

    + +

    Due to web compatibility Unicode ToASCII only records + validation errors when domain is an ASCII string. -

  4. If beStrict is false and domain is an ASCII string, then set - result to domain, lowercased. +

  5. +

    Otherwise: -

  6. Otherwise, return failure. +

      +
    1. Set result to the result of running + Unicode ToASCII with domain_name set to domain, + CheckHyphens set to beStrict, CheckBidi set to true, + CheckJoiners set to true, UseSTD3ASCIIRules set to beStrict, + Transitional_Processing set to false, VerifyDnsLength set to beStrict, + and IgnoreInvalidPunycode set to false. [[!UTS46]] + +

    2. If result is a failure value, domain-to-ASCII validation error, + return failure.

  7. @@ -958,10 +973,6 @@ concepts.
  8. Return result.

-

If beStrict is false and domain is an ASCII string, this -algorithm is effectively ASCII lowercasing domain and -checking the result for forbidden domain code points. This is done for web compatibility. -

This document and the web platform at large use Unicode IDNA Compatibility Processing and not IDNA2008. For instance, ☕.example becomes xn--53h.example and not failure. [[UTS46]] [[RFC5890]] From 969eecf29e228a7ff6a8ede5e48fa06384e6609f Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Mon, 15 Jun 2026 13:05:42 +0200 Subject: [PATCH 3/6] handle beStrict=true first --- url.bs | 57 ++++++++++++++++++++++++++++++--------------------------- 1 file changed, 30 insertions(+), 27 deletions(-) diff --git a/url.bs b/url.bs index 7bb63791..fba7b513 100644 --- a/url.bs +++ b/url.bs @@ -912,10 +912,26 @@ concepts. domain and a boolean beStrict, runs these steps:

    +
  1. +

    If beStrict is true: + +

      +
    1. Let result be the result of running + Unicode ToASCII with domain_name set to domain, + CheckHyphens set to true, CheckBidi set to true, CheckJoiners set to true, + UseSTD3ASCIIRules set to true, Transitional_Processing set to false, + VerifyDnsLength set to true, and IgnoreInvalidPunycode set to false. [[!UTS46]] + +

    2. If result is a failure value, domain-to-ASCII validation error, + return failure. + +

    3. Return result. +

    +
  2. Let result be null.

  3. -

    If beStrict is false and domain is an ASCII string: +

    If domain is an ASCII string:

    1. If running Unicode ToASCII with domain_name set to @@ -928,8 +944,9 @@ concepts.

    2. Set result to domain, lowercased.

    -

    Due to web compatibility Unicode ToASCII only records - validation errors when domain is an ASCII string. +

    When beStrict is false and domain is an ASCII string, + Unicode ToASCII failures only result in validation errors + (instead of failing the whole algorithm) due to web compatibility.

  4. Otherwise: @@ -937,38 +954,24 @@ concepts.

    1. Set result to the result of running Unicode ToASCII with domain_name set to domain, - CheckHyphens set to beStrict, CheckBidi set to true, - CheckJoiners set to true, UseSTD3ASCIIRules set to beStrict, - Transitional_Processing set to false, VerifyDnsLength set to beStrict, - and IgnoreInvalidPunycode set to false. [[!UTS46]] + CheckHyphens set to false, CheckBidi set to true, CheckJoiners set to true, + UseSTD3ASCIIRules set to false, Transitional_Processing set to false, + VerifyDnsLength set to false, and IgnoreInvalidPunycode set to false. [[!UTS46]]

    2. If result is a failure value, domain-to-ASCII validation error, return failure.

    -
  5. -

    If beStrict is false: - -

      -
    1. If result is the empty string, domain-to-ASCII validation error, - return failure. - -

    2. -

      If result contains a forbidden domain code point, - domain-invalid-code-point validation error, return failure. - -

      Due to web compatibility and compatibility with non-DNS-based systems the - forbidden domain code points are a subset of those disallowed when - UseSTD3ASCIIRules is true. See also - issue #397. -

    +
  6. If result is the empty string, domain-to-ASCII validation error, + return failure.

  7. -

    Assert: result is not the empty string and does not contain a - forbidden domain code point. +

    If result contains a forbidden domain code point, + domain-invalid-code-point validation error, return failure. -

    Unicode IDNA Compatibility Processing guarantees this holds when - beStrict is true. [[UTS46]] +

    Due to web compatibility and compatibility with non-DNS-based systems the + forbidden domain code points are a subset of those disallowed when UseSTD3ASCIIRules + is true. See also issue #397.

  8. Return result.

From 9b998f7e8003908aa15f34ebdbe60bab04f113fd Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Wed, 17 Jun 2026 16:48:55 +0200 Subject: [PATCH 4/6] address a comment from Henri --- url.bs | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/url.bs b/url.bs index fba7b513..17dee401 100644 --- a/url.bs +++ b/url.bs @@ -946,7 +946,10 @@ concepts.

When beStrict is false and domain is an ASCII string, Unicode ToASCII failures only result in validation errors - (instead of failing the whole algorithm) due to web compatibility. + (instead of failing the whole algorithm) due to web compatibility. IgnoreInvalidPunycode + is not sufficient on its own, as Punycode can decode successfully yet still fail validity + criteria. E.g., xn--8i7caa decodes to www, whose code points have + status "mapped". [[UTS46]]

  • Otherwise: From 8556a69731545039f75a510ab7a5915101ed66a6 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Sun, 21 Jun 2026 14:42:06 +0200 Subject: [PATCH 5/6] fix domain to Unicode as well --- url.bs | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/url.bs b/url.bs index 17dee401..34b04223 100644 --- a/url.bs +++ b/url.bs @@ -123,12 +123,6 @@ valid input. User agents, especially conformance checkers, are encouraged to rep

    "https://exa%23mple.org" Yes - - domain-to-Unicode - -

    Unicode ToUnicode records an error. [[UTS46]] -

    The same considerations as with domain-to-ASCII apply. - · Host parsing @@ -995,8 +989,15 @@ concepts. set to true, UseSTD3ASCIIRules set to beStrict, Transitional_Processing set to false, and IgnoreInvalidPunycode set to false. [[!UTS46]] -

  • Signify domain-to-Unicode validation errors for any returned errors, and then, - return result. +

  • +

    If an error was recorded, then return domain. + +

    Because domain can only result from the host parser, any recorded + errors will already have been signified as validation errors. Returning domain + as-is therefore is sound and ensures domain to ASCII and domain to Unicode roundtrip + on input such as xn--8i7caa. + +

  • Return result. @@ -4192,6 +4193,7 @@ Ian Hickson, Ilya Grigorik, Italo A. Casas, Jakub Gieryluk, +James C. Wise, James Graham, James Manger, James Ross, From e1a97e5801e2f630a66497966d7e37ba1f394100 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Sun, 21 Jun 2026 15:12:06 +0200 Subject: [PATCH 6/6] nit --- url.bs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/url.bs b/url.bs index 34b04223..b2834297 100644 --- a/url.bs +++ b/url.bs @@ -994,8 +994,8 @@ concepts.

    Because domain can only result from the host parser, any recorded errors will already have been signified as validation errors. Returning domain - as-is therefore is sound and ensures domain to ASCII and domain to Unicode roundtrip - on input such as xn--8i7caa. + ensures domain to ASCII and domain to Unicode roundtrip on input such as + xn--8i7caa.

  • Return result.