diff --git a/url.bs b/url.bs index 5058807d..e6f65bcc 100644 --- a/url.bs +++ b/url.bs @@ -107,25 +107,30 @@ valid input. User agents, especially conformance checkers, are encouraged to rep
Unicode ToASCII records an error or returns the empty string. - [[UTS46]] +
Unicode ToASCII records an error when CheckHyphens, + UseSTD3ASCIIRules, and VerifyDnsLength are all set to true. [[UTS46]]
If details about Unicode ToASCII errors are recorded, user agents are encouraged to pass those along. -
The input's host contains a forbidden domain code point.
Hosts are percent-decoded before being processed when the URL
is special, which would result in the following host portion becoming
"exa#mple.org" and thus triggering this error.
"https://exa%23mple.org"
The input's host to be processed as a domain contains a + percent-encoded byte. +
"https://exam%70le.org"
+
Let strictResult be the result of running domain parser ToASCII with + domain and true. + +
If strictResult is a failure value, domain-to-ASCII + validation error. This step does not return. +
If beStrict is true:
Let result be the result of running - Unicode ToASCII with domain_name set to domain, - CheckHyphens set to true, CheckBidi set to true, CheckJoiners set to true, - UseSTD3ASCIIRules set to true, Transitional_Processing set to false, - VerifyDnsLength set to true, and IgnoreInvalidPunycode set to false. [[!UTS46]] - -
If result is a failure value, domain-to-ASCII validation error, - return failure. +
If strictResult is a failure value, then return failure. -
Return result. +
Return strictResult.
Let result be null.
If domain is an ASCII string: - -
If running Unicode ToASCII with domain_name set to - domain, CheckHyphens set to false, CheckBidi set to true, - CheckJoiners set to true, UseSTD3ASCIIRules set to false, - Transitional_Processing set to false, VerifyDnsLength set to false, and - IgnoreInvalidPunycode set to false is a failure value, domain-to-ASCII - validation error. [[!UTS46]] - -
Set result to domain, lowercased. -
If domain is an ASCII string, then set result to + domain, lowercased.
When beStrict is false and domain is an ASCII string,
- Unicode ToASCII failures only result in validation errors
- (instead of failing the whole algorithm) due to web compatibility. IgnoreInvalidPunycode
- is not sufficient on its own, as Punycode can decode successfully yet still fail validity
- criteria. E.g., xn--8i7caa decodes to www, whose code points have
- status "mapped". [[UTS46]]
+ the algorithm returns domain lowercased regardless of
+ Unicode ToASCII's outcome, due to web compatibility.
+ IgnoreInvalidPunycode is not sufficient on its own, as Punycode can decode successfully
+ yet still fail validity criteria. E.g., xn--8i7caa decodes to www,
+ whose code points have status "mapped". [[UTS46]]
Otherwise:
Set result to the result of running - Unicode ToASCII with domain_name set to domain, - CheckHyphens set to false, CheckBidi set to true, CheckJoiners set to true, - UseSTD3ASCIIRules set to false, Transitional_Processing set to false, - VerifyDnsLength set to false, and IgnoreInvalidPunycode set to false. [[!UTS46]] +
Set result to the result of running domain parser ToASCII with + domain and false. -
If result is a failure value, domain-to-ASCII validation error, - return failure. +
If result is a failure value, then return failure.
If result is the empty string, domain-to-ASCII validation error, - return failure. +
If result is the empty string, then return failure.
If result contains a forbidden domain code point, - domain-invalid-code-point validation error, return failure. +
If result contains a forbidden domain code point, then return failure.
Due to web compatibility and compatibility with non-DNS-based systems the
forbidden domain code points are a subset of those disallowed when UseSTD3ASCIIRules
@@ -979,6 +967,16 @@ steps. They return failure or a domain.
☕.example becomes xn--53h.example and not failure. [[UTS46]] [[RFC5890]]
+
The domain parser ToASCII algorithm, given a scalar value string +domain and a boolean beStrict, returns the result of running +Unicode ToASCII with domain_name set to domain, +CheckHyphens set to beStrict, CheckBidi set to true, CheckJoiners +set to true, UseSTD3ASCIIRules set to beStrict, Transitional_Processing +set to false, VerifyDnsLength set to beStrict, and IgnoreInvalidPunycode +set to false. [[!UTS46]] +
The domain to Unicode algorithm, given a domain domain, runs these steps: @@ -1072,6 +1070,9 @@ false), and then runs these steps. They return failure or a host.
Assert: input is not the empty string. +
If input contains a percent-encoded byte, + domain-percent-encoded validation error. +
Let domain be the result of running UTF-8 decode without BOM on the percent-decoding of input. @@ -1677,10 +1678,15 @@ unified model would be, please file an issue.
file:///C:/
file://loc%61lhost/
+ file://localhost/
file:///
+ file://loc%61lhost/
+ file:///
https://user:password@example.org/
file",
followed by U+003A (:) and a scheme-relative-special-URL string
a URL-scheme string that is not an ASCII case-insensitive match for a - special scheme, followed by U+003A (:) and a relative-URL string + special scheme, followed by U+003A (:) and one of: a scheme-relative-URL string, a + path-absolute-URL string, or zero or more URL units
a URL-scheme string that is an ASCII case-insensitive match for
"file", followed by U+003A (:) and a scheme-relative-file-URL string
@@ -2970,6 +2977,8 @@ and then runs these steps:
Otherwise, if c is U+0020 SPACE: