Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 56 additions & 47 deletions url.bs
Original file line number Diff line number Diff line change
Expand Up @@ -107,25 +107,30 @@ valid input. User agents, especially conformance checkers, are encouraged to rep
<tr>
<td><dfn id=validation-error-domain-to-ascii>domain-to-ASCII</dfn>
<td>
<p><a abstract-op lt=ToASCII>Unicode ToASCII</a> records an error or returns the empty string.
[[UTS46]]
<p><a abstract-op lt=ToASCII>Unicode ToASCII</a> records an error when <i>CheckHyphens</i>,
<i>UseSTD3ASCIIRules</i>, and <i>VerifyDnsLength</i> are all set to true. [[UTS46]]
<p class=note>If details about <a abstract-op lt=ToASCII>Unicode ToASCII</a> errors are
recorded, user agents are encouraged to pass those along.
<td class=yes>Yes<br>(unless <var>domain</var> is an <a>ASCII string</a>)
<tr>
<td><dfn>domain-invalid-code-point</dfn>
<td>
<p>The input's <a for=/>host</a> contains a <a>forbidden domain code point</a>.
<div class=example id=example-domain-invalid-code-point>
<p>Hosts are <a for=string>percent-decoded</a> before being processed when the URL
<a>is special</a>, which would result in the following host portion becoming
"<code>exa#mple.org</code>" and thus triggering this error.
<p>"<code>https://exa%23mple.org</code>"
</div>
<td class=yes>Yes
<td class=yes>Yes<br>(when <var ignore>beStrict</var> is true, or <var ignore>domain</var> is
not an <a>ASCII string</a> and <a abstract-op lt=ToASCII>Unicode ToASCII</a> with relaxed
parameters also fails)
<tbody>
<tr>
<th colspan=3 scope=rowgroup><a href=#host-parsing>Host parsing</a>
<!-- domain host -->
<tr>
<td><dfn>domain-percent-encoded</dfn>
<td>
<p>The input's <a for=/>host</a> to be processed as a domain contains a
<a>percent-encoded byte</a>.
<p class=example id=example-domain-percent-encoded>"<code>https://exam%70le.org</code>"
<td class=no>·
<!-- opaque-host parser -->
<tr>
<td><dfn>host-invalid-code-point</dfn>
Expand Down Expand Up @@ -907,65 +912,48 @@ concepts.
steps. They return failure or a <a for=/>domain</a>.

<ol>
<li><p>Let <var>strictResult</var> be the result of running <a>domain parser ToASCII</a> with
<var>domain</var> and true.

<li><p>If <var>strictResult</var> is a failure value, <a>domain-to-ASCII</a>
<a>validation error</a>. <span class=note>This step does not return.</span>

<li>
<p>If <var>beStrict</var> is true:

<ol>
<li><p>Let <var>result</var> be the result of running
<a abstract-op lt=ToASCII>Unicode ToASCII</a> with <i>domain_name</i> set to <var>domain</var>,
<i>CheckHyphens</i> set to true, <i>CheckBidi</i> set to true, <i>CheckJoiners</i> set to true,
<i>UseSTD3ASCIIRules</i> set to true, <i>Transitional_Processing</i> set to false,
<i>VerifyDnsLength</i> set to true, and <i>IgnoreInvalidPunycode</i> set to false. [[!UTS46]]

<li><p>If <var>result</var> is a failure value, <a>domain-to-ASCII</a> <a>validation error</a>,
return failure.
<li><p>If <var>strictResult</var> is a failure value, then return failure.

<li><p>Return <var>result</var>.
<li><p>Return <var>strictResult</var>.
</ol>

<li><p>Let <var>result</var> be null.

<li>
<p>If <var>domain</var> is an <a>ASCII string</a>:

<ol>
<li><p>If running <a abstract-op lt=ToASCII>Unicode ToASCII</a> with <i>domain_name</i> set to
<var>domain</var>, <i>CheckHyphens</i> set to false, <i>CheckBidi</i> set to true,
<i>CheckJoiners</i> set to true, <i>UseSTD3ASCIIRules</i> set to false,
<i>Transitional_Processing</i> set to false, <i>VerifyDnsLength</i> set to false, and
<i>IgnoreInvalidPunycode</i> set to false is a failure value, <a>domain-to-ASCII</a>
<a>validation error</a>. [[!UTS46]]

<li><p>Set <var>result</var> to <var>domain</var>, <a lt="ASCII lowercase">lowercased</a>.
</ol>
<p>If <var>domain</var> is an <a>ASCII string</a>, then set <var>result</var> to
<var>domain</var>, <a lt="ASCII lowercase">lowercased</a>.

<p class=note>When <var>beStrict</var> is false and <var>domain</var> is an <a>ASCII string</a>,
<a abstract-op lt=ToASCII>Unicode ToASCII</a> failures only result in <a>validation errors</a>
(instead of failing the whole algorithm) due to web compatibility. <i>IgnoreInvalidPunycode</i>
is not sufficient on its own, as Punycode can decode successfully yet still fail validity
criteria. E.g., <code>xn--8i7caa</code> decodes to <code>www</code>, whose code points have
status "mapped". [[UTS46]]
the algorithm returns <var>domain</var> <a lt="ASCII lowercase">lowercased</a> regardless of
<a abstract-op lt=ToASCII>Unicode ToASCII</a>'s outcome, due to web compatibility.
<i>IgnoreInvalidPunycode</i> is not sufficient on its own, as Punycode can decode successfully
yet still fail validity criteria. E.g., <code>xn--8i7caa</code> decodes to <code>www</code>,
whose code points have status "mapped". [[UTS46]]

<li>
<p>Otherwise:

<ol>
<li><p>Set <var>result</var> to the result of running
<a abstract-op lt=ToASCII>Unicode ToASCII</a> with <i>domain_name</i> set to <var>domain</var>,
<i>CheckHyphens</i> set to false, <i>CheckBidi</i> set to true, <i>CheckJoiners</i> set to true,
<i>UseSTD3ASCIIRules</i> set to false, <i>Transitional_Processing</i> set to false,
<i>VerifyDnsLength</i> set to false, and <i>IgnoreInvalidPunycode</i> set to false. [[!UTS46]]
<li><p>Set <var>result</var> to the result of running <a>domain parser ToASCII</a> with
<var>domain</var> and false.

<li><p>If <var>result</var> is a failure value, <a>domain-to-ASCII</a> <a>validation error</a>,
return failure.
<li><p>If <var>result</var> is a failure value, then return failure.
</ol>

<li><p>If <var>result</var> is the empty string, <a>domain-to-ASCII</a> <a>validation error</a>,
return failure.
<li><p>If <var>result</var> is the empty string, then return failure.

<li>
<p>If <var>result</var> contains a <a>forbidden domain code point</a>,
<a>domain-invalid-code-point</a> <a>validation error</a>, return failure.
<p>If <var>result</var> contains a <a>forbidden domain code point</a>, then return failure.

<p class=note>Due to web compatibility and compatibility with non-DNS-based systems the
<a>forbidden domain code points</a> are a subset of those disallowed when <i>UseSTD3ASCIIRules</i>
Expand All @@ -979,6 +967,16 @@ steps. They return failure or a <a for=/>domain</a>.
<code>☕.example</code> becomes <code>xn--53h.example</code> and not failure. [[UTS46]] [[RFC5890]]
</div>

<div algorithm>
<p>The <dfn>domain parser ToASCII</dfn> algorithm, given a <a for=/>scalar value string</a>
<var>domain</var> and a boolean <var>beStrict</var>, returns the result of running
<a abstract-op lt=ToASCII>Unicode ToASCII</a> with <i>domain_name</i> set to <var>domain</var>,
<i>CheckHyphens</i> set to <var>beStrict</var>, <i>CheckBidi</i> set to true, <i>CheckJoiners</i>
set to true, <i>UseSTD3ASCIIRules</i> set to <var>beStrict</var>, <i>Transitional_Processing</i>
set to false, <i>VerifyDnsLength</i> set to <var>beStrict</var>, and <i>IgnoreInvalidPunycode</i>
set to false. [[!UTS46]]
</div>

<div algorithm>
<p>The <dfn id=concept-domain-to-unicode>domain to Unicode</dfn> algorithm, given a <a>domain</a>
<var>domain</var>, runs these steps:
Expand Down Expand Up @@ -1072,6 +1070,9 @@ false), and then runs these steps. They return failure or a <a for=/>host</a>.

<li><p>Assert: <var>input</var> is not the empty string.

<li><p>If <var>input</var> contains a <a>percent-encoded byte</a>,
<a>domain-percent-encoded</a> <a>validation error</a>.

<li>
<p>Let <var>domain</var> be the result of running <a>UTF-8 decode without BOM</a> on the
<a for=string>percent-decoding</a> of <var>input</var>.
Expand Down Expand Up @@ -1677,10 +1678,15 @@ unified model would be, please file an issue.
<td>✅
<td><code>file:///C:/</code>
<tr>
<td><code>file://loc%61lhost/</code>
<td><code>file://localhost/</code>
<td>
<td>✅
<td><code>file:///</code>
<tr>
<td><code>file://loc%61lhost/</code>
<td>
<td>❌
<td><code>file:///</code>
<tr>
<td><code>https://user:password@example.org/</code>
<td>
Expand Down Expand Up @@ -2014,7 +2020,8 @@ an <a>absolute-URL string</a>, optionally followed by U+0023 (#) and a <a>URL-fr
<a>special scheme</a> and not an <a>ASCII case-insensitive</a> match for "<code>file</code>",
followed by U+003A (:) and a <a>scheme-relative-special-URL string</a>
<li><p>a <a>URL-scheme string</a> that is <em>not</em> an <a>ASCII case-insensitive</a> match for a
<a>special scheme</a>, followed by U+003A (:) and a <a>relative-URL string</a>
<a>special scheme</a>, followed by U+003A (:) and one of: a <a>scheme-relative-URL string</a>, a
<a>path-absolute-URL string</a>, or zero or more <a>URL units</a>
<li><p>a <a>URL-scheme string</a> that is an <a>ASCII case-insensitive</a> match for
"<code>file</code>", followed by U+003A (:) and a <a>scheme-relative-file-URL string</a>
</ul>
Expand Down Expand Up @@ -2970,6 +2977,8 @@ and then runs these steps:
<p>Otherwise, if <a>c</a> is U+0020 SPACE:

<ol>
<li><p><a>Invalid-URL-unit</a> <a>validation error</a>.

<li><p>If <a>remaining</a> starts with U+003F (?) or U+0023 (#), then append
"<code>%20</code>" to <var>url</var>'s <a for=url>path</a>.

Expand Down