This was generated by AI during triage.
Summary
WP_HTML_Processor::serialize_token() rebuilds the current token from parsed token state without first applying all queued mutations. Some pending changes are visible through getters, but attribute additions and removals are not consistently reflected when serializing the current token.
This makes code like this produce incorrect output:
$output = "";
while ( $processor->next_token() ) {
if ( "STYLE" === $processor->get_token_name() ) {
continue;
}
if ( "#tag" === $processor->get_token_type() && ! $processor->is_tag_closer() ) {
$processor->remove_attribute( "style" );
}
$output .= $processor->serialize_token();
}
Given:
<p style="color:red" id="x">Hi <b style="font-weight:bold">there</b></p><style>.x{}</style>
Actual output:
<p style id="x">Hi <b style>there</b></p>
Expected output:
<p id="x">Hi <b>there</b></p>
Observed mutation behavior
Confirmed against the current local checkout:
| Mutation |
serialize_token() result |
Existing attribute value change, for example set_attribute( "id", "y" ) |
Works: <p id="y"> |
Existing attribute removal, for example remove_attribute( "style" ) |
Fails: <p style id="x"> |
New attribute addition, for example set_attribute( "data-x", "1" ) |
Fails: serializes <p> |
add_class() when class already exists |
Works: <p class="old added"> |
add_class() when no class exists |
Fails: serializes <p> |
remove_class() leaving a non-empty class list |
Works: <p class="keep"> |
remove_class() leaving no classes |
Fails: serializes <p class> |
set_modifiable_text() on #text |
Works when it returns true |
set_modifiable_text() on atomic HTML tags (STYLE, SCRIPT, TEXTAREA, TITLE) |
Works when it returns true |
set_modifiable_text() on normal tags or foreign atomic-like tags |
Correctly returns false; no change serialized |
Likely cause
serialize_token() enumerates attributes with get_attribute_names_with_prefix( "" ) before queued lexical updates have been applied. It then reads values through get_attribute(), so updates to existing attributes may be visible, and class updates may be flushed when the original class attribute exists. However, added attributes are missing from the original parsed attribute list, and removed attributes remain in that list and can serialize as boolean attributes.
Calling get_updated_html() after a successful mutation, before serialize_token(), works around the problem because it flushes and reparses the current token.
Summary
WP_HTML_Processor::serialize_token()rebuilds the current token from parsed token state without first applying all queued mutations. Some pending changes are visible through getters, but attribute additions and removals are not consistently reflected when serializing the current token.This makes code like this produce incorrect output:
Given:
Actual output:
Expected output:
Observed mutation behavior
Confirmed against the current local checkout:
serialize_token()resultset_attribute( "id", "y" )<p id="y">remove_attribute( "style" )<p style id="x">set_attribute( "data-x", "1" )<p>add_class()whenclassalready exists<p class="old added">add_class()when noclassexists<p>remove_class()leaving a non-empty class list<p class="keep">remove_class()leaving no classes<p class>set_modifiable_text()on#texttrueset_modifiable_text()on atomic HTML tags (STYLE,SCRIPT,TEXTAREA,TITLE)trueset_modifiable_text()on normal tags or foreign atomic-like tagsfalse; no change serializedLikely cause
serialize_token()enumerates attributes withget_attribute_names_with_prefix( "" )before queued lexical updates have been applied. It then reads values throughget_attribute(), so updates to existing attributes may be visible, and class updates may be flushed when the originalclassattribute exists. However, added attributes are missing from the original parsed attribute list, and removed attributes remain in that list and can serialize as boolean attributes.Calling
get_updated_html()after a successful mutation, beforeserialize_token(), works around the problem because it flushes and reparses the current token.