Skip to content

HTML API: serialize_token() does not reflect queued attribute additions/removals on the current tag #67

@sirreal

Description

@sirreal

This was generated by AI during triage.

Summary

WP_HTML_Processor::serialize_token() rebuilds the current token from parsed token state without first applying all queued mutations. Some pending changes are visible through getters, but attribute additions and removals are not consistently reflected when serializing the current token.

This makes code like this produce incorrect output:

$output = "";
while ( $processor->next_token() ) {
	if ( "STYLE" === $processor->get_token_name() ) {
		continue;
	}

	if ( "#tag" === $processor->get_token_type() && ! $processor->is_tag_closer() ) {
		$processor->remove_attribute( "style" );
	}

	$output .= $processor->serialize_token();
}

Given:

<p style="color:red" id="x">Hi <b style="font-weight:bold">there</b></p><style>.x{}</style>

Actual output:

<p style id="x">Hi <b style>there</b></p>

Expected output:

<p id="x">Hi <b>there</b></p>

Observed mutation behavior

Confirmed against the current local checkout:

Mutation serialize_token() result
Existing attribute value change, for example set_attribute( "id", "y" ) Works: <p id="y">
Existing attribute removal, for example remove_attribute( "style" ) Fails: <p style id="x">
New attribute addition, for example set_attribute( "data-x", "1" ) Fails: serializes <p>
add_class() when class already exists Works: <p class="old added">
add_class() when no class exists Fails: serializes <p>
remove_class() leaving a non-empty class list Works: <p class="keep">
remove_class() leaving no classes Fails: serializes <p class>
set_modifiable_text() on #text Works when it returns true
set_modifiable_text() on atomic HTML tags (STYLE, SCRIPT, TEXTAREA, TITLE) Works when it returns true
set_modifiable_text() on normal tags or foreign atomic-like tags Correctly returns false; no change serialized

Likely cause

serialize_token() enumerates attributes with get_attribute_names_with_prefix( "" ) before queued lexical updates have been applied. It then reads values through get_attribute(), so updates to existing attributes may be visible, and class updates may be flushed when the original class attribute exists. However, added attributes are missing from the original parsed attribute list, and removed attributes remain in that list and can serialize as boolean attributes.

Calling get_updated_html() after a successful mutation, before serialize_token(), works around the problem because it flushes and reparses the current token.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions