Skip to content

HTML API: Preserve decoder match length on named-reference miss#66

Open
sirreal wants to merge 1 commit into
trunkfrom
fix/html-decoder-token-map-null
Open

HTML API: Preserve decoder match length on named-reference miss#66
sirreal wants to merge 1 commit into
trunkfrom
fix/html-decoder-token-map-null

Conversation

@sirreal

@sirreal sirreal commented Jun 12, 2026

Copy link
Copy Markdown
Owner

What

Fixes WP_HTML_Decoder::read_character_reference() so unmatched named character references preserve the by-reference match length value.

Issue

WP_Token_Map::read_token() returns null when no token matches. The decoder checked for false instead. On an unmatched named reference, that allowed null to flow into the later semicolonless-reference path, where the decoder could calculate a non-zero match length even though no character reference was matched.

Callers use the by-reference match length to advance through a string only when a reference is actually found. A miss must return null and leave the supplied match length untouched.

Reproduction

On trunk, a miss in data context alters the by-reference length:

$match_byte_length = "sentinel";
$result = WP_HTML_Decoder::read_character_reference( "data", "&bogus;", 0, $match_byte_length );

var_dump( $result );
var_dump( $match_byte_length );

Expected:

NULL
string(8) "sentinel"

Actual on trunk:

NULL
int(1)

The previously shown attribute-context &bogus; demo does not reproduce this bug because the attribute ambiguity branch returns before mutating the match length. The underlying contract still applies to both contexts: a failed match should not set $match_byte_length.

Fix

Check null === $replacement after WP_Token_Map::read_token(), matching the token-map API contract.

Validation

vendor/bin/phpunit --filter test_unmatched_named_character_reference_does_not_set_match_byte_length tests/phpunit/tests/html-api/wpHtmlDecoder.php

Result: OK, 4 tests, 8 assertions.

Trac ticket: TBD

Use of AI Tools

AI assistance: Yes
Tool(s): Codex
Model(s): GPT-5
Used for: splitting the fuzzer-discovered fix into a focused PR, drafting reproduction notes, and running validation. Final implementation was reviewed against the branch diff.


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

@sirreal sirreal marked this pull request as ready for review June 12, 2026 22:11
@github-actions

Copy link
Copy Markdown

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant