Handoff: Three Pre-Existing WP_HTML_Tag_Processor Issues for #65372 Follow-ups
Origin
These were surfaced by adversarial review panels during the 2026-06-12 session that fixed the PRE/LISTING leading-newline seek replay bug (commits 63ac0ac and 613bb18 on local branch fix-pre-seek-2-fable, worktree fix-pre-seek). All three reproduce on clean trunk — none were introduced or worsened by those commits. Suggested ticket: #65372 (HTML API fuzz umbrella).
All work is in src/wp-includes/html-api/class-wp-html-tag-processor.php; tests belong in tests/phpunit/tests/html-api/wpHtmlTagProcessorModifiableText.php or wpHtmlTagProcessor-bookmark.php. Line numbers below reflect the post-613bb18a4b state of the branch.
Suggested skills: diagnose (light — repros are already known), then tdd per item.
Item 1: Bookmark head-drift on equal-start text replacement
Symptom: Bookmarking a text node, replacing its text with set_modifiable_text(), then seeking to that bookmark lands mid-token.
Repro:
$p = new WP_HTML_Tag_Processor( "<pre>\nfoo</pre>" );
$p->next_token(); // PRE
$p->next_token(); // #text
$p->set_bookmark( 'text' );
$p->set_modifiable_text( 'bar' );
$p->get_updated_html();
$p->seek( 'text' );
$p->get_modifiable_text(); // '>bar' — bookmark drifted into the PRE opener's '>'
A variant: replacement "\n\nbar" reads 'bar' after seek where a fresh parse of the same updated HTML reads "\nbar" (the restored newline-skip flag eats one newline at the drifted offset).
Cause: In the bookmark adjustment loop of apply_attributes_updates() (~line 2616), the head condition $bookmark->start >= $diff->start applies head_delta for a diff starting exactly at the bookmark start. A replacement of the bookmarked token's own text starts exactly there, so the bookmark head shifts by the full delta instead of staying put.
Direction: A diff starting exactly at bookmark->start should likely not shift the head (> instead of >=), but verify against the cases that motivated >= — check test_updates_bookmark_* tests in wpHtmlTagProcessor-bookmark.php and zero-length insertion diffs at the same point before changing the comparison. Reviewers noted this would also resolve the 'nger' divergence (probe M: opener grow + equal-start text replacement + flagged-bookmark seek).
Item 2: set_modifiable_text() doesn't protect a leading newline for PRE/LISTING text
Symptom: Setting text with an intentional leading \n on a #text node that immediately follows a PRE/LISTING opener reads back without it.
Repro:
$p = new WP_HTML_Tag_Processor( "<pre>\nfoo" );
$p->next_token(); // PRE
$p->next_token(); // #text
$p->set_modifiable_text( "\nkeepme" );
$p->get_modifiable_text(); // 'keepme' — the set newline was eaten
Cause: The TEXTAREA/TITLE branch of set_modifiable_text() (~lines 3949–3958) deliberately doubles an intended leading newline so the ignore-one-newline rule preserves it. The #text branch (~lines 3834–3850) has no equivalent protection for a node flagged by skip_newline_at.
Direction: Mirror the TEXTAREA newline-doubling in the #text branch when skip_newline_at === token_starts_at. Note the interaction with item 1: the doubled newline lands in a replacement starting exactly at the bookmark/skip position — fix or account for item 1 first, or test the combination explicitly.
Item 3: Latent in-loop-mutation flaw on bytes_already_parsed
Symptom: None observable today — this is a masked latent bug, the same coordinate-mixing flaw that caused the rev-1 rejection of 613bb18.
Location: apply_attributes_updates() (~line 2564): if ( $diff->start < $this->bytes_already_parsed ) { $this->bytes_already_parsed += $shift; } mutates the cursor inside the per-diff loop while later diffs' start values are still in original document coordinates. With multiple updates where an early large shrink drags the running value below a later diff's original start, the later shift is dropped.
Why it's masked: Every current caller overwrites the cursor immediately after the flush — get_updated_html() re-points from $before_current_tag (computed via the correct $accumulated_shift_for_given_point pattern), and seek() re-points from the bookmark.
Direction: Either convert to the accumulate-then-apply pattern (now used twice in the same function — $accumulated_shift_for_given_point and $accumulated_shift_for_skip_newline) for consistency and safety, or document why the in-loop form is acceptable. Low priority; no failing repro exists. Don't claim a behavior fix in the commit message — there's no observable change unless a new caller relies on the post-flush cursor.
Constraints
- Test command:
WP_TESTS_SKIP_INSTALL=1 ./vendor/bin/phpunit --group html-api,html-api-html5lib-tests. Baseline on the branch: 2927 tests, 421 skipped, 0 failures. No regressions allowed.
@ticket 65372 on tests; See #65372. in commit messages; trunk @since is 7.1.0.
- PHP 8.4's
Dom\HTMLDocument is the reference oracle for expected text content; vendor/bin/phpcs -q <files> from repo root must exit 0.
- Items 1 and 2 interact — sequence them deliberately and test the combined case.
Handoff: Three Pre-Existing
WP_HTML_Tag_ProcessorIssues for #65372 Follow-upsOrigin
These were surfaced by adversarial review panels during the 2026-06-12 session that fixed the PRE/LISTING leading-newline seek replay bug (commits 63ac0ac and 613bb18 on local branch
fix-pre-seek-2-fable, worktreefix-pre-seek). All three reproduce on clean trunk — none were introduced or worsened by those commits. Suggested ticket: #65372 (HTML API fuzz umbrella).All work is in
src/wp-includes/html-api/class-wp-html-tag-processor.php; tests belong intests/phpunit/tests/html-api/wpHtmlTagProcessorModifiableText.phporwpHtmlTagProcessor-bookmark.php. Line numbers below reflect the post-613bb18a4b state of the branch.Suggested skills:
diagnose(light — repros are already known), thentddper item.Item 1: Bookmark head-drift on equal-start text replacement
Symptom: Bookmarking a text node, replacing its text with
set_modifiable_text(), then seeking to that bookmark lands mid-token.Repro:
A variant: replacement
"\n\nbar"reads'bar'after seek where a fresh parse of the same updated HTML reads"\nbar"(the restored newline-skip flag eats one newline at the drifted offset).Cause: In the bookmark adjustment loop of
apply_attributes_updates()(~line 2616), the head condition$bookmark->start >= $diff->startapplieshead_deltafor a diff starting exactly at the bookmark start. A replacement of the bookmarked token's own text starts exactly there, so the bookmark head shifts by the full delta instead of staying put.Direction: A diff starting exactly at
bookmark->startshould likely not shift the head (>instead of>=), but verify against the cases that motivated>=— checktest_updates_bookmark_*tests inwpHtmlTagProcessor-bookmark.phpand zero-length insertion diffs at the same point before changing the comparison. Reviewers noted this would also resolve the'nger'divergence (probe M: opener grow + equal-start text replacement + flagged-bookmark seek).Item 2:
set_modifiable_text()doesn't protect a leading newline for PRE/LISTING textSymptom: Setting text with an intentional leading
\non a#textnode that immediately follows a PRE/LISTING opener reads back without it.Repro:
Cause: The TEXTAREA/TITLE branch of
set_modifiable_text()(~lines 3949–3958) deliberately doubles an intended leading newline so the ignore-one-newline rule preserves it. The#textbranch (~lines 3834–3850) has no equivalent protection for a node flagged byskip_newline_at.Direction: Mirror the TEXTAREA newline-doubling in the
#textbranch whenskip_newline_at === token_starts_at. Note the interaction with item 1: the doubled newline lands in a replacement starting exactly at the bookmark/skip position — fix or account for item 1 first, or test the combination explicitly.Item 3: Latent in-loop-mutation flaw on
bytes_already_parsedSymptom: None observable today — this is a masked latent bug, the same coordinate-mixing flaw that caused the rev-1 rejection of 613bb18.
Location:
apply_attributes_updates()(~line 2564):if ( $diff->start < $this->bytes_already_parsed ) { $this->bytes_already_parsed += $shift; }mutates the cursor inside the per-diff loop while later diffs'startvalues are still in original document coordinates. With multiple updates where an early large shrink drags the running value below a later diff's original start, the later shift is dropped.Why it's masked: Every current caller overwrites the cursor immediately after the flush —
get_updated_html()re-points from$before_current_tag(computed via the correct$accumulated_shift_for_given_pointpattern), andseek()re-points from the bookmark.Direction: Either convert to the accumulate-then-apply pattern (now used twice in the same function —
$accumulated_shift_for_given_pointand$accumulated_shift_for_skip_newline) for consistency and safety, or document why the in-loop form is acceptable. Low priority; no failing repro exists. Don't claim a behavior fix in the commit message — there's no observable change unless a new caller relies on the post-flush cursor.Constraints
WP_TESTS_SKIP_INSTALL=1 ./vendor/bin/phpunit --group html-api,html-api-html5lib-tests. Baseline on the branch: 2927 tests, 421 skipped, 0 failures. No regressions allowed.@ticket 65372on tests;See #65372.in commit messages; trunk@sinceis 7.1.0.Dom\HTMLDocumentis the reference oracle for expected text content;vendor/bin/phpcs -q <files>from repo root must exit 0.