Skip to content

possible html api seek follow-ups #59

@sirreal

Description

@sirreal

Handoff: Three Pre-Existing WP_HTML_Tag_Processor Issues for #65372 Follow-ups

Origin

These were surfaced by adversarial review panels during the 2026-06-12 session that fixed the PRE/LISTING leading-newline seek replay bug (commits 63ac0ac and 613bb18 on local branch fix-pre-seek-2-fable, worktree fix-pre-seek). All three reproduce on clean trunk — none were introduced or worsened by those commits. Suggested ticket: #65372 (HTML API fuzz umbrella).

All work is in src/wp-includes/html-api/class-wp-html-tag-processor.php; tests belong in tests/phpunit/tests/html-api/wpHtmlTagProcessorModifiableText.php or wpHtmlTagProcessor-bookmark.php. Line numbers below reflect the post-613bb18a4b state of the branch.

Suggested skills: diagnose (light — repros are already known), then tdd per item.

Item 1: Bookmark head-drift on equal-start text replacement

Symptom: Bookmarking a text node, replacing its text with set_modifiable_text(), then seeking to that bookmark lands mid-token.

Repro:

$p = new WP_HTML_Tag_Processor( "<pre>\nfoo</pre>" );
$p->next_token();                 // PRE
$p->next_token();                 // #text
$p->set_bookmark( 'text' );
$p->set_modifiable_text( 'bar' );
$p->get_updated_html();
$p->seek( 'text' );
$p->get_modifiable_text();        // '>bar' — bookmark drifted into the PRE opener's '>'

A variant: replacement "\n\nbar" reads 'bar' after seek where a fresh parse of the same updated HTML reads "\nbar" (the restored newline-skip flag eats one newline at the drifted offset).

Cause: In the bookmark adjustment loop of apply_attributes_updates() (~line 2616), the head condition $bookmark->start >= $diff->start applies head_delta for a diff starting exactly at the bookmark start. A replacement of the bookmarked token's own text starts exactly there, so the bookmark head shifts by the full delta instead of staying put.

Direction: A diff starting exactly at bookmark->start should likely not shift the head (> instead of >=), but verify against the cases that motivated >= — check test_updates_bookmark_* tests in wpHtmlTagProcessor-bookmark.php and zero-length insertion diffs at the same point before changing the comparison. Reviewers noted this would also resolve the 'nger' divergence (probe M: opener grow + equal-start text replacement + flagged-bookmark seek).

Item 2: set_modifiable_text() doesn't protect a leading newline for PRE/LISTING text

Symptom: Setting text with an intentional leading \n on a #text node that immediately follows a PRE/LISTING opener reads back without it.

Repro:

$p = new WP_HTML_Tag_Processor( "<pre>\nfoo" );
$p->next_token();                 // PRE
$p->next_token();                 // #text
$p->set_modifiable_text( "\nkeepme" );
$p->get_modifiable_text();        // 'keepme' — the set newline was eaten

Cause: The TEXTAREA/TITLE branch of set_modifiable_text() (~lines 3949–3958) deliberately doubles an intended leading newline so the ignore-one-newline rule preserves it. The #text branch (~lines 3834–3850) has no equivalent protection for a node flagged by skip_newline_at.

Direction: Mirror the TEXTAREA newline-doubling in the #text branch when skip_newline_at === token_starts_at. Note the interaction with item 1: the doubled newline lands in a replacement starting exactly at the bookmark/skip position — fix or account for item 1 first, or test the combination explicitly.

Item 3: Latent in-loop-mutation flaw on bytes_already_parsed

Symptom: None observable today — this is a masked latent bug, the same coordinate-mixing flaw that caused the rev-1 rejection of 613bb18.

Location: apply_attributes_updates() (~line 2564): if ( $diff->start < $this->bytes_already_parsed ) { $this->bytes_already_parsed += $shift; } mutates the cursor inside the per-diff loop while later diffs' start values are still in original document coordinates. With multiple updates where an early large shrink drags the running value below a later diff's original start, the later shift is dropped.

Why it's masked: Every current caller overwrites the cursor immediately after the flush — get_updated_html() re-points from $before_current_tag (computed via the correct $accumulated_shift_for_given_point pattern), and seek() re-points from the bookmark.

Direction: Either convert to the accumulate-then-apply pattern (now used twice in the same function — $accumulated_shift_for_given_point and $accumulated_shift_for_skip_newline) for consistency and safety, or document why the in-loop form is acceptable. Low priority; no failing repro exists. Don't claim a behavior fix in the commit message — there's no observable change unless a new caller relies on the post-flush cursor.

Constraints

  • Test command: WP_TESTS_SKIP_INSTALL=1 ./vendor/bin/phpunit --group html-api,html-api-html5lib-tests. Baseline on the branch: 2927 tests, 421 skipped, 0 failures. No regressions allowed.
  • @ticket 65372 on tests; See #65372. in commit messages; trunk @since is 7.1.0.
  • PHP 8.4's Dom\HTMLDocument is the reference oracle for expected text content; vendor/bin/phpcs -q <files> from repo root must exit 0.
  • Items 1 and 2 interact — sequence them deliberately and test the combined case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions