Skip to content

HTML API: Ignore leading newline after seeking to text following PRE#60

Open
sirreal wants to merge 2 commits into
trunkfrom
fix-pre-seek-2-fable
Open

HTML API: Ignore leading newline after seeking to text following PRE#60
sirreal wants to merge 2 commits into
trunkfrom
fix-pre-seek-2-fable

Conversation

@sirreal

@sirreal sirreal commented Jun 12, 2026

Copy link
Copy Markdown
Owner

What

Fixes WP_HTML_Tag_Processor::seek() so that seeking back to a bookmarked text token reproduces the same public token state observed on the first scan. Found by the fuzzer seek invariant.

Trac ticket: https://core.trac.wordpress.org/ticket/65372

The bug

The first newline after a LISTING or PRE opening tag is ignored as an authoring convenience. The Tag Processor tracks a single offset (skip_newline_at) for this, set when scanning past such an opening tag — so scanning past a later PRE/LISTING overwrites it, and seek() restored the cursor without restoring it:

$processor = new WP_HTML_Tag_Processor( "<pre>\nX<pre>" );
$processor->next_token();                  // PRE.
$processor->next_token();                  // #text.
$processor->get_modifiable_text();         // 'X'
$processor->set_bookmark( 'text' );
$processor->next_token();                  // Second PRE clobbers state.
$processor->seek( 'text' );
$processor->get_modifiable_text();         // "\nX" - wrong!

The fix

Two commits:

  1. Restore the state on seek. set_bookmark() records, per bookmark, whether its token immediately follows a LISTING/PRE opening tag; seek() restores skip_newline_at from the (update-shifted) bookmark start when flagged, and clears it otherwise. This removes the limitation previously documented on get_modifiable_text() and un-skips the existing test that covered it. It also prevents a stale offset from colliding with an unrelated token's post-update location, which could strip a newline that should have been preserved.
  2. Shift the position when applying updates. apply_attributes_updates() shifted the cursor and bookmarks but never skip_newline_at, so flushing updates while paused on the affected text node changed its reading ('foo'"\nfoo") and poisoned the flag recorded for later bookmarks. The shift is accumulated against original document coordinates and applied once after the updates are written, matching the existing pointer-shift pattern.

Behavior for normal text nodes, TEXTAREA, and seeking to the PRE/LISTING opener itself is unchanged.

Testing

WP_TESTS_SKIP_INSTALL=1 ./vendor/bin/phpunit --group html-api,html-api-html5lib-tests

2927 tests, 0 failures (baseline 2917; +10 new regression cases parameterized over PRE and LISTING, 1 previously-skipped test now runs).

sirreal added 2 commits June 12, 2026 17:45
The first newline after LISTING and PRE opening tags is ignored as an
authoring convenience. The Tag Processor tracks where such a newline
would be found and ignores it when reading modifiable text, but this
single tracked offset is overwritten upon scanning past any later
LISTING or PRE opening tag. Seeking directly to a bookmarked text node
restored the cursor without restoring that state, so the same token
could report different modifiable text on a repeat visit:

    $processor = new WP_HTML_Tag_Processor( "<pre>\nX<pre>" );
    $processor->next_token();                  // PRE.
    $processor->next_token();                  // #text.
    $processor->get_modifiable_text();         // 'X'
    $processor->set_bookmark( 'text' );
    $processor->next_token();                  // Second PRE clobbers state.
    $processor->seek( 'text' );
    $processor->get_modifiable_text();         // "\nX" - wrong!

Resolve this by recording, for each bookmark, whether its token
immediately follows a LISTING or PRE opening tag, and restoring the
ignored-newline state from the bookmark when seeking. Because the
recorded state is keyed to the bookmark, it shifts together with the
bookmark when applied document updates move token offsets.

This also prevents a stale tracked offset from colliding with an
unrelated token's post-update location, which could previously strip
a newline that should have been preserved.

Removes the corresponding documented limitation on
get_modifiable_text() and the skipped-test guard which covered it.

See #65372.
The position where an ignorable newline may follow a LISTING or PRE
opening tag was never adjusted when applying enqueued updates, even
though those updates shift the offsets of everything after them in
the document. Applying updates while paused on the text node following
one of these tags changed what get_modifiable_text() returned:

    $processor = new WP_HTML_Tag_Processor( "<pre class=\"x\">\nfoo" );
    $processor->next_token();                  // PRE.
    $processor->remove_attribute( 'class' );
    $processor->next_token();                  // #text.
    $processor->get_modifiable_text();         // 'foo'
    $processor->get_updated_html();
    $processor->get_modifiable_text();         // "\nfoo" - wrong!

A stale position also poisoned the ignored-newline state recorded for
bookmarks set after applying updates, which could reintroduce the
inconsistency through seek().

Resolve this by accumulating, against the original document
coordinates, the shift from every update located before the ignored-
newline position, and applying the sum once after the updates are
written - the same pattern used for shifting a given pointer. Comparing
against original coordinates matters: adjusting the position inside the
update loop would compare earlier-shifted values against later updates'
unshifted offsets and apply the wrong total shift when multiple updates
are applied together.

Follow-up to [63ac0ac].

See #65372.
@github-actions

Copy link
Copy Markdown

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant