HTML API: Ignore leading newline after seeking to text following PRE by sirreal · Pull Request #60 · sirreal/wordpress-develop

sirreal · 2026-06-12T18:50:35Z

What

Fixes WP_HTML_Tag_Processor::seek() so that seeking back to a bookmarked text token reproduces the same public token state observed on the first scan. Found by the fuzzer seek invariant.

Trac ticket: https://core.trac.wordpress.org/ticket/65372

The bug

The first newline after a LISTING or PRE opening tag is ignored as an authoring convenience. The Tag Processor tracks a single offset (skip_newline_at) for this, set when scanning past such an opening tag — so scanning past a later PRE/LISTING overwrites it, and seek() restored the cursor without restoring it:

$processor = new WP_HTML_Tag_Processor( "<pre>\nX<pre>" );
$processor->next_token();                  // PRE.
$processor->next_token();                  // #text.
$processor->get_modifiable_text();         // 'X'
$processor->set_bookmark( 'text' );
$processor->next_token();                  // Second PRE clobbers state.
$processor->seek( 'text' );
$processor->get_modifiable_text();         // "\nX" - wrong!

The fix

Two commits:

Restore the state on seek. set_bookmark() records, per bookmark, whether its token immediately follows a LISTING/PRE opening tag; seek() restores skip_newline_at from the (update-shifted) bookmark start when flagged, and clears it otherwise. This removes the limitation previously documented on get_modifiable_text() and un-skips the existing test that covered it. It also prevents a stale offset from colliding with an unrelated token's post-update location, which could strip a newline that should have been preserved.
Shift the position when applying updates. apply_attributes_updates() shifted the cursor and bookmarks but never skip_newline_at, so flushing updates while paused on the affected text node changed its reading ('foo' → "\nfoo") and poisoned the flag recorded for later bookmarks. The shift is accumulated against original document coordinates and applied once after the updates are written, matching the existing pointer-shift pattern.

Behavior for normal text nodes, TEXTAREA, and seeking to the PRE/LISTING opener itself is unchanged.

Testing

WP_TESTS_SKIP_INSTALL=1 ./vendor/bin/phpunit --group html-api,html-api-html5lib-tests

2927 tests, 0 failures (baseline 2917; +10 new regression cases parameterized over PRE and LISTING, 1 previously-skipped test now runs).

The first newline after LISTING and PRE opening tags is ignored as an authoring convenience. The Tag Processor tracks where such a newline would be found and ignores it when reading modifiable text, but this single tracked offset is overwritten upon scanning past any later LISTING or PRE opening tag. Seeking directly to a bookmarked text node restored the cursor without restoring that state, so the same token could report different modifiable text on a repeat visit: $processor = new WP_HTML_Tag_Processor( "<pre>\nX<pre>" ); $processor->next_token(); // PRE. $processor->next_token(); // #text. $processor->get_modifiable_text(); // 'X' $processor->set_bookmark( 'text' ); $processor->next_token(); // Second PRE clobbers state. $processor->seek( 'text' ); $processor->get_modifiable_text(); // "\nX" - wrong! Resolve this by recording, for each bookmark, whether its token immediately follows a LISTING or PRE opening tag, and restoring the ignored-newline state from the bookmark when seeking. Because the recorded state is keyed to the bookmark, it shifts together with the bookmark when applied document updates move token offsets. This also prevents a stale tracked offset from colliding with an unrelated token's post-update location, which could previously strip a newline that should have been preserved. Removes the corresponding documented limitation on get_modifiable_text() and the skipped-test guard which covered it. See #65372.

The position where an ignorable newline may follow a LISTING or PRE opening tag was never adjusted when applying enqueued updates, even though those updates shift the offsets of everything after them in the document. Applying updates while paused on the text node following one of these tags changed what get_modifiable_text() returned: $processor = new WP_HTML_Tag_Processor( "<pre class=\"x\">\nfoo" ); $processor->next_token(); // PRE. $processor->remove_attribute( 'class' ); $processor->next_token(); // #text. $processor->get_modifiable_text(); // 'foo' $processor->get_updated_html(); $processor->get_modifiable_text(); // "\nfoo" - wrong! A stale position also poisoned the ignored-newline state recorded for bookmarks set after applying updates, which could reintroduce the inconsistency through seek(). Resolve this by accumulating, against the original document coordinates, the shift from every update located before the ignored- newline position, and applying the sum once after the updates are written - the same pattern used for shifting a given pointer. Comparing against original coordinates matters: adjusting the position inside the update loop would compare earlier-shifted values against later updates' unshifted offsets and apply the wrong total shift when multiple updates are applied together. Follow-up to [63ac0ac]. See #65372.

github-actions · 2026-06-12T18:50:46Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

sirreal added 2 commits June 12, 2026 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML API: Ignore leading newline after seeking to text following PRE#60

HTML API: Ignore leading newline after seeking to text following PRE#60
sirreal wants to merge 2 commits into
trunkfrom
fix-pre-seek-2-fable

sirreal commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sirreal commented Jun 12, 2026

What

The bug

The fix

Testing

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant