Skip to content

Fix block content leaking out of marker-line nested-list items#251

Merged
dereuromark merged 1 commit into
masterfrom
fix/marker-line-block-absorption
Jun 20, 2026
Merged

Fix block content leaking out of marker-line nested-list items#251
dereuromark merged 1 commit into
masterfrom
fix/marker-line-block-absorption

Conversation

@dereuromark

Copy link
Copy Markdown
Contributor

The bug

A nested list opened on its parent item's marker line (- - A) was treated as line-scoped. The inner list was materialized from an isolated single-line slice, so its item closed immediately at the line break. As a result:

  • a block placed in the inner item leaked out to the outer item, and
  • a following same-indent marker fragmented the list into two separate lists.

The reference djot.js 0.3.2 keeps the inner item open: it absorbs blocks indented to its content column and continues the list across following same-indent markers. djot-php now matches that.

(carve-php inherits this parser and was affected by the same bug.)

The fix

When an item's content itself begins with a list marker (the marker-line sublist case), collect the whole nested region (lines indented past the inner marker column, plus markers sitting at it, across blank lines) into the item's lines and parse them as blocks. The existing recursive list parser then builds a persistent inner list - reusing the nested-list handling that already works for a sublist appearing on a following line, rather than adding a parallel path.

Boundary rules (verified against reference djot.js 0.3.2):

  • A line indented past the inner marker column is inner-item content or a deeper sublist.
  • A list marker exactly at the inner marker column continues / siblings the inner list.
  • A non-marker line at the inner marker column is inner-item content only when it lazily continues an open paragraph; after a blank line (or with nothing open before it) it is outer-item content and ends the region.

Cases now matching the reference

Case 1 - input - - A\n\n block for A\n - B:

<ul>
<li>
<ul>
<li>
<p>A</p>
<p>block for A</p>
</li>
<li>
<p>B</p>
</li>
</ul>
</li>
</ul>

Case 2 - input - - A\n\n block under A (block stays inside inner item A, no leak):

<ul>
<li>
<ul>
<li>
<p>A</p>
<p>block under A</p>
</li>
</ul>
</li>
</ul>

Case 3 - input - - A\n - B\n - C (already worked, kept as a regression guard): single tight inner list [A, B, C].

Scope

  • The paragraph-interrupt rule ("a list needs a blank line to follow a paragraph") is untouched.
  • Bare-marker rejection is untouched.
  • Single-pass parsing is preserved.
  • Diff is confined to the marker-line (loose) sublist path in BlockParser::tryParseList(); a corpus diff of 33 non-marker-line list inputs shows zero output changes versus master. New tests pin Cases 1, 2, and 3 to exact HTML.

A nested list opened on its parent item's marker line (`- - A`) was
treated as line-scoped: the inner list was materialized from an isolated
single-line slice, so its item closed immediately. A block placed in that
item then leaked out to the outer item, and a following same-indent
marker fragmented the list into two.

The reference djot.js 0.3.2 keeps the inner item open: it absorbs blocks
indented to its content column and continues the list across following
same-indent markers. djot-php now matches that.

When an item's content itself begins with a list marker, collect the
whole nested region (lines indented past the inner marker column, plus
markers at it, across blank lines) into the item's lines and parse them
as blocks. The existing recursive list parser then builds a persistent
inner list - reusing the nested-list handling that already works for a
sublist appearing on a following line, rather than adding a parallel
path. A non-marker line at the inner marker column, or anything less
indented, stays outer-item content.

Cases now matching the reference:
- `- - A\n\n    block for A\n  - B` -> inner item A keeps the block; B
  stays in the same inner list.
- `- - A\n\n    block under A` -> block stays inside inner item A (no
  leak to the outer item).
- `- - A\n  - B\n  - C` -> single tight inner list [A, B, C]
  (unchanged).

The paragraph-interrupt rule and bare-marker rejection are untouched.
carve-php inherits this parser and was affected by the same bug.
@codecov

codecov Bot commented Jun 20, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.21053% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.34%. Comparing base (58a181a) to head (28f4ade).

Files with missing lines Patch % Lines
src/Parser/BlockParser.php 84.21% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #251      +/-   ##
============================================
- Coverage     92.37%   92.34%   -0.04%     
- Complexity     3576     3593      +17     
============================================
  Files           107      107              
  Lines         10129    10165      +36     
============================================
+ Hits           9357     9387      +30     
- Misses          772      778       +6     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dereuromark dereuromark merged commit 78f8aec into master Jun 20, 2026
4 of 6 checks passed
@dereuromark dereuromark deleted the fix/marker-line-block-absorption branch June 20, 2026 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant