Skip to content

[Review] Fix math block delimiter parsing#13

Closed
wargoblin wants to merge 1 commit into
upstream-masterfrom
fix/math-block-delimiters
Closed

[Review] Fix math block delimiter parsing#13
wargoblin wants to merge 1 commit into
upstream-masterfrom
fix/math-block-delimiters

Conversation

@wargoblin
Copy link
Copy Markdown
Owner

Internal review PR before updating upstream alecdotdev#142.

@wargoblin
Copy link
Copy Markdown
Owner Author

@codex /review focus: current head ff206bc, markdown preprocessing boundaries, multiline display math blocks, KaTeX data-math rendering, inline math regressions

@wargoblin wargoblin requested a review from Copilot May 7, 2026 15:06
@wargoblin
Copy link
Copy Markdown
Owner Author

@claude /review focus on multiline display math rendering, DOM text/br reconstruction, repeated KaTeX rendering, and inline/display delimiter edge cases

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @wargoblin's task in 3m 30s —— View job


PR Review: Fix math block delimiter parsing

  • Read changed files
  • Analyze multiline display math rendering
  • Analyze DOM text/br reconstruction
  • Analyze repeated KaTeX rendering
  • Analyze inline/display delimiter edge cases
  • Post full review

Overview

This PR introduces a proper preprocessing pipeline for display math blocks (replacing the fragile span[data-math]-only approach) and rewrites inline math delimiter conversion to handle adjacency and $$ passthrough. The data-math-source attribute is the key structural improvement — it decouples the KaTeX render source from the DOM text content so re-renders remain correct.

Overall the approach is sound, but there are a few correctness issues worth addressing before merging.


1. processInlineMath walks inside [data-math] elements (Medium)

src/lib/utils/markdown.ts:93-118

The tree walker only skips CODE, PRE, SCRIPT, STYLE. After processDisplayMathBlocks runs (line 560) and creates <div> with the raw LaTeX as textContent, processInlineMath (line 563) will walk those same text nodes.

For display math containing a bare $ — e.g. $$\text{price: } \$5$$ — the escaped \$ is safe because the backslash guard fires. But LaTeX with adjacent $ like $$a \cdot b + c$$ is fine too. The real problem emerges when processInlineMath converts something inside the display math text, because that corrupts textContent. Since renderRichContent uses data-math-source (not textContent) for the KaTeX direct-render loop, that path is safe. But renderMathInElement runs after it and scans the whole body for \(…\). Any \(…\) injected into display math textContent by processInlineMath would be found and re-rendered by renderMathInElement inside the already-KaTeX-rendered element.

Fix: add [data-math] to the rejected tagName set in the tree walker filter, or better, check the ancestor for data-math attribute.

// In processInlineMath's acceptNode:
if (curr.hasAttribute?.('data-math')) return NodeFilter.FILTER_REJECT;

Fix this →


2. extractDisplayMathBlock — silent drop of empty display math (Low)

src/lib/utils/markdown.ts:157-158

const math = trimmed.slice(2, -2).trim();
return math ? math : null;

$$\n$$ (an intentionally blank display block) returns null and the <p> is left in place, effectively silently swallowing the block. This may be fine in practice (empty display math is unusual), but a debug console warning would prevent confusion.


3. findInlineMathEnd returns -1 on trailing digit — overly broad (Medium)

src/lib/utils/markdown.ts:241

if (/\s/.test(beforeEnd) || /\d/.test(afterEnd)) return -1;

When afterEnd is a digit, -1 is returned immediately, meaning the entire opening $ is treated as a literal for the rest of the string. This prevents $n^2$2 or $f(x)$42 from ever converting the valid math span. The original regex also had this guard ((?![\d])), but at the position of the closing delimiter only, not as an abort of the whole scan. Because this is return -1 (not continue), there's no second chance to find a valid close.

This is a known conservative heuristic, but worth a comment explaining the decision.


4. DOM text/br reconstruction — previousWasBreak strips only one newline (Low)

src/lib/utils/markdown.ts:140

if (previousWasBreak && value.startsWith("\n")) value = value.slice(1);

If a text node following <br> starts with "\n\n" (some parsers emit double-newlines), only the first \n is stripped. The math source would then contain an extra blank line, which KaTeX renders correctly but looks inconsistent. A trimStart applied only when the text node is first-in-line, or stripping all leading \ns after <br>, would be more robust.


5. [data-math] selector change is correct — no issue

src/lib/MarkdownViewer.svelte:665 and src/lib/utils/markdown.ts:729

Widening from span[data-math] to [data-math] is the right call: new display blocks use <div> and parser-generated inline spans use <span data-math>. Both are now covered. The selector change is applied consistently in both code paths.


6. data-math-source as re-render guard — correct (Good)

src/lib/utils/markdown.ts:732

const mathSource = el.getAttribute('data-math-source') || el.textContent || '';

data-math-source is set during processMarkdownHtml (preprocessing phase) and never touched by KaTeX. On a re-render triggered by $effect, KaTeX will always read the original LaTeX source, not the rendered HTML. The el.textContent fallback for <span data-math> elements (emitted by the markdown parser without a data-math-source attribute) works because KaTeX preserves the original source in an aria-hidden span inside the rendered output, keeping textContent equal to the original source. This is correct.


7. Ordering: processDisplayMathBlocks before processInlineMath (Correct)

src/lib/utils/markdown.ts:560-563

processDisplayMathBlocks(doc.body, doc);   // creates <div data-math>
processBlockIds(doc.body, doc);
processTaskItems(doc.body);
processInlineMath(doc.body);               // converts $…$ in remaining text

Correct order. Display blocks are replaced before inline scan, preventing $$…$$ display content from being re-parsed as inline. The issue in point 1 above is that processInlineMath still visits the replacement element's text content — the fix there closes the loop.


8. Display math in tight list items not processed (Low)

src/lib/utils/markdown.ts:121

for (const element of Array.from(root.querySelectorAll("p"))) {

Tight Markdown lists produce <li> content without a wrapping <p>. A $$…$$ on its own line in a tight list item would remain as a <li> text node and not be converted to a display block. This is an edge case; the more common loose-list case wraps content in <p> and works correctly.


Summary

Issue Severity Status
processInlineMath walks inside [data-math] elements Medium Needs fix
findInlineMathEnd -1 on digit aborts whole span Medium Acceptable with a comment
Empty $$\n$$ silently dropped Low Consider a warning
previousWasBreak strips only one \n Low Acceptable
Tight-list display math missed Low Acceptable
[data-math] selector widened consistently ✅ Correct
data-math-source re-render guard ✅ Correct
processDisplayMathBlocks call order ✅ Correct

The structural approach is solid. The data-math-source + [data-math] pattern is a meaningful improvement over the previous el.textContent approach for re-render correctness. Addressing item 1 (walker skipping [data-math] elements) before merging to upstream would prevent a subtle renderMathInElement double-pass on display math content.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ff206bc0a0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/lib/utils/markdown.ts Outdated
}

function processDisplayMathBlocks(root: Element, doc: Document) {
for (const element of Array.from(root.querySelectorAll("p"))) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include tight list items in display math preprocessing

When a multiline $$ block is the only content of a tight list item, Comrak renders the item body as text/<br> children of <li> rather than wrapping it in a <p>, so this loop never calls extractDisplayMathBlock for inputs like - $$\n a+b\n $$. Because KaTeX auto-render still cannot match $$ delimiters split across separate text/br nodes, the math remains literal in list items even though top-level paragraphs now work.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c5f0ae8: display math preprocessing now also considers tight-list li nodes, while still requiring the element to contain only text/BR children before marking it as math.

@wargoblin wargoblin force-pushed the fix/math-block-delimiters branch from ff206bc to dd84b2c Compare May 7, 2026 15:12
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Markdown math delimiter handling by introducing explicit display-math block extraction ($$...$$) and replacing the prior regex-based inline parsing with a more robust inline delimiter conversion. It also updates KaTeX rendering to support both inline and display math elements via a unified [data-math] selector and a stable data-math-source attribute.

Changes:

  • Add display-math block detection for paragraphs that contain only $$...$$ (including cases split by <br>), emitting data-math="display" + data-math-source.
  • Replace regex-based $...$ inline conversion with convertInlineMathDelimiters(...) to better handle escapes/adjacent spans.
  • Update KaTeX rendering to read from data-math-source and to target all [data-math] elements; remove duplicated inline-math preprocessing in MarkdownViewer.svelte.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/lib/utils/markdown.ts Adds display-math block extraction + new inline delimiter conversion and updates KaTeX rendering input/selection.
src/lib/MarkdownViewer.svelte Removes redundant inline-math preprocessing and aligns KaTeX rendering with the new data-math(-source) approach.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@wargoblin
Copy link
Copy Markdown
Owner Author

@codex /review current head dd84b2c; focus on data-math ancestor skipping, multiline display math preprocessing, repeated KaTeX rendering, and inline math regression risk

@wargoblin wargoblin requested a review from Copilot May 7, 2026 15:13
@wargoblin
Copy link
Copy Markdown
Owner Author

@claude /review current head dd84b2c; verify the data-math ancestor skip addresses the previous actionable comment and check for remaining correctness issues only

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @wargoblin's task in 2m 40s —— View job


PR Review: dd84b2c — data-math ancestor skip verification

  • Read changed files
  • Verify data-math ancestor skip addresses previous actionable comment
  • Check for remaining correctness issues
  • Post final review

Actionable comment from prior review — RESOLVED ✅

src/lib/utils/markdown.ts:94-106

The ancestor-walking data-math check is correctly in place:

acceptNode(node) {
    let curr = node.parentElement;
    while (curr && curr !== root) {
        if (curr.hasAttribute("data-math"))        // ← new guard
            return NodeFilter.FILTER_REJECT;
        if (["CODE", "PRE", "SCRIPT", "STYLE"].includes(curr.tagName))
            return NodeFilter.FILTER_REJECT;
        curr = curr.parentElement;
    }
    return NodeFilter.FILTER_ACCEPT;
},

This fully closes the loop identified previously. Because processInlineMath no longer visits text nodes inside [data-math] containers, nothing injects \(...\) into display math content before renderMathInElement runs. The double-pass risk is gone.

The bonus fix at line 142 is also correct — replace(/^\n+/, "") now strips all leading newlines after a <br>, not just one, addressing the Low-severity previousWasBreak issue from the prior review.


Remaining correctness issues

1. findInlineMathEnd digit guard — comment mismatch (Low)

src/lib/utils/markdown.ts:242-243

// A following `$` may open an adjacent inline span; the outer loop handles it.
if (/\s/.test(beforeEnd) || /\d/.test(afterEnd)) return -1;

The comment explains the $-adjacency case but says nothing about the /\d/.test(afterEnd) branch. A reader has no context for why a trailing digit should abort the entire scan (rather than continue to search for a later closing $). The prior review flagged this as acceptable but asked for a comment. A one-liner would suffice:

// Trailing digit after closing $ likely means a unit/number context (e.g. "$10 USD"), not math.
if (/\s/.test(beforeEnd) || /\d/.test(afterEnd)) return -1;

2. Tight-list display math — still not reached (Low, pre-existing)

src/lib/utils/markdown.ts:123

processDisplayMathBlocks only iterates p elements. A $$ block in a tight list item (<li> with no wrapping <p>) is never extracted. renderMathInElement can't recover it either because $$ is split across text/br nodes. Codex's P2 comment on the same line confirms this. Still low-priority but worth noting since it affects any math-heavy list.

3. Empty display block — silently dropped (Low, pre-existing)

src/lib/utils/markdown.ts:159-160

$$\n$$ returns null with no indication; the <p> is left in the output. A console.warn would help during development. No change needed for correctness.


No new correctness issues found

The KaTeX render pipeline in MarkdownViewer.svelte (lines 664–689) is consistent: the direct-render loop reads data-math-source for all [data-math] elements, leaving no raw delimiters for renderMathInElement to double-process. The removal of the duplicate processInlineMath function from the Svelte component is clean. Both [data-math] selector usages are aligned.


Verdict

The Medium-severity blocker is resolved. The two remaining low-severity pre-existing issues (tight lists, empty block) are acceptable scope for a follow-up. Only the digit-guard comment (item 1 above) is a minor omission worth a one-line fix before merging.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread src/lib/utils/markdown.ts Outdated
Comment on lines +127 to +131
const mathElement = doc.createElement("div");
mathElement.setAttribute("data-math", "display");
mathElement.setAttribute("data-math-source", math);
mathElement.textContent = math;
element.replaceWith(mathElement);
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c5f0ae8: the preprocessing now preserves the original p/li element and its attributes such as data-sourcepos, replacing only its children with the normalized math source.

@wargoblin wargoblin force-pushed the fix/math-block-delimiters branch from dd84b2c to c5f0ae8 Compare May 7, 2026 15:21
@wargoblin
Copy link
Copy Markdown
Owner Author

@codex /review current head c5f0ae8; verify previous comments are addressed: tight-list display math and preserving sourcepos/attributes by reusing p/li elements

@wargoblin wargoblin requested a review from Copilot May 7, 2026 15:22
@wargoblin
Copy link
Copy Markdown
Owner Author

@claude /review current head c5f0ae8; verify previous Codex/Copilot comments are addressed and flag only remaining correctness blockers

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @wargoblin's task in 1m 42s —— View job


PR Review: c5f0ae8 — Codex/Copilot comment verification

  • Read changed files at current head
  • Verify Codex P2: tight-list <li> nodes now included
  • Verify Copilot: original element and attributes preserved
  • Flag any remaining correctness blockers

Previous Codex/Copilot comments — both resolved ✅

Codex P2 — Tight-list <li> nodes in display math preprocessing

src/lib/utils/markdown.ts:119

for (const element of Array.from(root.querySelectorAll("p, li"))) {

Confirmed. The selector now covers both <p> and <li>, so a $$...$$ block that Comrak renders as direct text/<br> children of a <li> (tight list) is caught. extractDisplayMathBlock still guards against <li> elements with non-text/BR children (nested <p>, inline elements, etc.) by returning null, so loose-list items with complex content are unaffected.


Copilot — Original <p>/<li> element and its attributes preserved

src/lib/utils/markdown.ts:122-126

element.setAttribute("data-math", "display");
element.setAttribute("data-math-source", math);
element.replaceChildren(doc.createTextNode(math));

Confirmed. The original element is mutated in-place — no replaceWith(). Attributes like data-sourcepos that MarkdownViewer uses for scroll/line sync survive. The prior Copilot concern (creating a new <div> and calling replaceWith) is fully addressed.


Remaining items — all low severity, pre-existing, non-blocking

1. findInlineMathEnd digit-abort still lacks a comment

src/lib/utils/markdown.ts:241

// A following `$` may open an adjacent inline span; the outer loop handles it.
if (/\s/.test(beforeEnd) || /\d/.test(afterEnd)) return -1;

The comment explains the $-adjacency branch but says nothing about why a trailing digit causes a hard abort of the entire scan rather than a continue. A one-liner would close this off:

// Trailing digit (e.g. "$10") is treated as currency, not math; abort rather than risk a false match later.
if (/\s/.test(beforeEnd) || /\d/.test(afterEnd)) return -1;

No correctness impact — the behavior is intentionally conservative.

2. Empty display block silently dropped

src/lib/utils/markdown.ts:157-160

$$\n$$ returns null from extractDisplayMathBlock and the element is left untouched. No regression from prior behavior; a console.warn would help during development but is not a blocker.


Verdict

Both actionable comments from Codex and Copilot are correctly resolved in c5f0ae8. No new correctness blockers introduced. The two low-severity pre-existing items are acceptable scope for a follow-up. This is ready to merge to upstream.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@wargoblin
Copy link
Copy Markdown
Owner Author

Internal review complete for c5f0ae8: fork CI passed, Claude confirmed the previous findings are addressed, Copilot re-reviewed with no new comments, and Codex review was skipped due usage limit.

@wargoblin
Copy link
Copy Markdown
Owner Author

Closing after internal review completion; upstream PR alecdotdev#142 is updated from this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants