feat(extension): clean reading mode via allowlist HTML sanitizer (Phase 1)#6
Conversation
…se 1) Render each post's rich contentHtml in the side panel instead of flattening it to plain text. Untrusted forum HTML is parsed inertly (a detached <template>, no browsing context) and rebuilt by a build-up allowlist sanitizer that keeps only safe semantic tags and a sanitized href on links — dropping scripts, styles, embeds, media, inline handlers, and unsafe URL schemes. Images are dropped by default to avoid third-party requests. render.ts falls back to contentText when a post has no usable HTML or sanitizes to nothing. The security boundary (sanitizeFragment) is a pure function with exhaustive XSS unit tests. Hand-rolled rather than pulling in DOMPurify to keep the extension's zero-runtime-dependency stance; clean reading mode wants a small normalized allowlist anyway. Docs synced: app README, ROADMAP/Initial Plan checklists, SECURITY threat model, and .memory lessons 0003/0004. Verification: pnpm -r typecheck, pnpm test (70 pass; 15 new), pnpm build. https://claude.ai/code/session_01WRRAzKehcHPUhGNibdiGp7
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94faec5b9a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Body contentHtml is captured as raw innerHTML, so its hrefs are unresolved — the sanitizer's absolute-only scheme allowlist dropped common internal forum links like /thread/5#post-2, leaving an <a> with no href. Expose the page base the parser already computes as ExtractedThread.baseUrl, thread it through renderThread to the sanitizer, and resolve relative/fragment hrefs against it before the scheme allowlist. Resolution never widens safety: the allowlist is applied to the resolved URL, so a resolved javascript: URL is still dropped, and relative links are still dropped when no base is known. Verification: pnpm -r typecheck, pnpm test (75 pass; +5), pnpm build. https://claude.ai/code/session_01WRRAzKehcHPUhGNibdiGp7
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 78f3185334
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| body.append(sanitizeHtml(doc, post.contentHtml, baseUrl)); | ||
| // Sanitizing can empty out a body (e.g. an image-only post): keep the rich | ||
| // body only if it has visible text, otherwise fall back to plain text. | ||
| if ((body.textContent ?? "").trim() !== "") return body; |
There was a problem hiding this comment.
Preserve newlines for text-only HTML bodies
When a forum body contains literal text with newline separators but no semantic tags (common on older pages that rely on white-space/pre-line CSS), this return takes the sanitized contentHtml path because textContent is non-empty. That bypasses the .ff-post__text fallback that still has white-space: pre-wrap, while .ff-post__body no longer preserves whitespace, so line breaks that Phase 0 rendered correctly collapse into a single paragraph; consider preserving whitespace for sanitized text-only bodies or falling back to contentText in this case.
Useful? React with 👍 / 👎.
Summary
Starts Phase 1 with the first roadmap item, clean reading mode — the
piece #5 explicitly
deferred. The side panel now renders each post's rich
contentHtml(links,quotes, lists, code blocks) instead of flattening it to plain text, while
keeping the same hard guarantee that nothing from an untrusted forum page can
execute in the panel.
How it's kept safe
apps/extension/src/sanitize.tsis a build-up allowlist sanitizer, not atear-down one:
<template>— its contentlives in a document with no browsing context, so nothing executes or loads
during parse (no script, no
onerror, no image fetch).tags (
p,a,blockquote,pre/code, lists, headings, tables, inlineemphasis…). It copies no attributes except a scheme-checked
hrefonlinks (to which it adds
target="_blank"+rel="noopener noreferrer nofollow"). Soon*handlers,style,class,id, andjavascript:/data:URLs simply never get carried over.script/style/embeds/media/form controls are dropped with their subtree;unknown layout wrappers (
div,span,font) are unwrapped, keeping theirsanitized children.
stance). Opt-in image handling is a later clean-reading sub-feature.
render.tsrenders the sanitized fragment and falls back tocontentTextwhen a post has no usable HTML or sanitizes to nothing.
Why hand-rolled, not DOMPurify
Deliberate (decision confirmed with the maintainer): keep the extension's
zero-runtime-dependency stance, and clean reading mode wants a small,
normalized allowlist anyway. The accepted trade-off is that
sanitize.tsissecurity-critical code we own — so the security boundary (
sanitizeFragment)is a pure function with exhaustive XSS unit tests.
Files
src/sanitize.ts— the allowlist sanitizer (new).src/render.ts— renders sanitized rich body, with text fallback.public/sidepanel.html— reading styles for quotes, code, lists, links.test/sanitize.test.ts(new, 13 tests) +test/render.test.ts(updated).ROADMAP.md/Initial Plan.mdchecklists,SECURITY.mdthreat model, and
.memorylessons 0003/0004.Verification
pnpm -r typecheck— passes (core + parser + storage + extension)pnpm test— 70 tests pass (15 new: 13 sanitizer + 2 render)pnpm --filter @forumforge/extension build— bundlesdist/Intentionally not done
sanitizer logic and render wiring are unit-tested against a DOM.
options, quote collapsing, keyboard navigation) are deferred follow-ups.
https://claude.ai/code/session_01WRRAzKehcHPUhGNibdiGp7
Generated by Claude Code