Skip to content

feat(extension): clean reading mode via allowlist HTML sanitizer (Phase 1)#6

Merged
erichuang1425 merged 2 commits into
mainfrom
claude/youthful-curie-zbs6yh
Jun 15, 2026
Merged

feat(extension): clean reading mode via allowlist HTML sanitizer (Phase 1)#6
erichuang1425 merged 2 commits into
mainfrom
claude/youthful-curie-zbs6yh

Conversation

@erichuang1425

Copy link
Copy Markdown
Owner

Summary

Starts Phase 1 with the first roadmap item, clean reading mode — the
piece #5 explicitly
deferred. The side panel now renders each post's rich contentHtml (links,
quotes, lists, code blocks) instead of flattening it to plain text, while
keeping the same hard guarantee that nothing from an untrusted forum page can
execute in the panel.

post.contentHtml ─▶ inert <template> parse ─▶ allowlist sanitizer ─▶ side panel

How it's kept safe

apps/extension/src/sanitize.ts is a build-up allowlist sanitizer, not a
tear-down one:

  • Untrusted HTML is parsed inertly via a detached <template> — its content
    lives in a document with no browsing context, so nothing executes or loads
    during parse
    (no script, no onerror, no image fetch).
  • The sanitizer rebuilds a fresh tree containing only allowlisted semantic
    tags
    (p, a, blockquote, pre/code, lists, headings, tables, inline
    emphasis…). It copies no attributes except a scheme-checked href on
    links (to which it adds target="_blank" + rel="noopener noreferrer nofollow"). So on* handlers, style, class, id, and javascript: /
    data: URLs simply never get carried over.
  • script/style/embeds/media/form controls are dropped with their subtree;
    unknown layout wrappers (div, span, font) are unwrapped, keeping their
    sanitized children.
  • Images are dropped by default to avoid third-party requests (privacy
    stance). Opt-in image handling is a later clean-reading sub-feature.

render.ts renders the sanitized fragment and falls back to contentText
when a post has no usable HTML or sanitizes to nothing.

Why hand-rolled, not DOMPurify

Deliberate (decision confirmed with the maintainer): keep the extension's
zero-runtime-dependency stance, and clean reading mode wants a small,
normalized allowlist anyway. The accepted trade-off is that sanitize.ts is
security-critical code we own — so the security boundary (sanitizeFragment)
is a pure function with exhaustive XSS unit tests.

Files

  • src/sanitize.ts — the allowlist sanitizer (new).
  • src/render.ts — renders sanitized rich body, with text fallback.
  • public/sidepanel.html — reading styles for quotes, code, lists, links.
  • test/sanitize.test.ts (new, 13 tests) + test/render.test.ts (updated).
  • Docs: app README, ROADMAP.md / Initial Plan.md checklists, SECURITY.md
    threat model, and .memory lessons 0003/0004.

Verification

  • pnpm -r typecheck — passes (core + parser + storage + extension)
  • pnpm test70 tests pass (15 new: 13 sanitizer + 2 render)
  • pnpm --filter @forumforge/extension build — bundles dist/

Intentionally not done

  • Rendering in a real browser is a manual step (needs a browser); the
    sanitizer logic and render wiring are unit-tested against a DOM.
  • Image rendering and the rest of the clean-reading polish (typography
    options, quote collapsing, keyboard navigation) are deferred follow-ups.

https://claude.ai/code/session_01WRRAzKehcHPUhGNibdiGp7


Generated by Claude Code

…se 1)

Render each post's rich contentHtml in the side panel instead of flattening it
to plain text. Untrusted forum HTML is parsed inertly (a detached <template>,
no browsing context) and rebuilt by a build-up allowlist sanitizer that keeps
only safe semantic tags and a sanitized href on links — dropping scripts,
styles, embeds, media, inline handlers, and unsafe URL schemes. Images are
dropped by default to avoid third-party requests. render.ts falls back to
contentText when a post has no usable HTML or sanitizes to nothing.

The security boundary (sanitizeFragment) is a pure function with exhaustive
XSS unit tests. Hand-rolled rather than pulling in DOMPurify to keep the
extension's zero-runtime-dependency stance; clean reading mode wants a small
normalized allowlist anyway.

Docs synced: app README, ROADMAP/Initial Plan checklists, SECURITY threat
model, and .memory lessons 0003/0004.

Verification: pnpm -r typecheck, pnpm test (70 pass; 15 new), pnpm build.

https://claude.ai/code/session_01WRRAzKehcHPUhGNibdiGp7

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94faec5b9a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread apps/extension/src/sanitize.ts Outdated
Body contentHtml is captured as raw innerHTML, so its hrefs are unresolved —
the sanitizer's absolute-only scheme allowlist dropped common internal forum
links like /thread/5#post-2, leaving an <a> with no href. Expose the page base
the parser already computes as ExtractedThread.baseUrl, thread it through
renderThread to the sanitizer, and resolve relative/fragment hrefs against it
before the scheme allowlist. Resolution never widens safety: the allowlist is
applied to the resolved URL, so a resolved javascript: URL is still dropped,
and relative links are still dropped when no base is known.

Verification: pnpm -r typecheck, pnpm test (75 pass; +5), pnpm build.

https://claude.ai/code/session_01WRRAzKehcHPUhGNibdiGp7
@erichuang1425 erichuang1425 merged commit 717b63b into main Jun 15, 2026
1 check passed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 78f3185334

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

body.append(sanitizeHtml(doc, post.contentHtml, baseUrl));
// Sanitizing can empty out a body (e.g. an image-only post): keep the rich
// body only if it has visible text, otherwise fall back to plain text.
if ((body.textContent ?? "").trim() !== "") return body;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve newlines for text-only HTML bodies

When a forum body contains literal text with newline separators but no semantic tags (common on older pages that rely on white-space/pre-line CSS), this return takes the sanitized contentHtml path because textContent is non-empty. That bypasses the .ff-post__text fallback that still has white-space: pre-wrap, while .ff-post__body no longer preserves whitespace, so line breaks that Phase 0 rendered correctly collapse into a single paragraph; consider preserving whitespace for sanitized text-only bodies or falling back to contentText in this case.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants