feat(extension): clean reading mode via allowlist HTML sanitizer (Phase 1) by erichuang1425 · Pull Request #6 · erichuang1425/ForumForge

erichuang1425 · 2026-06-15T23:05:13Z

Summary

Starts Phase 1 with the first roadmap item, clean reading mode — the
piece #5 explicitly
deferred. The side panel now renders each post's rich contentHtml (links,
quotes, lists, code blocks) instead of flattening it to plain text, while
keeping the same hard guarantee that nothing from an untrusted forum page can
execute in the panel.

post.contentHtml ─▶ inert <template> parse ─▶ allowlist sanitizer ─▶ side panel

How it's kept safe

apps/extension/src/sanitize.ts is a build-up allowlist sanitizer, not a
tear-down one:

Untrusted HTML is parsed inertly via a detached <template> — its content
lives in a document with no browsing context, so nothing executes or loads
during parse (no script, no onerror, no image fetch).
The sanitizer rebuilds a fresh tree containing only allowlisted semantic
tags (p, a, blockquote, pre/code, lists, headings, tables, inline
emphasis…). It copies no attributes except a scheme-checked href on
links (to which it adds target="_blank" + rel="noopener noreferrer nofollow"). So on* handlers, style, class, id, and javascript: /
data: URLs simply never get carried over.
script/style/embeds/media/form controls are dropped with their subtree;
unknown layout wrappers (div, span, font) are unwrapped, keeping their
sanitized children.
Images are dropped by default to avoid third-party requests (privacy
stance). Opt-in image handling is a later clean-reading sub-feature.

render.ts renders the sanitized fragment and falls back to contentText
when a post has no usable HTML or sanitizes to nothing.

Why hand-rolled, not DOMPurify

Deliberate (decision confirmed with the maintainer): keep the extension's
zero-runtime-dependency stance, and clean reading mode wants a small,
normalized allowlist anyway. The accepted trade-off is that sanitize.ts is
security-critical code we own — so the security boundary (sanitizeFragment)
is a pure function with exhaustive XSS unit tests.

Files

src/sanitize.ts — the allowlist sanitizer (new).
src/render.ts — renders sanitized rich body, with text fallback.
public/sidepanel.html — reading styles for quotes, code, lists, links.
test/sanitize.test.ts (new, 13 tests) + test/render.test.ts (updated).
Docs: app README, ROADMAP.md / Initial Plan.md checklists, SECURITY.md
threat model, and .memory lessons 0003/0004.

Verification

pnpm -r typecheck — passes (core + parser + storage + extension)
pnpm test — 70 tests pass (15 new: 13 sanitizer + 2 render)
pnpm --filter @forumforge/extension build — bundles dist/

Intentionally not done

Rendering in a real browser is a manual step (needs a browser); the
sanitizer logic and render wiring are unit-tested against a DOM.
Image rendering and the rest of the clean-reading polish (typography
options, quote collapsing, keyboard navigation) are deferred follow-ups.

https://claude.ai/code/session_01WRRAzKehcHPUhGNibdiGp7

Generated by Claude Code

…se 1) Render each post's rich contentHtml in the side panel instead of flattening it to plain text. Untrusted forum HTML is parsed inertly (a detached <template>, no browsing context) and rebuilt by a build-up allowlist sanitizer that keeps only safe semantic tags and a sanitized href on links — dropping scripts, styles, embeds, media, inline handlers, and unsafe URL schemes. Images are dropped by default to avoid third-party requests. render.ts falls back to contentText when a post has no usable HTML or sanitizes to nothing. The security boundary (sanitizeFragment) is a pure function with exhaustive XSS unit tests. Hand-rolled rather than pulling in DOMPurify to keep the extension's zero-runtime-dependency stance; clean reading mode wants a small normalized allowlist anyway. Docs synced: app README, ROADMAP/Initial Plan checklists, SECURITY threat model, and .memory lessons 0003/0004. Verification: pnpm -r typecheck, pnpm test (70 pass; 15 new), pnpm build. https://claude.ai/code/session_01WRRAzKehcHPUhGNibdiGp7

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94faec5b9a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Body contentHtml is captured as raw innerHTML, so its hrefs are unresolved — the sanitizer's absolute-only scheme allowlist dropped common internal forum links like /thread/5#post-2, leaving an <a> with no href. Expose the page base the parser already computes as ExtractedThread.baseUrl, thread it through renderThread to the sanitizer, and resolve relative/fragment hrefs against it before the scheme allowlist. Resolution never widens safety: the allowlist is applied to the resolved URL, so a resolved javascript: URL is still dropped, and relative links are still dropped when no base is known. Verification: pnpm -r typecheck, pnpm test (75 pass; +5), pnpm build. https://claude.ai/code/session_01WRRAzKehcHPUhGNibdiGp7

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 78f3185334

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-15T23:17:27Z

+    body.append(sanitizeHtml(doc, post.contentHtml, baseUrl));
+    // Sanitizing can empty out a body (e.g. an image-only post): keep the rich
+    // body only if it has visible text, otherwise fall back to plain text.
+    if ((body.textContent ?? "").trim() !== "") return body;


Preserve newlines for text-only HTML bodies

When a forum body contains literal text with newline separators but no semantic tags (common on older pages that rely on white-space/pre-line CSS), this return takes the sanitized contentHtml path because textContent is non-empty. That bypasses the .ff-post__text fallback that still has white-space: pre-wrap, while .ff-post__body no longer preserves whitespace, so line breaks that Phase 0 rendered correctly collapse into a single paragraph; consider preserving whitespace for sanitized text-only bodies or falling back to contentText in this case.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread apps/extension/src/sanitize.ts Outdated

erichuang1425 merged commit 717b63b into main Jun 15, 2026
1 check passed

chatgpt-codex-connector Bot reviewed Jun 15, 2026

View reviewed changes

erichuang1425 mentioned this pull request Jun 19, 2026

feat(extension): OP highlighting in the side panel (Phase 1) #7

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(extension): clean reading mode via allowlist HTML sanitizer (Phase 1)#6

feat(extension): clean reading mode via allowlist HTML sanitizer (Phase 1)#6
erichuang1425 merged 2 commits into
mainfrom
claude/youthful-curie-zbs6yh

erichuang1425 commented Jun 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

erichuang1425 commented Jun 15, 2026

Summary

How it's kept safe

Why hand-rolled, not DOMPurify

Files

Verification

Intentionally not done

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants