Release 0.18.3 — native screenshot renderer + cross-platform OCR by ronaldeddings · Pull Request #116 · Hacker-Valley-Media/Interceptor

ronaldeddings · 2026-06-20T10:36:58Z

Release 0.18.3 — native screenshot renderer + cross-platform OCR

Bumps every surface to 0.18.3. Two headline changes.

Screenshots — native DOM renderer (html-to-image removed)

The default DOM-render screenshot hung for the full CLI timeout on any backgrounded tab — html-to-image resolved its image load inside requestAnimationFrame, which Chrome suspends for hidden tabs, so the render never completed. Replaced the dependency with a direct native renderer (getComputedStyle → inline cssText, XMLSerializer → <svg><foreignObject>, Image/decode, Canvas, FileReader/fetch for embedding images, canvas snapshots, and background-image resources).

Works fully backgrounded, no focus required.
Much faster on large pages: inline <svg> subtrees are deep-cloned wholesale instead of inlining styles per descendant, so SVG-heavy pages (a full Wikipedia article, ~11k px, ~2.6k icons) render in ~7s on a hidden tab instead of timing out.
DOM-render timeout guard fails fast with a clear error instead of a silent 45s hang.
html-to-image, its vendored runner, and its patches are removed entirely.

OCR — native-first + cross-platform (ocrad.js removed, Tesseract.js added)

canvas ocr now returns the canvas's native accessible text (aria + fallback subtree + figcaption) plus the page's semantic textbox model, instead of low-quality pixel OCR. ocrad.js (an unmaintained pixel-OCR blob) is removed.
New interceptor ocr <selector|element|region>: renders the target natively and OCRs it with a bundled Tesseract.js engine — offline, cross-platform, no native bridge required; returns a deterministic text string with a confidence score. canvas ocr falls back to it for pixel-only canvases. WASM core + worker + English data are bundled and loaded from extension-local URLs (wasm-unsafe-eval added to the extension CSP).

Notes

Net +841 / −1218 (removed the html-to-image runner and ocrad).
All surfaces bumped to 0.18.3. Test suite: 484 pass / 10 skip / 0 fail.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features

Added OCR command to the CLI for extracting text from page elements, regions, and CSS selectors.
Improved canvas OCR to intelligently prioritize native accessible text sources before using pixel-based recognition.

Bug Fixes

Enhanced DOM screenshot rendering with better timeout handling and improved error guidance.

Chores

Version updated to 0.18.3.
Upgraded OCR processing for better accuracy and performance.

Bumps every surface to 0.18.3. Two headline changes: the DOM-render screenshot path is now a dependency-free native renderer (html-to-image removed), and OCR is reworked to be native-first with a bundled cross-platform engine (ocrad.js removed, Tesseract.js added). Screenshots — native DOM renderer - The default DOM-render screenshot hung for the full CLI timeout on any backgrounded tab: html-to-image resolved its image load inside requestAnimationFrame, which Chrome suspends for hidden tabs, so the render never completed. Replaced the html-to-image dependency with a direct native renderer: getComputedStyle -> inline cssText, XMLSerializer -> <svg><foreignObject>, Image/decode, Canvas, and FileReader/fetch for embedding images, canvas snapshots, and background-image resources. - Works fully backgrounded with no focus, and is dramatically faster on large pages: inline <svg> subtrees are deep-cloned wholesale instead of having styles inlined per descendant, so SVG-heavy pages (e.g. a full Wikipedia article, ~11k px tall, ~2.6k inline icons) render in ~7s on a hidden tab instead of timing out. - A DOM-render timeout guard now fails fast with a clear error instead of a silent 45s hang. - html-to-image, its vendored runner, and its patches are removed entirely. OCR — native-first + cross-platform - `canvas ocr` now returns the canvas's native accessible text (aria-label / aria-labelledby / fallback subtree / figcaption) plus the page's semantic textbox model, instead of low-quality pixel OCR. ocrad.js — an unmaintained pixel-OCR blob — is removed. - New `interceptor ocr <selector|element|region>`: renders the target via the native path and OCRs it with a bundled Tesseract.js engine. Offline, cross-platform, no native bridge required; returns a deterministic text string with a confidence score. `canvas ocr` falls back to it for pixel-only canvases. The WASM core, worker, and English language data are bundled and loaded from extension-local URLs (works on any page CSP; wasm-unsafe-eval added to the extension CSP). Bump all surfaces to 0.18.3. Suite 484/0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-20T10:37:11Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Replaces the html-to-image vendor library with a native SVG-foreignObject DOM-cloning renderer in the content script. Swaps the ocrad.js OCR engine for tesseract.js with extension-local WASM assets. Adds a new top-level ocr CLI command and background action. Changes canvas OCR to prefer accessibility text (ARIA/fallback subtree/figcaption) before falling back to pixel OCR. Bumps to v0.18.3.

Changes

Native DOM Renderer + Tesseract OCR Pipeline

Layer / File(s)	Summary
Native DOM renderer replaces html-to-image `extension/src/content/dom-screenshot.ts`, `extension/dist-mv2/content.js`, `extension/src/background/capabilities/screenshot.ts`, `extension/dist-mv2/background-electron.js`, `package.json`, `scripts/build.sh`	Removes the `screenshot-runner.js` injection of `html-to-image` and replaces the rendering pipeline with a native clone + inline-styles + SVG `foreignObject` rasterizer. `handleDomScreenshot` is rewritten to use `nativeRenderToDataUrl` with mode-aware dimensions and optional region cropping via `cropDataUrl`.
Tesseract.js OCR engine replaces ocrad.js `extension/src/offscreen.ts`, `extension/manifest.json`, `scripts/build.sh`, `package.json`	`offscreen.ts` gains a lazily-initialized Tesseract.js worker loading WASM/traineddata from bundled `tesseract/` extension assets. `ocrImage` is rewritten to use the worker, returning normalized text with confidence. `ocrad.js` and `html-to-image` are removed from dependencies. Manifest gains `wasm-unsafe-eval` CSP and build bundles Tesseract assets.
Canvas OCR: accessibility-first, pixel OCR fallback `extension/src/background/capabilities/canvas.ts`, `extension/dist-mv2/background-electron.js`, `.agents/skills/interceptor-browser/references/command-catalog.md`	`canvasAccessibleText()` aggregates aria-label, aria-labelledby, fallback subtree textContent, figcaption, aria-describedby, and title for a canvas element. The `canvas_ocr` handler now selects accessibility text first, then semantic textbox text, and only runs pixel OCR as a last resort. Diagnostics fields and docs updated accordingly.
New `ocr` background action + DOM-render timeout guard `extension/src/background/capabilities/screenshot.ts`, `extension/src/background/router.ts`, `extension/dist-mv2/background-electron.js`	`handleOcr` renders PNG via `handleDomRenderScreenshot`, sends the data URL to the Tesseract offscreen worker, and returns text with confidence/dimensions. The DOM-render path is wrapped in `withCaptureTimeout(DOM_RENDER_TIMEOUT_MS=30s)`. `SCREENSHOT_ACTIONS` and the dispatcher are updated to route `"ocr"`.
CLI `ocr` command, transport timeouts, and routing `cli/commands/screenshot.ts`, `cli/help.ts`, `cli/index.ts`, `cli/transport.ts`	`parseScreenshotCommand` gains an `"ocr"` case parsing selector/ref/region/scale/target-max-long-edge. CLI routing adds `"ocr"` to known commands and introduces `unwrapResult`. Help text adds `interceptor ocr` entries and updates the canvas ocr description. `ACTION_TIMEOUT_OVERRIDES_MS` sets `canvas_ocr` and `ocr` to 60s.
Tests, version bumps, and manifest `test/screenshot-minimized-preflight.test.ts`, `cli/version.ts`, `package.json`, `extension/dist-mv2/manifest.json`, `extension/manifest.json`	Preflight tests switch from tracking `scripting.executeScript` injection to tracking `sendMessage` dispatch. Minimized-window case asserts zero `sendMessage` calls; non-minimized asserts dispatch proceeds. All versions bumped to 0.18.3.

Sequence Diagram(s)

sequenceDiagram
  participant CLI
  participant Background as screenshot.ts
  participant Content as dom-screenshot.ts
  participant Offscreen as offscreen.ts / Tesseract

  rect rgba(100, 149, 237, 0.5)
    note over CLI,Offscreen: New top-level OCR flow
    CLI->>Background: action { type: "ocr", selector/region }
    Background->>Content: sendToContentScript dom_screenshot (PNG, withCaptureTimeout 30s)
    Content->>Content: clone DOM, inline styles, fetch resources as data URLs
    Content->>Content: serialize SVG foreignObject → Image.decode() → canvas.toDataURL()
    Content-->>Background: { dataUrl, width, height }
    Background->>Offscreen: { type: "ocr", dataUrl }
    Offscreen->>Offscreen: getOcrWorker() lazy-init Tesseract WASM
    Offscreen-->>Background: { success: true, text, confidence }
    Background-->>CLI: { text, source, confidence, width, height }
  end

flowchart LR
  A[canvas_ocr action] --> B[canvasAccessibleText]
  B --> C{ARIA / figcaption found?}
  C -- yes --> G[return accessibility text]
  C -- no --> D[hostCanvasSignals semantic textbox]
  D --> E{semantic text found?}
  E -- yes --> H[return semantic text]
  E -- no --> F[canvas_read + Tesseract pixel OCR]
  F --> I[return OCR text or no-text hint]

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Hacker-Valley-Media/Interceptor#44: Introduced the original DOM-render screenshot pipeline (html-to-image runner injection) that this PR replaces wholesale.
Hacker-Valley-Media/Interceptor#97: Modified handleDomRenderScreenshot and the surrounding DOM-render flow in screenshot.ts, the same code path extended with the timeout guard and OCR handler here.

Poem

🐇 No more borrowing libraries from the shelf,
I draw the DOM myself, myself, myself!
Tesseract reads the pixels with a trained eye,
While ARIA text skips the OCR — oh my!
Canvas speaks first through its accessible voice,
Version 0.18.3 — what a wonderful choice! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.64% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: a native screenshot renderer replacing html-to-image and cross-platform OCR replacing ocrad.js, matching the core focus of the entire changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch release/0.18.3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ronaldeddings merged commit 56040f7 into main Jun 20, 2026
1 of 2 checks passed

ronaldeddings deleted the release/0.18.3 branch June 20, 2026 10:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.18.3 — native screenshot renderer + cross-platform OCR#116

Release 0.18.3 — native screenshot renderer + cross-platform OCR#116
ronaldeddings merged 1 commit into
mainfrom
release/0.18.3

ronaldeddings commented Jun 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ronaldeddings commented Jun 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!