Skip to content

Release 0.18.3 — native screenshot renderer + cross-platform OCR#116

Merged
ronaldeddings merged 1 commit into
mainfrom
release/0.18.3
Jun 20, 2026
Merged

Release 0.18.3 — native screenshot renderer + cross-platform OCR#116
ronaldeddings merged 1 commit into
mainfrom
release/0.18.3

Conversation

@ronaldeddings

@ronaldeddings ronaldeddings commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

Release 0.18.3 — native screenshot renderer + cross-platform OCR

Bumps every surface to 0.18.3. Two headline changes.

Screenshots — native DOM renderer (html-to-image removed)

The default DOM-render screenshot hung for the full CLI timeout on any backgrounded tab — html-to-image resolved its image load inside requestAnimationFrame, which Chrome suspends for hidden tabs, so the render never completed. Replaced the dependency with a direct native renderer (getComputedStyle → inline cssText, XMLSerializer<svg><foreignObject>, Image/decode, Canvas, FileReader/fetch for embedding images, canvas snapshots, and background-image resources).

  • Works fully backgrounded, no focus required.
  • Much faster on large pages: inline <svg> subtrees are deep-cloned wholesale instead of inlining styles per descendant, so SVG-heavy pages (a full Wikipedia article, ~11k px, ~2.6k icons) render in ~7s on a hidden tab instead of timing out.
  • DOM-render timeout guard fails fast with a clear error instead of a silent 45s hang.
  • html-to-image, its vendored runner, and its patches are removed entirely.

OCR — native-first + cross-platform (ocrad.js removed, Tesseract.js added)

  • canvas ocr now returns the canvas's native accessible text (aria + fallback subtree + figcaption) plus the page's semantic textbox model, instead of low-quality pixel OCR. ocrad.js (an unmaintained pixel-OCR blob) is removed.
  • New interceptor ocr <selector|element|region>: renders the target natively and OCRs it with a bundled Tesseract.js engine — offline, cross-platform, no native bridge required; returns a deterministic text string with a confidence score. canvas ocr falls back to it for pixel-only canvases. WASM core + worker + English data are bundled and loaded from extension-local URLs (wasm-unsafe-eval added to the extension CSP).

Notes

  • Net +841 / −1218 (removed the html-to-image runner and ocrad).
  • All surfaces bumped to 0.18.3. Test suite: 484 pass / 10 skip / 0 fail.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features

  • Added OCR command to the CLI for extracting text from page elements, regions, and CSS selectors.
  • Improved canvas OCR to intelligently prioritize native accessible text sources before using pixel-based recognition.

Bug Fixes

  • Enhanced DOM screenshot rendering with better timeout handling and improved error guidance.

Chores

  • Version updated to 0.18.3.
  • Upgraded OCR processing for better accuracy and performance.

Bumps every surface to 0.18.3. Two headline changes: the DOM-render screenshot
path is now a dependency-free native renderer (html-to-image removed), and OCR
is reworked to be native-first with a bundled cross-platform engine (ocrad.js
removed, Tesseract.js added).

Screenshots — native DOM renderer
- The default DOM-render screenshot hung for the full CLI timeout on any
  backgrounded tab: html-to-image resolved its image load inside
  requestAnimationFrame, which Chrome suspends for hidden tabs, so the render
  never completed. Replaced the html-to-image dependency with a direct native
  renderer: getComputedStyle -> inline cssText, XMLSerializer ->
  <svg><foreignObject>, Image/decode, Canvas, and FileReader/fetch for
  embedding images, canvas snapshots, and background-image resources.
- Works fully backgrounded with no focus, and is dramatically faster on large
  pages: inline <svg> subtrees are deep-cloned wholesale instead of having
  styles inlined per descendant, so SVG-heavy pages (e.g. a full Wikipedia
  article, ~11k px tall, ~2.6k inline icons) render in ~7s on a hidden tab
  instead of timing out.
- A DOM-render timeout guard now fails fast with a clear error instead of a
  silent 45s hang.
- html-to-image, its vendored runner, and its patches are removed entirely.

OCR — native-first + cross-platform
- `canvas ocr` now returns the canvas's native accessible text (aria-label /
  aria-labelledby / fallback subtree / figcaption) plus the page's semantic
  textbox model, instead of low-quality pixel OCR. ocrad.js — an unmaintained
  pixel-OCR blob — is removed.
- New `interceptor ocr <selector|element|region>`: renders the target via the
  native path and OCRs it with a bundled Tesseract.js engine. Offline,
  cross-platform, no native bridge required; returns a deterministic text
  string with a confidence score. `canvas ocr` falls back to it for pixel-only
  canvases. The WASM core, worker, and English language data are bundled and
  loaded from extension-local URLs (works on any page CSP; wasm-unsafe-eval
  added to the extension CSP).

Bump all surfaces to 0.18.3. Suite 484/0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Replaces the html-to-image vendor library with a native SVG-foreignObject DOM-cloning renderer in the content script. Swaps the ocrad.js OCR engine for tesseract.js with extension-local WASM assets. Adds a new top-level ocr CLI command and background action. Changes canvas OCR to prefer accessibility text (ARIA/fallback subtree/figcaption) before falling back to pixel OCR. Bumps to v0.18.3.

Changes

Native DOM Renderer + Tesseract OCR Pipeline

Layer / File(s) Summary
Native DOM renderer replaces html-to-image
extension/src/content/dom-screenshot.ts, extension/dist-mv2/content.js, extension/src/background/capabilities/screenshot.ts, extension/dist-mv2/background-electron.js, package.json, scripts/build.sh
Removes the screenshot-runner.js injection of html-to-image and replaces the rendering pipeline with a native clone + inline-styles + SVG foreignObject rasterizer. handleDomScreenshot is rewritten to use nativeRenderToDataUrl with mode-aware dimensions and optional region cropping via cropDataUrl.
Tesseract.js OCR engine replaces ocrad.js
extension/src/offscreen.ts, extension/manifest.json, scripts/build.sh, package.json
offscreen.ts gains a lazily-initialized Tesseract.js worker loading WASM/traineddata from bundled tesseract/ extension assets. ocrImage is rewritten to use the worker, returning normalized text with confidence. ocrad.js and html-to-image are removed from dependencies. Manifest gains wasm-unsafe-eval CSP and build bundles Tesseract assets.
Canvas OCR: accessibility-first, pixel OCR fallback
extension/src/background/capabilities/canvas.ts, extension/dist-mv2/background-electron.js, .agents/skills/interceptor-browser/references/command-catalog.md
canvasAccessibleText() aggregates aria-label, aria-labelledby, fallback subtree textContent, figcaption, aria-describedby, and title for a canvas element. The canvas_ocr handler now selects accessibility text first, then semantic textbox text, and only runs pixel OCR as a last resort. Diagnostics fields and docs updated accordingly.
New ocr background action + DOM-render timeout guard
extension/src/background/capabilities/screenshot.ts, extension/src/background/router.ts, extension/dist-mv2/background-electron.js
handleOcr renders PNG via handleDomRenderScreenshot, sends the data URL to the Tesseract offscreen worker, and returns text with confidence/dimensions. The DOM-render path is wrapped in withCaptureTimeout(DOM_RENDER_TIMEOUT_MS=30s). SCREENSHOT_ACTIONS and the dispatcher are updated to route "ocr".
CLI ocr command, transport timeouts, and routing
cli/commands/screenshot.ts, cli/help.ts, cli/index.ts, cli/transport.ts
parseScreenshotCommand gains an "ocr" case parsing selector/ref/region/scale/target-max-long-edge. CLI routing adds "ocr" to known commands and introduces unwrapResult. Help text adds interceptor ocr entries and updates the canvas ocr description. ACTION_TIMEOUT_OVERRIDES_MS sets canvas_ocr and ocr to 60s.
Tests, version bumps, and manifest
test/screenshot-minimized-preflight.test.ts, cli/version.ts, package.json, extension/dist-mv2/manifest.json, extension/manifest.json
Preflight tests switch from tracking scripting.executeScript injection to tracking sendMessage dispatch. Minimized-window case asserts zero sendMessage calls; non-minimized asserts dispatch proceeds. All versions bumped to 0.18.3.

Sequence Diagram(s)

sequenceDiagram
  participant CLI
  participant Background as screenshot.ts
  participant Content as dom-screenshot.ts
  participant Offscreen as offscreen.ts / Tesseract

  rect rgba(100, 149, 237, 0.5)
    note over CLI,Offscreen: New top-level OCR flow
    CLI->>Background: action { type: "ocr", selector/region }
    Background->>Content: sendToContentScript dom_screenshot (PNG, withCaptureTimeout 30s)
    Content->>Content: clone DOM, inline styles, fetch resources as data URLs
    Content->>Content: serialize SVG foreignObject → Image.decode() → canvas.toDataURL()
    Content-->>Background: { dataUrl, width, height }
    Background->>Offscreen: { type: "ocr", dataUrl }
    Offscreen->>Offscreen: getOcrWorker() lazy-init Tesseract WASM
    Offscreen-->>Background: { success: true, text, confidence }
    Background-->>CLI: { text, source, confidence, width, height }
  end
Loading
flowchart LR
  A[canvas_ocr action] --> B[canvasAccessibleText]
  B --> C{ARIA / figcaption found?}
  C -- yes --> G[return accessibility text]
  C -- no --> D[hostCanvasSignals semantic textbox]
  D --> E{semantic text found?}
  E -- yes --> H[return semantic text]
  E -- no --> F[canvas_read + Tesseract pixel OCR]
  F --> I[return OCR text or no-text hint]
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Hacker-Valley-Media/Interceptor#44: Introduced the original DOM-render screenshot pipeline (html-to-image runner injection) that this PR replaces wholesale.
  • Hacker-Valley-Media/Interceptor#97: Modified handleDomRenderScreenshot and the surrounding DOM-render flow in screenshot.ts, the same code path extended with the timeout guard and OCR handler here.

Poem

🐇 No more borrowing libraries from the shelf,
I draw the DOM myself, myself, myself!
Tesseract reads the pixels with a trained eye,
While ARIA text skips the OCR — oh my!
Canvas speaks first through its accessible voice,
Version 0.18.3 — what a wonderful choice! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: a native screenshot renderer replacing html-to-image and cross-platform OCR replacing ocrad.js, matching the core focus of the entire changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch release/0.18.3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ronaldeddings ronaldeddings merged commit 56040f7 into main Jun 20, 2026
1 of 2 checks passed
@ronaldeddings ronaldeddings deleted the release/0.18.3 branch June 20, 2026 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant