Release 0.18.3 — native screenshot renderer + cross-platform OCR#116
Conversation
Bumps every surface to 0.18.3. Two headline changes: the DOM-render screenshot path is now a dependency-free native renderer (html-to-image removed), and OCR is reworked to be native-first with a bundled cross-platform engine (ocrad.js removed, Tesseract.js added). Screenshots — native DOM renderer - The default DOM-render screenshot hung for the full CLI timeout on any backgrounded tab: html-to-image resolved its image load inside requestAnimationFrame, which Chrome suspends for hidden tabs, so the render never completed. Replaced the html-to-image dependency with a direct native renderer: getComputedStyle -> inline cssText, XMLSerializer -> <svg><foreignObject>, Image/decode, Canvas, and FileReader/fetch for embedding images, canvas snapshots, and background-image resources. - Works fully backgrounded with no focus, and is dramatically faster on large pages: inline <svg> subtrees are deep-cloned wholesale instead of having styles inlined per descendant, so SVG-heavy pages (e.g. a full Wikipedia article, ~11k px tall, ~2.6k inline icons) render in ~7s on a hidden tab instead of timing out. - A DOM-render timeout guard now fails fast with a clear error instead of a silent 45s hang. - html-to-image, its vendored runner, and its patches are removed entirely. OCR — native-first + cross-platform - `canvas ocr` now returns the canvas's native accessible text (aria-label / aria-labelledby / fallback subtree / figcaption) plus the page's semantic textbox model, instead of low-quality pixel OCR. ocrad.js — an unmaintained pixel-OCR blob — is removed. - New `interceptor ocr <selector|element|region>`: renders the target via the native path and OCRs it with a bundled Tesseract.js engine. Offline, cross-platform, no native bridge required; returns a deterministic text string with a confidence score. `canvas ocr` falls back to it for pixel-only canvases. The WASM core, worker, and English language data are bundled and loaded from extension-local URLs (works on any page CSP; wasm-unsafe-eval added to the extension CSP). Bump all surfaces to 0.18.3. Suite 484/0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughReplaces the ChangesNative DOM Renderer + Tesseract OCR Pipeline
Sequence Diagram(s)sequenceDiagram
participant CLI
participant Background as screenshot.ts
participant Content as dom-screenshot.ts
participant Offscreen as offscreen.ts / Tesseract
rect rgba(100, 149, 237, 0.5)
note over CLI,Offscreen: New top-level OCR flow
CLI->>Background: action { type: "ocr", selector/region }
Background->>Content: sendToContentScript dom_screenshot (PNG, withCaptureTimeout 30s)
Content->>Content: clone DOM, inline styles, fetch resources as data URLs
Content->>Content: serialize SVG foreignObject → Image.decode() → canvas.toDataURL()
Content-->>Background: { dataUrl, width, height }
Background->>Offscreen: { type: "ocr", dataUrl }
Offscreen->>Offscreen: getOcrWorker() lazy-init Tesseract WASM
Offscreen-->>Background: { success: true, text, confidence }
Background-->>CLI: { text, source, confidence, width, height }
end
flowchart LR
A[canvas_ocr action] --> B[canvasAccessibleText]
B --> C{ARIA / figcaption found?}
C -- yes --> G[return accessibility text]
C -- no --> D[hostCanvasSignals semantic textbox]
D --> E{semantic text found?}
E -- yes --> H[return semantic text]
E -- no --> F[canvas_read + Tesseract pixel OCR]
F --> I[return OCR text or no-text hint]
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Release 0.18.3 — native screenshot renderer + cross-platform OCR
Bumps every surface to 0.18.3. Two headline changes.
Screenshots — native DOM renderer (html-to-image removed)
The default DOM-render screenshot hung for the full CLI timeout on any backgrounded tab — html-to-image resolved its image load inside
requestAnimationFrame, which Chrome suspends for hidden tabs, so the render never completed. Replaced the dependency with a direct native renderer (getComputedStyle→ inlinecssText,XMLSerializer→<svg><foreignObject>,Image/decode,Canvas,FileReader/fetchfor embedding images, canvas snapshots, and background-image resources).<svg>subtrees are deep-cloned wholesale instead of inlining styles per descendant, so SVG-heavy pages (a full Wikipedia article, ~11k px, ~2.6k icons) render in ~7s on a hidden tab instead of timing out.OCR — native-first + cross-platform (ocrad.js removed, Tesseract.js added)
canvas ocrnow returns the canvas's native accessible text (aria + fallback subtree + figcaption) plus the page's semantic textbox model, instead of low-quality pixel OCR. ocrad.js (an unmaintained pixel-OCR blob) is removed.interceptor ocr <selector|element|region>: renders the target natively and OCRs it with a bundled Tesseract.js engine — offline, cross-platform, no native bridge required; returns a deterministic text string with a confidence score.canvas ocrfalls back to it for pixel-only canvases. WASM core + worker + English data are bundled and loaded from extension-local URLs (wasm-unsafe-evaladded to the extension CSP).Notes
+841 / −1218(removed the html-to-image runner and ocrad).🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Chores