Skip to content

tbro/rakers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

109 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rakers

Crates.io docs.rs

A lightweight, single-binary JS renderer for JavaScript SPAs and SSR sites, where startup latency (milliseconds vs. 1-2 seconds) and memory footprint (~10 MB vs. ~300 MB) matter more than compatibility breadth.

rakers renders JavaScript into HTML. Give it an HTML file, a URL, or a bare JS script and it returns the post-execution HTML — including content rendered by React, Vue, Angular, Svelte, Preact, Mithril, Elm, Riot, and other JS frameworks.

Built on html5ever (Servo's HTML5 parser) with a choice of JS engine: QuickJS via rquickjs (default) or boa_engine (pure-Rust, no C compiler required).

Install

Pre-built binaries (recommended)

Download the latest release binary for your platform from the releases page:

Platform Binary
Linux x86-64 rakers-linux-x86_64
macOS Apple Silicon rakers-macos-aarch64
# Linux example
curl -L https://github.com/tbro/rakers/releases/latest/download/rakers-linux-x86_64 -o rakers
chmod +x rakers
sudo mv rakers /usr/local/bin/

Build from source

Requires Rust and a C compiler (for the default QuickJS engine).

cargo install --path .

For a pure-Rust build without a C compiler, use the boa engine instead:

cargo install --path . --no-default-features --features boa

Usage

rakers [OPTIONS] [INPUT]

INPUT is a file path, an http/https URL, or omit to read from stdin.

Input type Example
URL rakers https://example.com
HTML file rakers page.html
JS file rakers script.js
stdin echo '<script>document.write("hi")</script>' | rakers

By default output goes to stdout. Use -o to write to a file:

rakers https://example.com -o rendered.html

Options

Flag Description
-o FILE Write output to FILE instead of stdout
-A UA Set the User-Agent header for all HTTP requests
-H "Name: Value" Add a custom request header (repeatable)
--clean Strip <script> elements and unwrap <noscript> — see Clean mode
--pretty Format output HTML with two-space indentation
--json Emit {"raw_bytes":N,"rendered_bytes":N,"html":"..."} instead of bare HTML
--diff Show a unified diff of raw vs rendered HTML (both sides are pretty-printed first)
--selector SELECTOR Filter output to elements matching a CSS selector; multiple matches are newline-separated
--max-scripts N Limit the number of remote <script src> fetches (inline scripts are not counted)
--timeout SECS Per-script wall-clock timeout in seconds; fractions allowed (e.g. 0.5). Default: 30
--no-timeout Remove the per-script timeout entirely (conflicts with --timeout)
--verbose Print informational messages to stderr: [fetch], [skip], [console], [module-shim]
--forward-headers Forward custom -H headers on XHR requests made by page scripts (off by default — see note below)

Note on custom headers: Headers passed via -H (including Authorization, cookies, and API keys) are sent on the page fetch and any external script fetches, but are not forwarded on XHR requests the page's JavaScript initiates. Use --forward-headers to opt in to forwarding them on XHR too — but avoid this when rendering untrusted HTML, as page scripts could trigger XHR requests to arbitrary cross-origin destinations carrying your credentials.

How it works

  1. Fetches the page (or reads from file/stdin)
  2. Parses HTML with html5ever into a DOM tree
  3. Collects <script> tags — inline and external (src="...") — and fetches any external scripts
    • External scripts that open with import/export (ES module files requiring a module loader) are automatically skipped; self-contained bundles tagged type="module" still execute
    • Cloudflare Rocket Loader (type="<hash>-text/javascript") is recognized and executed
  4. Executes all scripts in order in a sandboxed JS context with browser globals stubbed out
  5. Flushes any deferred callbacks (setTimeout, requestAnimationFrame, MessageChannel, queueMicrotask) so async-rendered frameworks have a chance to run
  6. Reads back document.body.innerHTML and serializes the final HTML
    • Large server-rendered bodies (SSR sites) are preserved when the JS-rendered body is substantially smaller, avoiding measurement/analytics divs from clobbering real content

.js files are automatically wrapped in a minimal HTML document before processing.

console.log, console.warn, and console.error print to stderr with a [console] prefix when --verbose is set. Script errors are non-fatal — execution continues with the next script.

Output modes

--pretty

Pretty-print the rendered HTML with two-space indentation. Block elements each start on their own line; inline elements and their content stay together.

rakers --pretty https://example.com

--json

Emit a JSON object useful for scripting and size comparisons:

{"raw_bytes":645,"rendered_bytes":4210,"html":"<html>..."}

--json and --pretty can be combined — the HTML field will contain pretty-printed, JSON-escaped HTML.

--diff

Show a unified diff of the raw vs rendered HTML. Both sides are pretty-printed before diffing for a readable result:

rakers --diff https://example.com/spa

--selector

Extract specific elements from the rendered output using a CSS selector. All matching elements are printed, newline-separated:

# Extract just the article elements from a news site
rakers --selector "article" https://example.com

# Combine with --pretty for readable output
rakers --selector "#root" --pretty https://example.com/spa

Returns an empty string (exit 0) when no elements match. Returns an error for an invalid selector.

Clean mode

--clean applies a post-processing pass that produces a static, crawlable snapshot — similar to what prerendering services (Prerender.io, rendertron) deliver to search-engine bots:

  • Removes all <script> elements (inline and external)
  • Removes <link rel="modulepreload"> and <link rel="preload" as="script">
  • Unwraps <noscript> — strips the tags but keeps the inner content, so crawlers see any fallback markup (meta redirects, image links, etc.)
rakers --clean https://example.com -o static.html

The output is self-contained HTML with no executable code — safe to serve directly to crawlers or store as a static snapshot.

JS engine choice

rakers supports two JS engines selectable at compile time.

rquickjs (default) boa
Build deps Requires a C compiler Pure Rust, no C compiler
ES standard ES2023 ES2021 (partial)
Real-world bundles Good Limited — may stack-overflow on large bundles
React / Vue SPAs Works Often hits stack limits
When to use Real-world sites (default) CI without C toolchain

Building

# rquickjs (default — recommended)
cargo build
cargo install --path .

# boa (pure Rust, no C compiler needed)
cargo build --no-default-features --features boa
cargo install --path . --no-default-features --features boa

Only one engine can be enabled at a time; the build will fail with a clear error if both or neither are selected.

Running tests

Unit tests run with either engine:

cargo test                                       # rquickjs (default)
cargo test --no-default-features --features boa  # boa

Integration tests that fetch real SPAs require rquickjs (boa overflows the native stack on large React/Rocket Loader bundles):

cargo test --test integration

Browser environment

The following globals are stubbed so typical JS bundles run without errors:

  • documentcreateElement, getElementById, querySelector / querySelectorAll (including compound comma-separated selectors and script[type="X"] queries), body, head, currentScript, and the full DOM manipulation API (appendChild with move semantics, insertBefore, removeChild, setAttribute, innerHTML, firstChild, lastChild, childNodes, etc.)
  • window.location — all fields (href, pathname, hostname, protocol, host, port, search, hash, origin) are parsed from the page URL; setting hash fires onhashchange via the deferred-callback queue; assign, replace, and reload are no-ops
  • window.historypushState and replaceState update history.state; navigation methods are no-ops
  • windownavigator, screen, performance, localStorage, sessionStorage, matchMedia, getComputedStyle, and all standard event/observer constructors
  • URL / URLSearchParams — relative URL resolution against the page URL; searchParams with full get/set/has
  • fetch — returns Promise.resolve(response) with an empty 200 OK body; .then() chains run, apps don't crash, but no data is loaded
  • XMLHttpRequest — synchronous mode (open(method, url, false)) fetches via the same HTTP client as the main page fetch; async mode schedules onload / onreadystatechange callbacks with the real response body; they fire during the deferred-callback flush pass
  • Script injection — dynamically appended <script> elements are executed: inline child.text is evaled directly; elements with a src attribute are fetched via the same blocking HTTP client as the page load and then evaled. Supports compilers (e.g. Riot 2.x) that register components and frameworks that load chunked scripts at runtime.
  • DOMException / customElements — Web Components registry and DOM exception constructor
  • process — Node.js-style globals for webpack/Vite bundler compatibility
  • TimerssetTimeout, setInterval, requestAnimationFrame, queueMicrotask, and MessageChannel callbacks are collected and flushed after scripts finish
  • import() — dynamic imports return Promise.resolve({}) (a stub module); .then() chains run but no real module is loaded

Comparison

rakers headless Chrome Playwright / Puppeteer Splash
JS compatibility Good (QuickJS / ES2023) Full Full Full (WebKit)
Requires browser No Yes Yes Yes (via Docker)
Startup time ~10 ms ~1–2 s ~1–2 s ~500 ms
Memory ~10 MB ~150–300 MB ~150–300 MB ~200 MB
Network calls from JS No (stubbed) Yes Yes Yes
CSS / layout No Yes Yes Yes
Embeddable as library Yes (Rust crate) No No No
Installation Single binary Chrome + chromedriver Browser + Node Docker image
Language Rust Any JS / many bindings Python / Lua

When to use rakers — fast HTML extraction in a scraping pipeline, CI environments without a browser, embedding in a Rust service, or anywhere startup latency and memory footprint matter more than pixel-perfect rendering.

When to use a headless browser — pages that rely on CSS-driven layout, canvas, WebGL, WebSockets, or JavaScript that makes authenticated network requests during render.

Demo

TodoMVC React is the canonical demo. The server returns a 645-byte skeleton:

<section class="todoapp" id="root"></section>

rakers executes the React bundle and returns the fully rendered app:

<div id="root">
  <header class="header">
    <h1>todos</h1>
    <div class="input-container">
      <input class="new-todo" type="text">
      ...
    </div>
  </header>
  ...
</div>

Compatibility

Tested against real-world sites with rquickjs:

Site Framework Result
react.dev Next.js (SSR) ✓ no errors
svelte.dev SvelteKit (SSR) ✓ no errors
vuejs.org Vite (SSR) ✓ no errors
tailwindcss.com Next.js (SSR) ✓ no errors
remix.run Remix (SSR) ✓ no errors
jsbench.me React SPA ✓ full render
babylonbee.com Cloudflare Rocket Loader ✓ articles intact
linear.app Next.js ✓ renders (1 minor error)
github.com Custom SSR ✓ renders (4 minor errors)

TodoMVC sweep

19 of 20 TodoMVC examples render correctly. The sweep runs automatically on every push via the todomvc-compat CI job.

Framework Result
React ✓ full render
React + Redux ✓ full render
Vue ✓ full render
Preact ✓ full render
Svelte ✓ full render
Angular ✓ full render
Mithril ✓ full render
Elm ✓ full render
Riot ✓ template rendered
Ember ✓ app shell rendered
Backbone, KnockoutJS, jQuery, Dojo, Aurelia, Backbone Marionette, Vanilla ES5/ES6, Web Components ✓ prerendered content preserved
Lit ✗ native ES-module bundle (no IIFE fallback) — needs a full module loader

About

A lightweight, single-binary JS renderer for JavaScript SPAs and SSR sites, where startup latency (milliseconds vs. 1-2 seconds) and memory footprint (~10 MB vs. ~300 MB) matter more than compatibility breadth.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors