Skip to content

Add config validation and a test suite#39

Merged
jeremyfelt merged 17 commits into
trunkfrom
task/tests-and-validation
Jun 3, 2026
Merged

Add config validation and a test suite#39
jeremyfelt merged 17 commits into
trunkfrom
task/tests-and-validation

Conversation

@jeremyfelt
Copy link
Copy Markdown
Member

Closes #18. Closes #19.

#19 — Configuration validation

Validates config up front in src/config.mjs so problems surface with an actionable message instead of failing deep inside a Playwright capture:

  • Viewports (validateViewports): each must be an object with a non-empty name and positive-integer width/height. A typo'd viewport previously flowed untouched into page.setViewportSize() and threw an opaque error mid-capture.
  • Domain (normalizeDomain): rejects empty/non-string domains and validates the result with new URL() so a malformed domain is caught before navigation.

Scoped intentionally: the original issue's JSON-Schema machinery is overkill for this config shape, and network/permission errors are already handled (capture retries and logs per-target; mkdirSync throws clearly). Those are noted on the issue rather than built.

#18 — Test suite (docs were already done)

Adds a node:test suite (zero new deps, matches the all-ESM setup) — npm test runs node --test:

  • test/config.test.mjs — domain normalization, URL building, viewport validation, and loadConfig (missing file, bad JSON, empty paths, domain override, viewport defaults, pixelmatch merge).
  • test/control.test.mjs — the control move semantics (captures are moved, not copied), missing-capture skips, and the --only filter.
  • test/compare.test.mjspadImage and compareSlug (now exported) against in-memory PNGs: identical → 0%, opposite → 100%, height-mismatch padding, and the missing-control / width-mismatch skips.

Tier 3 (Playwright capture()) is intentionally not automated — it needs a live server + browser and would be flaky. CI runs lint + tests on Node 20 and 22 via .github/workflows/test.yml, with PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD set since the unit tests don't need Chromium.

All 33 tests pass; lint is clean (the only remaining warnings are pre-existing console statements).

🤖 Generated with Claude Code

jeremyfelt and others added 13 commits June 3, 2026 11:43
Addresses #19 by validating viewports (positive-integer dimensions, a
required name) and domain format in src/config.mjs, surfacing actionable
errors before they fail deep inside a Playwright capture.

Addresses #18 by adding a node:test suite covering config normalization
and validation, the control move semantics, and the compare diff math
(padImage/compareSlug, now exported), plus a Lint-and-test GitHub Actions
workflow on Node 20 and 22.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`--concurrency=0` previously flowed into the capture batching loop as the
step size, so `i += 0` never advanced and capture hung forever with no
error; negatives ran the loop backward to the same effect, and `parseInt`
silently truncated `8x` to 8.

Extract flag parsing into `src/args.mjs` as a testable `toInt` that uses
`Number` (rejecting trailing garbage) and enforces a minimum — 1 for
concurrency, 0 for stagger (which legitimately disables staggering). It
throws an actionable error caught by the CLI's top-level handler instead
of calling process.exit, which makes it unit-testable. Also clamp the
batch step to a positive value as defense in depth.

Fixes HIGH-004.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`capture` already guarded against a `--only` filter that matched nothing,
but `control` and `compare` applied the same filter with no guard: a typo'd
key (`reglance control blgo`) silently filtered the target list to empty and
printed a success-looking summary ("Promoted 0 capture(s)" / "Nothing to
compare").

Factor the filter into a shared `filterTargets` helper in config.mjs that
throws — listing the known keys to help spot the typo — and use it in all
three commands. capture's behavior is now consistent (a non-zero exit via
the CLI's error handler instead of a swallowed exit 0).

Fixes MED-004.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When the retry loop exhausted its attempts it set `success = true`, logged
"continuing anyway", and screenshotted whatever was on screen — an error
page, a blank page, a half-loaded page — with no signal. The empty
`catch {}` discarded the navigation error, and `capture` printed
"✅ Capture complete" and exited 0 regardless. A silently-captured error
page then becomes a baseline via `control`, defeating the tool's core
promise, with CI staying green.

Per the D-004 decision (permissive + loud):
- The retry loop now records the real error and pushes the slug to a
  `failures` list instead of faking success; best-effort screenshotting is
  kept but the slug is marked degraded.
- The outer catch records which slug failed rather than swallowing it.
- `capture` prints a loud summary of every degraded slug and tells the
  operator to review before running `control`.
- A new `--fail-on-degraded` flag opts into a non-zero exit so CI can gate
  on it; the default still exits 0 to preserve partial/best-effort runs.

The exit-policy decision is factored into a pure, unit-tested
`shouldFailRun` helper; the browser-driven path is covered by manual
verification (capture against an unreachable domain).

Fixes HIGH-002.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
control moved whatever captures existed and printed "Promoted N", never
revealing that the controls for missing slugs were left untouched — so a
partial capture (a failed target, or a --only run) silently produced a
baseline mixing captures from different eras, which compare then trusted.

Per the D-004 decision (permissive + loud, with a manifest):
- control now reports "Promoted N of M" and warns when N < M that the
  untouched controls are now stale relative to the freshly promoted ones,
  and returns { moved, expected }.
- Each promotion is recorded in controls/manifest.json (new src/manifest.mjs)
  with a shared per-run timestamp.
- compare reads the manifest and warns when the baseline mixes controls from
  more than one control run, naming the oldest slug.

Also fixes LOW-015: an orphan HTML snapshot (HTML present, PNG absent from a
failed prior run) is now removed rather than left to mispair with a future
PNG.

Fixes HIGH-003, LOW-015.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When the control and capture differed in width, compareSlug warned and
returned null, omitting the slug entirely — no row, no diff image, no entry
in the comparison count. A width change is itself a visual regression (a
collapsed layout, an appearing/disappearing scrollbar), so the report showed
"all clear" for exactly the pages that changed most, with only a console
warning that scrolls past.

Generalize padImage to pad onto a common width-and-height canvas (returning
the image untouched when it already fits, which also avoids a needless copy),
and pad both images that way before pixelmatch. A width change now surfaces
as a large diff and the slug appears in the report. The test that asserted a
width mismatch returns null is rewritten to assert the new contract.

Fixes MED-005.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
compare processed every (target × viewport) pair in a single serial loop,
each iteration decoding two PNGs and running pixelmatch synchronously on the
main thread — so the whole run was single-core and the event loop was blocked
start to finish, while capture (next to it) already ran in parallel.

- Run each comparison in a worker_threads pool (src/compare-worker.mjs) that
  reuses workers off a shared queue, so diffs spread across cores, the main
  thread stays responsive, and a slow slug doesn't gate the others
  (HIGH-005).
- Bound peak memory by capping the pool: each in-flight comparison holds
  several large RGBA buffers, so concurrency defaults to CPU-count − 1 and is
  tunable via --compare-concurrency for very tall pages / constrained runners.
  padImage now returns the image untouched when no padding is needed, dropping
  a redundant full-buffer copy (HIGH-006).
- Guard the HTML line-diff: skip snapshots over 2MB (which otherwise grind
  through diffLines for no useful result) and build the diff via array-join
  instead of repeated string `+=` (MED-008).

A full dimension-aware memory budget was left out as a deliberate
simplification; the configurable cap plus reduced allocation is the pragmatic
bound. An end-to-end compare() test exercises the pool.

Fixes HIGH-005, HIGH-006, MED-008.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Capture ran targets in fixed batches joined by `Promise.all`, with two costs:
the stagger delay was `j * staggerDelay` (an absolute delay from batch start),
so the last context in each batch waited `(concurrency-1) * stagger` before
starting — raising concurrency to go faster grew that tail and re-burst the
herd at every batch boundary; and `Promise.all` waited for the whole batch, so
one slow target (a page hitting the timeout × retries) blocked the finished
contexts from picking up new work.

Use a queue-based pool instead: `concurrency` workers each pull the next
target as soon as they finish, so throughput is bounded by total work over
concurrency rather than by the slowest target per batch, and the stagger is
applied once per worker at startup. Failure collection is unchanged.

Fixes MED-007.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Each capture stacked three overlapping waits: goto with waitUntil
'networkidle' (already idle-bounded), a fixed-cadence auto-scroll (100ms per
500px step — ~3s to scroll a 15000px page regardless of load speed), and a
hand-rolled busy-poll waitForNetworkIdle that re-confirmed idle with a 3s cap
shorter than the 15s goto timeout (so lazy assets could be cut off).

Reconcile MED-006 (drop the redundant double wait, speed the scroll) and
LOW-013 (the post-scroll cap is too short and not configurable) into one
change:
- autoScroll now scrolls to the bottom and repeats only until the page height
  stabilizes (50ms between checks, capped at 100 iterations), so short pages
  settle almost instantly and tall ones keep going as content loads.
- Replace the busy-poll with a single event-driven
  page.waitForLoadState('networkidle') after scrolling, bounded by a
  configurable timeout and best-effort (a never-idle page still screenshots).
- goto and settle timeouts are now configurable via a `timeouts` block in
  reglance.json (defaults: goto 15000, settle 8000 — up from the old 3000 cap).

Fixes MED-006, LOW-013.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Capture launched Chromium with --ignore-certificate-errors unconditionally,
so TLS validation was off for every run — including when pointed at a public
HTTPS host, where a swapped or expired cert would be captured silently.
(The ignoreHTTPSErrors flag was also passed to launch(), where it has no
effect; it belongs on the browser context.)

Per the D-002 decision: ignore certificate errors only for local development
hosts (localhost, 127.0.0.1, ::1, and .test/.local/.localhost domains), where
self-signed certs are normal. For any other host, validation stays on unless
the new --insecure flag is passed, and capture warns when it disables checks
for a non-local host. ignoreHTTPSErrors is now correctly applied on
newContext.

Fixes LOW-002.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LOW-012: capture and compare wrote artifacts with no handling for write
failures, so a full disk or read-only output surfaced as an opaque error
(or, in capture, was swallowed). ensureOutputDir now fails fast with a clear
message when the output directory is not writable, and compare wraps its diff
and html-diff writes to report the path and error code (EACCES/ENOSPC/...)
instead of a bare stack.

INFO-001: a `paths` value that is an absolute URL bypasses the configured
domain and is navigated as-is. Per the D-001 decision (treat config as
potentially untrusted) this stays supported but is no longer silent — capture
now warns which path keys point off the configured domain, so off-domain
(and potentially internal-network) capture is a conscious choice. The hosts
are not blocked, since local development legitimately uses loopback/.test
addresses; the documentation note covers the remaining surface.

Fixes LOW-012, INFO-001.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The postinstall ran `playwright install chromium` directly. Because reglance
is a dev dependency, that hook runs in every consuming install, and a failed
download (air-gapped runner, blocked CDN, proxy) aborted the consumer's entire
`npm ci`. The PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD env var set in CI did nothing,
because the explicit `playwright install` command ignores it — so CI also
re-downloaded Chromium on every leg.

- Move the hook to scripts/postinstall.mjs (added to the published `files`):
  it returns early when PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD is set, otherwise runs
  the install and, on failure, prints an actionable message but always exits 0
  so it can never break a consumer's install.
- capture now catches a missing-browser launch error and tells the user to run
  `npx playwright install chromium`, instead of a raw Playwright stack.
- Pin playwright to an exact version so the installed browser revision is
  deterministic (Dependabot already bumps it).

The CI skip flag in test.yml is now meaningful (the comment is finally true).

Fixes HIGH-001.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update the README and example config for the changes in this branch:
new --fail-on-degraded / --insecure / --compare-concurrency flags, the
configurable `timeouts` block, the off-domain path warning, and the
degraded-capture and partial-promotion (manifest) safeguards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jeremyfelt and others added 4 commits June 3, 2026 13:08
Node 20 is approaching end-of-life and Node 24 is the current line; test
against 22 (active LTS) and 24. engines stays at >=20 — nothing here requires
a newer minimum.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The multi-persona code review writes its artifacts under .reviews/; keep them
out of git like the other generated output directories.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fix audit findings: 6 high + 5 medium + 4 low/info
@jeremyfelt jeremyfelt merged commit b8e9045 into trunk Jun 3, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement configuration validation and error handling Add comprehensive testing and documentation

1 participant