Add config validation and a test suite by jeremyfelt · Pull Request #39 · happyprime/reglance

jeremyfelt · 2026-06-03T18:43:58Z

Closes #18. Closes #19.

#19 — Configuration validation

Validates config up front in src/config.mjs so problems surface with an actionable message instead of failing deep inside a Playwright capture:

Viewports (validateViewports): each must be an object with a non-empty name and positive-integer width/height. A typo'd viewport previously flowed untouched into page.setViewportSize() and threw an opaque error mid-capture.
Domain (normalizeDomain): rejects empty/non-string domains and validates the result with new URL() so a malformed domain is caught before navigation.

Scoped intentionally: the original issue's JSON-Schema machinery is overkill for this config shape, and network/permission errors are already handled (capture retries and logs per-target; mkdirSync throws clearly). Those are noted on the issue rather than built.

#18 — Test suite (docs were already done)

Adds a node:test suite (zero new deps, matches the all-ESM setup) — npm test runs node --test:

test/config.test.mjs — domain normalization, URL building, viewport validation, and loadConfig (missing file, bad JSON, empty paths, domain override, viewport defaults, pixelmatch merge).
test/control.test.mjs — the control move semantics (captures are moved, not copied), missing-capture skips, and the --only filter.
test/compare.test.mjs — padImage and compareSlug (now exported) against in-memory PNGs: identical → 0%, opposite → 100%, height-mismatch padding, and the missing-control / width-mismatch skips.

Tier 3 (Playwright capture()) is intentionally not automated — it needs a live server + browser and would be flaky. CI runs lint + tests on Node 20 and 22 via .github/workflows/test.yml, with PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD set since the unit tests don't need Chromium.

All 33 tests pass; lint is clean (the only remaining warnings are pre-existing console statements).

🤖 Generated with Claude Code

Addresses #19 by validating viewports (positive-integer dimensions, a required name) and domain format in src/config.mjs, surfacing actionable errors before they fail deep inside a Playwright capture. Addresses #18 by adding a node:test suite covering config normalization and validation, the control move semantics, and the compare diff math (padImage/compareSlug, now exported), plus a Lint-and-test GitHub Actions workflow on Node 20 and 22. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`--concurrency=0` previously flowed into the capture batching loop as the step size, so `i += 0` never advanced and capture hung forever with no error; negatives ran the loop backward to the same effect, and `parseInt` silently truncated `8x` to 8. Extract flag parsing into `src/args.mjs` as a testable `toInt` that uses `Number` (rejecting trailing garbage) and enforces a minimum — 1 for concurrency, 0 for stagger (which legitimately disables staggering). It throws an actionable error caught by the CLI's top-level handler instead of calling process.exit, which makes it unit-testable. Also clamp the batch step to a positive value as defense in depth. Fixes HIGH-004. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`capture` already guarded against a `--only` filter that matched nothing, but `control` and `compare` applied the same filter with no guard: a typo'd key (`reglance control blgo`) silently filtered the target list to empty and printed a success-looking summary ("Promoted 0 capture(s)" / "Nothing to compare"). Factor the filter into a shared `filterTargets` helper in config.mjs that throws — listing the known keys to help spot the typo — and use it in all three commands. capture's behavior is now consistent (a non-zero exit via the CLI's error handler instead of a swallowed exit 0). Fixes MED-004. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

When the retry loop exhausted its attempts it set `success = true`, logged "continuing anyway", and screenshotted whatever was on screen — an error page, a blank page, a half-loaded page — with no signal. The empty `catch {}` discarded the navigation error, and `capture` printed "✅ Capture complete" and exited 0 regardless. A silently-captured error page then becomes a baseline via `control`, defeating the tool's core promise, with CI staying green. Per the D-004 decision (permissive + loud): - The retry loop now records the real error and pushes the slug to a `failures` list instead of faking success; best-effort screenshotting is kept but the slug is marked degraded. - The outer catch records which slug failed rather than swallowing it. - `capture` prints a loud summary of every degraded slug and tells the operator to review before running `control`. - A new `--fail-on-degraded` flag opts into a non-zero exit so CI can gate on it; the default still exits 0 to preserve partial/best-effort runs. The exit-policy decision is factored into a pure, unit-tested `shouldFailRun` helper; the browser-driven path is covered by manual verification (capture against an unreachable domain). Fixes HIGH-002. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

control moved whatever captures existed and printed "Promoted N", never revealing that the controls for missing slugs were left untouched — so a partial capture (a failed target, or a --only run) silently produced a baseline mixing captures from different eras, which compare then trusted. Per the D-004 decision (permissive + loud, with a manifest): - control now reports "Promoted N of M" and warns when N < M that the untouched controls are now stale relative to the freshly promoted ones, and returns { moved, expected }. - Each promotion is recorded in controls/manifest.json (new src/manifest.mjs) with a shared per-run timestamp. - compare reads the manifest and warns when the baseline mixes controls from more than one control run, naming the oldest slug. Also fixes LOW-015: an orphan HTML snapshot (HTML present, PNG absent from a failed prior run) is now removed rather than left to mispair with a future PNG. Fixes HIGH-003, LOW-015. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

When the control and capture differed in width, compareSlug warned and returned null, omitting the slug entirely — no row, no diff image, no entry in the comparison count. A width change is itself a visual regression (a collapsed layout, an appearing/disappearing scrollbar), so the report showed "all clear" for exactly the pages that changed most, with only a console warning that scrolls past. Generalize padImage to pad onto a common width-and-height canvas (returning the image untouched when it already fits, which also avoids a needless copy), and pad both images that way before pixelmatch. A width change now surfaces as a large diff and the slug appears in the report. The test that asserted a width mismatch returns null is rewritten to assert the new contract. Fixes MED-005. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

compare processed every (target × viewport) pair in a single serial loop, each iteration decoding two PNGs and running pixelmatch synchronously on the main thread — so the whole run was single-core and the event loop was blocked start to finish, while capture (next to it) already ran in parallel. - Run each comparison in a worker_threads pool (src/compare-worker.mjs) that reuses workers off a shared queue, so diffs spread across cores, the main thread stays responsive, and a slow slug doesn't gate the others (HIGH-005). - Bound peak memory by capping the pool: each in-flight comparison holds several large RGBA buffers, so concurrency defaults to CPU-count − 1 and is tunable via --compare-concurrency for very tall pages / constrained runners. padImage now returns the image untouched when no padding is needed, dropping a redundant full-buffer copy (HIGH-006). - Guard the HTML line-diff: skip snapshots over 2MB (which otherwise grind through diffLines for no useful result) and build the diff via array-join instead of repeated string `+=` (MED-008). A full dimension-aware memory budget was left out as a deliberate simplification; the configurable cap plus reduced allocation is the pragmatic bound. An end-to-end compare() test exercises the pool. Fixes HIGH-005, HIGH-006, MED-008. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Capture ran targets in fixed batches joined by `Promise.all`, with two costs: the stagger delay was `j * staggerDelay` (an absolute delay from batch start), so the last context in each batch waited `(concurrency-1) * stagger` before starting — raising concurrency to go faster grew that tail and re-burst the herd at every batch boundary; and `Promise.all` waited for the whole batch, so one slow target (a page hitting the timeout × retries) blocked the finished contexts from picking up new work. Use a queue-based pool instead: `concurrency` workers each pull the next target as soon as they finish, so throughput is bounded by total work over concurrency rather than by the slowest target per batch, and the stagger is applied once per worker at startup. Failure collection is unchanged. Fixes MED-007. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Each capture stacked three overlapping waits: goto with waitUntil 'networkidle' (already idle-bounded), a fixed-cadence auto-scroll (100ms per 500px step — ~3s to scroll a 15000px page regardless of load speed), and a hand-rolled busy-poll waitForNetworkIdle that re-confirmed idle with a 3s cap shorter than the 15s goto timeout (so lazy assets could be cut off). Reconcile MED-006 (drop the redundant double wait, speed the scroll) and LOW-013 (the post-scroll cap is too short and not configurable) into one change: - autoScroll now scrolls to the bottom and repeats only until the page height stabilizes (50ms between checks, capped at 100 iterations), so short pages settle almost instantly and tall ones keep going as content loads. - Replace the busy-poll with a single event-driven page.waitForLoadState('networkidle') after scrolling, bounded by a configurable timeout and best-effort (a never-idle page still screenshots). - goto and settle timeouts are now configurable via a `timeouts` block in reglance.json (defaults: goto 15000, settle 8000 — up from the old 3000 cap). Fixes MED-006, LOW-013. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Capture launched Chromium with --ignore-certificate-errors unconditionally, so TLS validation was off for every run — including when pointed at a public HTTPS host, where a swapped or expired cert would be captured silently. (The ignoreHTTPSErrors flag was also passed to launch(), where it has no effect; it belongs on the browser context.) Per the D-002 decision: ignore certificate errors only for local development hosts (localhost, 127.0.0.1, ::1, and .test/.local/.localhost domains), where self-signed certs are normal. For any other host, validation stays on unless the new --insecure flag is passed, and capture warns when it disables checks for a non-local host. ignoreHTTPSErrors is now correctly applied on newContext. Fixes LOW-002. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

LOW-012: capture and compare wrote artifacts with no handling for write failures, so a full disk or read-only output surfaced as an opaque error (or, in capture, was swallowed). ensureOutputDir now fails fast with a clear message when the output directory is not writable, and compare wraps its diff and html-diff writes to report the path and error code (EACCES/ENOSPC/...) instead of a bare stack. INFO-001: a `paths` value that is an absolute URL bypasses the configured domain and is navigated as-is. Per the D-001 decision (treat config as potentially untrusted) this stays supported but is no longer silent — capture now warns which path keys point off the configured domain, so off-domain (and potentially internal-network) capture is a conscious choice. The hosts are not blocked, since local development legitimately uses loopback/.test addresses; the documentation note covers the remaining surface. Fixes LOW-012, INFO-001. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The postinstall ran `playwright install chromium` directly. Because reglance is a dev dependency, that hook runs in every consuming install, and a failed download (air-gapped runner, blocked CDN, proxy) aborted the consumer's entire `npm ci`. The PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD env var set in CI did nothing, because the explicit `playwright install` command ignores it — so CI also re-downloaded Chromium on every leg. - Move the hook to scripts/postinstall.mjs (added to the published `files`): it returns early when PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD is set, otherwise runs the install and, on failure, prints an actionable message but always exits 0 so it can never break a consumer's install. - capture now catches a missing-browser launch error and tells the user to run `npx playwright install chromium`, instead of a raw Playwright stack. - Pin playwright to an exact version so the installed browser revision is deterministic (Dependabot already bumps it). The CI skip flag in test.yml is now meaningful (the comment is finally true). Fixes HIGH-001. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Update the README and example config for the changes in this branch: new --fail-on-degraded / --insecure / --compare-concurrency flags, the configurable `timeouts` block, the off-domain path warning, and the degraded-capture and partial-promotion (manifest) safeguards. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Node 20 is approaching end-of-life and Node 24 is the current line; test against 22 (active LTS) and 24. engines stays at >=20 — nothing here requires a newer minimum. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The multi-persona code review writes its artifacts under .reviews/; keep them out of git like the other generated output directories. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Fix audit findings: 6 high + 5 medium + 4 low/info

jeremyfelt and others added 13 commits June 3, 2026 11:43

jeremyfelt mentioned this pull request Jun 3, 2026

Fix audit findings: 6 high + 5 medium + 4 low/info #40

Merged

jeremyfelt and others added 4 commits June 3, 2026 13:08

Test on Node 22 and 24 instead of 20 and 22

671916d

Node 20 is approaching end-of-life and Node 24 is the current line; test against 22 (active LTS) and 24. engines stays at >=20 — nothing here requires a newer minimum. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge branch 'task/tests-and-validation' into task/audit-fixes

1f6f1cb

Ignore the .reviews/ code-review output directory

380237e

The multi-persona code review writes its artifacts under .reviews/; keep them out of git like the other generated output directories. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge pull request #40 from happyprime/task/audit-fixes

6d38731

Fix audit findings: 6 high + 5 medium + 4 low/info

jeremyfelt merged commit b8e9045 into trunk Jun 3, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add config validation and a test suite#39

Add config validation and a test suite#39
jeremyfelt merged 17 commits into
trunkfrom
task/tests-and-validation

jeremyfelt commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jeremyfelt commented Jun 3, 2026

#19 — Configuration validation

#18 — Test suite (docs were already done)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant