Harden visual-bug verification harness + usage docs (adversarial review + fault-injection) by intendednull · Pull Request #68 · intendednull/buiy

intendednull · 2026-06-16T08:55:53Z

What this is

Post-landing adversarial review + hardening of the buiy_verify visual-bug verification harness, a usage guide for it, and an empirical fault-injection proof that it detects real bugs. Also merges origin/main (text-editing E2–E6 + README) into the branch.

Hardening — 7 real bugs + 1 maintainability trap (all fixed, TDD)

A fresh-agent adversarial review (20-agent find → verify → synthesize) of the just-landed harness found genuine defects, each fixed red-test-first:

Sev	Bug	Fix
HIGH	A blank/failed render (`0×0`) silently PASSED any golden — empty fast-path short-circuited before the dimension-mismatch check, on the exact path the golden gate uses	`924ce89`
HIGH	`GoldenKey` dropped the `forced_colors` axis — `fc0`/`fc1` cells collapsed onto one baseline, so an FC regression would pass silently once blessed	`880a38a`
MED	Per-timestamp snapshots ran on a wall-clock virtual clock (non-deterministic for animated fixtures)	`8992d1f`
MED	Same-`Name` sibling sort tiebroke on `Entity::index()` (spawn-order-dependent flaky snapshots)	`8992d1f`
MED	Per-positive ledger budget was inert (gated with caller budget, never `positive.budget`)	`a68b655`
LOW	Saturated diff made a reftest `Mismatch` pass vacuously	`ebfbd24`
maint	Hardcoded `assert_eq!(cells, 24)` broke "zero edits to add a fixture"	`85007b6`

Plus reconciliation of the overstated docstrings the review flagged as the recurring theme (transform_roundtrips, scene.rs, the matrix_goldens vacuity message, invariants.md stability clause, reftests.md RefCase::multi deferral) and headless coverage for the determinism gate's first probe (87cd098).

Fault-injection — does it actually catch bugs?

Injected real one-line bugs into buiy_core production code and confirmed the gate goes RED (reverted each), documented in docs/reports/2026-06-15-verification-harness-adversarial-review.md:

Layout (+7px position) → RED via Tier-1 layout snapshot
Color/visual (kill red channel) → RED via Tier-2 display-list snapshot
Paint order (reverse z-sort) → RED via buiy_core's z_index_* tests

Two honest findings recorded as follow-ups: an R↔B swap was invisible on the button's symmetric colors (fixture-coverage note), and the Tier-3 invariant misses a production paint-order-assembly bug because realize re-implements sub-pass 6f instead of calling it (hardening follow-up).

Usage docs for future contributors / LLMs

.claude/skills/using-buiy-verification/SKILL.md — task-oriented how-to (tier selection, add a fixture, write each tier's test, the bless workflow, gotchas).
crates/buiy_verify/src/lib.rs — rewrote the stale crate doc into an accurate five-tier map.
CLAUDE.md — discovery pointer.

Doc accuracy was itself adversarially verified (a fact-check workflow caught a wrong fixture path + 3 overstatements, all corrected).

Merge of `origin/main`

Integrates text-editing E2–E6 + the README refresh. Conflicts (both additive) resolved in the buiy_core dev-deps and follow-ups.md. origin/main's 3 new #[ignore] GPU re-capture tests call the now-deprecated perceptual_diff; applied the same #![allow(deprecated)] interim policy the 4 sibling golden suites already use (full migration to metric::compare tracked in follow-ups.md).

Verification

Green on the merged tree: cargo fmt --check, clippy --workspace --all-targets -D warnings, cargo doc -D warnings, cargo test --workspace (175 result sections, 0 failed). The GPU --ignored lanes passed pre-merge on an RX 6700 XT (re-pathed forced_colors goldens match on the real adapter); the merge touched no verification GPU tests.

🤖 Generated with Claude Code

…-art, spec, plan Strategy report (5-tier reftests-first pyramid) re-grounded on canonical main; 5 prior-art folders (wpt-reftests, vello, skia-gold, flutter-golden-testing, wgpu-testing); the buiy-verification-design multi-file spec realizing foundation gates #2/#5/#11/#12; and the phased TDD implementation plan. docs/README catalog wired. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Phase 0.1 of the verification pyramid: the advisory MSSIM channel (image-compare) and the tier-1/2 snapshot driver (insta, glob feature) land in buiy_verify with exact patch pins. cargo deny check passes; any new transitive license is added explicitly to deny.toml's allow list. pixelmatch is NOT added here — Phase 1a vendors its algorithm. No code consumes them yet — the metric/snapshot modules land in Phase 1/2. insta pinned to =1.48.0 (latest 1.x patch at impl time, not the plan's =1.43.2 placeholder, per the plan's 'pin the exact latest 1.x' directive). Spec: docs/specs/2026-06-15-buiy-verification-design/metric.md § Crate choice. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Phase 0.2 of the verification pyramid: the #[ignore] GPU re-capture tests in tests/text_*_gpu.rs migrate (Phase 1a) off the deprecated L1 perceptual_diff onto buiy_verify::metric::compare, so buiy_core's tests need to name buiy_verify. Added under [dev-dependencies] only — this forms a DEV-ONLY cycle (core → verify → core) that Cargo permits because dev-dep edges are excluded from the normal build graph. Confined to #[cfg(test)]. Spec: docs/specs/2026-06-15-buiy-verification-design/metric.md § Migration. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Phase 0.3 of the verification pyramid: Dpr is device-pixel-ratio as integer milliscale (1000 = 1×, 2000 = 2×) so it is Eq+Hash+Ord — a fixture axis that keys goldens/coverage cells, never a tolerance. Defined ONCE here; goldens and coverage import it. from_f32/as_f32 round-trip the window's f32 scale_factor at the capture boundary; serde-derived for the bless ledger. Added serde.workspace = true to buiy_core [dependencies]: the plan made this conditional on 'if serde isn't already a direct dep'. Verified it was NOT (buiy_core's src had no serde use and the manifest no serde line), and the derive emits ::serde:: paths that bevy's re-export does not satisfy, so the direct dep is required. Rides the workspace serde pin — no new crate. Spec: docs/specs/2026-06-15-buiy-verification-design/determinism.md § Extending GoldenConfig. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Phase 0.4 of the verification pyramid: the shared GPU capture seam moves out of tests/support into render::golden src as capture_to_image(&mut App, &GoldenConfig) -> image::RgbaImage, so buiy_verify's reftest + golden tiers can call it. Sizes the offscreen target to the window's physical pixel grid, paints under CAPTURE_MSAA (single- sampled, dither off), and reads back into an RgbaImage. buiy_core gains image as a direct dep (README § Crate-dependency note: the only new GPU dep). #[ignore] GPU meta-test asserts physical dimensions + non-vacuous paint. readback_rgba_into is promoted to pub alongside capture_to_image; the tests/support readback_rgba now delegates to it so the readback poll + the 256-byte row-padding strip live in exactly one place (anti-drift). The dead CapturedBytes resource + Readback/ReadbackComplete/Mutex imports drop from tests/support as a result. Phase-0 scope is the capture mechanics; the four-condition quiescence flush and the scale_factor==dpr assertion are Phase 3.3's hardening. Spec: docs/specs/2026-06-15-buiy-verification-design/determinism.md § Where the code lives. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Type shapes + empty-case compare stub, wired into lib.rs. Algorithm lands next. Realizes metric.md § Types. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Ports pixelmatch's luminance-weighted YIQ delta (verbatim constants) and adds the raw L∞ max_channel_delta scan. Single-wrong-pixel is now caught at N in {16,256,2048} — the §4 dilution regression. AA exclusion and MSSIM follow. The yiq_luminance_outweighs_chroma fixture is corrected from the plan's [180,120,60]@0.05 (which does not separate luma from chroma — both exceed max_delta=88) to an equal-L∞ pure-luma (+30 all) vs chroma-leaning (+30R/-30B) pair @0.1, where the YIQ weighting (luma 455 vs chroma 244, max_delta=352) is what separates them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A differing pixel that is AA in either image is excluded unless include_aa. EXACT (0,0) now holds across residual AA jitter while still catching an isolated real defect. Vendored verbatim from pixelmatch. The aa_edge_pair fixture is corrected from the plan's hard-2-tone diagonal step (which pixelmatch correctly never classifies as AA — a pure black/white edge has no pixel with both a brighter and darker sibling, so excluded would equal counted=16, not 0) to a genuine antialiased vertical edge (black | gray AA column | white) whose gray column jitters 128->180 between a and b — the canonical sub-LSB re-rasterization the AA exclusion exists to tolerate (excluded=0, counted=16). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Two-axis gate (both bind); within() pins the fuzzy-if floor so an unexpectedly-clean render reds. A dimension mismatch folds into a saturated Diff that fails EVERY budget — the loud-red replacement for the naive silent 1.0. Adds a `saturated: bool` discriminator to Diff so passes() can honor metric.md's "false for every budget, including a maximal (255, u32::MAX)" contract: the pure two-axis formula would otherwise ACCEPT a saturated diff under a maximal budget. The flag also keeps a saturated mismatch categorically distinct from an in-bounds all-different frame (which a wide budget may legitimately accept). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Diff::mssim from rgba_blended_hybrid_compare, Option (None when disabled/errored — never silently 0.0). Proven non-gating: a 1-LSB wash (0 differing pixels) still passes a budget admitting its 1-LSB L∞ delta despite a sub-1 MSSIM. The mssim_never_gates fixture is corrected from the plan's passes(&EXACT) form (EXACT rejects the 1-LSB wash on the *channel* axis, so it cannot isolate the MSSIM-non-gating property) to a budget that tolerates the L∞ delta and 0 diff pixels, leaving MSSIM as the only possible gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

pixelmatch palette: differing pixels red, AA pixels yellow. Off in the hot reftest path; on for tier-5 golden triage HTML. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

metric.md § Verification: identity, scale-invariant single defect, saturated dim-mismatch, and an exact-integer constants pin guarding the vendored YIQ/AA numbers. (insta-snapshot upgrade deferred to Phase 2.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… to metric metric.md § Migration step 1: the RMSE metric and DiffResult are gone; tests/visual.rs and smoke.rs move onto metric::compare + Diff::passes (in-memory fixtures replace baseline/tinted PNGs). One metric now. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

metric.md § Migration step 2: buiy_core cannot depend on buiy_verify in its normal graph, so perceptual_diff carries a #[deprecated] gravestone pointing at buiy_verify::metric::compare; its L1 body stays for the unmigrated ignored GPU re-capture tests (Phase 3). Callers gain a file-level allow(deprecated) until they migrate. text_gpu.rs gains a TEMPORARY allow here (removed in 1a.10 when it migrates) so this commit stays clippy -D warnings clean; the plan's split leaves it warning otherwise. The deprecation note avoids literal #[ignore] brackets — rustdoc parses [ignore] as an intra-doc link and fails the -D warnings doc gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…pare The #[ignore] GPU re-capture tests reach the unified metric over the dev-only buiy_core -> buiy_verify edge (landed Phase 0.2). Stable re-capture sites -> passes(&EXACT) via assert_stable; the must-differ anti-tests (:152, :271) -> !passes(&EXACT) via assert_differs. The TEMPORARY allow(deprecated) added in 1a.9 is removed (the file no longer names perceptual_diff). Verified on the RX 6700 XT GPU lane: all 6 #[ignore] tests pass, the stable sites bit-exact at EXACT (0,0) — the old < 1e-4 tolerance was not masking drift. The stored-baseline sites in the other text_*_gpu.rs files stay on deprecated perceptual_diff until Phase 3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The MSSIM/threshold doc comments wrote the range as a bare [0,1], which rustdoc parses as an intra-doc link and fails the RUSTDOCFLAGS="-D warnings" doc gate (unresolved link to `0,1`). Wrapped in backticks so it renders as code, not a link. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

AA-exclusion on, MSSIM advisory, no diff-image alloc in the hot path — the options run_reftest passes to metric::compare (reftests.md § API). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

RefKind{Match,Mismatch} and reftest_kind(&str) — the token parser the reftest! macro calls. reftests.md § Module & public API. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The pairing (name/kind/test/reference/fuzz) and its outcome (passed/diff/report_path). reftests.md § Module & public API. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Match passes within budget, Mismatch passes outside it (the silent-no-op guard). Pure CPU so it gates headless. reftests.md § Verification #1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

run_reftest captures test+reference in ONE app via capture_to_image (re-target + re-readback) and diffs with metric::compare; the painting-app builder is promoted from tests/support into render::golden::capture_app so buiy_verify builds its app from src (the test-support gpu_render_app* builders now delegate to the single src body — anti-drift). GPU known-good/ known-bad pairs prove the harness can both pass and fail (vacuous-green guard). reftests.md §§ API, Verification #3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A != that tolerates difference is vacuous — mismatch_floor_ok gates it pure-CPU and run_reftest asserts it as a belt (replacing the 1b.5 inline stub). reftests.md § Verification #2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

reftest!(kind, fn_ident, test, reference[, fuzz=(d,p)]) emits one #[test] #[ignore] per pairing; a non-(0,0) floor on a mismatch fails to COMPILE via a const assert. reftests.md § 'The reftest! macro'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

assert_reference_independent builds the reference into a no-GPU App and rejects any forbidden marker (ContentVisibility/ContainerQuery/TopLayer/ Translate). Value-encoded features fall to human review (documented). The lint is itself RED/GREEN-tested. reftests.md §§ Reference independence, Verification #4. Two deviations forced by the live API (both keep the lint structural): - TopLayer is a FIELD on the Stacking component, not a component of its own, so the marker queries Stacking and checks top_layer != None — structurally equivalent to the Containment/content_visibility routing. - Style is a Bundle that already supplies Containment + Stacking; the self-test sets content_visibility via Style::containment() (spawning a second Containment alongside is a duplicate-component panic, not a lint trip), and the markers check the FIELD VALUE so a default-Visible Containment on a disjoint reference does not trip the lint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Promotes the CPU SDF port from scalar probes to a full-tile rasterizer mirroring shader.wgsl:60/:76-:79 (sdf_rounded_rect + fwidth->smoothstep). Pinned to the render_instance.rs point-probes. reftests.md §§ CPU-vs-GPU cross-check, Verification #5. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Renders one rounded-rect on the GPU and via the CPU oracle, diffs within a measured AA fuzz budget. Zero stored bytes; kept permanently (one shared analytic SDF). reftests.md § CPU-vs-GPU SDF cross-check. Two corrections forced by root-causing a 60%-of-frame divergence to green: - The corner radius for a Background fill is carried on Border.radius (Corners::all(Radius::circular(..))) — the component draw_for_node reads (render/mod.rs:373) — NOT a bare Radius component (which the fill path ignores). spawn_single_primitive now uses a zero-width Border. - The CPU oracle must match the full CAPTURE chain, not just the fragment shader: the capture camera clears to OPAQUE BLACK and the pipeline blends linear-space SrcOver into an Rgba8UnormSrgb target. The oracle now composites coverage over opaque black in linear space then sRGB-encodes, so interior + exterior agree and only the ~1px AA rim differs (measured 87/24000 px on RX 6700 XT; budget bounds it at 200). The 1b.10 oracle point-probe test moves to the same capture-matched convention (filled = opaque white, empty = opaque black) — same geometry, composited. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

flex justify-content: SpaceBetween == three literal-offset boxes (reference routes through the primitive/literal-Node layer, NOT flex — independence by construction); content-visibility: hidden != the visible subtree (the != anti-test). The cv reference's independence is asserted pure-CPU. Both pass on the RX 6700 XT at the default (0,0) fuzz. reftests.md § Authoring patterns. Adaptations to the live API: - content-visibility set via Style::containment() (Style is a Bundle that already supplies Containment — a second one is a duplicate-component panic). - the independence lint builds the reference under ThemePlugin + LayoutPlugin (no GPU) so theme-token-installing scenes build; the lint still reads only component DATA, no render systems run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ons) Tier 1-2 structured-snapshot module (snapshots.md). Task 2.1 lands the shared dump primitives both tiers consume: - `round(f32) -> String` — round to ROUND_DP=2 decimals, strip trailing zeros + bare trailing dot, normalize -0 to "0". Kills last-ULP churn from the Taffy / clip-space math while staying diff-readable. - `LAYOUT_DUMP_VERSION` / `DISPLAY_LIST_DUMP_VERSION` — format-version headers so a formatter change is one conscious, visible diff line. The `#[track_caller]` insta bridge (`assert_named_snapshot`) writes each `.snap` beside the CALLING test file via `Location::caller()` + `prepend_module_to_snapshot(false)`, so the dump helpers can live in buiy_verify while their `.snap`s live next to the buiy_core tests that call them. `bytemuck.workspace = true` added to buiy_verify (already a workspace dep used by buiy_core for the PackedInstance POD layout; the Tier-2 hex check needs bytes_of / pod_read_unaligned). No new supply-chain crate, no new cargo-deny surface. Deviation: snapshots.md § Verification #2's `round(1.005) == "1.0"` vector is self-inconsistent with `round(50.0) == "50"` (1.005_f32 is 1.00499…, formats to "1.00" — same .00 suffix as 50.0's "50.00", so one trailing-zero rule cannot strip one to "1.0" and the other to "50"). The self-consistent rule strips all trailing zeros; `round(1.005) == "1"` preserves the vector's intent (1.005 rounds DOWN to 1.00, never up). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`layout_dump(world)` emits one `(name, pos, size)` line per ResolvedLayout entity, indented by ChildOf depth, siblings ordered by Name then Entity index — the Name-key is what makes the dump invariant to ECS spawn / archetype order (proved by the entity-order-invariant self-test). Floats via the shared `round`; unnamed entities fall back to `entity#<index>`; version-headered. `assert_layout_snapshot(app, name)` runs one update() then snapshots the dump via the #[track_caller] insta bridge, so the `.snap` lands beside the CALLING test (verified: buiy_core's flex_row_basic.snap landed under crates/buiy_core/tests/snapshots/, not buiy_verify's tree). Self-tests (plain assert_eq!, non-vacuous): entity-order invariance, version-header tripwire, unnamed-fallback. Migration (layout.rs:33): the child-only `(size - 50).abs() < 0.5` pair becomes one `assert_layout_snapshot(&mut app, "flex_row_basic")` over a Name-tagged root + TWO 50x50 children — the snapshot pins every box's position+size (strictly more than the old tolerance assert) and exercises sibling ordering. The two layout_tree_garbage_collects_* tests STAY plain assert_eq! (LayoutTree cardinality, not geometry — a length snapshot is lower-density). Robustness: collect_layout_entries / NameLookup::from_world / extract_nodes_from_world look up Name/ChildOf/Background per-entity via world.get and tolerate try_query returning None for an unregistered component (a fixture that tags none) — fixes a panic on a nameless, childless fixture. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`instance_hex(p)` hex-dumps `bytemuck::bytes_of(&PackedInstance)` (52 B → 104 hex chars) — a byte-exact, format-version-free snapshot of the GPU upload payload, the complement to the diff-readable Display dump: a packing arithmetic change flips the hex even when the rounded dump rounds it away. `NameLookup` (entity→name, World-built once) keeps the display-list dump World-free. Self-tests (plain assert_eq!, non-vacuous): - hex_round_trips_bytes: hex → parse → pod_read_unaligned reconstructs the exact instance bytes (lossless, matches the GPU payload). - hex_flips_on_a_packing_change: a negated height (the half-size sign bug render_instance.rs regression-tests) flips the hex — proves teeth. Endianness: bytes_of is host-endian; CI + dev are little-endian x86-64 and the hex is a within-repo regression artifact, documented in the fn. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…sensitivity Phase 3.5 of the verification pyramid (determinism.md § Verification #1/#2). All #[ignore], GPU lane — the headless gate stays green without them. #1 idempotent capture (the headline proof): the SAME scene captured TWICE through two fresh DeterministicApps is byte-identical — compare(a, b, default) .passes(FuzzBudget::EXACT) at (0, 0). Covers a rounded-rect fixture AND an Ahem-text fixture (the box-font collapse holds frame-to-frame). Verified on the RX 6700 XT: both pass at (0,0). The brief's second verification: ahem_text_is_font_availability_invariant — the same Ahem text scene captured with vs without an extra host-style family registered is byte-identical, because the fixture names only "Ahem" and that is the sole resolvable family. Proves host-font-independence at the pixel level. #2 knob sensitivity (negatives): knob_sensitivity_dpr (1× vs 2× differ — a different physical grid), knob_sensitivity_font_mode (Real vs Ahem of the same text differ — outlines vs em-boxes). Each flip changes the bytes ⇒ the knobs are load-bearing. FINDING — MSAA is inert for this pipeline, by design. The test that asserted a 4× MSAA capture *differs* from the single-sampled one FAILED with 0 differing pixels: Buiy antialiases the SDF analytically in-shader and paints axis-aligned pixel-covering quads, so a hardware MSAA resolve is identity. That is exactly determinism.md's rationale ("in-shader analytic AA … MSAA buys nothing here"). The test is reframed (msaa_is_inert_for_the_in_shader_aa_pipeline) to assert the verified truth — a 4× capture is byte-identical to CAPTURE_MSAA — which is WHY pinning MSAA off is free. No nondeterminism source; an honest reframe. Spec: docs/specs/2026-06-15-buiy-verification-design/determinism.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… landed Phase 3.10 of the verification pyramid (determinism.md § CI software-rasterizer pin). A CONFIG/DOC deliverable — lavapipe is not installed locally, so this is validated on the real RX 6700 XT here; the lavapipe leg is the CI stored-baseline gate. - .github/actions/install-mesa/action.yml: a composite action that consumes gfx-rs/ci-build's prebuilt, VERSION-PINNED lavapipe tarball (no self-build; MESA_VERSION + ci-build tag pinned), writes its OWN ICD JSON (the upstream path is build-host-absolute), and exports the adapter-selection env contract: VK_DRIVER_FILES (the modern variable, NOT the deprecated VK_ICD_FILENAMES — deviation #2) + WGPU_ADAPTER_NAME=llvmpipe. LP_NUM_THREADS is deliberately NOT set (deviation #1 — determinism comes from the pinned Mesa version, not thread count). - .github/workflows/ci.yml: a new `gpu` job invoking the action, a one-line llvmpipe-adapter smoke guard (determinism.md § Verification #5 — the pin is active, not silently falling back to hardware), then the #[ignore] GPU lane serialized at --test-threads=1. Additive: the headless `test` job stays green with no adapter. Also records a "Landed" section in determinism.md (tasks 3.1-3.5, 3.10) and corrects Verification #2's MSAA claim to the VERIFIED finding: 4× MSAA is byte-identical to CAPTURE_MSAA for Buiy's in-shader analytic-AA pipeline, which confirms (not contradicts) the MSAA-pin rationale. Tier-5 golden corpus (3.6-3.9) remains future work; status stays draft until the 4.7 flip. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Phase 3.6 (verification-design goldens.md). Adds the `buiy_verify::golden` module: the `GoldenKey` trace identity (widget × state × theme × viewport × backend × dpr) with a deterministic lower-kebab slug + `from_slug` inverse, the `Backend` enum, and the human-diffable `BlessLedger`/`Positive` TOML accept record. The key schema is fixed before any golden is generated — a Skia-Gold lesson, since adding a field later re-baselines the whole corpus. The module scaffolds all three submodules (`check`, `ledger`, `report`) so the `pub use` re-exports resolve and the crate compiles; 3.7/3.8/3.9 land the per-area test coverage and the GPU round-trip over this same code. `FuzzBudget` gains serde derives so `Positive.budget` persists a per-fixture widened budget directly. New workspace deps `toml = "0.8"` (ledger) and `base64 = "0.22"` (HTML report PNG inlining) — both MIT/Apache-2.0, cleared by `cargo deny check` before the add. RED→GREEN: `golden_keys.rs` proptest pins `slug()`→`from_slug` round-trip and no-collision over canonical keys (goldens.md § Verification #6), plus deterministic/lower-kebab/dir/ledger-TOML unit tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Phase 3.7 (verification-design goldens.md § Verification #1–#4). Lands the test coverage + the env-decoupling refactor over the `golden::check` code from 3.6. `check_golden` compares `actual` against the stored multi-positive baseline set and passes if ANY positive clears the budget (Skia-Gold "many positives per config"); on a miss it carries the closest (smallest-Diff) candidate. `assert_golden` is the fail-closed panicking wrapper — empty/non-matching corpus panics with the bless instruction (the BUIY_ACCEPT_SHAPING shape); under BUIY_BLESS=1 it blesses instead, writing the PNG + recording commit/timestamp/ budget/reason in the human-diffable ledger (never a silent overwrite). Refactor: the bless decision is resolved into an explicit `BlessMode` at the single public env-read site, so `check_golden_in`/`assert_golden_in` drive bless/assert against a temp corpus with no process-global `BUIY_BLESS` race — the seam the harness self-tests and the Phase-4 coverage matrix driver consume. RED→GREEN (golden_persistence.rs, pure-CPU, synthetic images): match/mismatch, multi-positive any-matches (second positive ⇒ matched_positive: 1), bless round-trip (re-check passes + ledger provenance), bless-replace-in-place, fail-closed panic on empty corpus, and the structured missing⇒Fail{best:None}. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Phase 3.8 (verification-design goldens.md § Verification #5). Lands the test coverage for the `golden::report` TriageReport/TriageCard built in 3.6. The report base64-inlines the actual / closest-baseline / diff-heatmap PNGs into one HTML file with three views per card — side-by-side, a pure-JS opacity-slider overlay, and the diff heatmap — so it opens straight from a CI artifact with no network and no external asset (offline-first, no SaaS). RED→GREEN (golden_report.rs, pure-CPU): assert every `src=` is a data URI, no http(s)/relative/`<script src>` reference, the three views + slug label + JS slider are present, write() emits the same self-contained file to disk, and multiple cards accumulate with unique overlay ids. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Phase 3.9 (verification-design goldens.md § Verification #7). Lands the GPU `#[ignore]` golden lane (tests/goldens.rs) over the persistence machinery from 3.6–3.8, plus the first blessed in-git corpus. `golden_round_trip_on_real_adapter` is the self-verifying machinery proof (needs no committed PNG): on the RX 6700 XT it captures a deterministic rounded-rect, blesses it to a temp corpus, re-captures + asserts it passes at FuzzBudget::EXACT (the determinism pin makes re-capture bit-identical), then asserts a deliberately-tampered image FAILS and emits a diff-PNG heatmap + a self-contained HTML triage report carrying the expected sections (slug, base64-inlined PNGs, diff-heatmap view, overlay slider, no external URL). The full bless→pass→fail→report cycle, verified end to end on real hardware. The committed residue goldens assert against the in-git corpus: - `golden_ahem_layout_class` double-asserts the box-font collapse — two fresh Ahem captures are byte-identical AND equal to the stored positive. - `golden_sdf_corner` pins the irreducible SDF corner AA rim. Blessed corpus (reviewed: each PNG decoded + eyeballed as the intended scene): - rect-rounded: a blue rounded fill on black, 5 distinct colors incl. AA rim pixels (genuine SDF corner residue, not a hard rectangle). - text-ahem: two solid orange em-boxes ("Hi" under the Ahem box-font). Each ledger records the bless commit + RFC3339 timestamp + (0,0) budget + reason; PNGs total 44K (well under the 50MB object-store migration trigger). `.gitattributes` pins `crates/buiy_verify/tests/goldens/**/*.png binary` so the nested per-key corpus is never eol-converted (mirrors the *.snap pin; the `**` glob crosses the per-key dirs). Deferred (harness-ready, renderer-blocked): the drop-shadow-kernel golden (no BoxShadow extract/draw path yet) and the color-emoji fidelity golden (pinned bundled emoji font). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ll (Phase 4.1-4.3, 4.5) Coverage-by-construction substrate (coverage.md): a Fixture corpus crossed with a global Matrix Cartesian product, so adding one fixture auto-enrolls it across every tier with no test-file edit. - coverage/fixture.rs: Fixture{name,state,spawn}, the `fixture!` macro emitting an inventory::submit!, catalog()/sorted_catalog() over the inventory registry. - coverage/matrix.rs: Matrix{themes,viewports,forced_colors,dprs}, ThemeAxis, Viewport, Cell, ci_default (2×3×2×2 = 24 cells/fixture), cells() Cartesian product in stable axis-declaration order, CELL_CEILING_PER_FIXTURE budget. - coverage/key.rs: CoverageKey (Cell × Fixture) deriving Eq+Hash because dpr is the canonical milliscale Dpr, not f32 — so keys (not just stems) collect into a HashSet. stem()/from_stem() round-trip losslessly. - coverage/enroll.rs: build_app (CPU-only deterministic app: cell theme installed, synthetic PrimaryWindow sized to viewport×dpr, forced_colors on UserPreferences, fixture spawned) + enroll_all over catalog×cells. - Added Backend::Cpu to the golden Backend enum so CPU (Tiers 1-3) and GPU golden cells key off one enum (coverage.md §146). - fixtures/button/resting.rs: the live Button::new bundle as the catalog row, with forced-colors-safe system-color paint inserted (the default Button uses a brand token, NOT yet forced-colors-safe — a buiy-widget-catalog-design concern). - New dep: inventory 0.3 (MIT/Apache-2.0, already in lockfile; deny-clean) and a path edge on buiy_widgets (acyclic). Self-tests (coverage_meta.rs) green: verify_catalog_matches_glob, verify_keys_unique, verify_cell_count_under_ceiling, enrollment_fan_out, build_app_pins_viewport_theme_and_dpr. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… (Phase 4.4, 4.6) Compose the matrix across all tiers + close gate #11's live-catalog half. Enrollment drivers (Task 4.4) — each a thin enroll_all caller, no per-widget test code; adding a fixture enrolls it into every tier with zero edits: - coverage_layout.rs (Tier 1, gate #5): assert_layout_snapshot per cell, plus a baseline-free structural guard (version header + names the widget root). - coverage_display_list.rs (Tier 2): display-list dump per cell at t=0. - coverage_invariants.rs (Tier 3, gate #12): finite + non-negative-extent predicates on the realized live scene per cell. - coverage_golden.rs (Tier 5, #[ignore] GPU): captures each cell on the real adapter, keyed by a GoldenKey derived from the CoverageKey. No PNGs committed (blessed on a GPU host). - 48 CPU-deterministic .snap baselines committed (24 layout + 24 display-list). Forced-colors live wiring (Task 4.6) — gate #11 over the LIVE catalog: - coverage/forced_colors.rs: live_catalog_paint()/paint_for_fixtures() derive CatalogPaint from the spawned Background/Border/Outline off each fixture's Name-tagged root; the analyzers run UNCHANGED, only the input source moves from hand-built descriptors to the live tree (closes follow-ups.md:462-473). - coverage_forced_colors.rs: live_catalog_has_no_forced_colors_violations (the production scan), broken_fixture_produces_violation (teeth — a brand-token fixture MUST violate, proving the producer reads real paint), safe_fixture_produces_no_violation (non-vacuous companion), and an #[ignore]'d boxshadow_visual_reftest_is_blocked placeholder documenting the BLOCKED BoxShadow draw-skip dependency (follow-ups.md:474-478 — NOT faked green). Auto-enroll-by-construction proof (Task 4.5 extension): enroll_fixtures seam + adding_one_fixture_grows_corpus_by_axes asserts adding one fixture grows the corpus by exactly |axes| (24) cells. Boundary documented: Buiy's wholesale forced-colors theme swap means no token resolves in both light and forced themes, so the system-color-safe button renders the magenta sentinel under the light theme — recorded faithfully in the *.light.* display-list baselines (the forced-colors-safe default widget is a buiy-widget-catalog-design concern). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rustdoc RUSTDOCFLAGS="-D warnings" cargo doc flagged a redundant-explicit-link in the golden Backend::Cpu doc and required explicit cross-crate / cross-module paths for the coverage module's intra-doc references (the `fixture!` macro, the buiy_core forced-colors analyzers, and the Matrix/CoverageKey/Fixture/ThemeAxis links from modules that do not `use` those types). Doc-only; no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ctive Closes the verification campaign's docs flip. Reconciles every buiy-verification-design child file against the actually-landed code and flips the spec draft -> active/landed: - metric.md: record that pixelmatch is VENDORED (the crate is unusable — PNG-stream API, private primitives, image-0.24-bound), not a dep; the Diff.saturated field; compare is infallible; corrected crate-choice table. - snapshots.md: display_list_dump renders color as resolved #rrggbbaa hex (ExtractedNode.color is post-theme; no token rendering), NameLookup::from_pairs, the trailing-zero-stripping round() rule, assert_display_list_snapshot_at; top-layer asserts live in render_paint_order.rs. - invariants.md: top_layer_paint_rank promotion (systems.rs:3816), cosmic_text::Cursor (not a Buiy struct), module = invariant.rs + invariant/. - reftests.md: two-captures-in-one-App without a capture_scene shape; TopLayer-via-Stacking marker; Style-as-Bundle; the SDF cross-check root-cause (Border.radius is the consumed radius; linear-blend-over-opaque-black + sRGB-encode capture chain); the value-encoded independence caveat. - determinism.md: status landed; PendingCaptureAssets, VK_DRIVER_FILES, MSAA-inert all as-landed; Ahem real font shipped. - goldens.md: Backend::Cpu added; __-separated slugs; BlessMode + *_in hermetic variants; corpus started (rect-rounded, text-ahem); honest GPU-lane state. - coverage.md: Matrix/enroll_all/CoverageKey final shapes + the live forced-colors wiring (with broken-fixture teeth) + the BLOCKED BoxShadow visual reftest + the wholesale-swap magenta-sentinel deviation. Also: README spec entry + reading-order draft->active/landed; foundation verification.md gates #2/#5/#11/#12 get realization notes (definitions unchanged); plan status -> landed with a per-phase table; follow-ups.md marks the stored-PNG golden machinery / metric / determinism / layout snapshots / proptest invariants / forced-colors live wiring as DONE and records the DEFERRED set (shadow-kernel/color-emoji goldens, BoxShadow forced-colors visual reftest, multi-reference aggregation, golden-prune bin, object-store migration, and the matrix_goldens bless-on-demand GPU-lane gap). Docs-only; no source touched. Headless gate (fmt/clippy/doc/test) + cargo deny green; GPU lane green except the pre-existing coverage_golden::matrix_goldens fail-closed (button corpus un-blessed since a73de05), documented as deferred. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coverage_golden::matrix_goldens asserted a golden for every cell of ci_default over every catalog fixture, but Tier-5 goldens are the minimal rasterization residue (goldens.md § Storage) and only the rect-rounded/text-ahem classes are blessed — so it fail-closed on the un-blessed button cells, reddening the GPU lane (CLAUDE.md: the GPU lane must pass on a GPU host). Add golden::committed_positives(key); make matrix_goldens bless-on-demand (no committed baseline ⇒ pending/skipped; a blessed cell still must match on fresh capture via assert_golden). BUIY_BLESS=1 still spans the full matrix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ent-pass (HIGH) The empty-image fast path short-circuited BEFORE the dimension-mismatch check, so compare(0×0, real) returned a non-saturated zero-Diff and passed every budget. The golden gate (golden/check.rs) feeds the live capture as arg `a`, so a render that emitted a 0×0 image was silently accepted against any stored golden — defeating the saturated sentinel on the exact visual-regression path. Reorder: dimension-mismatch check first, then the (now equal-dim ⇒ both-empty) fast-path. Adds an asymmetric both-orders regression test; the prior empty/ mismatch tests only covered equal-dim and empty-vs-empty. Found by the fresh-agent quality review of the verification campaign. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… (HIGH) Matrix::ci_default crosses forced_colors (24 cells) and CoverageKey encodes fc0/fc1, but GoldenKey had NO forced_colors field, so the GPU-tier golden_key() mapping collapsed fc=false and fc=true onto one baseline. The two cells produce DIFFERENT captures (the BoxShadow draw-skip reads UserPreferences::forced_colors), so once blessed a forced-colors visual regression would silently pass against the other mode's baseline — the exact hole gate #11 exists to close. - GoldenKey gains `forced_colors: bool`; slug/from_slug carry an fc0/fc1 token (schema is now widget/state/theme__viewport__fc__backend__dpr). - golden_key() threads cov.forced_colors through; new headless regression test golden_key_is_injective_over_the_matrix asserts no two cells share a slug. - Re-path the 2 committed residue goldens (rect-rounded, text-ahem) to the fc0 slug; PNG bytes unchanged (captured at default fc=false), ledgers gain forced_colors = false. - Reconcile goldens.md (struct + slug schema + Backend::Cpu drift). No button golden is committed yet, so fixing the key schema now costs zero re-baseline. Found by the fresh-agent quality review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Two determinism holes the fresh-agent review reproduced — both bite exactly the patterns a scaling app introduces: 1. Per-timestamp display-list snapshots ran on a WALL-CLOCK virtual clock. build_app adds TimePlugin but never pinned TimeUpdateStrategy, so each app.update() advanced Time<Virtual> by the wall-clock delta — the captured frame's logical time was t + accumulated-wall-clock, non-reproducible (and advance_virtual_to's checked_sub silently underflowed to ZERO once drift exceeded a step). assert_display_list_snapshot_at now pins ManualDuration(ZERO) so advance_virtual_to is the SOLE clock driver. Regression: wall_clock_does_not_leak_into_the_per_timestamp_clock (phase (a) proves the leak is real, so the test isn't a tautology). 2. Same-Name sibling sort tiebroke on Entity::index() (spawn-order dependent), so list rows all Name::new("row") dumped in spawn order — a flaky snapshot, the worst failure mode for a verification harness. Both the Tier-1 layout sort and the Tier-2 display-list extract now tiebreak by CONTENT (position then size via f32::total_cmp); genuinely-indistinguishable siblings (same name+box) fail loudly rather than emit a flaky dump. Regression: dump_is_invariant_for_same_name_siblings (the existing determinism test used UNIQUE names, so it never hit this). Found by the fresh-agent quality review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…DIUM) Positive.budget is written by bless(), persisted to TOML, round-trip-tested and documented (ledger.rs: "the budget this positive is asserted against") as the per-fixture widened budget a baseline is matched under — but check_golden gated with the caller's FuzzBudget parameter and never read positive.budget. The documented per-fixture widened-budget workflow was therefore inert: an SDF/shadow positive blessed with a widened tolerance would be re-checked at the caller's (often EXACT) budget and spuriously fail. - check_golden_in now gates positive i against ledger.positives[i].budget; the caller's check-time budget remains the budget recorded when blessing a NEW positive. - The failure triage card reports the closest positive's own budget (which bar was missed), not the caller's. - Regression: positive_is_gated_by_its_own_recorded_widened_budget. Latent today (both committed ledgers store (0,0)=EXACT, equal to the caller budget). Found by the fresh-agent quality review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

evaluate_outcome returned !diff.passes(fuzz) for a Mismatch, so a saturated diff (dimension mismatch — a structural capture error) made the Mismatch pass vacuously: !false == true. A broken capture must FAIL both kinds, never be mistaken for a legitimate render difference. Early-return false when saturated. Latent today (run_reftest captures both images from one app at a fixed shared viewport, so the dimension-mismatch branch never fires), but a real invariant gap. Regression: saturated_diff_fails_both_kinds. Found by the fresh-agent review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…24 (maintainability) The Tier-1/Tier-3 enrollment tests asserted cells.get() == 24, which a SECOND fixture would redden — breaking the central 'zero test edits to add a fixture' guarantee the coverage-by-construction design exists to provide. Derive the expected count from sorted_catalog().len() * cells_per_fixture(); the literal 24 stays pinned in exactly one place (matrix.rs's cells_per_fixture unit test). Found by the fresh-agent quality review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ss test The fresh-agent review's headline quality finding was 'docstrings oversell what the tests guarantee'. Reconcile each (doc-as-deliverable): - transform_roundtrips: was 'a transposed factor reds this' — it is blind to INTER-factor order (each relation uses one non-identity factor); note the scope and point at buiy_core's compose-order unit tests that DO pin it. - scene.rs: was 'can never diverge from what the engine paints' — bound it to the generated domain and record the PositionKind (tier-2 positioned/auto-z) generator-coverage gap. - matrix_goldens: the vacuity guard's message read like a non-vacuousness check; make the 0-compared case loud (green != covered) and annotate the guard. - invariants.md: the paint-order stability clause is inexpressible at the predicate's input boundary (the stable sort already ran); record the waiver. - reftests.md: mark RefCase::multi OR/AND aggregation DEFERRED (already in follow-ups; single-reference covers current pairings). Also: give the determinism gate's first probe headless teeth — a unit test that quiescence_unmet blocks on an unloaded required asset (condition 1), so a vacuous-check regression there fails without a GPU. follow-ups.md gains the PositionKind, quiescence conditions-2-4, and CPU-SDF-oracle-numeric-pin gaps. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

One-shot report capturing the fresh-agent cold-context review of the landed buiy_verify harness: 7 confirmed bugs (2 high) + 1 maintainability trap found and fixed TDD, the doc-overstatement theme reconciled, 3 coverage gaps deferred. Indexed in docs/README.md under Reports. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Injected real one-line bugs into buiy_core PRODUCTION code and confirmed the gate goes RED (reverted each): - layout +7px position -> RED via Tier-1 layout snapshot - color red-channel kill -> RED via Tier-2 display-list snapshot - paint-order reversal -> RED via buiy_core z_index_* tests (NOT the new Tier-3 invariant) Two honest findings recorded: - A color R<->B swap was initially missed because the button fixture's colors (white, magenta sentinel) are symmetric under R<->B; an asymmetric kill was caught. Fixture-coverage note, not a harness defect. - A production paint-order bug is invisible to the Tier-3 invariant because scene.rs::realize re-implements the painters_z assembly (sub-pass 6f) instead of calling it. Caught by buiy_core's z_index_* tests + the GPU golden tier. Added a hardening follow-up (make realize CALL the production assembly). Documents the fault-injection pass in the adversarial-review report + follow-ups. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Future LLMs (and humans) had no USAGE guide for the verification harness — only the design spec (target state) and the build plan (historical). The crate root doc was itself stale ("Phase 0 ships the perceptual metric..."). Add: - .claude/skills/using-buiy-verification/SKILL.md — the task-oriented how-to: tier-selection rule, add a fixture (+ the #[path] mod wiring step), write each tier's test, the reftest! / fixture! macro syntax, the BUIY_BLESS golden workflow, the headless vs GPU --ignored gates, and the gotchas that each cost a real bug (same-Name siblings, asymmetric fixture colors, forced_colors key axis, the realize-mirror paint-order blind spot, saturated-diff loud-fail). - crates/buiy_verify/src/lib.rs — rewrite the stale crate doc into an accurate five-tier map with entry-point intra-doc links (cargo doc -D warnings clean) pointing at the skill + spec + report. - CLAUDE.md — a Code Conventions pointer so it's discoverable every session. Accuracy adversarially verified by a fact-check workflow against the code (4 agents + synthesis): it caught a wrong fixture path (tests/fixtures -> fixtures) and three overstatements (harness-"enforced" Camera2d/Name -> contract; "zero central-list edits" -> one #[path] mod line; the GPU-golden paint-order catch is potential, not current) — all corrected before commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… branch Integrates the text-editing campaign (E2–E6: input/keymap, caret/selection, clipboard/undo, IME, lifecycle) + the README refresh that landed on main while the visual-bug verification work was in flight. Conflicts resolved (both additive): - crates/buiy_core/Cargo.toml [dev-dependencies]: keep BOTH proptest (theirs) and the buiy_verify dev-only cycle edge (ours). - docs/plans/follow-ups.md: keep our verification follow-up entries + their text-editing follow-up sections; merge the closing Owner/Spec-touchpoint. Integration fix: origin/main's three new #[ignore] GPU re-capture tests (text_caret_selection_e3_gpu, text_placeholder_gpu, text_ime_preedit_gpu) call buiy_core::perceptual_diff, which this branch DEPRECATED — so clippy --all-targets -D warnings would reject them. Applied the same #![allow(deprecated)] interim policy the four sibling GPU golden suites already use (migration to buiy_verify::metric::compare tracked in follow-ups.md). Gate on the merged tree: clippy --workspace --all-targets -D warnings clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

DeterministicApp::build() instantiates the capture render stack (capture_app_scaled → RenderPlugin), which REQUIRES a wgpu adapter. Three determinism_build.rs tests called build() but were NOT #[ignore], despite the file claiming HEADLESS — so they ran in the every-PR gate. They passed anywhere with an adapter (local GPU, macOS/Windows CI) but panicked 'Unable to find a GPU!' on adapter-less Linux CI, the gate that must stay green without one. My local GPU masked this; CI ubuntu caught it. Split the file: config-level knob checks (default/override DPR, font mode, the MSAA constant) stay HEADLESS via .config() (no build()); the built-app observables (window scale_factor, the manual TimeUpdateStrategy) move to #[ignore] GPU-lane tests. Verified: headless 4 passed/3 ignored (no adapter touched), --ignored 3 passed on RX 6700 XT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The install-mesa action defaulted to ci-build-tag build19 + mesa-version 24.3.4, but that release pairing does not exist: gfx-rs/ci-build build19 carries mesa-24.2.3; mesa-24.3.4 ships under build20. So the GPU (pinned lavapipe) CI lane 404'd at the Mesa download and has never actually run. Fix the tag to build20 (keeping mesa-24.3.4 so lavapipe pixel output — what the stored goldens are blessed against — does not shift). Verified the corrected URL returns HTTP 200, the broken one 404. Documented the tag↔version pairing gotcha. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The GPU (pinned lavapipe) lane now installs Mesa successfully (prior commit), so for the first time it reaches `cargo test`. The release `wgpu-info` build + the large bevy #[ignore] GPU test binaries + the pinned Mesa exhaust the ~14 GB ubuntu-runner disk: 'No space left on device'. Add the standard free-disk-space step (remove preinstalled dotnet/android/ghc/CodeQL/boost/toolcache + prune docker images, ~25 GB reclaimed) before the rust-cache restore and the compiles. Inline rm -rf, no new third-party action. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

After the disk fix the GPU lane reached the tests and the buiy_core #[ignore] suite passed (goldens included — the residue corpus matches on CI's pinned lavapipe 24.3.4). But linking buiy_verify's large bevy test binaries then crashed 'ld terminated with signal 7 [Bus error], core dumped' — the GPU lane builds wgpu-info (release) + runs the buiy_core GPU suite first, so it has far less memory/disk headroom than the plain Test job when it gets to that link. Debug info is the bulk of a bevy test binary's link size, and GPU pixel/invariant checks don't need backtraces. Add -C debuginfo=0 (preserving -D warnings) for the GPU job only — shrinks both the link memory and the disk footprint. The other jobs keep full debuginfo. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The SDF-corner-AA residue golden was blessed on dev hardware (RX 6700 XT) but is keyed to CI's pinned lavapipe, which rasterizes the corner AA differently (perceptually identical, differing_pixels=0, but max_channel_delta=35) — so it fails EXACT on CI. The determinism design requires goldens captured ON the pinned rasterizer. Add a sentinel-gated job (runs while BLESS_GOLDENS_NOW exists) that drops the dev-hardware positives and re-blesses fresh single-positive lavapipe captures, uploaded as the lavapipe-blessed-goldens artifact. Sentinel-gated (not workflow_dispatch) because dispatch requires the workflow on the default branch. Next: download the artifact, commit the lavapipe goldens, remove the sentinel + this job. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…l if) The previous attempt gated the job on `if: hashFiles('BLESS_GOLDENS_NOW')`, but hashFiles is not allowed in a job-level if — GitHub rejected the whole workflow ('workflow file issue'), so no jobs/checks ran. Drop the gate (and the sentinel file); the job is temporary and will be removed once the lavapipe goldens are committed, so unconditional is fine — it only writes the runner's ephemeral checkout + uploads an artifact, never commits back. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ere) The rect-rounded SDF-corner-AA golden was blessed on dev hardware (RX 6700 XT) but is keyed to CI's pinned lavapipe, which rasterizes the corner AA differently (perceptually identical — differing_pixels=0 — but max_channel_delta=35), so it failed EXACT on CI. The determinism design requires goldens captured ON the pinned rasterizer. Re-captured both residue goldens on CI's lavapipe (Mesa 24.3.4) via the temporary bless-goldens job and committed the artifact: the rect-rounded SDF PNG now matches lavapipe EXACT; the text-ahem PNG is byte-identical (Ahem boxes are rasterizer-invariant — only its ledger provenance updated). Removed the now-done temporary bless-goldens CI job. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

intendednull and others added 30 commits June 15, 2026 04:35

feat(verify): metric module skeleton — Diff/FuzzBudget/CompareOpts

9525cca

Type shapes + empty-case compare stub, wired into lib.rs. Algorithm lands next. Realizes metric.md § Types. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(verify): diff_image heatmap on emit_diff_image

b085f91

pixelmatch palette: differing pixels red, AA pixels yellow. Off in the hot reftest path; on for tier-5 golden triage HTML. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(verify): add CompareOpts::reftest_default for tier-4

61901e3

AA-exclusion on, MSSIM advisory, no diff-image alloc in the hot path — the options run_reftest passes to metric::compare (reftests.md § API). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(verify): reftest module skeleton + RefKind parser

3aacb18

RefKind{Match,Mismatch} and reftest_kind(&str) — the token parser the reftest! macro calls. reftests.md § Module & public API. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(verify): RefCase + RefOutcome reftest types

5960161

The pairing (name/kind/test/reference/fuzz) and its outcome (passed/diff/report_path). reftests.md § Module & public API. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(verify): pure evaluate_outcome pass-decision + truth table

bd6d969

Match passes within budget, Mismatch passes outside it (the silent-no-op guard). Pure CPU so it gates headless. reftests.md § Verification #1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

intendednull and others added 29 commits June 15, 2026 09:10

intendednull merged commit 62b000d into main Jun 16, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden visual-bug verification harness + usage docs (adversarial review + fault-injection)#68

Harden visual-bug verification harness + usage docs (adversarial review + fault-injection)#68
intendednull merged 71 commits into
mainfrom
worktree-visual-bug-detection-report

intendednull commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

intendednull commented Jun 16, 2026

What this is

Hardening — 7 real bugs + 1 maintainability trap (all fixed, TDD)

Fault-injection — does it actually catch bugs?

Usage docs for future contributors / LLMs

Merge of origin/main

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Merge of `origin/main`