Skip to content

Harden visual-bug verification harness + usage docs (adversarial review + fault-injection)#68

Merged
intendednull merged 71 commits into
mainfrom
worktree-visual-bug-detection-report
Jun 16, 2026
Merged

Harden visual-bug verification harness + usage docs (adversarial review + fault-injection)#68
intendednull merged 71 commits into
mainfrom
worktree-visual-bug-detection-report

Conversation

@intendednull

Copy link
Copy Markdown
Owner

What this is

Post-landing adversarial review + hardening of the buiy_verify visual-bug verification harness, a usage guide for it, and an empirical fault-injection proof that it detects real bugs. Also merges origin/main (text-editing E2–E6 + README) into the branch.

Hardening — 7 real bugs + 1 maintainability trap (all fixed, TDD)

A fresh-agent adversarial review (20-agent find → verify → synthesize) of the just-landed harness found genuine defects, each fixed red-test-first:

Sev Bug Fix
HIGH A blank/failed render (0×0) silently PASSED any golden — empty fast-path short-circuited before the dimension-mismatch check, on the exact path the golden gate uses 924ce89
HIGH GoldenKey dropped the forced_colors axis — fc0/fc1 cells collapsed onto one baseline, so an FC regression would pass silently once blessed 880a38a
MED Per-timestamp snapshots ran on a wall-clock virtual clock (non-deterministic for animated fixtures) 8992d1f
MED Same-Name sibling sort tiebroke on Entity::index() (spawn-order-dependent flaky snapshots) 8992d1f
MED Per-positive ledger budget was inert (gated with caller budget, never positive.budget) a68b655
LOW Saturated diff made a reftest Mismatch pass vacuously ebfbd24
maint Hardcoded assert_eq!(cells, 24) broke "zero edits to add a fixture" 85007b6

Plus reconciliation of the overstated docstrings the review flagged as the recurring theme (transform_roundtrips, scene.rs, the matrix_goldens vacuity message, invariants.md stability clause, reftests.md RefCase::multi deferral) and headless coverage for the determinism gate's first probe (87cd098).

Fault-injection — does it actually catch bugs?

Injected real one-line bugs into buiy_core production code and confirmed the gate goes RED (reverted each), documented in docs/reports/2026-06-15-verification-harness-adversarial-review.md:

  • Layout (+7px position) → RED via Tier-1 layout snapshot
  • Color/visual (kill red channel) → RED via Tier-2 display-list snapshot
  • Paint order (reverse z-sort) → RED via buiy_core's z_index_* tests

Two honest findings recorded as follow-ups: an R↔B swap was invisible on the button's symmetric colors (fixture-coverage note), and the Tier-3 invariant misses a production paint-order-assembly bug because realize re-implements sub-pass 6f instead of calling it (hardening follow-up).

Usage docs for future contributors / LLMs

  • .claude/skills/using-buiy-verification/SKILL.md — task-oriented how-to (tier selection, add a fixture, write each tier's test, the bless workflow, gotchas).
  • crates/buiy_verify/src/lib.rs — rewrote the stale crate doc into an accurate five-tier map.
  • CLAUDE.md — discovery pointer.

Doc accuracy was itself adversarially verified (a fact-check workflow caught a wrong fixture path + 3 overstatements, all corrected).

Merge of origin/main

Integrates text-editing E2–E6 + the README refresh. Conflicts (both additive) resolved in the buiy_core dev-deps and follow-ups.md. origin/main's 3 new #[ignore] GPU re-capture tests call the now-deprecated perceptual_diff; applied the same #![allow(deprecated)] interim policy the 4 sibling golden suites already use (full migration to metric::compare tracked in follow-ups.md).

Verification

Green on the merged tree: cargo fmt --check, clippy --workspace --all-targets -D warnings, cargo doc -D warnings, cargo test --workspace (175 result sections, 0 failed). The GPU --ignored lanes passed pre-merge on an RX 6700 XT (re-pathed forced_colors goldens match on the real adapter); the merge touched no verification GPU tests.

🤖 Generated with Claude Code

intendednull and others added 30 commits June 15, 2026 04:35
…-art, spec, plan

Strategy report (5-tier reftests-first pyramid) re-grounded on canonical main; 5 prior-art folders (wpt-reftests, vello, skia-gold, flutter-golden-testing, wgpu-testing); the buiy-verification-design multi-file spec realizing foundation gates #2/#5/#11/#12; and the phased TDD implementation plan. docs/README catalog wired.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 0.1 of the verification pyramid: the advisory MSSIM channel
(image-compare) and the tier-1/2 snapshot driver (insta, glob feature)
land in buiy_verify with exact patch pins. cargo deny check passes; any
new transitive license is added explicitly to deny.toml's allow list.
pixelmatch is NOT added here — Phase 1a vendors its algorithm.

No code consumes them yet — the metric/snapshot modules land in Phase 1/2.
insta pinned to =1.48.0 (latest 1.x patch at impl time, not the plan's
=1.43.2 placeholder, per the plan's 'pin the exact latest 1.x' directive).
Spec: docs/specs/2026-06-15-buiy-verification-design/metric.md § Crate choice.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 0.2 of the verification pyramid: the #[ignore] GPU re-capture tests
in tests/text_*_gpu.rs migrate (Phase 1a) off the deprecated L1
perceptual_diff onto buiy_verify::metric::compare, so buiy_core's tests need
to name buiy_verify. Added under [dev-dependencies] only — this forms a
DEV-ONLY cycle (core → verify → core) that Cargo permits because dev-dep
edges are excluded from the normal build graph. Confined to #[cfg(test)].

Spec: docs/specs/2026-06-15-buiy-verification-design/metric.md § Migration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 0.3 of the verification pyramid: Dpr is device-pixel-ratio as integer
milliscale (1000 = 1×, 2000 = 2×) so it is Eq+Hash+Ord — a fixture axis that
keys goldens/coverage cells, never a tolerance. Defined ONCE here; goldens
and coverage import it. from_f32/as_f32 round-trip the window's f32
scale_factor at the capture boundary; serde-derived for the bless ledger.

Added serde.workspace = true to buiy_core [dependencies]: the plan made this
conditional on 'if serde isn't already a direct dep'. Verified it was NOT
(buiy_core's src had no serde use and the manifest no serde line), and the
derive emits ::serde:: paths that bevy's re-export does not satisfy, so the
direct dep is required. Rides the workspace serde pin — no new crate.

Spec: docs/specs/2026-06-15-buiy-verification-design/determinism.md
      § Extending GoldenConfig.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 0.4 of the verification pyramid: the shared GPU capture seam moves out
of tests/support into render::golden src as
capture_to_image(&mut App, &GoldenConfig) -> image::RgbaImage, so
buiy_verify's reftest + golden tiers can call it. Sizes the offscreen target
to the window's physical pixel grid, paints under CAPTURE_MSAA (single-
sampled, dither off), and reads back into an RgbaImage. buiy_core gains
image as a direct dep (README § Crate-dependency note: the only new GPU
dep). #[ignore] GPU meta-test asserts physical dimensions + non-vacuous paint.

readback_rgba_into is promoted to pub alongside capture_to_image; the
tests/support readback_rgba now delegates to it so the readback poll + the
256-byte row-padding strip live in exactly one place (anti-drift). The dead
CapturedBytes resource + Readback/ReadbackComplete/Mutex imports drop from
tests/support as a result.

Phase-0 scope is the capture mechanics; the four-condition quiescence flush
and the scale_factor==dpr assertion are Phase 3.3's hardening.
Spec: docs/specs/2026-06-15-buiy-verification-design/determinism.md
      § Where the code lives.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Type shapes + empty-case compare stub, wired into lib.rs. Algorithm
lands next. Realizes metric.md § Types.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ports pixelmatch's luminance-weighted YIQ delta (verbatim constants)
and adds the raw L∞ max_channel_delta scan. Single-wrong-pixel is now
caught at N in {16,256,2048} — the §4 dilution regression. AA exclusion
and MSSIM follow.

The yiq_luminance_outweighs_chroma fixture is corrected from the plan's
[180,120,60]@0.05 (which does not separate luma from chroma — both
exceed max_delta=88) to an equal-L∞ pure-luma (+30 all) vs chroma-leaning
(+30R/-30B) pair @0.1, where the YIQ weighting (luma 455 vs chroma 244,
max_delta=352) is what separates them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A differing pixel that is AA in either image is excluded unless
include_aa. EXACT (0,0) now holds across residual AA jitter while still
catching an isolated real defect. Vendored verbatim from pixelmatch.

The aa_edge_pair fixture is corrected from the plan's hard-2-tone
diagonal step (which pixelmatch correctly never classifies as AA — a
pure black/white edge has no pixel with both a brighter and darker
sibling, so excluded would equal counted=16, not 0) to a genuine
antialiased vertical edge (black | gray AA column | white) whose gray
column jitters 128->180 between a and b — the canonical sub-LSB
re-rasterization the AA exclusion exists to tolerate (excluded=0,
counted=16).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two-axis gate (both bind); within() pins the fuzzy-if floor so an
unexpectedly-clean render reds. A dimension mismatch folds into a
saturated Diff that fails EVERY budget — the loud-red replacement for
the naive silent 1.0.

Adds a `saturated: bool` discriminator to Diff so passes() can honor
metric.md's "false for every budget, including a maximal (255, u32::MAX)"
contract: the pure two-axis formula would otherwise ACCEPT a saturated
diff under a maximal budget. The flag also keeps a saturated mismatch
categorically distinct from an in-bounds all-different frame (which a
wide budget may legitimately accept).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Diff::mssim from rgba_blended_hybrid_compare, Option (None when
disabled/errored — never silently 0.0). Proven non-gating: a 1-LSB wash
(0 differing pixels) still passes a budget admitting its 1-LSB L∞ delta
despite a sub-1 MSSIM.

The mssim_never_gates fixture is corrected from the plan's passes(&EXACT)
form (EXACT rejects the 1-LSB wash on the *channel* axis, so it cannot
isolate the MSSIM-non-gating property) to a budget that tolerates the
L∞ delta and 0 diff pixels, leaving MSSIM as the only possible gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pixelmatch palette: differing pixels red, AA pixels yellow. Off in the
hot reftest path; on for tier-5 golden triage HTML.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
metric.md § Verification: identity, scale-invariant single defect,
saturated dim-mismatch, and an exact-integer constants pin guarding the
vendored YIQ/AA numbers. (insta-snapshot upgrade deferred to Phase 2.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… to metric

metric.md § Migration step 1: the RMSE metric and DiffResult are gone;
tests/visual.rs and smoke.rs move onto metric::compare + Diff::passes
(in-memory fixtures replace baseline/tinted PNGs). One metric now.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
metric.md § Migration step 2: buiy_core cannot depend on buiy_verify in
its normal graph, so perceptual_diff carries a #[deprecated] gravestone
pointing at buiy_verify::metric::compare; its L1 body stays for the
unmigrated ignored GPU re-capture tests (Phase 3). Callers gain a
file-level allow(deprecated) until they migrate.

text_gpu.rs gains a TEMPORARY allow here (removed in 1a.10 when it
migrates) so this commit stays clippy -D warnings clean; the plan's
split leaves it warning otherwise.

The deprecation note avoids literal #[ignore] brackets — rustdoc parses
[ignore] as an intra-doc link and fails the -D warnings doc gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pare

The #[ignore] GPU re-capture tests reach the unified metric over the
dev-only buiy_core -> buiy_verify edge (landed Phase 0.2). Stable
re-capture sites -> passes(&EXACT) via assert_stable; the must-differ
anti-tests (:152, :271) -> !passes(&EXACT) via assert_differs. The
TEMPORARY allow(deprecated) added in 1a.9 is removed (the file no longer
names perceptual_diff). Verified on the RX 6700 XT GPU lane: all 6
#[ignore] tests pass, the stable sites bit-exact at EXACT (0,0) — the
old < 1e-4 tolerance was not masking drift. The stored-baseline sites in
the other text_*_gpu.rs files stay on deprecated perceptual_diff until
Phase 3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The MSSIM/threshold doc comments wrote the range as a bare [0,1], which
rustdoc parses as an intra-doc link and fails the RUSTDOCFLAGS="-D warnings"
doc gate (unresolved link to `0,1`). Wrapped in backticks so it renders
as code, not a link. No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
AA-exclusion on, MSSIM advisory, no diff-image alloc in the hot path —
the options run_reftest passes to metric::compare (reftests.md § API).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
RefKind{Match,Mismatch} and reftest_kind(&str) — the token parser the
reftest! macro calls. reftests.md § Module & public API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The pairing (name/kind/test/reference/fuzz) and its outcome
(passed/diff/report_path). reftests.md § Module & public API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Match passes within budget, Mismatch passes outside it (the silent-no-op
guard). Pure CPU so it gates headless. reftests.md § Verification #1.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
run_reftest captures test+reference in ONE app via capture_to_image
(re-target + re-readback) and diffs with metric::compare; the painting-app
builder is promoted from tests/support into render::golden::capture_app so
buiy_verify builds its app from src (the test-support gpu_render_app*
builders now delegate to the single src body — anti-drift). GPU known-good/
known-bad pairs prove the harness can both pass and fail (vacuous-green
guard). reftests.md §§ API, Verification #3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A != that tolerates difference is vacuous — mismatch_floor_ok gates it
pure-CPU and run_reftest asserts it as a belt (replacing the 1b.5 inline
stub). reftests.md § Verification #2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
reftest!(kind, fn_ident, test, reference[, fuzz=(d,p)]) emits one
#[test] #[ignore] per pairing; a non-(0,0) floor on a mismatch fails to
COMPILE via a const assert. reftests.md § 'The reftest! macro'.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
assert_reference_independent builds the reference into a no-GPU App and
rejects any forbidden marker (ContentVisibility/ContainerQuery/TopLayer/
Translate). Value-encoded features fall to human review (documented). The
lint is itself RED/GREEN-tested. reftests.md §§ Reference independence,
Verification #4.

Two deviations forced by the live API (both keep the lint structural):
- TopLayer is a FIELD on the Stacking component, not a component of its
  own, so the marker queries Stacking and checks top_layer != None —
  structurally equivalent to the Containment/content_visibility routing.
- Style is a Bundle that already supplies Containment + Stacking; the
  self-test sets content_visibility via Style::containment() (spawning a
  second Containment alongside is a duplicate-component panic, not a lint
  trip), and the markers check the FIELD VALUE so a default-Visible
  Containment on a disjoint reference does not trip the lint.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Promotes the CPU SDF port from scalar probes to a full-tile rasterizer
mirroring shader.wgsl:60/:76-:79 (sdf_rounded_rect + fwidth->smoothstep).
Pinned to the render_instance.rs point-probes. reftests.md §§ CPU-vs-GPU
cross-check, Verification #5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Renders one rounded-rect on the GPU and via the CPU oracle, diffs within a
measured AA fuzz budget. Zero stored bytes; kept permanently (one shared
analytic SDF). reftests.md § CPU-vs-GPU SDF cross-check.

Two corrections forced by root-causing a 60%-of-frame divergence to green:
- The corner radius for a Background fill is carried on Border.radius
  (Corners::all(Radius::circular(..))) — the component draw_for_node reads
  (render/mod.rs:373) — NOT a bare Radius component (which the fill path
  ignores). spawn_single_primitive now uses a zero-width Border.
- The CPU oracle must match the full CAPTURE chain, not just the fragment
  shader: the capture camera clears to OPAQUE BLACK and the pipeline blends
  linear-space SrcOver into an Rgba8UnormSrgb target. The oracle now
  composites coverage over opaque black in linear space then sRGB-encodes,
  so interior + exterior agree and only the ~1px AA rim differs (measured
  87/24000 px on RX 6700 XT; budget bounds it at 200). The 1b.10 oracle
  point-probe test moves to the same capture-matched convention (filled =
  opaque white, empty = opaque black) — same geometry, composited.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
flex justify-content: SpaceBetween == three literal-offset boxes (reference
routes through the primitive/literal-Node layer, NOT flex — independence by
construction); content-visibility: hidden != the visible subtree (the !=
anti-test). The cv reference's independence is asserted pure-CPU. Both pass
on the RX 6700 XT at the default (0,0) fuzz. reftests.md § Authoring patterns.

Adaptations to the live API:
- content-visibility set via Style::containment() (Style is a Bundle that
  already supplies Containment — a second one is a duplicate-component panic).
- the independence lint builds the reference under ThemePlugin + LayoutPlugin
  (no GPU) so theme-token-installing scenes build; the lint still reads only
  component DATA, no render systems run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ons)

Tier 1-2 structured-snapshot module (snapshots.md). Task 2.1 lands the
shared dump primitives both tiers consume:

- `round(f32) -> String` — round to ROUND_DP=2 decimals, strip trailing
  zeros + bare trailing dot, normalize -0 to "0". Kills last-ULP churn
  from the Taffy / clip-space math while staying diff-readable.
- `LAYOUT_DUMP_VERSION` / `DISPLAY_LIST_DUMP_VERSION` — format-version
  headers so a formatter change is one conscious, visible diff line.

The `#[track_caller]` insta bridge (`assert_named_snapshot`) writes each
`.snap` beside the CALLING test file via `Location::caller()` +
`prepend_module_to_snapshot(false)`, so the dump helpers can live in
buiy_verify while their `.snap`s live next to the buiy_core tests that
call them.

`bytemuck.workspace = true` added to buiy_verify (already a workspace
dep used by buiy_core for the PackedInstance POD layout; the Tier-2 hex
check needs bytes_of / pod_read_unaligned). No new supply-chain crate,
no new cargo-deny surface.

Deviation: snapshots.md § Verification #2's `round(1.005) == "1.0"`
vector is self-inconsistent with `round(50.0) == "50"` (1.005_f32 is
1.00499…, formats to "1.00" — same .00 suffix as 50.0's "50.00", so one
trailing-zero rule cannot strip one to "1.0" and the other to "50"). The
self-consistent rule strips all trailing zeros; `round(1.005) == "1"`
preserves the vector's intent (1.005 rounds DOWN to 1.00, never up).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`layout_dump(world)` emits one `(name, pos, size)` line per ResolvedLayout
entity, indented by ChildOf depth, siblings ordered by Name then Entity
index — the Name-key is what makes the dump invariant to ECS spawn /
archetype order (proved by the entity-order-invariant self-test). Floats
via the shared `round`; unnamed entities fall back to `entity#<index>`;
version-headered.

`assert_layout_snapshot(app, name)` runs one update() then snapshots the
dump via the #[track_caller] insta bridge, so the `.snap` lands beside the
CALLING test (verified: buiy_core's flex_row_basic.snap landed under
crates/buiy_core/tests/snapshots/, not buiy_verify's tree).

Self-tests (plain assert_eq!, non-vacuous): entity-order invariance,
version-header tripwire, unnamed-fallback.

Migration (layout.rs:33): the child-only `(size - 50).abs() < 0.5` pair
becomes one `assert_layout_snapshot(&mut app, "flex_row_basic")` over a
Name-tagged root + TWO 50x50 children — the snapshot pins every box's
position+size (strictly more than the old tolerance assert) and exercises
sibling ordering. The two layout_tree_garbage_collects_* tests STAY plain
assert_eq! (LayoutTree cardinality, not geometry — a length snapshot is
lower-density).

Robustness: collect_layout_entries / NameLookup::from_world /
extract_nodes_from_world look up Name/ChildOf/Background per-entity via
world.get and tolerate try_query returning None for an unregistered
component (a fixture that tags none) — fixes a panic on a nameless,
childless fixture.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`instance_hex(p)` hex-dumps `bytemuck::bytes_of(&PackedInstance)` (52 B →
104 hex chars) — a byte-exact, format-version-free snapshot of the GPU
upload payload, the complement to the diff-readable Display dump: a
packing arithmetic change flips the hex even when the rounded dump rounds
it away.

`NameLookup` (entity→name, World-built once) keeps the display-list dump
World-free.

Self-tests (plain assert_eq!, non-vacuous):
- hex_round_trips_bytes: hex → parse → pod_read_unaligned reconstructs the
  exact instance bytes (lossless, matches the GPU payload).
- hex_flips_on_a_packing_change: a negated height (the half-size sign bug
  render_instance.rs regression-tests) flips the hex — proves teeth.

Endianness: bytes_of is host-endian; CI + dev are little-endian x86-64
and the hex is a within-repo regression artifact, documented in the fn.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
intendednull and others added 29 commits June 15, 2026 09:10
…sensitivity

Phase 3.5 of the verification pyramid (determinism.md § Verification #1/#2). All
#[ignore], GPU lane — the headless gate stays green without them.

#1 idempotent capture (the headline proof): the SAME scene captured TWICE
through two fresh DeterministicApps is byte-identical — compare(a, b, default)
.passes(FuzzBudget::EXACT) at (0, 0). Covers a rounded-rect fixture AND an
Ahem-text fixture (the box-font collapse holds frame-to-frame). Verified on the
RX 6700 XT: both pass at (0,0).

The brief's second verification: ahem_text_is_font_availability_invariant —
the same Ahem text scene captured with vs without an extra host-style family
registered is byte-identical, because the fixture names only "Ahem" and that is
the sole resolvable family. Proves host-font-independence at the pixel level.

#2 knob sensitivity (negatives): knob_sensitivity_dpr (1× vs 2× differ — a
different physical grid), knob_sensitivity_font_mode (Real vs Ahem of the same
text differ — outlines vs em-boxes). Each flip changes the bytes ⇒ the knobs
are load-bearing.

FINDING — MSAA is inert for this pipeline, by design. The test that asserted a
4× MSAA capture *differs* from the single-sampled one FAILED with 0 differing
pixels: Buiy antialiases the SDF analytically in-shader and paints axis-aligned
pixel-covering quads, so a hardware MSAA resolve is identity. That is exactly
determinism.md's rationale ("in-shader analytic AA … MSAA buys nothing here").
The test is reframed (msaa_is_inert_for_the_in_shader_aa_pipeline) to assert
the verified truth — a 4× capture is byte-identical to CAPTURE_MSAA — which is
WHY pinning MSAA off is free. No nondeterminism source; an honest reframe.
Spec: docs/specs/2026-06-15-buiy-verification-design/determinism.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… landed

Phase 3.10 of the verification pyramid (determinism.md § CI software-rasterizer
pin). A CONFIG/DOC deliverable — lavapipe is not installed locally, so this is
validated on the real RX 6700 XT here; the lavapipe leg is the CI stored-baseline
gate.

- .github/actions/install-mesa/action.yml: a composite action that consumes
  gfx-rs/ci-build's prebuilt, VERSION-PINNED lavapipe tarball (no self-build;
  MESA_VERSION + ci-build tag pinned), writes its OWN ICD JSON (the upstream
  path is build-host-absolute), and exports the adapter-selection env contract:
  VK_DRIVER_FILES (the modern variable, NOT the deprecated VK_ICD_FILENAMES —
  deviation #2) + WGPU_ADAPTER_NAME=llvmpipe. LP_NUM_THREADS is deliberately
  NOT set (deviation #1 — determinism comes from the pinned Mesa version, not
  thread count).
- .github/workflows/ci.yml: a new `gpu` job invoking the action, a one-line
  llvmpipe-adapter smoke guard (determinism.md § Verification #5 — the pin is
  active, not silently falling back to hardware), then the #[ignore] GPU lane
  serialized at --test-threads=1. Additive: the headless `test` job stays green
  with no adapter.

Also records a "Landed" section in determinism.md (tasks 3.1-3.5, 3.10) and
corrects Verification #2's MSAA claim to the VERIFIED finding: 4× MSAA is
byte-identical to CAPTURE_MSAA for Buiy's in-shader analytic-AA pipeline, which
confirms (not contradicts) the MSAA-pin rationale. Tier-5 golden corpus
(3.6-3.9) remains future work; status stays draft until the 4.7 flip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 3.6 (verification-design goldens.md). Adds the `buiy_verify::golden`
module: the `GoldenKey` trace identity (widget × state × theme × viewport ×
backend × dpr) with a deterministic lower-kebab slug + `from_slug` inverse, the
`Backend` enum, and the human-diffable `BlessLedger`/`Positive` TOML accept
record. The key schema is fixed before any golden is generated — a Skia-Gold
lesson, since adding a field later re-baselines the whole corpus.

The module scaffolds all three submodules (`check`, `ledger`, `report`) so the
`pub use` re-exports resolve and the crate compiles; 3.7/3.8/3.9 land the
per-area test coverage and the GPU round-trip over this same code.

`FuzzBudget` gains serde derives so `Positive.budget` persists a per-fixture
widened budget directly. New workspace deps `toml = "0.8"` (ledger) and
`base64 = "0.22"` (HTML report PNG inlining) — both MIT/Apache-2.0, cleared by
`cargo deny check` before the add.

RED→GREEN: `golden_keys.rs` proptest pins `slug()`→`from_slug` round-trip and
no-collision over canonical keys (goldens.md § Verification #6), plus
deterministic/lower-kebab/dir/ledger-TOML unit tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 3.7 (verification-design goldens.md § Verification #1#4). Lands the
test coverage + the env-decoupling refactor over the `golden::check` code from
3.6.

`check_golden` compares `actual` against the stored multi-positive baseline
set and passes if ANY positive clears the budget (Skia-Gold "many positives per
config"); on a miss it carries the closest (smallest-Diff) candidate.
`assert_golden` is the fail-closed panicking wrapper — empty/non-matching
corpus panics with the bless instruction (the BUIY_ACCEPT_SHAPING shape); under
BUIY_BLESS=1 it blesses instead, writing the PNG + recording commit/timestamp/
budget/reason in the human-diffable ledger (never a silent overwrite).

Refactor: the bless decision is resolved into an explicit `BlessMode` at the
single public env-read site, so `check_golden_in`/`assert_golden_in` drive
bless/assert against a temp corpus with no process-global `BUIY_BLESS` race —
the seam the harness self-tests and the Phase-4 coverage matrix driver consume.

RED→GREEN (golden_persistence.rs, pure-CPU, synthetic images): match/mismatch,
multi-positive any-matches (second positive ⇒ matched_positive: 1), bless
round-trip (re-check passes + ledger provenance), bless-replace-in-place,
fail-closed panic on empty corpus, and the structured missing⇒Fail{best:None}.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 3.8 (verification-design goldens.md § Verification #5). Lands the test
coverage for the `golden::report` TriageReport/TriageCard built in 3.6.

The report base64-inlines the actual / closest-baseline / diff-heatmap PNGs
into one HTML file with three views per card — side-by-side, a pure-JS
opacity-slider overlay, and the diff heatmap — so it opens straight from a CI
artifact with no network and no external asset (offline-first, no SaaS).

RED→GREEN (golden_report.rs, pure-CPU): assert every `src=` is a data URI, no
http(s)/relative/`<script src>` reference, the three views + slug label + JS
slider are present, write() emits the same self-contained file to disk, and
multiple cards accumulate with unique overlay ids.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 3.9 (verification-design goldens.md § Verification #7). Lands the GPU
`#[ignore]` golden lane (tests/goldens.rs) over the persistence machinery from
3.6–3.8, plus the first blessed in-git corpus.

`golden_round_trip_on_real_adapter` is the self-verifying machinery proof
(needs no committed PNG): on the RX 6700 XT it captures a deterministic
rounded-rect, blesses it to a temp corpus, re-captures + asserts it passes at
FuzzBudget::EXACT (the determinism pin makes re-capture bit-identical), then
asserts a deliberately-tampered image FAILS and emits a diff-PNG heatmap + a
self-contained HTML triage report carrying the expected sections (slug,
base64-inlined PNGs, diff-heatmap view, overlay slider, no external URL). The
full bless→pass→fail→report cycle, verified end to end on real hardware.

The committed residue goldens assert against the in-git corpus:
- `golden_ahem_layout_class` double-asserts the box-font collapse — two fresh
  Ahem captures are byte-identical AND equal to the stored positive.
- `golden_sdf_corner` pins the irreducible SDF corner AA rim.

Blessed corpus (reviewed: each PNG decoded + eyeballed as the intended scene):
- rect-rounded: a blue rounded fill on black, 5 distinct colors incl. AA rim
  pixels (genuine SDF corner residue, not a hard rectangle).
- text-ahem: two solid orange em-boxes ("Hi" under the Ahem box-font).
Each ledger records the bless commit + RFC3339 timestamp + (0,0) budget +
reason; PNGs total 44K (well under the 50MB object-store migration trigger).

`.gitattributes` pins `crates/buiy_verify/tests/goldens/**/*.png binary` so the
nested per-key corpus is never eol-converted (mirrors the *.snap pin; the `**`
glob crosses the per-key dirs). Deferred (harness-ready, renderer-blocked): the
drop-shadow-kernel golden (no BoxShadow extract/draw path yet) and the
color-emoji fidelity golden (pinned bundled emoji font).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ll (Phase 4.1-4.3, 4.5)

Coverage-by-construction substrate (coverage.md): a Fixture corpus crossed with
a global Matrix Cartesian product, so adding one fixture auto-enrolls it across
every tier with no test-file edit.

- coverage/fixture.rs: Fixture{name,state,spawn}, the `fixture!` macro emitting
  an inventory::submit!, catalog()/sorted_catalog() over the inventory registry.
- coverage/matrix.rs: Matrix{themes,viewports,forced_colors,dprs}, ThemeAxis,
  Viewport, Cell, ci_default (2×3×2×2 = 24 cells/fixture), cells() Cartesian
  product in stable axis-declaration order, CELL_CEILING_PER_FIXTURE budget.
- coverage/key.rs: CoverageKey (Cell × Fixture) deriving Eq+Hash because dpr is
  the canonical milliscale Dpr, not f32 — so keys (not just stems) collect into
  a HashSet. stem()/from_stem() round-trip losslessly.
- coverage/enroll.rs: build_app (CPU-only deterministic app: cell theme
  installed, synthetic PrimaryWindow sized to viewport×dpr, forced_colors on
  UserPreferences, fixture spawned) + enroll_all over catalog×cells.
- Added Backend::Cpu to the golden Backend enum so CPU (Tiers 1-3) and GPU
  golden cells key off one enum (coverage.md §146).
- fixtures/button/resting.rs: the live Button::new bundle as the catalog row,
  with forced-colors-safe system-color paint inserted (the default Button uses a
  brand token, NOT yet forced-colors-safe — a buiy-widget-catalog-design concern).
- New dep: inventory 0.3 (MIT/Apache-2.0, already in lockfile; deny-clean) and a
  path edge on buiy_widgets (acyclic).

Self-tests (coverage_meta.rs) green: verify_catalog_matches_glob,
verify_keys_unique, verify_cell_count_under_ceiling, enrollment_fan_out,
build_app_pins_viewport_theme_and_dpr.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (Phase 4.4, 4.6)

Compose the matrix across all tiers + close gate #11's live-catalog half.

Enrollment drivers (Task 4.4) — each a thin enroll_all caller, no per-widget
test code; adding a fixture enrolls it into every tier with zero edits:
- coverage_layout.rs (Tier 1, gate #5): assert_layout_snapshot per cell, plus a
  baseline-free structural guard (version header + names the widget root).
- coverage_display_list.rs (Tier 2): display-list dump per cell at t=0.
- coverage_invariants.rs (Tier 3, gate #12): finite + non-negative-extent
  predicates on the realized live scene per cell.
- coverage_golden.rs (Tier 5, #[ignore] GPU): captures each cell on the real
  adapter, keyed by a GoldenKey derived from the CoverageKey. No PNGs committed
  (blessed on a GPU host).
- 48 CPU-deterministic .snap baselines committed (24 layout + 24 display-list).

Forced-colors live wiring (Task 4.6) — gate #11 over the LIVE catalog:
- coverage/forced_colors.rs: live_catalog_paint()/paint_for_fixtures() derive
  CatalogPaint from the spawned Background/Border/Outline off each fixture's
  Name-tagged root; the analyzers run UNCHANGED, only the input source moves
  from hand-built descriptors to the live tree (closes follow-ups.md:462-473).
- coverage_forced_colors.rs: live_catalog_has_no_forced_colors_violations (the
  production scan), broken_fixture_produces_violation (teeth — a brand-token
  fixture MUST violate, proving the producer reads real paint),
  safe_fixture_produces_no_violation (non-vacuous companion), and an #[ignore]'d
  boxshadow_visual_reftest_is_blocked placeholder documenting the BLOCKED
  BoxShadow draw-skip dependency (follow-ups.md:474-478 — NOT faked green).

Auto-enroll-by-construction proof (Task 4.5 extension): enroll_fixtures seam +
adding_one_fixture_grows_corpus_by_axes asserts adding one fixture grows the
corpus by exactly |axes| (24) cells.

Boundary documented: Buiy's wholesale forced-colors theme swap means no token
resolves in both light and forced themes, so the system-color-safe button
renders the magenta sentinel under the light theme — recorded faithfully in the
*.light.* display-list baselines (the forced-colors-safe default widget is a
buiy-widget-catalog-design concern).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rustdoc

RUSTDOCFLAGS="-D warnings" cargo doc flagged a redundant-explicit-link in the
golden Backend::Cpu doc and required explicit cross-crate / cross-module paths
for the coverage module's intra-doc references (the `fixture!` macro, the
buiy_core forced-colors analyzers, and the Matrix/CoverageKey/Fixture/ThemeAxis
links from modules that do not `use` those types). Doc-only; no behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ctive

Closes the verification campaign's docs flip. Reconciles every
buiy-verification-design child file against the actually-landed code and
flips the spec draft -> active/landed:

- metric.md: record that pixelmatch is VENDORED (the crate is unusable —
  PNG-stream API, private primitives, image-0.24-bound), not a dep; the
  Diff.saturated field; compare is infallible; corrected crate-choice table.
- snapshots.md: display_list_dump renders color as resolved #rrggbbaa hex
  (ExtractedNode.color is post-theme; no token rendering), NameLookup::from_pairs,
  the trailing-zero-stripping round() rule, assert_display_list_snapshot_at;
  top-layer asserts live in render_paint_order.rs.
- invariants.md: top_layer_paint_rank promotion (systems.rs:3816),
  cosmic_text::Cursor (not a Buiy struct), module = invariant.rs + invariant/.
- reftests.md: two-captures-in-one-App without a capture_scene shape;
  TopLayer-via-Stacking marker; Style-as-Bundle; the SDF cross-check root-cause
  (Border.radius is the consumed radius; linear-blend-over-opaque-black +
  sRGB-encode capture chain); the value-encoded independence caveat.
- determinism.md: status landed; PendingCaptureAssets, VK_DRIVER_FILES,
  MSAA-inert all as-landed; Ahem real font shipped.
- goldens.md: Backend::Cpu added; __-separated slugs; BlessMode + *_in hermetic
  variants; corpus started (rect-rounded, text-ahem); honest GPU-lane state.
- coverage.md: Matrix/enroll_all/CoverageKey final shapes + the live
  forced-colors wiring (with broken-fixture teeth) + the BLOCKED BoxShadow
  visual reftest + the wholesale-swap magenta-sentinel deviation.

Also: README spec entry + reading-order draft->active/landed; foundation
verification.md gates #2/#5/#11/#12 get realization notes (definitions
unchanged); plan status -> landed with a per-phase table; follow-ups.md marks
the stored-PNG golden machinery / metric / determinism / layout snapshots /
proptest invariants / forced-colors live wiring as DONE and records the
DEFERRED set (shadow-kernel/color-emoji goldens, BoxShadow forced-colors visual
reftest, multi-reference aggregation, golden-prune bin, object-store migration,
and the matrix_goldens bless-on-demand GPU-lane gap).

Docs-only; no source touched. Headless gate (fmt/clippy/doc/test) + cargo deny
green; GPU lane green except the pre-existing coverage_golden::matrix_goldens
fail-closed (button corpus un-blessed since a73de05), documented as deferred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
coverage_golden::matrix_goldens asserted a golden for every cell of ci_default over every catalog fixture, but Tier-5 goldens are the minimal rasterization residue (goldens.md § Storage) and only the rect-rounded/text-ahem classes are blessed — so it fail-closed on the un-blessed button cells, reddening the GPU lane (CLAUDE.md: the GPU lane must pass on a GPU host). Add golden::committed_positives(key); make matrix_goldens bless-on-demand (no committed baseline ⇒ pending/skipped; a blessed cell still must match on fresh capture via assert_golden). BUIY_BLESS=1 still spans the full matrix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ent-pass (HIGH)

The empty-image fast path short-circuited BEFORE the dimension-mismatch check,
so compare(0×0, real) returned a non-saturated zero-Diff and passed every
budget. The golden gate (golden/check.rs) feeds the live capture as arg `a`, so
a render that emitted a 0×0 image was silently accepted against any stored
golden — defeating the saturated sentinel on the exact visual-regression path.

Reorder: dimension-mismatch check first, then the (now equal-dim ⇒ both-empty)
fast-path. Adds an asymmetric both-orders regression test; the prior empty/
mismatch tests only covered equal-dim and empty-vs-empty.

Found by the fresh-agent quality review of the verification campaign.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (HIGH)

Matrix::ci_default crosses forced_colors (24 cells) and CoverageKey encodes
fc0/fc1, but GoldenKey had NO forced_colors field, so the GPU-tier golden_key()
mapping collapsed fc=false and fc=true onto one baseline. The two cells produce
DIFFERENT captures (the BoxShadow draw-skip reads UserPreferences::forced_colors),
so once blessed a forced-colors visual regression would silently pass against the
other mode's baseline — the exact hole gate #11 exists to close.

- GoldenKey gains `forced_colors: bool`; slug/from_slug carry an fc0/fc1 token
  (schema is now widget/state/theme__viewport__fc__backend__dpr).
- golden_key() threads cov.forced_colors through; new headless regression test
  golden_key_is_injective_over_the_matrix asserts no two cells share a slug.
- Re-path the 2 committed residue goldens (rect-rounded, text-ahem) to the fc0
  slug; PNG bytes unchanged (captured at default fc=false), ledgers gain
  forced_colors = false.
- Reconcile goldens.md (struct + slug schema + Backend::Cpu drift).

No button golden is committed yet, so fixing the key schema now costs zero
re-baseline. Found by the fresh-agent quality review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two determinism holes the fresh-agent review reproduced — both bite exactly the
patterns a scaling app introduces:

1. Per-timestamp display-list snapshots ran on a WALL-CLOCK virtual clock.
   build_app adds TimePlugin but never pinned TimeUpdateStrategy, so each
   app.update() advanced Time<Virtual> by the wall-clock delta — the captured
   frame's logical time was t + accumulated-wall-clock, non-reproducible (and
   advance_virtual_to's checked_sub silently underflowed to ZERO once drift
   exceeded a step). assert_display_list_snapshot_at now pins
   ManualDuration(ZERO) so advance_virtual_to is the SOLE clock driver.
   Regression: wall_clock_does_not_leak_into_the_per_timestamp_clock (phase (a)
   proves the leak is real, so the test isn't a tautology).

2. Same-Name sibling sort tiebroke on Entity::index() (spawn-order dependent),
   so list rows all Name::new("row") dumped in spawn order — a flaky snapshot,
   the worst failure mode for a verification harness. Both the Tier-1 layout
   sort and the Tier-2 display-list extract now tiebreak by CONTENT (position
   then size via f32::total_cmp); genuinely-indistinguishable siblings
   (same name+box) fail loudly rather than emit a flaky dump.
   Regression: dump_is_invariant_for_same_name_siblings (the existing
   determinism test used UNIQUE names, so it never hit this).

Found by the fresh-agent quality review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…DIUM)

Positive.budget is written by bless(), persisted to TOML, round-trip-tested and
documented (ledger.rs: "the budget this positive is asserted against") as the
per-fixture widened budget a baseline is matched under — but check_golden gated
with the caller's FuzzBudget parameter and never read positive.budget. The
documented per-fixture widened-budget workflow was therefore inert: an SDF/shadow
positive blessed with a widened tolerance would be re-checked at the caller's
(often EXACT) budget and spuriously fail.

- check_golden_in now gates positive i against ledger.positives[i].budget; the
  caller's check-time budget remains the budget recorded when blessing a NEW
  positive.
- The failure triage card reports the closest positive's own budget (which bar
  was missed), not the caller's.
- Regression: positive_is_gated_by_its_own_recorded_widened_budget.

Latent today (both committed ledgers store (0,0)=EXACT, equal to the caller
budget). Found by the fresh-agent quality review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
evaluate_outcome returned !diff.passes(fuzz) for a Mismatch, so a saturated diff
(dimension mismatch — a structural capture error) made the Mismatch pass
vacuously: !false == true. A broken capture must FAIL both kinds, never be
mistaken for a legitimate render difference. Early-return false when saturated.

Latent today (run_reftest captures both images from one app at a fixed shared
viewport, so the dimension-mismatch branch never fires), but a real invariant
gap. Regression: saturated_diff_fails_both_kinds. Found by the fresh-agent review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…24 (maintainability)

The Tier-1/Tier-3 enrollment tests asserted cells.get() == 24, which a SECOND
fixture would redden — breaking the central 'zero test edits to add a fixture'
guarantee the coverage-by-construction design exists to provide. Derive the
expected count from sorted_catalog().len() * cells_per_fixture(); the literal 24
stays pinned in exactly one place (matrix.rs's cells_per_fixture unit test).

Found by the fresh-agent quality review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ss test

The fresh-agent review's headline quality finding was 'docstrings oversell what
the tests guarantee'. Reconcile each (doc-as-deliverable):

- transform_roundtrips: was 'a transposed factor reds this' — it is blind to
  INTER-factor order (each relation uses one non-identity factor); note the scope
  and point at buiy_core's compose-order unit tests that DO pin it.
- scene.rs: was 'can never diverge from what the engine paints' — bound it to the
  generated domain and record the PositionKind (tier-2 positioned/auto-z)
  generator-coverage gap.
- matrix_goldens: the vacuity guard's message read like a non-vacuousness check;
  make the 0-compared case loud (green != covered) and annotate the guard.
- invariants.md: the paint-order stability clause is inexpressible at the
  predicate's input boundary (the stable sort already ran); record the waiver.
- reftests.md: mark RefCase::multi OR/AND aggregation DEFERRED (already in
  follow-ups; single-reference covers current pairings).

Also: give the determinism gate's first probe headless teeth — a unit test that
quiescence_unmet blocks on an unloaded required asset (condition 1), so a
vacuous-check regression there fails without a GPU. follow-ups.md gains the
PositionKind, quiescence conditions-2-4, and CPU-SDF-oracle-numeric-pin gaps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
One-shot report capturing the fresh-agent cold-context review of the landed
buiy_verify harness: 7 confirmed bugs (2 high) + 1 maintainability trap found and
fixed TDD, the doc-overstatement theme reconciled, 3 coverage gaps deferred.
Indexed in docs/README.md under Reports.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Injected real one-line bugs into buiy_core PRODUCTION code and confirmed the gate
goes RED (reverted each):
- layout +7px position    -> RED via Tier-1 layout snapshot
- color red-channel kill   -> RED via Tier-2 display-list snapshot
- paint-order reversal     -> RED via buiy_core z_index_* tests (NOT the new
                              Tier-3 invariant)

Two honest findings recorded:
- A color R<->B swap was initially missed because the button fixture's colors
  (white, magenta sentinel) are symmetric under R<->B; an asymmetric kill was
  caught. Fixture-coverage note, not a harness defect.
- A production paint-order bug is invisible to the Tier-3 invariant because
  scene.rs::realize re-implements the painters_z assembly (sub-pass 6f) instead
  of calling it. Caught by buiy_core's z_index_* tests + the GPU golden tier.
  Added a hardening follow-up (make realize CALL the production assembly).

Documents the fault-injection pass in the adversarial-review report + follow-ups.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Future LLMs (and humans) had no USAGE guide for the verification harness — only
the design spec (target state) and the build plan (historical). The crate root
doc was itself stale ("Phase 0 ships the perceptual metric..."). Add:

- .claude/skills/using-buiy-verification/SKILL.md — the task-oriented how-to:
  tier-selection rule, add a fixture (+ the #[path] mod wiring step), write each
  tier's test, the reftest! / fixture! macro syntax, the BUIY_BLESS golden
  workflow, the headless vs GPU --ignored gates, and the gotchas that each cost
  a real bug (same-Name siblings, asymmetric fixture colors, forced_colors key
  axis, the realize-mirror paint-order blind spot, saturated-diff loud-fail).
- crates/buiy_verify/src/lib.rs — rewrite the stale crate doc into an accurate
  five-tier map with entry-point intra-doc links (cargo doc -D warnings clean)
  pointing at the skill + spec + report.
- CLAUDE.md — a Code Conventions pointer so it's discoverable every session.

Accuracy adversarially verified by a fact-check workflow against the code (4
agents + synthesis): it caught a wrong fixture path (tests/fixtures -> fixtures)
and three overstatements (harness-"enforced" Camera2d/Name -> contract; "zero
central-list edits" -> one #[path] mod line; the GPU-golden paint-order catch is
potential, not current) — all corrected before commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… branch

Integrates the text-editing campaign (E2–E6: input/keymap, caret/selection,
clipboard/undo, IME, lifecycle) + the README refresh that landed on main while
the visual-bug verification work was in flight.

Conflicts resolved (both additive):
- crates/buiy_core/Cargo.toml [dev-dependencies]: keep BOTH proptest (theirs)
  and the buiy_verify dev-only cycle edge (ours).
- docs/plans/follow-ups.md: keep our verification follow-up entries + their
  text-editing follow-up sections; merge the closing Owner/Spec-touchpoint.

Integration fix: origin/main's three new #[ignore] GPU re-capture tests
(text_caret_selection_e3_gpu, text_placeholder_gpu, text_ime_preedit_gpu) call
buiy_core::perceptual_diff, which this branch DEPRECATED — so clippy
--all-targets -D warnings would reject them. Applied the same #![allow(deprecated)]
interim policy the four sibling GPU golden suites already use (migration to
buiy_verify::metric::compare tracked in follow-ups.md).

Gate on the merged tree: clippy --workspace --all-targets -D warnings clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DeterministicApp::build() instantiates the capture render stack
(capture_app_scaled → RenderPlugin), which REQUIRES a wgpu adapter. Three
determinism_build.rs tests called build() but were NOT #[ignore], despite the
file claiming HEADLESS — so they ran in the every-PR gate. They passed anywhere
with an adapter (local GPU, macOS/Windows CI) but panicked 'Unable to find a
GPU!' on adapter-less Linux CI, the gate that must stay green without one. My
local GPU masked this; CI ubuntu caught it.

Split the file: config-level knob checks (default/override DPR, font mode, the
MSAA constant) stay HEADLESS via .config() (no build()); the built-app
observables (window scale_factor, the manual TimeUpdateStrategy) move to
#[ignore] GPU-lane tests. Verified: headless 4 passed/3 ignored (no adapter
touched), --ignored 3 passed on RX 6700 XT.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The install-mesa action defaulted to ci-build-tag build19 + mesa-version 24.3.4,
but that release pairing does not exist: gfx-rs/ci-build build19 carries
mesa-24.2.3; mesa-24.3.4 ships under build20. So the GPU (pinned lavapipe) CI
lane 404'd at the Mesa download and has never actually run.

Fix the tag to build20 (keeping mesa-24.3.4 so lavapipe pixel output — what the
stored goldens are blessed against — does not shift). Verified the corrected URL
returns HTTP 200, the broken one 404. Documented the tag↔version pairing gotcha.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The GPU (pinned lavapipe) lane now installs Mesa successfully (prior commit), so
for the first time it reaches `cargo test`. The release `wgpu-info` build + the
large bevy #[ignore] GPU test binaries + the pinned Mesa exhaust the ~14 GB
ubuntu-runner disk: 'No space left on device'. Add the standard free-disk-space
step (remove preinstalled dotnet/android/ghc/CodeQL/boost/toolcache + prune
docker images, ~25 GB reclaimed) before the rust-cache restore and the compiles.
Inline rm -rf, no new third-party action.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After the disk fix the GPU lane reached the tests and the buiy_core #[ignore]
suite passed (goldens included — the residue corpus matches on CI's pinned
lavapipe 24.3.4). But linking buiy_verify's large bevy test binaries then crashed
'ld terminated with signal 7 [Bus error], core dumped' — the GPU lane builds
wgpu-info (release) + runs the buiy_core GPU suite first, so it has far less
memory/disk headroom than the plain Test job when it gets to that link.

Debug info is the bulk of a bevy test binary's link size, and GPU
pixel/invariant checks don't need backtraces. Add -C debuginfo=0 (preserving
-D warnings) for the GPU job only — shrinks both the link memory and the disk
footprint. The other jobs keep full debuginfo.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The SDF-corner-AA residue golden was blessed on dev hardware (RX 6700 XT) but is
keyed to CI's pinned lavapipe, which rasterizes the corner AA differently
(perceptually identical, differing_pixels=0, but max_channel_delta=35) — so it
fails EXACT on CI. The determinism design requires goldens captured ON the pinned
rasterizer.

Add a sentinel-gated job (runs while BLESS_GOLDENS_NOW exists) that drops the
dev-hardware positives and re-blesses fresh single-positive lavapipe captures,
uploaded as the lavapipe-blessed-goldens artifact. Sentinel-gated (not
workflow_dispatch) because dispatch requires the workflow on the default branch.
Next: download the artifact, commit the lavapipe goldens, remove the sentinel +
this job.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l if)

The previous attempt gated the job on `if: hashFiles('BLESS_GOLDENS_NOW')`, but
hashFiles is not allowed in a job-level if — GitHub rejected the whole workflow
('workflow file issue'), so no jobs/checks ran. Drop the gate (and the sentinel
file); the job is temporary and will be removed once the lavapipe goldens are
committed, so unconditional is fine — it only writes the runner's ephemeral
checkout + uploads an artifact, never commits back.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ere)

The rect-rounded SDF-corner-AA golden was blessed on dev hardware (RX 6700 XT)
but is keyed to CI's pinned lavapipe, which rasterizes the corner AA differently
(perceptually identical — differing_pixels=0 — but max_channel_delta=35), so it
failed EXACT on CI. The determinism design requires goldens captured ON the
pinned rasterizer.

Re-captured both residue goldens on CI's lavapipe (Mesa 24.3.4) via the temporary
bless-goldens job and committed the artifact: the rect-rounded SDF PNG now
matches lavapipe EXACT; the text-ahem PNG is byte-identical (Ahem boxes are
rasterizer-invariant — only its ledger provenance updated). Removed the
now-done temporary bless-goldens CI job.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@intendednull intendednull merged commit 62b000d into main Jun 16, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant