Skip to content

Cross-backend golden-image (visual regression) test harness#4236

Draft
mattleibow wants to merge 13 commits into
mainfrom
mattleibow-dev-golden-image-tests
Draft

Cross-backend golden-image (visual regression) test harness#4236
mattleibow wants to merge 13 commits into
mainfrom
mattleibow-dev-golden-image-tests

Conversation

@mattleibow

@mattleibow mattleibow commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Fixes #4295

Summary

This PR adds a holistic golden-image (visual regression) test harness to the main test suite. It renders a set of deterministic scenes through every available SkiaSharp backend (renderer) and diffs the result against committed golden PNGs with a strict per-renderer tolerance. Each (renderer × scene) pair is one xUnit theory cell.

The harness lives under tests/Tests/SkiaSharp/Visual/ and runs in-process inside the test runners we already ship — no separate render-host app, no Playwright/adb/simctl orchestration. Because the portable code is linked by the shared SkiaSharp.Tests project, the same matrix compiles and runs in SkiaSharp.Tests.Console (desktop), SkiaSharp.Tests.Devices (MAUI: Android/iOS/Mac Catalyst), and SkiaSharp.Tests.Wasm (browser).

Backends that need an extra NuGet package (Vulkan → SharpVk, Direct3D → Vortice) live in SkiaSharp's existing satellite host projects (SkiaSharp.Vulkan.Tests.Console, SkiaSharp.Direct3D.Tests.Console) instead of the base host, so that dependency never reaches the MAUI/WASM builds. They share the same engine via a small VisualMatrixTestsBase and a ~15-line driver class — see Project structure.

Important

This is a draft handoff. The framework, discipline, seeding tooling, CI provisioning, and the full macOS golden set are done and verified locally. What remains is mechanical and can only be done on CI/other platforms: harvest and commit the non-macOS goldens, and (optionally) add the device/WASM GPU renderers. See Remaining work below.

Why this exists

The in-flight Graphite backend PR (#3968) carries its own ad-hoc visual harness. This PR supersedes that infra with a clean, reusable one on main, so #3968 can rebase and only add its Graphite renderer classes + golden PNGs — no new test/CI/infra. See The Graphite seam.


What's in the box

Architecture

ISkiaScene.Draw(canvas) ─▶ IRenderer.RenderAsync ─▶ byte[] RGBA8888/Premul
                                                        │
              emit ##SKIA-GOLDEN-IMAGE## marker (PNG) into the test results (TRX)
                                                        │
   GoldenStore.TryLoad: {renderer}.{platform} ▸ {renderer}
                                                        │
                              SKPixelComparer.Compare(golden, actual, tolerance)
                                                        │
   pass  │  FAIL (out of tolerance, OR unseeded — captured PNG is in the TRX)  │  Skip (backend genuinely absent)
Piece Type Role
Scene ISkiaScene Deterministic draw op; same bytes every run on a backend
Renderer IRenderer Renders a scene through one backend → RGBA8888/premul pixels
Catalogs SceneCatalog, RendererCatalog Reflection-discover every public parameterless scene/renderer
Engine VisualMatrixTestsBase Shared per-cell pipeline (render → emit → compare-or-fail); reused by every host
Matrix VisualMatrixTests [Theory] over the catalog product in the base assembly; calls the engine
Comparison SKPixelComparer (extended) Tolerance-aware per-channel diff + colored diff image
Tolerance GoldenTolerance Per-renderer + per-(renderer, scene) tolerance
Golden I/O GoldenStore Resolves/loads goldens (read-only); encodes captures to PNG

Scenes (5)

DiagonalLines, FilledCircle, RedRoundedRectOnWhite, GradientBlend, Text (uses a bundled Roboto font from tests/Content/fonts — no system fonts, so it's deterministic).

Renderers (4 today)

Renderer Where it runs How it gets its context
raster everywhere (shared) CPU, always available
ganesh-metal macOS Console + iOS/Mac Catalyst device hosts (shared, Apple-gated) in-process Metal
ganesh-gl Console only (desktop) reuses the existing tests/Tests/SkiaSharp/GlContexts/ abstraction (CGL/GLX/WGL)
ganesh-vulkan SkiaSharp.Vulkan.Tests.Console satellite (Linux/Windows) headless SharpVk; bring-up failure → skip, EntryPoint/MissingMethod → fail

Project structure — base host vs. satellites

The base SkiaSharp.Tests.Console must not take a SharpVk/Vortice dependency: it is the host the MAUI-device and WASM builds consume, and pulling a GPU-vendor package into it would bloat or break those. SkiaSharp already ships dedicated satellite projects for exactly these backends, both already built and run by tests-netcore:

Project Adds Visual renderers it hosts
SkiaSharp.Tests.Console (base) raster, ganesh-gl (reuses in-repo GlContexts/), ganesh-metal (Apple-gated)
SkiaSharp.Vulkan.Tests.Console SharpVk ganesh-vulkan (+ future graphite-vulkan)
SkiaSharp.Direct3D.Tests.Console Vortice direct3d (future)

The shared VisualMatrixTests [Theory] only runs in the assembly that compiles it (the base host), driving the renderers auto-discovered there. A satellite adds a thin VulkanVisualTests : VisualMatrixTestsBase that iterates RendererCatalog.NamesIn(Assembly.GetExecutingAssembly()) × scenes — i.e. only the renderers compiled into that satellite — so raster/GL/Metal are never double-run, and a new Vulkan-family renderer (e.g. Graphite's) joins automatically with no test-class edit. ganesh-gl/ganesh-metal stay in the base host because they need no extra package (GL reuses the existing GlContexts/ abstraction; Metal is in-process P/Invoke).

Golden storage & lookup — two layers, generalizing over platform only

tests/Content/Goldens/
  {renderer}.{platform}/{scene}.png   ← per-platform override (this OS/driver diverges)
  {renderer}/{scene}.png              ← the renderer's golden, shared across platforms

GoldenStore.TryLoad resolves the per-platform path first, then the shared {renderer}/ path. {platform} is a short VisualPlatform.Tag (macos/windows/linux/android/ios/maccatalyst/tvos/browser).

  • The fallback never crosses renderers — a missing GPU golden is never satisfied by the CPU baseline (that aliases GPU-vs-CPU antialiasing and hides regressions). The "shared" layer is per renderer, at {renderer}/.
  • {renderer}/ is the cross-platform share for the common case (CPU raster + software GL are deterministic across OSes/arches); a genuinely divergent OS/driver gets a {renderer}.{platform}/ override that wins.

Goldens live under tests/Content/Goldens/ so they ride the existing Content pipeline (Console copies them next to the binary; Devices/Wasm embed them as resources). No per-project globbing.

Failure discipline (the property this harness exists to guarantee)

A cell skips only when the backend is genuinely absentIRenderer.IsAvailable == false, or RenderAsync throws RendererUnavailableException (runtime probe found no device/driver/context). Everything else is a hard failure, including:

  • a render that throws anything else (incl. EntryPointNotFoundException/MissingMethodException from a broken binding),
  • pixels out of tolerance against a golden that exists,
  • an unseeded cell — the backend ran and produced pixels but no golden is committed. This is a fail, not a skip (a green here would be a coverage hole). It's safe to enforce because the captured PNG is already in the TRX to harvest (next section).

There is no path that downgrades a real regression to a skip or a warning.

Seeding goldens — harvest from the test results (TRX)

There is no in-process record mode and no environment variable. Every cell emits its rendered PNG into the test log on pass and fail as a single XML-safe line:

##SKIA-GOLDEN-IMAGE## path={renderer}.{platform}/{scene}.png size=WxH base64=<png bytes>

The TRX is the one output channel that exists uniformly on every host — desktop, MAUI device, and WASM — including the device/browser hosts where the filesystem is sandboxed/embedded and an in-process write-to-source-tree is impossible. So seeding is:

# 1. run with a TRX report (from the test build output dir):
./SkiaSharp.Tests --filter-trait "Category=Visual" --report-trx --report-trx-filename visual.trx
# 2. harvest the markers into tests/Content/Goldens and commit (from repo root):
python3 scripts/infra/tests/extract-visual-goldens.py path/to/visual.trx --dry-run   # preview
python3 scripts/infra/tests/extract-visual-goldens.py path/to/visual.trx             # write
git add tests/Content/Goldens && git commit

A new cell fails as unseeded on its first run; after harvest+commit it compares strictly and goes green. To share a golden across platforms, move byte-identical PNGs up to {renderer}/; the harvest then skips re-creating the per-platform copy whenever an existing {renderer}/ golden is byte-identical, so the promotion sticks.

CI wiring (scripts/azure-templates-stages-test.yml)

  • Software GPU on the Linux .NET Core agent: installs xvfb mesa-utils libgl1-mesa-dri mesa-vulkan-drivers vulkan-tools, starts Xvfb, and pins GL/Vulkan to Mesa's software rasterizers (LIBGL_ALWAYS_SOFTWARE=1 + llvmpipe + lavapipe VK_ICD_FILENAMES). This makes ganesh-gl/ganesh-vulkan actually render on a headless agent so their captures can be harvested. Fail-safe: if any piece is missing, context creation throws RendererUnavailableException and the cell skips — a provisioning gap never turns into a red build. (Bonus: the existing GRContextTest/GRGlInterfaceTest GL tests now also exercise llvmpipe.)
  • Existing PublishTestResults + the testlogs_* artifacts carry the .trx files back for harvesting — no extra collection step.
  • The matrix runs inside the existing stages (tests-netcore, tests-android, tests-ios, tests-maccatalyst, tests-wasm). There is no dedicated visual stage and no tests-visual build target.

What's verified vs. not

Verified locally (macOS Console, net10 arm64):

  • SkiaSharp.Tests.Console + shared SkiaSharp.Tests.csproj build clean.
  • Base SkiaSharp.Tests.Console matrix → 15 pass / 0 skip (raster + ganesh-gl + ganesh-metal × 5 scenes).
  • SkiaSharp.Vulkan.Tests.Console satellite → 5 ganesh-vulkan cells skip cleanly (no Vulkan ICD on the box), driven by the thin VulkanVisualTests. --filter-trait "Category=Visual" works both directions in both hosts.
  • extract-visual-goldens.py against a real TRX: all 15 markers parsed with correct two-layer paths; harvest is byte-idempotent (re-harvest → git status clean); the byte-aware shared-skip works (a promoted raster/FilledCircle.png makes the per-platform copy skip while GPU cells with different bytes still write).

NOT verifiable on this macOS box (needs CI / other platforms):

  • Linux Mesa/lavapipe provisioning actually lighting up ganesh-gl/ganesh-vulkan.
  • Whether a single {renderer}/ raster golden is portable across arches within tolerance, or whether some scenes need per-platform raster goldens.
  • Windows / Android / iOS / Mac Catalyst / WASM cells (all currently unseeded → will fail until their goldens are harvested on their agents — by design).

Remaining work (handoff checklist)

The framework is complete. The rest is seeding goldens per platform (mechanical, but must run on each platform's CI agent) and optionally adding device/WASM GPU renderers.

  • Seed desktop GPU goldens from CI. Run the matrix on the Linux and Windows agents (provisioning is already wired for Linux). The GPU cells will fail as unseeded on the first run — that's expected. Download each lane's testlogs_* artifact, run extract-visual-goldens.py against it, review, and commit ganesh-gl.linux/, ganesh-vulkan.linux/, ganesh-gl.windows/ (and ganesh-vulkan.windows/ if a Windows ICD is present).
  • Decide raster portability. After Linux/Windows raster cells run, check whether their captures match the committed raster.macos/ bytes within (2, 0.002). If yes, promote raster.macos/*raster/* (and let the harvest's byte-aware skip keep them shared). If a scene diverges, leave it per-platform.
  • Windows software-Vulkan provisioning (optional). The Linux apt path is done; Windows currently runs on whatever ICD is present (fail-safe). Add a software ICD if deterministic Windows Vulkan is wanted.
  • P2 — device/WASM GPU renderers (the only coverage gap vs Dev/graphite backend #3968). Add, gated and skip-safe:
    • ganesh-gles on Android (raw P/Invoke EGL/GLES, OS-gated),
    • ganesh-vulkan on Android,
    • WebGL2 on WASM (OffscreenCanvas interop).
      These were left out deliberately so an unverified device P/Invoke can't cause a false red on a host that can't be tested locally. Each is a new IRenderer class + golden folder only — no harness changes. Seed their goldens via the same TRX harvest on the device/browser lanes.
  • Tighten tolerances for the software-driver GPU cells once their goldens are stable (they can move toward the deterministic end). Use per-(renderer, scene) overrides in GoldenTolerance.ByRendererScene for individually divergent cells rather than loosening a whole renderer.
  • direct3d cell (P3, optional) — add in the separate Direct3D console project; CatalogReflection discovers renderers from the entry assembly too.

The Graphite seam

#3968 rebases onto this by adding renderer classes and golden PNGs only — no test/csproj/CI changes. Concretely it adds:

  • tests/VulkanTests/Visual/GraphiteVulkanRenderer.cs in the SkiaSharp.Vulkan.Tests.Console satellite (beside GaneshVulkanRenderer); the satellite's VulkanVisualTests discovers it via RendererCatalog.NamesIn(thisAssembly), so it joins that matrix with no test-class edit,
  • Visual/Renderers/GraphiteMetalRenderer.cs (shared + Apple-gated, beside GaneshMetalRenderer, auto-discovered by the base VisualMatrixTests),
  • Content/Goldens/graphite-*.{platform}/*.png (seeded per platform by harvesting its CI TRX).

The catalogs auto-discover both. The Graphite renderer files take a small (~5-line) rebase edit: implement SkiaSharp.Tests.Visual.IRenderer, acquire the GPU context from the shared TestConfig/GlContexts providers (or the satellite's existing VkContext/GRSharpVkBackendContext) instead of #3968's VulkanLoader/WglLoader/EglLoader, drop the inline ComputeDiff (the harness compares), and drop the out-of-process host sessions + VisualFactAttribute opt-in gate (the matrix runs by default).


File guide

Harness core (tests/Tests/SkiaSharp/Visual/)

  • ISkiaScene.cs, IRenderer.cs, RendererUnavailableException.cs — the seam.
  • SceneCatalog.cs, RendererCatalog.cs, CatalogReflection.cs — reflection discovery.
  • Tests/VisualMatrixTestsBase.cs — the reusable per-cell engine: emits the ##SKIA-GOLDEN-IMAGE## marker, compares, enforces discipline. Shared by every host.
  • Tests/VisualMatrixTests.cs — the base host's thin [Theory] over every auto-discovered renderer × scene.
  • GoldenStore.cs — two-layer resolve/load (read-only) + PNG encode.
  • GoldenTolerance.csraster (2, 0.002), GPU (12, 0.02), per-cell overrides.
  • VisualPlatform.csTag (the {platform} segment).
  • RendererPixels.cs, GpuRenderGate.cs — pixel normalization + a GPU serialization lock.
  • Scenes/*.cs (5), Renderers/*.cs (raster, ganesh-metal), Renderers/Desktop/*.cs (ganesh-gl — excluded from the shared csproj, Console-only).
  • RendererCatalog.NamesIn(Assembly) — filters the catalog to renderers declared in one assembly; the seam a satellite uses to run only its own renderers.

Satellite host (tests/VulkanTests/Visual/, compiled into SkiaSharp.Vulkan.Tests.Console)

  • GaneshVulkanRenderer.cs — moved here from the base host so SharpVk stays out of the shared test code.
  • VulkanVisualTests.cs — the ~15-line VisualMatrixTestsBase driver running NamesIn(thisAssembly) × scenes.

Comparisontests/Tests/Utils/SKPixelComparer.cs: tolerance-aware Compare(...channelTolerance) (+ MaxChannelDelta) and GenerateDifferenceImage (red = over tolerance, amber = minor).

Toolingscripts/infra/tests/extract-visual-goldens.py: harvests markers from TRX → writes goldens, with a path-safety guard, --dry-run, and the byte-aware shared-skip.

CIscripts/azure-templates-stages-test.yml: Linux software-GPU provisioning.

csprojSkiaSharp.Tests.csproj excludes Visual/Renderers/Desktop/** (same as GlContexts/*); base SkiaSharp.Tests.Console.csproj includes everything but takes no GPU-vendor package. SkiaSharp.Vulkan.Tests.Console.csproj already references SharpVk + the base host and compiles ..\VulkanTests\**, so the moved renderer needed no csproj change.

Docsdocumentation/dev/golden-image-tests.md: the full durable design doc (architecture, discipline, seeding, CI, extending, the Graphite seam).


How to run locally

dotnet cake --target=externals-download            # C#-only change → pre-built natives are fine
dotnet build tests/SkiaSharp.Tests.Console/SkiaSharp.Tests.Console.csproj -c Release
cd tests/SkiaSharp.Tests.Console/bin/Release/net*/
./SkiaSharp.Tests --filter-trait "Category=Visual"        # run only the visual matrix
./SkiaSharp.Tests --filter-not-trait "Category=Visual"    # run everything else

On macOS the base host shows 15 pass / 0 skip. The ganesh-vulkan cells live in the Vulkan satellite:

dotnet build tests/SkiaSharp.Vulkan.Tests.Console/SkiaSharp.Vulkan.Tests.Console.csproj -c Release
cd tests/SkiaSharp.Vulkan.Tests.Console/bin/Release/net*/
./SkiaSharp.Vulkan.Tests --filter-trait "Category=Visual"   # 5 ganesh-vulkan cells; skip without an ICD

🤖 Generated with Copilot CLI

mattleibow and others added 9 commits June 24, 2026 21:25
Introduce a holistic golden-image matrix in the main test suite under
tests/Tests/SkiaSharp/Visual/. Each (renderer x scene) pair is one xUnit
theory cell that renders a deterministic scene through a backend and
compares the pixels to a committed golden PNG.

Core design:
- Runs in-process inside the existing runners (Console/Devices/Wasm) via
  the shared SkiaSharp.Tests project; no out-of-process render hosts,
  Playwright, adb, or simctl.
- Reuses existing primitives instead of reinventing them: SKPixelComparer
  (extended with a tolerance-aware Compare overload + colored diff image),
  the GlContexts/ abstraction via TestConfig.CreateGlContext(), and the
  Content embed/copy pipeline (goldens under tests/Content/Goldens/).
- Strict failure discipline: a cell skips only when the backend is
  genuinely absent (IsAvailable=false or RendererUnavailableException);
  any other thrown exception, out-of-tolerance pixels, or a missing golden
  is a hard failure. Broken bindings (EntryPointNotFoundException) fail.
- Platform-aware golden lookup: {renderer}.{platform} -> {renderer} ->
  _shared, resolved disk-first then from embedded resources.
- Per-renderer + per-(renderer,scene) tolerance (raster tight, GPU wider).
- Record/update goldens with SKIASHARP_UPDATE_GOLDENS=1 and an optional
  SKIASHARP_GOLDEN_SCOPE (shared|renderer|platform).

Phase 1 backends (verified on macOS Console): raster (all hosts),
ganesh-gl (CGL/GLX/WGL), ganesh-metal (in-process P/Invoke). Desktop GPU
renderers live under Renderers/Desktop/ and are excluded from the shared
project (mirroring the GlContexts exclusion) so the MAUI/Devices/Wasm
builds keep working.

RendererCatalog is the single seam the in-flight Graphite backend PR
plugs into: it adds renderer classes + golden folders and nothing else.
Includes documentation/dev/golden-image-tests.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tag every visual-matrix cell with [Trait("Category", "Visual")] so CI and
developers can run only the visual suite (--filter-trait Category=Visual) or
skip it (--filter-not-trait Category=Visual); cells still run by default.

Move GaneshMetalRenderer out of Renderers/Desktop into the shared harness and
gate availability on all Apple platforms (macOS/iOS/MacCatalyst/tvOS) via
OperatingSystem.Is* runtime checks. Because Metal is reached purely through
runtime P/Invoke, the same renderer now runs in-process on the macOS Console
host and on the iOS / Mac Catalyst MAUI device hosts, while skipping cleanly
on non-Apple platforms.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add GaneshVulkanRenderer reusing the existing SharpVk vehicle and the
GRSharpVkBackendContext bridge that SkiaSharp.Vulkan.Tests already exercises,
rather than reinventing a Vulkan loader. The context is fully headless: it
creates only Instance -> PhysicalDevice -> graphics Queue -> Device with no
VK_KHR_surface/swapchain and no window, which is all GRContext.CreateVulkan
needs to render to an offscreen SKSurface.

The renderer lives under Renderers/Desktop/ so the SharpVk dependency is
compiled only into the Console host and never reaches the MAUI device or WASM
builds. On a host without a Vulkan ICD (default macOS agent, or a driverless
Linux/Windows agent) it skips with a reason; a missing native entry point is
rethrown so a broken binding fails rather than skipping.

Console gains SharpVk + SkiaSharp.Vulkan.SharpVk references. Verified on macOS:
matrix is 20 cells, 15 pass, 5 ganesh-vulkan cells skip cleanly (no ICD).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Runs only the [Trait("Category","Visual")] cells in the desktop Console
host, with strict failure discipline (non-zero exit fails the target) and a
--updateGoldens record mode for seeding per-platform GPU goldens in CI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two correctness fixes plus a CI-friendly discipline for platforms that can
only be seeded on the agent (Linux/Windows/device/browser):

* The shared _shared baseline is now used ONLY by the portable raster backend
  for portable scenes. GPU backends no longer fall through to the CPU baseline
  (which compared GPU output against CPU goldens and would falsely fail every
  non-macOS GPU cell), and platform-dependent scenes (text) no longer share one
  platform's reference. Scenes declare ISkiaScene.IsPlatformDependent; the
  macOS raster Text golden moves from _shared to raster.macos accordingly.

* A cell whose renderer is available but has no golden recorded yet is an
  explicit, loud 'unseeded' skip instead of a hard failure, so CI stays green
  until each platform's goldens are seeded. A golden that EXISTS is still
  compared strictly (fail on diff), and an unexpected exception still fails.
  Set SKIASHARP_VISUAL_REQUIRE_GOLDENS=1 (CI does this per platform once seeded)
  to turn unseeded cells into failures and lock the coverage in.

Verified on macOS Console: 15 pass + 5 ganesh-vulkan skip unchanged; un-seeding
a cell skips by default and fails under REQUIRE_GOLDENS.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s-arch

The portable _shared CPU baseline is now only used on desktop hosts (macOS /
Windows / Linux); device and browser hosts (Android / iOS / MacCatalyst / WASM)
record raster goldens per platform, because their architecture rounds
antialiasing differently. This mirrors what the prior-art harness found
empirically (a shared desktop set plus separate android-/ios-/wasm- sets) and
means an unseeded device raster cell skips rather than falsely comparing against
the desktop baseline. The raster tolerance widens to 2 LSB on <=0.2% of pixels
so the shared baseline survives cross-architecture (arm64<->x64) AA rounding
while still failing hard on any real regression.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Provision the Linux .NET Core test agent with software GL (Mesa llvmpipe),
  software Vulkan (Mesa lavapipe) and a virtual X server so the matrix can
  render and golden-compare ganesh-gl / ganesh-vulkan deterministically on a
  headless agent. Driver/display selection is fail-safe: if anything is missing
  the GPU cells skip rather than fail. (This also lets the existing GL tests run
  on llvmpipe instead of skipping.)
* Collect the matrix's *.actual.png / *.diff.png failure artifacts into the
  published test-log tree on every .NET Core agent (Windows/macOS/Linux) via
  collect-visual-failures.ps1, so a red visual cell is triageable from the build
  artifacts, not just the base64 in the TRX.

The matrix already runs in-process inside the existing test stages (it is shared
test code), so no dedicated visual stage is added.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Failure discipline: document the unseeded-cell skip (renderer ran but no golden
  recorded yet for this platform) and the SKIASHARP_VISUAL_REQUIRE_GOLDENS knob
  that flips unseeded->fail per platform once seeded. A golden that *exists* is
  always compared strictly.
* Golden lookup: the _shared baseline is now scoped to desktop portable raster
  only (renderer==raster && !IsPlatformDependent && IsDesktop); GPU, Text, and
  device/browser raster each carry per-platform goldens. Add ISkiaScene.IsPlatformDependent.
* Tolerance: raster is (2, 0.002) to absorb cross-architecture CPU AA rounding on
  the shared desktop baseline; document when to split per-platform instead.
* Running locally: document the Category=Visual trait filter and the tests-visual
  Cake target (incl. --updateGoldens/--goldenScope record mode).
* CI: document the implemented wiring (Linux Mesa/lavapipe software-GPU + Xvfb,
  fail-safe selection; collect-visual-failures.ps1 artifact collection on all
  netcore agents), the per-platform seed->enforce lifecycle, and the env-var table.
* Coverage/seam: ganesh-metal is shared + Apple-gated (runs on iOS/MacCatalyst
  device hosts, not just macOS Console); fix the Graphite seam file locations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the env-var record mode with a TRX-harvest seeding model and simplify
the golden lookup to two layers, so the same flow works on every host (desktop,
MAUI device, WASM) without a writable source tree.

- Every matrix cell now emits its rendered PNG into the test results on pass and
  fail as a single-line ##SKIA-GOLDEN-IMAGE## marker (base64, XML-safe). The TRX
  is the one channel that exists on all hosts, so it is the seed channel.
- Seeding is now: run -> scripts/infra/tests/extract-visual-goldens.py harvests
  the markers from the TRX -> commit. No SKIASHARP_UPDATE_GOLDENS /
  SKIASHARP_GOLDEN_SCOPE / SKIASHARP_VISUAL_REQUIRE_GOLDENS, no in-process record
  mode. The harvest skips re-creating a {renderer}.{platform} file when a
  byte-identical shared {renderer}/ golden exists, so promotions survive.
- Golden lookup is now two layers: {renderer}.{platform}/{scene}.png ->
  {renderer}/{scene}.png. Generalizes over platform only, never over renderer.
  Removed the cross-renderer _shared/ folder, ISkiaScene.IsPlatformDependent, and
  VisualPlatform.IsDesktop. Migrated _shared/*.png -> raster.macos/.
- An unseeded cell now FAILS instead of skipping: the backend produced pixels, so
  green would be a coverage hole, and the PNG is already in the TRX to harvest.
- Removed scripts/infra/tests/tests-visual.cake (+ build.cake task) and
  collect-visual-failures.ps1 (+ its 3 YAML postBuildSteps); the matrix runs in
  the existing stages and the TRX carries the images. Kept the Linux Mesa/lavapipe
  /Xvfb provisioning so GPU cells render and can be harvested.
- Rewrote documentation/dev/golden-image-tests.md for the new model.

Verified on macOS Console (net10 arm64): Console + shared SkiaSharp.Tests build
clean; 20-cell matrix = 15 pass + 5 ganesh-vulkan skip (no ICD); harvest of the
real TRX is byte-idempotent and the shared-golden skip works.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

📦 Try the packages from this PR

Warning

Do not run these scripts without first reviewing the code in this PR.

Step 1 — Download the packages

bash / macOS / Linux:

curl -fsSL https://raw.githubusercontent.com/mono/SkiaSharp/main/scripts/get-skiasharp-pr.sh | bash -s -- 4236

PowerShell / Windows:

iex "& { $(irm https://raw.githubusercontent.com/mono/SkiaSharp/main/scripts/get-skiasharp-pr.ps1) } 4236"

Step 2 — Add the local NuGet source

dotnet nuget add source ~/.skiasharp/hives/pr-4236/packages --name skiasharp-pr-4236
More options
Option Description
--successful-only / -SuccessfulOnly Only use successful builds
--force / -Force Overwrite previously downloaded packages
--list / -List List available artifacts without downloading
--build-id ID / -BuildId ID Download from a specific build

Or download manually from Azure Pipelines — look for the nuget artifact on the build for this PR.

Remove the source when you're done:

dotnet nuget remove source skiasharp-pr-4236

@github-actions

Copy link
Copy Markdown
Contributor

📖 Documentation Preview

The documentation for this PR has been deployed and is available at:

🔗 View Staging Site
🔗 View Staging Docs
🔗 View Staging Gallery (Blazor)
🔗 View Staging Gallery (Uno Platform)
🔗 View Staging SkiaFiddle

This preview will be updated automatically when you push new commits to this PR.


This comment is automatically updated by the documentation staging workflow.

mattleibow and others added 4 commits June 25, 2026 00:01
The base SkiaSharp.Tests.Console must not take a SharpVk (or Vortice)
dependency — it is the host the MAUI device and WASM builds consume
indirectly, and SkiaSharp already ships dedicated satellite projects
(SkiaSharp.Vulkan.Tests.Console, SkiaSharp.Direct3D.Tests.Console) for
exactly these GPU backends, both already built and run by tests-netcore.

- Revert the SharpVk PackageReference + SkiaSharp.Vulkan.SharpVk
  ProjectReference accidentally added to the base Console project.
- Extract the per-cell pipeline (render -> emit ##SKIA-GOLDEN-IMAGE## ->
  compare-or-fail / fail-unseeded) into a reusable VisualMatrixTestsBase
  so every host shares one engine. The shared VisualMatrixTests becomes a
  thin [Theory] over the renderers auto-discovered in the base assembly.
- Move GaneshVulkanRenderer into the Vulkan satellite
  (tests/VulkanTests/Visual/) beside a thin VulkanVisualTests driver that
  runs only the renderers compiled into that assembly via
  RendererCatalog.NamesIn(thisAssembly) -- so raster/GL/Metal are never
  double-run, and a future graphite-vulkan renderer joins automatically.
- Make GpuRenderGate and RendererPixels public (the satellite renderer
  consumes them).
- Update documentation/dev/golden-image-tests.md (hosting/wiring, extend,
  coverage, Graphite seam) for the satellite structure.

Verified on macOS: base Console matrix 15 pass / 0 skip; Vulkan satellite
5 ganesh-vulkan cells skip cleanly (no ICD).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…r goldens

Addresses issues found reviewing and running the golden-image harness:

- GaneshMetalRenderer: guard OperatingSystem.Is*() behind NET5_0_OR_GREATER
  (falling back to TestConfig.Current.IsMac on net48). These probes don't
  exist on .NET Framework 4.8, which the Console host targets on Windows, so
  the matrix failed to compile there with CS0117.
- TextScene: a missing/unloadable bundled font is now a hard error instead of
  a silent SKTypeface.CreateDefault() fallback. The fallback would capture a
  host-dependent, non-portable golden and hide the real "font not bundled"
  failure, defeating the scene's determinism.
- SKPixelComparer.GenerateDifferenceImage: dispose the diff SKBitmap (it is
  snapshot-copied by SKImage.FromBitmap), fixing a native leak on every failed
  comparison.
- Promote raster.macos/ to a shared raster/ golden set. Verified portable:
  Windows and browser raster match macOS within the (2, 0.002) tolerance
  (maxDelta <= 2, 0 outliers), so one shared set covers macOS/Windows/WASM and
  the always-available raster row no longer fails as unseeded off macOS.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Failure images were only available as base64 embedded in the TRX. Surface
them as ordinary PNGs in the published test-logs artifact, grouped by outcome.

- VisualMatrixTestsBase: each cell now emits a ##SKIA-VISUAL-CELL## marker
  recording its outcome (pass | mismatch | unseeded), and a failing cell emits
  its golden and colored diff as ##SKIA-VISUAL-IMAGE## markers (replacing the
  unstructured base64 dumps). The outcome tag lets triage tell an unseeded cell
  (seed its golden) from a regression (investigate; never blindly harvest).
- extract-visual-goldens.py: add --failures-out DIR triage mode that decodes
  the markers from the TRX into
  visual-failures/unseeded/{r}.{platform}/{scene}.actual.png and
  visual-failures/mismatch/{r}.{platform}/{scene}.{actual,golden,diff}.png.
  Default golden-seeding behavior is unchanged; a missing TRX is a no-op here.
- azure-templates-stages-test.yml: run that extractor as an always() post-test
  step on every test lane (netfx/netcore desktop, android, ios, maccatalyst,
  wasm), writing into the already-published testlogs_* artifact. It reads the
  TRX, the one channel present on every host, so it works on desktop, device,
  and WASM alike.
- Document the markers and the triage step in golden-image-tests.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Text scene drew "Skia"/"Sharp", but the bundled Roboto2-Regular_NoEmbed.ttf
is a 33-glyph subset that only maps "!,DEHLORW" — none of S/k/i/a/h/r/p. So the
scene rendered nothing and the committed goldens were blank, making the cell pass
vacuously (blank render == blank golden) while asserting nothing about text. The
existing font tests only check the family name, so this went unnoticed.

- TextScene: draw "HELLO" / "WORLD!", which use only glyphs the subset font maps,
  so the scene actually exercises glyph rasterization.
- Regenerated the Text golden with real content (verified non-blank). Text glyph
  rendering genuinely diverges across platforms (browser/FreeType vs Windows:
  maxDelta 59, 1.83% of pixels over the raster tolerance), exactly as the design
  doc predicts, so Text is stored per-platform (raster.windows/, raster.browser/)
  rather than in the shared raster/ layer. The 4 geometric scenes stay shared.
- Removed the blank ganesh-gl.macos/Text.png and ganesh-metal.macos/Text.png; they
  were blank for the same reason and will be reseeded with real text from macOS CI.
  macOS/Linux raster Text is likewise seeded per-platform on its own CI run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mattleibow mattleibow added this to the 4.151.0-preview.2 milestone Jun 30, 2026
@mattleibow

Copy link
Copy Markdown
Contributor Author

🤝 Resume notes for whoever takes this over

Short orientation + tribal knowledge that isn't obvious from the PR body. The durable design doc is documentation/dev/golden-image-tests.md — read it first.

Current state (framework done; seeding is what's left)

  • Harness, two-layer golden lookup, TRX-harvest seeding, and CI wiring are in place. Failure PNGs are published on every test lane: scripts/azure-templates-stages-test.yml runs extract-visual-goldens.py as an always() post-test step that decodes the TRX markers into the testlogs_* artifact.
  • Committed goldens today: ganesh-gl.macos/ + ganesh-metal.macos/ (4 geometric scenes each), shared raster/ (4 geometric — verified portable across macOS/Windows/WASM within the raster tolerance), and per-platform Text (raster.windows/Text.png, raster.browser/Text.png).
  • Verified locally on macOS just now: base Console visual matrix = 12 pass / 3 fail. The 3 failures are {raster, ganesh-gl, ganesh-metal} × Text failing as unseeded — by design. Commit b29a8302d23 removed the old macOS Text goldens because they were silently blank (the bundled subset font didn't map the old "Skia"/"Sharp" glyphs; the scene now draws "HELLO"/"WORLD!"). They need a real-text reseed (see below). Unseeded = hard FAIL is intentional, not a regression.

Critical-path next action: seed goldens (the only real work left)

Every non-macOS cell, plus the 3 macOS Text cells, currently has no golden and will fail-as-unseeded on first run. That failure is the signal to harvest:

  1. Let the test lane run (visual cells fail-as-unseeded, emitting their PNG into the TRX).
  2. Download that lane's testlogs_* artifact.
  3. python3 scripts/infra/tests/extract-visual-goldens.py <path-to.trx> → writes tests/Content/Goldens/**. Triage with --failures-out DIR to dump actual/golden/diff PNGs and tell an unseeded cell (seed it) apart from a mismatch (investigate — never blind-harvest a regression).
  4. Eyeball the PNGs, commit, re-run → green.

Still to seed: macOS Text (raster/gl/metal — harvestable from a macOS lane or even a macOS dev box, which is how the existing macOS goldens were made); Linux raster+gl+vulkan; Windows raster+gl (+vulkan if an ICD is present); Android / iOS / MacCatalyst / WASM raster + their GPU cells.

Gotchas that cost time

  • No Vulkan ICD on the macOS dev boxganesh-vulkan cells skip locally and you cannot reproduce Linux/Windows/device/WASM here. Real GPU verification only happens on CI (Linux agents are provisioned with Mesa GL + Lavapipe Vulkan; see the CI provisioning steps).
  • gh pr edit and any GraphQL gh call are SAML-blocked with the available token. Edit the PR via REST instead: gh api -X PATCH repos/mono/SkiaSharp/pulls/4236 -F body=@file; set milestone/labels with gh api -X PATCH repos/mono/SkiaSharp/issues/{n} (issues + PRs share the /issues/ endpoint).
  • Always diff against origin/main, not the local main ref (the stale local main shows a spurious docs submodule change that isn't real).
  • This is a C#/test-only change → bootstrap with dotnet cake --target=externals-download. Do not rebuild natives.

Run the matrix locally (macOS)

export PATH="/usr/local/share/dotnet:$PATH"
dotnet build tests/SkiaSharp.Tests.Console/SkiaSharp.Tests.Console.csproj -c Release
cd tests/SkiaSharp.Tests.Console/bin/Release/net10.0
export DYLD_LIBRARY_PATH="$PWD:$DYLD_LIBRARY_PATH"
./SkiaSharp.Tests --filter-trait "Category=Visual"

Vulkan satellite: build SkiaSharp.Vulkan.Tests.Console, then ./SkiaSharp.Vulkan.Tests --filter-trait "Category=Visual".

Architecture cheat-sheet

  • Shared CPU/GL cells live in tests/Tests/SkiaSharp/Visual/** (project SkiaSharp.Tests) and compile into every host (Console / Devices / Wasm).
  • Package-dependent GPU renderers live in satellite hosts so the base Console stays dependency-clean: Vulkan → SkiaSharp.Vulkan.Tests.Console (SharpVk), Direct3D → SkiaSharp.Direct3D.Tests.Console (Vortice). Each is a thin VisualMatrixTestsBase subclass driving RendererCatalog.NamesIn(thisAssembly) × scenes.

The Graphite seam (the whole point of this PR)

When dev/graphite-backend (#3968) rebases onto this, it adds only:

  • graphite-vulkan: one IRenderer dropped into tests/VulkanTests/Visual/ → auto-joins the Vulkan satellite via RendererCatalog.NamesIn.
  • graphite-metal: one shared Apple-gated IRenderer → auto-joins the base matrix.
  • a golden folder per renderer.

Zero test-class / csproj / CI edits. The only rebase friction is a ~5-line edit pointing those renderers at our shared context-provider names (documented in the design doc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Add a cross-backend golden-image (visual regression) test harness

2 participants