Skip to content

Add cross-version benchmarks CI workflow#4276

Open
mattleibow wants to merge 5 commits into
mainfrom
mattleibow-benchmarks-ci-workflow
Open

Add cross-version benchmarks CI workflow#4276
mattleibow wants to merge 5 commits into
mainfrom
mattleibow-benchmarks-ci-workflow

Conversation

@mattleibow

@mattleibow mattleibow commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

What

Adds a PR-triggered Benchmarks workflow that runs the SkiaSharp micro-benchmarks across multiple SkiaSharp versions on Linux, Windows, and macOS, then merges the results into a single comparison report posted to the job summary.

This grew out of validating the native blur fix (SK_AVOID_SLOW_RASTER_PIPELINE_BLURS): there was no way to measure such changes with real numbers across versions and platforms. Now there is.

Why a build matrix instead of one process

BenchmarkDotNet cannot host two different versions of the same assembly in a single process (type-identity clash — there is no built-in WithNuGet). So each (os, version) combination is benchmarked in its own job and the JSON exports are merged afterwards.

How it works

flowchart LR
  setup[setup: resolve versions] --> pub[published matrix: os x version]
  setup --> cur[current matrix: linux + macos, native build]
  pub --> rep[report: merge -> job summary]
  cur --> rep
Loading
  • publishedSkiaSharp.Benchmarks.Compare, an isolated harness that links the same benchmark sources but restores a published SkiaSharp NuGet version from nuget.org. It deliberately opts out of the repo build infrastructure (empty Directory.Build.props/.targets) and carries its own NuGet.config so it resolves the exact released version requested. Fast and reliable — always reports real numbers on all three OSes. Default versions: 4.150.0-preview.2.1 (baseline) and 3.119.4.
  • current — benchmarks the working tree's native code. Rather than building the multi-targeted in-repo binding graph (which drags in the mobile native-asset projects and needs the Android/iOS workloads the runners don't have — NETSDK1147), it reuses the same Compare harness: it restores the baseline published managed package, then replaces that package's libSkiaSharp in the NuGet global cache with the one freshly built from this PR. BenchmarkDotNet builds its child project against that cache at run time, so the benchmark exercises this PR's native code with a known, stable managed API. The job logs the SHA before/after the swap to prove the working-tree binary is the one being measured. Best-effort (continue-on-error); scoped to the OSes whose native library reliably builds from source on a hosted runner (Linux + macOS).
  • reportscripts/benchmarks/merge-benchmarks.py merges every run into one Markdown table (mean µs + ratio-vs-baseline) written to $GITHUB_STEP_SUMMARY and uploaded as an artifact.

Reading the current column ⚠️

The from-source externals-* native is not built with the same optimization/official-build flags as the shipped NuGet native, so its absolute timings are systematically different (typically slower) than the optimized published packages. Do not read a current-vs-published ratio as the effect of a PR's code. To measure a native change, compare two current runs built the same way (the PR branch vs its base). The published columns are the apples-to-apples comparison across released versions.

The Windows hosted image currently cannot build the native library from source (missing Windows SDK 10.0.19041 and Spectre-mitigated MSVC libs on the VS preview image), so Windows is excluded from current by default and still appears in the published comparison. It can be opted back in via the current_oses dispatch input once the toolchain is available.

Other changes

  • BenchmarkConfig (shared) adds JsonExporter.FullCompressed — that export cannot be selected from the command line in this BenchmarkDotNet version, so it is configured in code to guarantee every run produces the JSON the merge step consumes.
  • BlurImageFilterBenchmark exercises the 8888 raster blur path affected by the native flag (small-sigma slow path vs large-sigma control).
  • SurfaceCanvasBenchmark switched to SKPath (CS0618 suppressed) so the shared sources compile against older releases that predate SKPathBuilder.

Triggers

  • pull_request touching benchmarks/**, scripts/benchmarks/**, or the workflow file.
  • workflow_dispatch with inputs: versions, filter, job (short/default), build_current, current_oses, extra_feed (e.g. point at a PR preview package source).

Validation — real CI numbers

Proven on real GitHub-hosted runners (run 28379961911): all six published cells (3 OSes × 2 versions) plus both current cells (Linux x64, macOS arm64) succeeded and merged into one report. The current cells logged the native SHA swap, e.g. macOS built 726ab1f0… replacing the published 4523f8db…, confirming the working-tree binary was measured. Example (mean µs, baseline 4.150.0-preview.2.1):

OS Benchmark baseline 3.119.4 current
windows-x64 BlurImage 1024/σ1 1,793 2,134
linux-x64 BlurImage 1024/σ1 4,199 3,955 4,502
osx-arm64 BlurImage 1024/σ1 414 472 643

(The macOS current being slower than published despite identical m150 code is exactly the build-flag caveat above.)

Note: this is infrastructure only — no library code or public API changes.

Run the SkiaSharp micro-benchmarks on every PR that touches benchmarks and
compare performance across multiple SkiaSharp versions, so changes like the
SK_AVOID_SLOW_RASTER_PIPELINE_BLURS native blur fix can be measured with real
numbers instead of being eyeballed.

BenchmarkDotNet cannot host two versions of the same assembly in one process,
so the workflow benchmarks each (operating system, version) combination in its
own job and merges the JSON exports afterwards:

* SkiaSharp.Benchmarks.Compare - an isolated harness that links the same
  benchmark sources but restores a published SkiaSharp NuGet version. It
  deliberately opts out of the repo build infrastructure (empty
  Directory.Build.props/targets) and carries its own NuGet.config so it can
  resolve the exact released version it is asked to benchmark. Used for the
  reliable "published" comparison columns (latest stable, a 3.x release).
* The in-repo SkiaSharp.Benchmarks project benchmarks the working tree after a
  single-arch native build (Linux via the repo's cross Docker image, Windows
  and macOS via cake). This "current" path is best-effort (continue-on-error)
  because a from-source native build on a hosted runner is slow and can fail
  for reasons unrelated to the benchmarks.

A shared BenchmarkConfig adds the JsonExporter.FullCompressed export (which
cannot be selected from the command line in this BenchmarkDotNet version) so
every run produces the JSON the merge step consumes. scripts/benchmarks/merge-
benchmarks.py combines the per-run results into one Markdown table (mean
microseconds plus a ratio-vs-baseline column) written to the job summary.

Also adds a BlurImageFilterBenchmark that exercises the 8888 raster blur path
affected by the native flag (small-sigma slow path vs large-sigma control), and
switches SurfaceCanvasBenchmark to SKPath so the shared sources compile against
older SkiaSharp releases that predate SKPathBuilder.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

📦 Try the packages from this PR

Warning

Do not run these scripts without first reviewing the code in this PR.

Step 1 — Download the packages

bash / macOS / Linux:

curl -fsSL https://raw.githubusercontent.com/mono/SkiaSharp/main/scripts/get-skiasharp-pr.sh | bash -s -- 4276

PowerShell / Windows:

iex "& { $(irm https://raw.githubusercontent.com/mono/SkiaSharp/main/scripts/get-skiasharp-pr.ps1) } 4276"

Step 2 — Add the local NuGet source

dotnet nuget add source ~/.skiasharp/hives/pr-4276/packages --name skiasharp-pr-4276
More options
Option Description
--successful-only / -SuccessfulOnly Only use successful builds
--force / -Force Overwrite previously downloaded packages
--list / -List List available artifacts without downloading
--build-id ID / -BuildId ID Download from a specific build

Or download manually from Azure Pipelines — look for the nuget artifact on the build for this PR.

Remove the source when you're done:

dotnet nuget remove source skiasharp-pr-4276

mattleibow and others added 3 commits June 29, 2026 16:04
BenchmarkDotNet 0.13.5 treats --filter as a single list option and errors with
"Option 'f, filter' is defined multiple times" when the flag is repeated, so
the multiple benchmark globs must be passed as values of one --filter instead of
one flag each. Also point --artifacts at the artifact root so the JSON lands in
bench-out/results next to meta.json (no redundant results/results nesting).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GitHub's macOS runners resolve "shell: bash" to Apple's /bin/bash 3.2, where
expanding an empty array under "set -u" (the optional EXTRA_FEED args) aborts
the step with "unbound variable" before dotnet runs. Linux and Windows use
bash 5.x and were unaffected. Use the bash 3.2-safe ${arr[@]+"${arr[@]}"}
expansion so the optional feed args are omitted cleanly when unset.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The 'current' cells previously ran the in-repo SkiaSharp.Benchmarks project,
whose ProjectReference/native-asset chain pulls the multi-targeted binding
graph (Android/iOS native-asset projects), which requires mobile workloads
that the benchmark runners do not have (NETSDK1147), so the run never built.

Instead, reuse the proven Compare harness: restore the baseline published
managed package, then replace its native libSkiaSharp in the NuGet global
cache with the one freshly built from this PR. BenchmarkDotNet builds its
child project against that cache at run time, so the benchmark exercises the
working tree's native code with a known, stable managed API. This avoids the
multi-TFM/workload build entirely and works on the runners that already build
the native library (Linux, macOS).

Also attempt to install the MSVC Spectre-mitigated libs on Windows so the
from-source native build can link; this stays best-effort (continue-on-error).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

📖 Documentation Preview

The documentation for this PR has been deployed and is available at:

🔗 View Staging Site
🔗 View Staging Docs
🔗 View Staging Gallery (Blazor)
🔗 View Staging Gallery (Uno Platform)
🔗 View Staging SkiaFiddle

This preview will be updated automatically when you push new commits to this PR.


This comment is automatically updated by the documentation staging workflow.

…build-flag caveat

The Windows hosted image cannot build libSkiaSharp from source (missing
Windows SDK 10.0.19041 and Spectre-mitigated MSVC libs on the VS preview
image), so it produced a perpetually failing 'current' cell. Restrict the
'current' matrix to the runners where the from-source native build works
(Linux and macOS) via a new current_oses input, while every OS still takes
part in the published comparison.

Also document an important caveat: the from-source 'externals-*' native is not
built with the official NuGet's optimization flags, so 'current' absolute
timings differ from the optimized published packages. A native PR should be
assessed by comparing two 'current' runs (PR vs base) built the same way, not
by reading a current-vs-published ratio.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant