Add cross-version benchmarks CI workflow#4276
Open
mattleibow wants to merge 5 commits into
Open
Conversation
Run the SkiaSharp micro-benchmarks on every PR that touches benchmarks and compare performance across multiple SkiaSharp versions, so changes like the SK_AVOID_SLOW_RASTER_PIPELINE_BLURS native blur fix can be measured with real numbers instead of being eyeballed. BenchmarkDotNet cannot host two versions of the same assembly in one process, so the workflow benchmarks each (operating system, version) combination in its own job and merges the JSON exports afterwards: * SkiaSharp.Benchmarks.Compare - an isolated harness that links the same benchmark sources but restores a published SkiaSharp NuGet version. It deliberately opts out of the repo build infrastructure (empty Directory.Build.props/targets) and carries its own NuGet.config so it can resolve the exact released version it is asked to benchmark. Used for the reliable "published" comparison columns (latest stable, a 3.x release). * The in-repo SkiaSharp.Benchmarks project benchmarks the working tree after a single-arch native build (Linux via the repo's cross Docker image, Windows and macOS via cake). This "current" path is best-effort (continue-on-error) because a from-source native build on a hosted runner is slow and can fail for reasons unrelated to the benchmarks. A shared BenchmarkConfig adds the JsonExporter.FullCompressed export (which cannot be selected from the command line in this BenchmarkDotNet version) so every run produces the JSON the merge step consumes. scripts/benchmarks/merge- benchmarks.py combines the per-run results into one Markdown table (mean microseconds plus a ratio-vs-baseline column) written to the job summary. Also adds a BlurImageFilterBenchmark that exercises the 8888 raster blur path affected by the native flag (small-sigma slow path vs large-sigma control), and switches SurfaceCanvasBenchmark to SKPath so the shared sources compile against older SkiaSharp releases that predate SKPathBuilder. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
📦 Try the packages from this PRWarning Do not run these scripts without first reviewing the code in this PR. Step 1 — Download the packages bash / macOS / Linux: curl -fsSL https://raw.githubusercontent.com/mono/SkiaSharp/main/scripts/get-skiasharp-pr.sh | bash -s -- 4276PowerShell / Windows: iex "& { $(irm https://raw.githubusercontent.com/mono/SkiaSharp/main/scripts/get-skiasharp-pr.ps1) } 4276"Step 2 — Add the local NuGet source dotnet nuget add source ~/.skiasharp/hives/pr-4276/packages --name skiasharp-pr-4276More options
Or download manually from Azure Pipelines — look for the Remove the source when you're done: dotnet nuget remove source skiasharp-pr-4276 |
BenchmarkDotNet 0.13.5 treats --filter as a single list option and errors with "Option 'f, filter' is defined multiple times" when the flag is repeated, so the multiple benchmark globs must be passed as values of one --filter instead of one flag each. Also point --artifacts at the artifact root so the JSON lands in bench-out/results next to meta.json (no redundant results/results nesting). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GitHub's macOS runners resolve "shell: bash" to Apple's /bin/bash 3.2, where
expanding an empty array under "set -u" (the optional EXTRA_FEED args) aborts
the step with "unbound variable" before dotnet runs. Linux and Windows use
bash 5.x and were unaffected. Use the bash 3.2-safe ${arr[@]+"${arr[@]}"}
expansion so the optional feed args are omitted cleanly when unset.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The 'current' cells previously ran the in-repo SkiaSharp.Benchmarks project, whose ProjectReference/native-asset chain pulls the multi-targeted binding graph (Android/iOS native-asset projects), which requires mobile workloads that the benchmark runners do not have (NETSDK1147), so the run never built. Instead, reuse the proven Compare harness: restore the baseline published managed package, then replace its native libSkiaSharp in the NuGet global cache with the one freshly built from this PR. BenchmarkDotNet builds its child project against that cache at run time, so the benchmark exercises the working tree's native code with a known, stable managed API. This avoids the multi-TFM/workload build entirely and works on the runners that already build the native library (Linux, macOS). Also attempt to install the MSVC Spectre-mitigated libs on Windows so the from-source native build can link; this stays best-effort (continue-on-error). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
📖 Documentation Preview The documentation for this PR has been deployed and is available at: 🔗 View Staging Site This preview will be updated automatically when you push new commits to this PR. This comment is automatically updated by the documentation staging workflow. |
…build-flag caveat The Windows hosted image cannot build libSkiaSharp from source (missing Windows SDK 10.0.19041 and Spectre-mitigated MSVC libs on the VS preview image), so it produced a perpetually failing 'current' cell. Restrict the 'current' matrix to the runners where the from-source native build works (Linux and macOS) via a new current_oses input, while every OS still takes part in the published comparison. Also document an important caveat: the from-source 'externals-*' native is not built with the official NuGet's optimization flags, so 'current' absolute timings differ from the optimized published packages. A native PR should be assessed by comparing two 'current' runs (PR vs base) built the same way, not by reading a current-vs-published ratio. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a PR-triggered Benchmarks workflow that runs the SkiaSharp micro-benchmarks across multiple SkiaSharp versions on Linux, Windows, and macOS, then merges the results into a single comparison report posted to the job summary.
This grew out of validating the native blur fix (
SK_AVOID_SLOW_RASTER_PIPELINE_BLURS): there was no way to measure such changes with real numbers across versions and platforms. Now there is.Why a build matrix instead of one process
BenchmarkDotNet cannot host two different versions of the same assembly in a single process (type-identity clash — there is no built-in
WithNuGet). So each(os, version)combination is benchmarked in its own job and the JSON exports are merged afterwards.How it works
published—SkiaSharp.Benchmarks.Compare, an isolated harness that links the same benchmark sources but restores a published SkiaSharp NuGet version from nuget.org. It deliberately opts out of the repo build infrastructure (emptyDirectory.Build.props/.targets) and carries its ownNuGet.configso it resolves the exact released version requested. Fast and reliable — always reports real numbers on all three OSes. Default versions:4.150.0-preview.2.1(baseline) and3.119.4.current— benchmarks the working tree's native code. Rather than building the multi-targeted in-repo binding graph (which drags in the mobile native-asset projects and needs the Android/iOS workloads the runners don't have —NETSDK1147), it reuses the sameCompareharness: it restores the baseline published managed package, then replaces that package'slibSkiaSharpin the NuGet global cache with the one freshly built from this PR. BenchmarkDotNet builds its child project against that cache at run time, so the benchmark exercises this PR's native code with a known, stable managed API. The job logs the SHA before/after the swap to prove the working-tree binary is the one being measured. Best-effort (continue-on-error); scoped to the OSes whose native library reliably builds from source on a hosted runner (Linux + macOS).report—scripts/benchmarks/merge-benchmarks.pymerges every run into one Markdown table (mean µs + ratio-vs-baseline) written to$GITHUB_STEP_SUMMARYand uploaded as an artifact.Reading the⚠️
currentcolumnThe from-source
externals-*native is not built with the same optimization/official-build flags as the shipped NuGet native, so its absolute timings are systematically different (typically slower) than the optimized published packages. Do not read acurrent-vs-publishedratio as the effect of a PR's code. To measure a native change, compare twocurrentruns built the same way (the PR branch vs its base). Thepublishedcolumns are the apples-to-apples comparison across released versions.The Windows hosted image currently cannot build the native library from source (missing Windows SDK
10.0.19041and Spectre-mitigated MSVC libs on the VS preview image), so Windows is excluded fromcurrentby default and still appears in thepublishedcomparison. It can be opted back in via thecurrent_osesdispatch input once the toolchain is available.Other changes
BenchmarkConfig(shared) addsJsonExporter.FullCompressed— that export cannot be selected from the command line in this BenchmarkDotNet version, so it is configured in code to guarantee every run produces the JSON the merge step consumes.BlurImageFilterBenchmarkexercises the 8888 raster blur path affected by the native flag (small-sigma slow path vs large-sigma control).SurfaceCanvasBenchmarkswitched toSKPath(CS0618 suppressed) so the shared sources compile against older releases that predateSKPathBuilder.Triggers
pull_requesttouchingbenchmarks/**,scripts/benchmarks/**, or the workflow file.workflow_dispatchwith inputs:versions,filter,job(short/default),build_current,current_oses,extra_feed(e.g. point at a PR preview package source).Validation — real CI numbers
Proven on real GitHub-hosted runners (run
28379961911): all sixpublishedcells (3 OSes × 2 versions) plus bothcurrentcells (Linux x64, macOS arm64) succeeded and merged into one report. Thecurrentcells logged the native SHA swap, e.g. macOS built726ab1f0…replacing the published4523f8db…, confirming the working-tree binary was measured. Example (mean µs, baseline4.150.0-preview.2.1):(The macOS
currentbeing slower thanpublisheddespite identical m150 code is exactly the build-flag caveat above.)