Add blame-hang-timeout to .NET test runs so hangs fail fast#4142
Open
mattleibow wants to merge 1 commit into
Open
Add blame-hang-timeout to .NET test runs so hangs fail fast#4142mattleibow wants to merge 1 commit into
mattleibow wants to merge 1 commit into
Conversation
The macOS (.NET Core) test job intermittently hangs for 30 min to 3 h (vs a healthy ~8 min) and occasionally hits the 180-min job timeout. There is currently no per-test or process-level hang timeout, so a stuck test host burns to the job cap and AzDO retries the whole tests-netcore step up to 3x. Add a process-level safety net via the VSTest blame-hang collector in the shared RunDotNetTest helper (covers all .NET Core desktop test projects run by the tests-netcore target). 15m is safely above the worst observed passing run (~13 min) so it won't false-fire, while bounding a true hang. A 'none' dump type avoids large artifacts. --blame-hang-timeout auto-enables blame mode, so no separate --blame flag is needed. The device/WASM runner path (RunDeviceRunnersTest) is intentionally left unchanged: those tests run in-process inside a MAUI/Blazor host and do not use the VSTest blame collector. Refs #4139 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
📦 Try the packages from this PRWarning Do not run these scripts without first reviewing the code in this PR. Step 1 — Download the packages bash / macOS / Linux: curl -fsSL https://raw.githubusercontent.com/mono/SkiaSharp/main/scripts/get-skiasharp-pr.sh | bash -s -- 4142PowerShell / Windows: iex "& { $(irm https://raw.githubusercontent.com/mono/SkiaSharp/main/scripts/get-skiasharp-pr.ps1) } 4142"Step 2 — Add the local NuGet source dotnet nuget add source ~/.skiasharp/hives/pr-4142/packages --name skiasharp-pr-4142More options
Or download manually from Azure Pipelines — look for the Remove the source when you're done: dotnet nuget remove source skiasharp-pr-4142 |
Contributor
|
📖 Documentation Preview The documentation for this PR has been deployed and is available at: 🔗 View Staging Site This preview will be updated automatically when you push new commits to this PR. This comment is automatically updated by the documentation staging workflow. |
mattleibow
added a commit
that referenced
this pull request
Jun 11, 2026
Follow-up from dual-model PR review: - Add the same always-run managed-only smoke test to the migrated SkiaSharp.Views.Gtk4.Tests project. Its other tests initialise native GTK4 in their constructors and skip every test on a headless/GTK-less agent; under Microsoft.Testing.Platform that all-skipped run would exit 8 (failure). The smoke test exercises pure-managed SkiaSharp geometry types (SKPointI/SKSizeI — no GTK, no native call). Verified: the suite runs 30 tests (1 executed, 29 skipped) and exits 0. (This project is not currently wired into CI, but it was migrated to MTP, so the guard keeps it safe if it is ever run standalone or added to a leg.) - Document in test-shared.cake why the hang-dump uses `--hangdump-type Mini` rather than #4142's VSTest `none`: MTP only detects a per-test hang via the HangDump extension, which always writes a dump (no "none" type exists); the global `--timeout` aborts without a dump but is a whole-session timeout, not per-test, so it is unsuitable. Mini is the smallest dump and only materialises on an actual hang. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
mattleibow
added a commit
that referenced
this pull request
Jun 16, 2026
Migrate test suite from xUnit v2 to xUnit v3 (#4143) Context: #4139 The macOS CI legs intermittently hung under the xUnit v2 console runner, which had no per-test timeout to recover a wedged test host. Rather than only paper over the hang, this migrates the entire test suite to xUnit v3 and drops two pieces of hand-rolled infrastructure in favour of native v3 features. Runners: * Desktop `dotnet test` projects move to Microsoft.Testing.Platform (MTP): OutputType=Exe, xunit.v3 + Microsoft.Testing.Extensions.TrxReport/HangDump, dropping Microsoft.NET.Test.Sdk, xunit.runner.visualstudio and XunitXml.TestLogger. * Device (MAUI) and WASM (Blazor) in-app runners move to DeviceRunners *.Xunit3 (.AddXunit3()) at 0.1.0-preview.11. * SkiaSharp.Views.Gtk4.Tests is migrated and wired into the tests-netcore leg so its conversion tests run in CI for the first time. * net48 (x86 and x64) is unchanged: each architecture now builds its own runnable exe, so the old console-runner `is32` bitness selector is removed as dead code. Native dynamic skip (drops Xunit.SkippableFact): * [SkippableFact]/[SkippableTheory] -> [Fact]/[Theory] * Skip.If/Skip.IfNot -> Assert.SkipWhen/Assert.SkipUnless * throw new SkipException(...) -> Assert.Skip(...) Assembly fixtures (drops the custom test framework): CustomTestFramework.cs and AssemblyFixtureAttribute.cs are deleted and the GarbageCleanupFixture rebinds to v3's native Xunit.AssemblyFixtureAttribute. ITestOutputHelper moves from Xunit.Abstractions to Xunit, and IAsyncLifetime now returns ValueTask. Zero-executed-test guard, with no masking: MTP exits 8 when zero tests run, and a dynamically-skipped test counts as not-run, so a fully-skipped suite also exits 8 (v2/VSTest treated all-skipped as success). Instead of suppressing exit 8 — which would also hide a real zero-discovery misconfiguration — the hardware-gated Vulkan and Direct3D suites each gain an always-run SmokeTest exercising a backend type that needs no GPU runtime (GRVkImageInfo, GRD3DTextureResourceInfo). The headless Linux Gtk4 leg is handled the same way: gtk_init/gtk_init_check call native exit() with no display, so the three display-dependent tests are gated behind a managed DISPLAY/WAYLAND_DISPLAY check before any GTK call while the ~26 conversion tests still run (libgtk-4-1 is installed on the agent). No --ignore-exit-code or allowNoTests masking remains. Hang protection / #4142 reconciliation: RunDotNetTest forwards MTP hang-dump args (--hangdump --hangdump-timeout 15m --hangdump-type Mini) — the MTP equivalent of #4142's VSTest --blame-hang-timeout. MTP HangDump has no "none" type, so the smallest (Mini) is used; these supersede #4142's VSTest blame flags on merge. Packaging: DeviceRunners.*.Xunit3 are now mirrored to the dnceng dotnet-public feed, so nuget.config restores exclusively from the two dnceng mirrors with no nuget.org source and no packageSourceMapping. No package uses a floating `*` version; SkiaSharp.Tests.Integration requires an explicit -p:SkiaSharpVersion= and fails fast via a ValidateVersions target. CI test-results publishing is switched from xUnit/TestResults.xml to VSTest/*.trx to match MTP's output so the desktop legs (including the 32-bit run) keep publishing. Closes #4139. Co-authored-by: Matthew Leibowitz <mattleibow@live.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a process-level hang-timeout safety net so a stuck .NET Core test fails fast (~15 min) instead of burning to the 180-min job cap.
This came out of the investigation into intermittent
macOS (.NET Core)test-job hangs — see #4139 for the full root cause.Background
The
macOS (.NET Core)test job intermittently hangs for 30 min – 3 h (vs a healthy ~8 min) and sometimes hits the 180-min job timeout. Root cause: a GC-finalizer-bound stress test (SKBitmapThreadingTest.ImageScalingMultipleThreadsTest) can take 13–15 min or crash the test host on a contended Microsoft-hosted macOS VM. When the host crashes, AzDO retries the entiretests-netcorestep up to 3× (retryCountOnTaskFailure: 3), bounded only bytimeoutInMinutes: 180. There is currently no per-test or process-level hang timeout.Change
In the shared
RunDotNetTesthelper (scripts/infra/tests/test-shared.cake), append the VSTest blame-hang collector flags to theArgumentCustomization. This single place covers all .NET Core desktop test projects run by thetests-netcoretarget:--blame-hang-timeoutauto-enables blame-hang mode (no separate--blameneeded).nonedump type avoids large dump artifacts (can revisit tomini/fulllater if a stack trace is wanted).The device/WASM runner path (
RunDeviceRunnersTest) is intentionally left unchanged: those tests run in-process inside a MAUI/Blazor host (DeviceRunners) and do not use the VSTest blame collector, so these flags don't apply there.Verification
C#-only/infra change — bootstrapped with
externals-download(no native rebuild). Ran a filtered subset of the netcore tests locally; the runner accepted the--blame-hang-timeout/--blame-hang-dump-typeflags and executed tests normally (no argument-parsing rejection).Refs #4139