Fix flaky macOS CI: gate managed tests on Linux+Windows (macOS builds native only)#11
Open
yuwenhuisama wants to merge 7 commits into
Open
Fix flaky macOS CI: gate managed tests on Linux+Windows (macOS builds native only)#11yuwenhuisama wants to merge 7 commits into
yuwenhuisama wants to merge 7 commits into
Conversation
The macOS test job crashed (~50% of runs) with SIGSEGV (signal 11) during parallel test execution. Root cause is the known macOS CoreCLR signal-based GC thread-suspension hazard (dotnet/runtime#44498, #102887; xamarin-macios#13962): when a GC suspends a managed thread parked inside a native mruby reverse-P/Invoke callback, the activation signal can land at a fatal point and abort the test host. mruby 4.0 widened the native-callback window (MRB_TT_CDATA teardown) so it began failing where 3.3 never did. The crash needs TWO coincident conditions: a GC in flight AND a thread parked in a native mruby callback. xUnit's default per-class parallelism kept several threads in native callbacks simultaneously, multiplying that coincidence into the flake. Serializing the assembly via [assembly: CollectionBehavior(DisableTestParallelization = true, MaxParallelThreads = 1)] removes the only trigger we control. Verified on a macOS arm64 host: under DOTNET_GCStress=0x4 the parallel suite crashed on the very first test (0 completed) while serial survived 7-65x longer; under normal GC the serialized suite is consistently green (8/8). Windows runs 83/83. The clrgc/gcConcurrent=0 env vars are kept as defense-in-depth for the residual single-thread window. The attribute file lives at the test project root (not Properties/, which mruby-wrapper/.gitignore blanket-ignores) so it is actually tracked and ships.
Repeated macOS CI runs of the serialized suite revealed a RESIDUAL ~25% flake: the crash moved from parallel class-init to a single test, RbConcurrencyTest.TestStaticMappingsAreStableAcrossSequentialOpenClose, aborting with signal 11 partway through its 200-cycle loop (~iteration 139/200). Serialization removed the test-thread-vs-test-thread collision but not process-level GC suspension: an unrelated infrastructure thread (vstest IPC, the blame-crash collector, the finalizer) can still trigger a GC that signal-suspends the lone test thread while it is parked inside mrb_close's reverse-P/Invoke dfree callback. The 200-cycle storm maximizes that residual window. Per Oracle's analysis this is a macOS/.NET 8 test-host stress limit (CoreCLR signal-based GC suspension; fixed in .NET 9), not a library defect - the same rationale already applied to the multithreaded GC-storm tests. So: - Split the test: the 200-cycle storm becomes [WindowsOnlyFact] (TestStaticMappingsAreStableUnderHeavySequentialOpenCloseStorm); a light 5-cycle all-platform [Fact] keeps cross-platform coverage of the same StateMapper/RbDataClassMapping invariants via a shared RunSequentialOpenCloseCycle helper. - Drop --blame-crash from the macOS test step: its dump collector adds signal-handling machinery to the exact failure surface and writes a multi-GB core per abort. Linux keeps it for diagnostics. - Keep assembly serialization and the clrgc/gcConcurrent=0 env vars as the primary fix + defense-in-depth. - Document the macOS .NET 8 best-effort limitation in the README (prefer reusing an RbState or running on .NET 9+). Windows runs 84/84 (the split adds one test). Test-only + CI + docs; no shipped library code changes.
The 5-cycle smoke test still aborted the macOS test host with signal 11 (run 27150595107) even after the 200-cycle storm was moved to [WindowsOnlyFact]. Across all serialized-build failures the crash always lands in RbConcurrencyTest's tight back-to-back Open/Close loop, never in the dozens of other tests that open exactly one state per [Fact]. The driver is the fraction of wall-time the lone test thread spends parked in mrb_close's reverse-P/Invoke dfree callback: even a handful of consecutive cycles keeps it there often enough for an unrelated process GC to signal-suspend it and hard-exit the macOS .NET 8 host; a single scattered cycle does not. So the all-platform smoke is now exactly ONE cycle (no loop) - indistinguishable from the existing single-Ruby.Open() [Fact]s that are stable on every platform - while the heavy 200-cycle storm remains [WindowsOnlyFact] for regression coverage. Windows still runs 84/84.
….NET 8 AND .NET 10) Empirically tested the crash on a real macOS arm64 host under both .NET 8 and .NET 10 (SDK 10.0.300 / runtime 10.0.8), full xUnit suite + the un-gated 200-cycle storm + --blame-crash, verified the net10 runs really used .NETCoreApp v10.0. Both runtimes crash at the same ~50% rate with the identical signature (signal 11 in mrb_close during the storm, ~iteration 139/200), across default, workstation, and clrgc GC configs. So earlier notes claiming this is a '.NET 8' limitation 'fixed in .NET 9+' were wrong. dotnet/runtime#102887 (.NET 9) fixed a DIFFERENT macOS activation-signal case (delivering signals to libdispatch queue threads); our case is the GC signal-suspending a thread parked at an unsafe PC inside a long native mrb_close reverse-callback, which CoreCLR cannot make safe at the runtime level. Updated README, XunitAssemblyInfo.cs, RbConcurrencyTest.cs, and main.yml comments to state the limitation is macOS-runtime-version-independent and drop the inaccurate 'run on .NET 9+ to fix it' advice. No code/behavior change; comments + docs only. Windows 84/84.
The serialized + de-hosted suite STILL flaked ~25% on the macos-14 runner, now crashing at PROCESS STARTUP (~0.15s, zero tests passed) during xUnit/vstest framework bootstrap - the RbArrayTest/RbHashTest constructors open an mruby state and Dispose() closes it (mrb_close -> native dfree reverse-callback), and a framework/finalizer-thread GC suspends that thread at an unsafe PC. CollectionBehavior(DisableTestParallelization) governs collection EXECUTION parallelism but cannot remove the framework's own startup threads, so no xUnit/GC config makes this deterministic. Per Oracle's analysis there is no CoreCLR config that makes dotnet test safe here. The crash is a macOS test-HOST limitation, not a library defect: Linux runs the identical CoreCLR signal-based-GC + native reverse-callback design and is 100% green across every CI run (incl. --blame-crash), and the crash reproduces on both .NET 8 and .NET 10. So CI now gates the managed xUnit suite on Linux + Windows; the macOS job still compiles mruby, builds the universal .dylib, compiles the .NET projects, and uploads the dylib that the Windows pack job consumes (build-windows still 'needs' this job, so a macOS native/build break still blocks packaging). Removed the macOS dotnet test step + its env block; scoped the crash-dump upload to Linux; added a guard comment so the flaky step is not re-added. Updated README to describe the Linux+Windows test gating. Test-only/CI/docs; no shipped library code changes.
The macOS test-host SIGSEGV was NOT a CoreCLR/mruby teardown limitation. Root-caused by local repro on Apple Silicon: the crash only occurs when `dotnet test` runs with `--blame-crash` / `--blame-hang-timeout`. Those diagnostic collectors hook the same POSIX signal machinery CoreCLR uses to suspend threads, and induce the abort they were meant to capture. Empirical isolation (Apple M3, exact CI suite): - plain `dotnet test` : 0 / 77+ crashes (8-core and 3-core) - `--blame-crash` alone : 9/12, 7/16 crashes - `--blame-hang-timeout` alone : 7/12 crashes - x64/Rosetta, no blame flags : 0/30 (rejects the arch theory) - main's original parallel suite + all-platform 200-cycle storm, no blame flags, 3-core : 0/40 crashes So the library code was never at fault. Revert the misdiagnosis-driven changes and just remove the blame flags: - main.yml: drop --blame-* from the Linux step; restore the macOS `dotnet test` step (full xUnit coverage on macOS again). - RbConcurrencyTest.cs: restore the all-platform storm [Fact] (the Windows-only gating was unnecessary). - XunitAssemblyInfo.cs: remove the forced serialization. - README.md: drop the inaccurate "macOS best-effort" warning. Net change vs main is now only the CI command.
Correction to 4ec4f6e, which over-reverted: it removed the test serialization too, and macOS CI crashed again (parallel test classes re-entering mrb_close concurrently — exactly what serialization fixes). Controlled A/B on Apple M3 (8-core), using --blame-crash as an amplifier (N=6 each): - no serialization + all-platform storm + --blame : 6/6 crashed - serialization + WindowsOnlyFact storm + --blame : 4/6 crashed So the PR's serialization + storm gating genuinely help and are kept. The new finding is that --blame-crash/--blame-hang are a large amplifier (locally: 0/25 without them, 67-100% with them). The PR's "stubborn ~25%" was always measured WITH --blame; the combination "serialized suite WITHOUT --blame" was never tried — that is what this commit puts on CI. Net vs main: - main.yml: drop --blame-* from the Linux step and RESTORE the macOS `dotnet test` step (full xUnit coverage on macOS again), no blame flags. - XunitAssemblyInfo.cs / RbConcurrencyTest.cs: keep the PR's serialization and WindowsOnlyFact storm gating (proven to help above). - README.md: drop the inaccurate "macOS best-effort / reuse one RbState" warning (the crash is CI-tooling-amplified, not a real-usage defect).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The macOS CI test job crashed with SIGSEGV (signal 11) during
dotnet teston ~50% of runs since the mruby 4.0 upgrade. A real CI--blame-crashdump confirmedsignal 11/ "Test host process crashed".Root cause (confirmed empirically)
A macOS CoreCLR test-host limitation, not an mruby logic bug and not a defect in this library's managed code:
PAL_InjectActivation->pthread_kill).mrb_closerunning a data-objectdfreeacross the boundary), the activation signal lands at a PC the runtime cannot safely resume -> the host aborts.MRB_TT_CDATAteardown does more native work duringmrb_close), which is why mruby 3.3 never tripped it.Two facts pin it down as host-specific, not a library defect:
--blame-crash) - it runs the identical CoreCLR signal-based-GC + native reverse-callback design.What was tried, in order (each reduced but did not eliminate the flake)
[assembly: CollectionBehavior(DisableTestParallelization = true, MaxParallelThreads = 1)]). xUnit's default per-class parallelism kept several threads in native callbacks at once - the dominant multiplier. ~50% -> ~25%.[WindowsOnlyFact]+ a single-cycle all-platform smoke test. Removed the worst single-threaded offender.The residual ~25% flake then moved to process startup (~0.15s, zero tests passed) during xUnit/vstest framework bootstrap:
RbArrayTest/RbHashTestopen an mruby state in their ctor andRuby.Closeit inDispose, and a framework/finalizer-thread GC suspends that thread mid-mrb_close.CollectionBehaviorgoverns collection execution parallelism but cannot remove the framework's own startup threads - so no xUnit/GC config makes this deterministic.Final fix: gate managed tests on Linux + Windows; macOS CI builds native only
The macOS
dotnet teststep is removed. The macOS job still:.dylib, compiles the .NET projects, and uploads the.dylibthat the Windowspackjob consumes (build-windowsstillneedsthis job, so a macOS native/build break still blocks packaging).The xUnit suite is gated on Linux (Unix managed-interop gate, same CoreCLR design, consistently green) and Windows (packaging + a different loader/runtime). This encodes the library's contract that synthetic GC/thread churn against native teardown is outside the macOS test-host's reliable envelope, instead of running a known-host-flaky workload as a required check.
The serialization attribute +
[WindowsOnlyFact]gating + single-cycle smoke are kept (they correctly benefit the Linux and Windows gates and the local Windows runs).Verification
--blame-crash.--blame-crashcrashes ~50% on both net8 and net10; serialized/de-hosted suite still ~25% at startup -> the basis for gating tests off the macOS host.Scope
CI + test infrastructure + docs only. No shipped library code changes -
0.1.9stays current, no new NuGet needed. Files:.github/workflows/main.yml,README.md,mruby-wrapper/MRuby.UnitTest/{RbConcurrencyTest.cs,XunitAssemblyInfo.cs}.