Perf #856: throw (not a returning call) at loop-backedge cancel path — closes objects/strings/closures to Node parity#876
Merged
Conversation
…path Every compiled loop polls a cooperative-cancellation flag at its backedge (#74). The flag read is free, but the cold path was `call CheckCancellation()` — a helper that throws internally. From RyuJIT's flow-graph view that is a *returning* call, and on SysV x64 every XMM register is caller-saved, so a returning call inside a loop forces the loop-carried doubles (and counter) to be stack-resident on every iteration: a load/store per use, roughly doubling a tight numeric loop. Fix: the backedge now emits `call $Runtime.BuildCancellationException(); throw`. The new factory only *constructs* the OperationCanceledException; the `throw` opcode happens at the backedge. Because `throw` does not return, the loop vars are dead on the cancel path and stay in registers on the hot path. CheckCancellation() is retained for the non-hot-loop sites (event loop, deep-recursion guard). Cancellation semantics are unchanged — same exception, same message, thrown at the same point. Controlled microbench (result*=i loop): `call CheckCancellation()` 2.09 ns/iter vs `throw Factory()` 1.15 ns/iter (volatile-read-only is also 1.15 — the read was never the cost). Real benchmarks @largest size, compiled vs Node: - objects 2.52x slower -> 1.00x (parity) - strings 1.26x slower -> 0.91x (faster than Node) - closures 1.13x slower -> 1.02x (parity) - count-primes 1.45x slower -> 1.13x - factorial 2.27x slower -> 1.22x 5/7 workloads now meet-or-beat Node; benefits all compiled loops. This supersedes the inline-volatile form (#874), which removed the unconditional call overhead but left the returning call in the loop's flow graph. IL verifies; #74 infinite-loop cancellation test still unwinds.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes most of the remaining compiled-vs-Node perf gap (epic #856) with a single low-risk codegen change to the loop-backedge cancellation check.
Every compiled loop polls a cooperative-cancellation flag at its backedge so the runner can unwind runaway loops (#74). The flag read is free, but the cold path was
call $Runtime.CheckCancellation()— a helper that throws internally. From RyuJIT's flow-graph view that is a returning call, and on SysV x64 every XMM register is caller-saved, so a returning call inside a loop forces the loop-carried doubles (and the counter) to be stack-resident on every iteration — a load/store per use, roughly doubling a tight numeric loop.Fix: the backedge now emits
call $Runtime.BuildCancellationException(); throw. The new factory only constructs theOperationCanceledException; thethrowopcode happens at the backedge. Becausethrowdoes not return, the loop vars are dead on the cancel path and stay in registers on the hot path.CheckCancellation()is retained for the non-hot-loop sites (event loop, deep-recursion guard).This supersedes the inline-volatile form (#874), which removed the unconditional call overhead but left the returning call in the loop's flow graph, so the XMM spill remained. A throttle-every-N variant ties it for the same reason. The fix is structural: make the cancel path non-returning.
Why it works (controlled microbench,
result*=iloop, Linux x64 / .NET 10)if(flag) call CheckCancellation()(old)if(flag) throw Factory()(call factory, throw result)The entire penalty was the returning call, not the flag read.
Results — compiled vs Node, min times at largest input
5 of 7 workloads now meet or beat Node; the other two are within ~1.2× and now at the codegen floor (V8's multiply loop is ~0.2 ns/iter tighter; count-primes' residual is
List<bool>index-write bounds checks). The change benefits all compiled loops, not just the benchmark suite.Correctness
--compile … --verify).Execute_InfiniteLoop_CancellationUnwindsCooperatively(Test262 runner: compile-mode cooperative cancellation #74 guard) passes — runaway loops still unwind with the sameOperationCanceledException.Files
Compilation/EmittedRuntime.cs—BuildCancellationExceptionMethodfieldCompilation/RuntimeEmitter.RuntimeClass.cs— emit the factory methodCompilation/StatementEmitterBase.cs—EmitCancellationCheckemitscall … BuildCancellationException(); throwSTATUS.md— §18 updatedRefs #856, #74. Supersedes the codegen approach in #874.