Perf #861: de-virtualize annotated-param array HOF callbacks#868
Merged
Conversation
Part of #856 (perf epic). The array HOFs (map/filter/reduce/forEach/find/some/ every) already had a direct-delegate fast path (TryEmitDirectDelegateCall -> Array*Direct(List<object>, Func<object,...>)), but it bailed at `if (p.Type != null) return false`: an annotated callback like `(x: number): number => x*2` compiles to a typed static method (double(double)) that cannot bind to Func<object,object>, so it fell back to the reflective ArrayMap + $TSFunction/MethodInvoker per-element path (MethodInvoker.Invoke + a 3-element object[] alloc + boxing per element). Idiomatic TypeScript annotates callback params, so the array-methods benchmark hit the reflective path on every callback -- the same reflective-dispatch class #858 just won ~40x on, not boxing. Fix: a new ArrowBoxedAdapterEmitter emits a per-arrow boxed adapter object(object[,object]) that unboxes/casts each boxed element into the arrow's typed parameter slot, calls the typed arrow, then reboxes the result. It binds to the existing Func<object,object>/Func<object,object,object> and feeds the unchanged Array*Direct helpers -- so the per-element MethodInvoker dispatch is gone. The unbox/box marshalling reuses DelegateAdapterEmitter.EmitUnboxForReturn/ EmitBoxForTS (now internal static), which matches the reflective MethodInvoker's no-arg-conversion regime exactly for concrete double/bool/string params (unions/ nullable already widen to object in ParameterTypeResolver). The adapter is emitted onto the arrow's staticMethod.DeclaringType (the $Program TypeBuilder), NOT ctx.ProgramType -- ProgramType is set only on the module-top- level context, so keying on it would have gated the optimization to top-level and left the in-function benchmark calls reflective. Gated conservatively: non- capturing arrows only (capturing defers to the reflective path, a follow-up), marshallable param/return types only. arrayMethodWork@10k warm (Release): ~3.31 -> ~1.22 ms/call (~2.7x), identical output, zero reflective dispatch / object[] per element; chained map().filter().reduce() emits 3 adapters, 0 ldtoken. Standalone-DLL constraint preserved (adapter is an emitted static method, no SharpTS.dll reference). Deferred to follow-ups: chained-stage List<->$Array round-trip elimination (L2), capturing annotated arrows (L3), bool-return adapter -> *DirectBool (L4). 24 new dual-mode tests (ArrayHofAnnotatedCallbackTests). Full dotnet test green except known flaky (UsingDeclaration/NumericSeparator pass in isolation) and the stale Test262 baselines (the only Test262 "regressions" are Array/isArray TypeCheckError/Proxy drift that also appears in interpreter mode -- a compiled- only change cannot cause those). --verify clean.
Follow-up to the #861 Layer 1 boxed-adapter work. A chained array expression like arr.map(f).filter(g).reduce(h) wrapped each intermediate result back into a $Array (EmitPostCallAdjust, returnsNewArray) only for the next stage to immediately unwrap it to a List<object> again (EmitGetListFromArrayOrList). The intermediate array is anonymous -- it can only ever be the receiver of the one following array-method call -- so its $Array identity is unobservable. TryEmitMethodCall now threads a leaveResultAsBareList flag: when a call's receiver is itself a fresh-array-producing array method on an array receiver (TryEmitChainedArrayReceiverAsBareList), the inner call is emitted with its $Array wrap suppressed (bare List<object> left on the stack) and the outer call skips the unwrap. The flag recurses, so a 3-stage chain drops both intermediate boundaries while the FINAL stage still wraps to $Array when its result is consumed as an array value (.length, assignment, ===). Disabled for returnsReceiver methods (sort/ reverse/fill/copyWithin, which need the original wrapper) and the await-spill path. The benchmark's map().filter().reduce() now emits zero newobj $Array / get_Elements between stages (IL-verified). Warm throughput is flat (~1.4 ms/call, identical to L1 in a controlled A/B) -- the win is allocation/GC reduction and cold-start, not warm steady-state, since per-element work dominates at large n. 8 new dual-mode chain tests (incl. final-wrap-preserved, map->slice plain-args boundary, mixed typed/untyped). Full dotnet test green except known flaky (pass in isolation) and the stale Test262 baselines (regressions are the same pre-existing Array/isArray TypeCheckError/Proxy drift, also present in interpreter mode -- unchanged by L2). --verify clean.
Layers 1/2 handled non-capturing annotated callbacks. A capturing arrow (e.g.
arr.map((x: number): number => x + k)) compiles to a typed INSTANCE Invoke on a
display class, not a static $Program method, so L1's adapter path bailed
(DisplayClasses.ContainsKey) and it fell back to per-element reflective
$TSFunction/MethodInvoker dispatch.
L3 emits the boxed adapter as an INSTANCE method on the arrow's display class
(object Invoke$box(object[,object]) -> unbox -> callvirt this.Invoke -> box) and
binds the delegate to (displayInstance, ldftn adapter). The display instance is
built fresh at the call site via the existing EmitCapturingArrowDisplayInstance
(exposed through IEmitterContext.TryEmitCapturingArrowDisplayInstance), exactly as
the reflective path does — so per-iteration fresh-binding semantics are preserved
(verified: a loop pushing arr.map(x => x + i)[0] yields 10,11,12, not a shared i).
ArrowBoxedAdapterEmitter.GetOrEmit gained an `instance` flag selecting the carrier
(static $Program vs the display class), arg base (arg0 = `this` for instance), and
call opcode (call vs callvirt); the adapter name {method.Name}$box{arity} stays
collision-free ("Invoke$box1" is unique within the per-arrow display class, since
two arrows sharing a class would already collide on "Invoke"). TryBindTypedArrowAdapter
now branches on DisplayClasses: capturing -> instance adapter + display instance;
non-capturing -> the L1 static path.
Capturing chained map().filter().reduce() now emits instance adapters with zero
reflective ldtoken/$TSFunction (IL-verified), ~1.1 ms/call warm (similar to the
non-capturing ~2.4x over main's reflective path). 5 new dual-mode tests (capturing
map/reduce/chained + the per-iteration fresh-capture guard). Full dotnet test green
except known flaky (all pass in isolation; failing set disjoint across runs) and the
stale Test262 baselines (regressions are the same pre-existing Array/isArray drift,
present in interpreter mode too — no closure/arrow test regressed). --verify clean.
… + IsTruthy) Layers 1/3 routed every typed annotated callback through an object-returning boxed adapter bound to Func<object,object> + the object-callback helper, which for a boolean predicate (filter/find/findIndex/some/every) boxes the bool result and then calls IsTruthy on it per element. The predicate helpers already have Func<object,bool> *DirectBool variants (used by the untyped path for bool(object) arrows). L4 routes a typed bool-returning predicate to them: when the arrow returns bool and a boolHelper exists, the adapter is emitted to return the unboxed bool directly (no rebox) and the call site uses the *DirectBool helper — dropping the per-element box + IsTruthy. ArrowBoxedAdapterEmitter.GetOrEmit gained a `boolReturn` flag (adapter return type bool vs object, skip the trailing box, distinct cache key + name marker $bbox); TryBindTypedArrowAdapter binds Func<object,bool>; TryEmitDirectDelegateCall computes wantBool = boolHelper != null && arrow returns bool and selects the variant. Composes with L3: a capturing bool predicate emits a bool-returning INSTANCE adapter bound to ArrayFilterDirectBool. filter/some/every/find/findIndex with typed predicates now emit $bbox adapters + *DirectBool helpers with zero object-variant IsTruthy (IL-verified). +4 dual-mode tests (capturing bool predicate, some/every short-circuit). Full dotnet test green except the stale Test262 baselines (same pre-existing Array/isArray drift; no predicate/filter/some/every/find test regressed). --verify clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of #856 (perf epic), closes #861. The last reflective-dispatch hotspot in the epic — the same class of cost #858 won ~40× on, now for array higher-order-function callbacks.
Problem
arr.map/filter/reduce/forEach/find/some/everyalready had a direct-delegate fast path (TryEmitDirectDelegateCall→Array*Direct(List<object>, Func<object,…>), "#96 Phase A"), but it bailed on annotated callback params (if (p.Type != null) return false). An annotated arrow(x: number): number => x*2compiles to a typed static methoddouble(double)that cannot bind toFunc<object,object>, so idiomatic TypeScript (which annotates callback params — including the entirearray-methodsbenchmark) fell back to the reflectiveArrayMap+$TSFunction/MethodInvokerpath: aMethodInvoker.Invoke+ a 3-elementobject[]allocation + boxing per element.Fix (4 layers)
L1 — boxed adapter (non-capturing). A new
ArrowBoxedAdapterEmitteremits a per-arrowobject(object[,object])adapter that unboxes each boxed element into the arrow's typed slot, calls the typed arrow, and reboxes the result; it binds to the existingFunc<object,…>and feeds the unchangedArray*Directhelpers. Marshalling reusesDelegateAdapterEmitter.EmitUnboxForReturn/EmitBoxForTS(nowinternal static), which matches the reflectiveMethodInvokerno-arg-conversion semantics exactly for concretedouble/bool/stringparams (unions/nullable already widen toobjectinParameterTypeResolver). The adapter is emitted onto the arrow'sstaticMethod.DeclaringType(notctx.ProgramType, which is set only on the top-level context) so it fires inside function/method bodies too — the dominant benchmark case.L2 — chained-stage round-trip elimination.
a.map(f).filter(g).reduce(h)wrapped each intermediate result into a$Arrayonly for the next stage to unwrap it. A recursiveleaveResultAsBareListflag now lets the inner stage leave a bareList<object>and the outer skip the unwrap; the final stage still wraps when consumed as an array value (.length/assign/===). The intermediate is anonymous, so its$Arrayidentity is unobservable. Disabled forreturnsReceivermutators (sort/reverse/fill/copyWithin) and the await-spill path.L3 — capturing annotated arrows. A capturing arrow compiles to a typed instance
Invokeon a display class. L3 emits the adapter as an instance method on that display class (callvirt this.Invoke) and binds(displayInstance, ldftn adapter), building the display instance fresh at the call site via the existing machinery (exposed throughIEmitterContext.TryEmitCapturingArrowDisplayInstance). Per-iteration fresh-binding is preserved (a loop overarr.map(x => x + i)[0]yields10,11,12, not a sharedi).L4 — bool-returning adapter for typed predicates. filter/find/findIndex/some/every with a typed
boolpredicate now bindFunc<object,bool>+ the existing*DirectBoolhelper, with the adapter returning the unboxedbool— dropping the per-element result box +IsTruthy.Performance (warm, .NET 10 Release, controlled same-session A/B)
L2 is allocation/GC + cold-start (warm-flat on large-n, where per-element work dominates) — the chained
map().filter().reduce()emits zeronewobj $Array/get_Elementsbetween stages (IL-verified). All outputs identical tomain.Correctness / scope
Conservative and additive: anything outside the fast path (capturing/marshallable gates, multi-arg callbacks, rest/optional/default params, non-array receivers) falls back to the unchanged reflective path. The standalone-DLL constraint is preserved (adapters are emitted methods, no
SharpTS.dllreference).arr.map((x:number)...)etc. — capturing or not — now emit a direct delegate call with no$TSFunction/GetMethodFromHandle/object[]per element.Tests
ArrayHofAnnotatedCallbackTests): annotated map/filter/reduce/forEach/find/some/every, string-param + NaN coercion parity, chained stages (final-wrap-preserved,map→sliceboundary, mixed typed/untyped), capturing (map/reduce/chained + the per-iteration fresh-capture guard), bool predicates, and fallback cases (any[], multi-arg, untyped).--verifyclean on all shapes.dotnet testgreen except the two known pre-existing stale/flaky Test262 baselines: their only "regressions" areArray/isArrayTypeCheckError/Proxy drift that also appears in interpreter mode (a compiled-only change cannot cause those), and the non-Test262 flaky failures pass in isolation with disjoint sets across runs. No HOF/callback/closure/predicate test changed status.