Skip to content

Perf #861: de-virtualize annotated-param array HOF callbacks#868

Merged
nickna merged 4 commits into
mainfrom
wrk/issue-861-array-hof-annotated-callbacks
Jun 21, 2026
Merged

Perf #861: de-virtualize annotated-param array HOF callbacks#868
nickna merged 4 commits into
mainfrom
wrk/issue-861-array-hof-annotated-callbacks

Conversation

@nickna

@nickna nickna commented Jun 21, 2026

Copy link
Copy Markdown
Owner

Part of #856 (perf epic), closes #861. The last reflective-dispatch hotspot in the epic — the same class of cost #858 won ~40× on, now for array higher-order-function callbacks.

Problem

arr.map/filter/reduce/forEach/find/some/every already had a direct-delegate fast path (TryEmitDirectDelegateCallArray*Direct(List<object>, Func<object,…>), "#96 Phase A"), but it bailed on annotated callback params (if (p.Type != null) return false). An annotated arrow (x: number): number => x*2 compiles to a typed static method double(double) that cannot bind to Func<object,object>, so idiomatic TypeScript (which annotates callback params — including the entire array-methods benchmark) fell back to the reflective ArrayMap + $TSFunction/MethodInvoker path: a MethodInvoker.Invoke + a 3-element object[] allocation + boxing per element.

Fix (4 layers)

L1 — boxed adapter (non-capturing). A new ArrowBoxedAdapterEmitter emits a per-arrow object(object[,object]) adapter that unboxes each boxed element into the arrow's typed slot, calls the typed arrow, and reboxes the result; it binds to the existing Func<object,…> and feeds the unchanged Array*Direct helpers. Marshalling reuses DelegateAdapterEmitter.EmitUnboxForReturn/EmitBoxForTS (now internal static), which matches the reflective MethodInvoker no-arg-conversion semantics exactly for concrete double/bool/string params (unions/nullable already widen to object in ParameterTypeResolver). The adapter is emitted onto the arrow's staticMethod.DeclaringType (not ctx.ProgramType, which is set only on the top-level context) so it fires inside function/method bodies too — the dominant benchmark case.

L2 — chained-stage round-trip elimination. a.map(f).filter(g).reduce(h) wrapped each intermediate result into a $Array only for the next stage to unwrap it. A recursive leaveResultAsBareList flag now lets the inner stage leave a bare List<object> and the outer skip the unwrap; the final stage still wraps when consumed as an array value (.length/assign/===). The intermediate is anonymous, so its $Array identity is unobservable. Disabled for returnsReceiver mutators (sort/reverse/fill/copyWithin) and the await-spill path.

L3 — capturing annotated arrows. A capturing arrow compiles to a typed instance Invoke on a display class. L3 emits the adapter as an instance method on that display class (callvirt this.Invoke) and binds (displayInstance, ldftn adapter), building the display instance fresh at the call site via the existing machinery (exposed through IEmitterContext.TryEmitCapturingArrowDisplayInstance). Per-iteration fresh-binding is preserved (a loop over arr.map(x => x + i)[0] yields 10,11,12, not a shared i).

L4 — bool-returning adapter for typed predicates. filter/find/findIndex/some/every with a typed bool predicate now bind Func<object,bool> + the existing *DirectBool helper, with the adapter returning the unboxed bool — dropping the per-element result box + IsTruthy.

Performance (warm, .NET 10 Release, controlled same-session A/B)

Workload main this PR
array-methods chain (non-capturing, n=10k) ~3.31 ms/call ~1.4 ms/call ~2.4×
chain with a capturing callback ~3.16 ms/call ~1.10 ms/call ~2.9×
pure filter (L4 delta vs L3) ~0.83 ms/call ~0.79 ms/call ~5%

L2 is allocation/GC + cold-start (warm-flat on large-n, where per-element work dominates) — the chained map().filter().reduce() emits zero newobj $Array/get_Elements between stages (IL-verified). All outputs identical to main.

Correctness / scope

Conservative and additive: anything outside the fast path (capturing/marshallable gates, multi-arg callbacks, rest/optional/default params, non-array receivers) falls back to the unchanged reflective path. The standalone-DLL constraint is preserved (adapters are emitted methods, no SharpTS.dll reference). arr.map((x:number)...) etc. — capturing or not — now emit a direct delegate call with no $TSFunction/GetMethodFromHandle/object[] per element.

Tests

  • 42 new dual-mode tests (ArrayHofAnnotatedCallbackTests): annotated map/filter/reduce/forEach/find/some/every, string-param + NaN coercion parity, chained stages (final-wrap-preserved, map→slice boundary, mixed typed/untyped), capturing (map/reduce/chained + the per-iteration fresh-capture guard), bool predicates, and fallback cases (any[], multi-arg, untyped).
  • --verify clean on all shapes.
  • Full dotnet test green except the two known pre-existing stale/flaky Test262 baselines: their only "regressions" are Array/isArray TypeCheckError/Proxy drift that also appears in interpreter mode (a compiled-only change cannot cause those), and the non-Test262 flaky failures pass in isolation with disjoint sets across runs. No HOF/callback/closure/predicate test changed status.

nickna added 4 commits June 20, 2026 14:45
Part of #856 (perf epic). The array HOFs (map/filter/reduce/forEach/find/some/
every) already had a direct-delegate fast path (TryEmitDirectDelegateCall ->
Array*Direct(List<object>, Func<object,...>)), but it bailed at
`if (p.Type != null) return false`: an annotated callback like
`(x: number): number => x*2` compiles to a typed static method (double(double))
that cannot bind to Func<object,object>, so it fell back to the reflective
ArrayMap + $TSFunction/MethodInvoker per-element path (MethodInvoker.Invoke + a
3-element object[] alloc + boxing per element). Idiomatic TypeScript annotates
callback params, so the array-methods benchmark hit the reflective path on every
callback -- the same reflective-dispatch class #858 just won ~40x on, not boxing.

Fix: a new ArrowBoxedAdapterEmitter emits a per-arrow boxed adapter
object(object[,object]) that unboxes/casts each boxed element into the arrow's
typed parameter slot, calls the typed arrow, then reboxes the result. It binds to
the existing Func<object,object>/Func<object,object,object> and feeds the
unchanged Array*Direct helpers -- so the per-element MethodInvoker dispatch is
gone. The unbox/box marshalling reuses DelegateAdapterEmitter.EmitUnboxForReturn/
EmitBoxForTS (now internal static), which matches the reflective MethodInvoker's
no-arg-conversion regime exactly for concrete double/bool/string params (unions/
nullable already widen to object in ParameterTypeResolver).

The adapter is emitted onto the arrow's staticMethod.DeclaringType (the $Program
TypeBuilder), NOT ctx.ProgramType -- ProgramType is set only on the module-top-
level context, so keying on it would have gated the optimization to top-level and
left the in-function benchmark calls reflective. Gated conservatively: non-
capturing arrows only (capturing defers to the reflective path, a follow-up),
marshallable param/return types only. arrayMethodWork@10k warm (Release):
~3.31 -> ~1.22 ms/call (~2.7x), identical output, zero reflective dispatch /
object[] per element; chained map().filter().reduce() emits 3 adapters, 0 ldtoken.

Standalone-DLL constraint preserved (adapter is an emitted static method, no
SharpTS.dll reference). Deferred to follow-ups: chained-stage List<->$Array
round-trip elimination (L2), capturing annotated arrows (L3), bool-return adapter
-> *DirectBool (L4).

24 new dual-mode tests (ArrayHofAnnotatedCallbackTests). Full dotnet test green
except known flaky (UsingDeclaration/NumericSeparator pass in isolation) and the
stale Test262 baselines (the only Test262 "regressions" are Array/isArray
TypeCheckError/Proxy drift that also appears in interpreter mode -- a compiled-
only change cannot cause those). --verify clean.
Follow-up to the #861 Layer 1 boxed-adapter work. A chained array expression like
arr.map(f).filter(g).reduce(h) wrapped each intermediate result back into a $Array
(EmitPostCallAdjust, returnsNewArray) only for the next stage to immediately
unwrap it to a List<object> again (EmitGetListFromArrayOrList). The intermediate
array is anonymous -- it can only ever be the receiver of the one following
array-method call -- so its $Array identity is unobservable.

TryEmitMethodCall now threads a leaveResultAsBareList flag: when a call's receiver
is itself a fresh-array-producing array method on an array receiver
(TryEmitChainedArrayReceiverAsBareList), the inner call is emitted with its $Array
wrap suppressed (bare List<object> left on the stack) and the outer call skips the
unwrap. The flag recurses, so a 3-stage chain drops both intermediate boundaries
while the FINAL stage still wraps to $Array when its result is consumed as an array
value (.length, assignment, ===). Disabled for returnsReceiver methods (sort/
reverse/fill/copyWithin, which need the original wrapper) and the await-spill path.

The benchmark's map().filter().reduce() now emits zero newobj $Array / get_Elements
between stages (IL-verified). Warm throughput is flat (~1.4 ms/call, identical to
L1 in a controlled A/B) -- the win is allocation/GC reduction and cold-start, not
warm steady-state, since per-element work dominates at large n. 8 new dual-mode
chain tests (incl. final-wrap-preserved, map->slice plain-args boundary, mixed
typed/untyped). Full dotnet test green except known flaky (pass in isolation) and
the stale Test262 baselines (regressions are the same pre-existing Array/isArray
TypeCheckError/Proxy drift, also present in interpreter mode -- unchanged by L2).
--verify clean.
Layers 1/2 handled non-capturing annotated callbacks. A capturing arrow (e.g.
arr.map((x: number): number => x + k)) compiles to a typed INSTANCE Invoke on a
display class, not a static $Program method, so L1's adapter path bailed
(DisplayClasses.ContainsKey) and it fell back to per-element reflective
$TSFunction/MethodInvoker dispatch.

L3 emits the boxed adapter as an INSTANCE method on the arrow's display class
(object Invoke$box(object[,object]) -> unbox -> callvirt this.Invoke -> box) and
binds the delegate to (displayInstance, ldftn adapter). The display instance is
built fresh at the call site via the existing EmitCapturingArrowDisplayInstance
(exposed through IEmitterContext.TryEmitCapturingArrowDisplayInstance), exactly as
the reflective path does — so per-iteration fresh-binding semantics are preserved
(verified: a loop pushing arr.map(x => x + i)[0] yields 10,11,12, not a shared i).

ArrowBoxedAdapterEmitter.GetOrEmit gained an `instance` flag selecting the carrier
(static $Program vs the display class), arg base (arg0 = `this` for instance), and
call opcode (call vs callvirt); the adapter name {method.Name}$box{arity} stays
collision-free ("Invoke$box1" is unique within the per-arrow display class, since
two arrows sharing a class would already collide on "Invoke"). TryBindTypedArrowAdapter
now branches on DisplayClasses: capturing -> instance adapter + display instance;
non-capturing -> the L1 static path.

Capturing chained map().filter().reduce() now emits instance adapters with zero
reflective ldtoken/$TSFunction (IL-verified), ~1.1 ms/call warm (similar to the
non-capturing ~2.4x over main's reflective path). 5 new dual-mode tests (capturing
map/reduce/chained + the per-iteration fresh-capture guard). Full dotnet test green
except known flaky (all pass in isolation; failing set disjoint across runs) and the
stale Test262 baselines (regressions are the same pre-existing Array/isArray drift,
present in interpreter mode too — no closure/arrow test regressed). --verify clean.
… + IsTruthy)

Layers 1/3 routed every typed annotated callback through an object-returning boxed
adapter bound to Func<object,object> + the object-callback helper, which for a
boolean predicate (filter/find/findIndex/some/every) boxes the bool result and then
calls IsTruthy on it per element.

The predicate helpers already have Func<object,bool> *DirectBool variants (used by
the untyped path for bool(object) arrows). L4 routes a typed bool-returning predicate
to them: when the arrow returns bool and a boolHelper exists, the adapter is emitted
to return the unboxed bool directly (no rebox) and the call site uses the *DirectBool
helper — dropping the per-element box + IsTruthy.

ArrowBoxedAdapterEmitter.GetOrEmit gained a `boolReturn` flag (adapter return type
bool vs object, skip the trailing box, distinct cache key + name marker $bbox);
TryBindTypedArrowAdapter binds Func<object,bool>; TryEmitDirectDelegateCall computes
wantBool = boolHelper != null && arrow returns bool and selects the variant. Composes
with L3: a capturing bool predicate emits a bool-returning INSTANCE adapter bound to
ArrayFilterDirectBool.

filter/some/every/find/findIndex with typed predicates now emit $bbox adapters +
*DirectBool helpers with zero object-variant IsTruthy (IL-verified). +4 dual-mode
tests (capturing bool predicate, some/every short-circuit). Full dotnet test green
except the stale Test262 baselines (same pre-existing Array/isArray drift; no
predicate/filter/some/every/find test regressed). --verify clean.
@nickna nickna merged commit 1ef13a3 into main Jun 21, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Perf: array HOFs (map/filter/reduce) cache callback MethodInfo + delegate-specialize

1 participant