Part of #856 (perf epic). Surfaced by the new cross-runtime typed-arrays benchmark (benchmarks/scripts/typed-arrays.ts): a Float64Array fill + 3-point stencil sweep.
Symptom
Per-call mean (ms):
| n |
Compiled |
Node |
Bun |
Compiled / Node |
| 1000 |
0.366 |
0.0036 |
0.0022 |
102× |
| 100000 |
10.42 |
0.310 |
0.213 |
34× |
| 1000000 |
100.3 |
4.18 |
2.23 |
24× |
A real typed buffer + tight arithmetic loop should be near-native; instead it is 24–101× slower than V8/JSC. Notably, a plain number[] is faster than a Float64Array here, because number[] gets the #857/#872 List<double> typed path and the typed array does not.
Root cause
a[i] lowers to the boxed central dispatcher Call Runtime.GetIndex(object, object) → object (Compilation/ILEmitter.Properties.cs:734-756). There is a typed fast path in EmitGetIndex — the #857 "promoted typed-array local" branch (ILEmitter.Properties.cs:820-833) — but it only matches number[]/boolean[] promoted to List<double>/List<bool>. A real Float64Array is an emitted $TypedArray object, doesn't match that branch, and falls through to the boxed path.
For typed arrays, GetIndex routes to $Runtime.GetTypedArrayElement(object, int) → object (Compilation/RuntimeEmitter.Worker.cs:725-752), which per access:
- does
isinst + castclass against the $TypedArray base type,
callvirts a per-type element accessor (TypedArrayElementGet, RuntimeEmitter.TSTypedArray.cs:62),
- returns
object — boxing the double on every read; writes take a boxed object (SetTypedArrayElement, RuntimeEmitter.Worker.cs:758).
So every element read/write pays box/unbox + a type check + a virtual dispatch, versus V8/JSC compiling typed-array access to a direct memory load. Over 1M elements (fill + stencil) that overhead dominates. The backing storage itself is fine — it is the access wrapper that is slow.
Suggested fix
Add a typed-array-aware fast path analogous to #857/#872: when the static type of the indexed object is a known TypedArray (Float64Array/Int32Array/…), emit a direct unboxed access to the backing buffer instead of GetIndex → GetTypedArrayElement. Options:
- expose an unboxed typed accessor on
$TypedArray (e.g. double GetF64(int) / void SetF64(int, double)) and bind it directly when the element type is statically known; or
- expose the backing buffer so the arithmetic loop can
ldelem.r8 / stelem.r8 directly.
Either removes the per-element boxing + virtual dispatch on the hot path. Out-of-range and detached-buffer semantics must be preserved (typed arrays read OOB as undefined).
Repro
benchmarks/scripts/typed-arrays.ts via benchmarks/run-benchmarks.ps1. All four runtimes compute identical results — equal work — so the gap is pure codegen.
Part of #856 (perf epic). Surfaced by the new cross-runtime
typed-arraysbenchmark (benchmarks/scripts/typed-arrays.ts): aFloat64Arrayfill + 3-point stencil sweep.Symptom
Per-call mean (ms):
A real typed buffer + tight arithmetic loop should be near-native; instead it is 24–101× slower than V8/JSC. Notably, a plain
number[]is faster than aFloat64Arrayhere, becausenumber[]gets the #857/#872List<double>typed path and the typed array does not.Root cause
a[i]lowers to the boxed central dispatcherCall Runtime.GetIndex(object, object) → object(Compilation/ILEmitter.Properties.cs:734-756). There is a typed fast path inEmitGetIndex— the #857 "promoted typed-array local" branch (ILEmitter.Properties.cs:820-833) — but it only matchesnumber[]/boolean[]promoted toList<double>/List<bool>. A realFloat64Arrayis an emitted$TypedArrayobject, doesn't match that branch, and falls through to the boxed path.For typed arrays,
GetIndexroutes to$Runtime.GetTypedArrayElement(object, int) → object(Compilation/RuntimeEmitter.Worker.cs:725-752), which per access:isinst+castclassagainst the$TypedArraybase type,callvirts a per-type element accessor (TypedArrayElementGet,RuntimeEmitter.TSTypedArray.cs:62),object— boxing thedoubleon every read; writes take a boxedobject(SetTypedArrayElement,RuntimeEmitter.Worker.cs:758).So every element read/write pays box/unbox + a type check + a virtual dispatch, versus V8/JSC compiling typed-array access to a direct memory load. Over 1M elements (fill + stencil) that overhead dominates. The backing storage itself is fine — it is the access wrapper that is slow.
Suggested fix
Add a typed-array-aware fast path analogous to #857/#872: when the static type of the indexed object is a known TypedArray (
Float64Array/Int32Array/…), emit a direct unboxed access to the backing buffer instead ofGetIndex→GetTypedArrayElement. Options:$TypedArray(e.g.double GetF64(int)/void SetF64(int, double)) and bind it directly when the element type is statically known; orldelem.r8/stelem.r8directly.Either removes the per-element boxing + virtual dispatch on the hot path. Out-of-range and detached-buffer semantics must be preserved (typed arrays read OOB as
undefined).Repro
benchmarks/scripts/typed-arrays.tsviabenchmarks/run-benchmarks.ps1. All four runtimes compute identical results — equal work — so the gap is pure codegen.