Skip to content

Perf #878: Float64Array element access fast path — unboxed get/set#881

Merged
nickna merged 1 commit into
mainfrom
wrk/878-float64-fast-path
Jun 22, 2026
Merged

Perf #878: Float64Array element access fast path — unboxed get/set#881
nickna merged 1 commit into
mainfrom
wrk/878-float64-fast-path

Conversation

@nickna

@nickna nickna commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Closes #878.

Problem

Compiled Float64Array element access — a[i] (read) and a[i] = x (write) — routed through the boxed dispatcher chain Runtime.GetIndexGetTypedArrayElement (isinst + castclass + virtual Get) → BitConverter + box per element. Unlike plain number[] (which got a List<double> typed path in #857/#872), typed arrays had no fast path, so Float64Array ran 24–101× slower than Node — ironically slower than a plain number[].

Fix

Add unboxed accessors double GetUnboxed(int) / void SetUnboxed(int, double) to the emitted $Float64Array (RuntimeEmitter.TSTypedArray.cs) — they mirror the existing boxed byte logic but drop the Box / Convert.ToDouble coercion. Then bind them directly in EmitGetIndex / EmitSetIndex (ILEmitter.Properties.cs) when:

  • the receiver is a variable statically typed Float64Array (side-effect-free, loaded once), and
  • (writes only) the RHS is statically numeric — a non-numeric RHS falls through to the boxed path, which does JS ToNumber coercion.

Reads leave a native double on the stack; writes take one. This eliminates the GetIndex/SetIndex dispatch, the isinst ladder, the virtual Get/Set, and the per-element box. The calls target a type in the output assembly, so the result stays fully standalone (no SharpTS.dll dependency).

OOB semantics unchanged: like the existing Get/Set, the unboxed accessors are not bounds-checked, so out-of-range access faults via BitConverter/Array.Copy exactly as today. (Both SharpTS modes already throw on OOB — interpreter RangeError, compiled ArgumentOutOfRangeException; only Node returns undefined. Fixing OOB→undefined is a separate pre-existing correctness matter, out of scope here.)

Result

Float64Array fill + 3-pt stencil Compiled Node vs Node
n=100000 (was ~33×) ~2.7 ms ~0.8 ms ~3.4×

The catastrophic gap is closed. The residual ~3.4× is the byte[]+BitConverter backing and the per-element accessor call vs V8's raw double memory — a follow-up (a sealed concrete type to let RyuJIT devirtualize/inline the accessor, or a native double[] backing).

Verification

  • IL verifies (--compile … --verify); 223 ILVerification/typed-array + 51 more unit tests pass.
  • Behavior-preserving — proven. Compiled output is byte-identical to the prior boxed path across fill/stencil, NaN/±Infinity/-0, assignment-expression result, the non-numeric-RHS coercion fallback, and OOB (verified by a base-vs-fix diff).
  • Test262 — no conformance change. Ran 633 TypedArray index-semantics files (TypedArrayConstructors/internals, Float64Array, prototype/{set,fill,subarray,slice,copyWithin}) in-process compiled on both base and fix — per-file outcomes are identical. (Run directly because the committed baseline differ is stale/crashing in this environment; the pre-existing failures are unchanged typed-array spec gaps.)
  • No interpreter/type-checker changes.

Scope: limited to Float64Array (the issue + benchmark). The same pattern extends to the other numeric typed-array kinds (Int32Array, Float32Array, …) with their respective coercions — separate follow-up.

Compiled Float64Array element access (`a[i]` read, `a[i] = x` write) routed
through the boxed dispatcher chain Runtime.GetIndex → GetTypedArrayElement
(isinst + castclass + virtual Get) → BitConverter + Box per element. With no
typed fast path (unlike number[] → List<double> per #857), Float64Array ran
24–101× slower than Node — ironically slower than a plain number[].

Add unboxed `double GetUnboxed(int)` / `void SetUnboxed(int, double)` on the
emitted $Float64Array (mirroring the boxed byte logic without the box or
Convert.ToDouble coercion), and bind them directly in EmitGetIndex/EmitSetIndex
when the receiver is a variable statically typed Float64Array (write also gated
on a statically-numeric RHS; a non-numeric RHS falls back to the boxed path for
ToNumber coercion). Reads leave a native double on the stack; writes take one —
no GetIndex/SetIndex dispatch, no isinst, no per-element box. Calls stay in the
output assembly → fully standalone. OOB faults exactly as the boxed Get/Set do
today (both SharpTS modes already throw on OOB — semantics unchanged).

Result (Float64Array fill + 3-point stencil, n=100000): ~33x → ~3.4x vs Node.

Verification:
- IL verifies (--verify); 223 ILVerification/typed-array + 51 more unit tests.
- Compiled output BYTE-IDENTICAL to the prior boxed path across fill/stencil,
  NaN/±Inf/-0, assignment-result, non-numeric-RHS coercion fallback, and OOB.
- Test262 (633 TypedArray index-semantics files, in-process compiled): per-file
  outcomes identical base-vs-fix — zero conformance change.

Residual ~3.4x is the byte[]+BitConverter backing and the per-element accessor
call vs V8's raw double memory — a follow-up (sealed concrete type for JIT
devirtualization/inlining, or a native double[] backing).
@nickna nickna merged commit 5c67ddd into main Jun 22, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Perf #856: compiled Float64Array/TypedArray element access is boxed + virtually dispatched (no typed fast path) — 24–101× slower than Node

1 participant