Perf #878: Float64Array element access fast path — unboxed get/set by nickna · Pull Request #881 · nickna/SharpTS

nickna · 2026-06-22T07:24:26Z

Closes #878.

Problem

Compiled Float64Array element access — a[i] (read) and a[i] = x (write) — routed through the boxed dispatcher chain Runtime.GetIndex → GetTypedArrayElement (isinst + castclass + virtual Get) → BitConverter + box per element. Unlike plain number[] (which got a List<double> typed path in #857/#872), typed arrays had no fast path, so Float64Array ran 24–101× slower than Node — ironically slower than a plain number[].

Fix

Add unboxed accessors double GetUnboxed(int) / void SetUnboxed(int, double) to the emitted $Float64Array (RuntimeEmitter.TSTypedArray.cs) — they mirror the existing boxed byte logic but drop the Box / Convert.ToDouble coercion. Then bind them directly in EmitGetIndex / EmitSetIndex (ILEmitter.Properties.cs) when:

the receiver is a variable statically typed Float64Array (side-effect-free, loaded once), and
(writes only) the RHS is statically numeric — a non-numeric RHS falls through to the boxed path, which does JS ToNumber coercion.

Reads leave a native double on the stack; writes take one. This eliminates the GetIndex/SetIndex dispatch, the isinst ladder, the virtual Get/Set, and the per-element box. The calls target a type in the output assembly, so the result stays fully standalone (no SharpTS.dll dependency).

OOB semantics unchanged: like the existing Get/Set, the unboxed accessors are not bounds-checked, so out-of-range access faults via BitConverter/Array.Copy exactly as today. (Both SharpTS modes already throw on OOB — interpreter RangeError, compiled ArgumentOutOfRangeException; only Node returns undefined. Fixing OOB→undefined is a separate pre-existing correctness matter, out of scope here.)

Result

Float64Array fill + 3-pt stencil	Compiled	Node	vs Node
n=100000 (was ~33×)	~2.7 ms	~0.8 ms	~3.4×

The catastrophic gap is closed. The residual ~3.4× is the byte[]+BitConverter backing and the per-element accessor call vs V8's raw double memory — a follow-up (a sealed concrete type to let RyuJIT devirtualize/inline the accessor, or a native double[] backing).

Verification

IL verifies (--compile … --verify); 223 ILVerification/typed-array + 51 more unit tests pass.
Behavior-preserving — proven. Compiled output is byte-identical to the prior boxed path across fill/stencil, NaN/±Infinity/-0, assignment-expression result, the non-numeric-RHS coercion fallback, and OOB (verified by a base-vs-fix diff).
Test262 — no conformance change. Ran 633 TypedArray index-semantics files (TypedArrayConstructors/internals, Float64Array, prototype/{set,fill,subarray,slice,copyWithin}) in-process compiled on both base and fix — per-file outcomes are identical. (Run directly because the committed baseline differ is stale/crashing in this environment; the pre-existing failures are unchanged typed-array spec gaps.)
No interpreter/type-checker changes.

Scope: limited to Float64Array (the issue + benchmark). The same pattern extends to the other numeric typed-array kinds (Int32Array, Float32Array, …) with their respective coercions — separate follow-up.

Compiled Float64Array element access (`a[i]` read, `a[i] = x` write) routed through the boxed dispatcher chain Runtime.GetIndex → GetTypedArrayElement (isinst + castclass + virtual Get) → BitConverter + Box per element. With no typed fast path (unlike number[] → List<double> per #857), Float64Array ran 24–101× slower than Node — ironically slower than a plain number[]. Add unboxed `double GetUnboxed(int)` / `void SetUnboxed(int, double)` on the emitted $Float64Array (mirroring the boxed byte logic without the box or Convert.ToDouble coercion), and bind them directly in EmitGetIndex/EmitSetIndex when the receiver is a variable statically typed Float64Array (write also gated on a statically-numeric RHS; a non-numeric RHS falls back to the boxed path for ToNumber coercion). Reads leave a native double on the stack; writes take one — no GetIndex/SetIndex dispatch, no isinst, no per-element box. Calls stay in the output assembly → fully standalone. OOB faults exactly as the boxed Get/Set do today (both SharpTS modes already throw on OOB — semantics unchanged). Result (Float64Array fill + 3-point stencil, n=100000): ~33x → ~3.4x vs Node. Verification: - IL verifies (--verify); 223 ILVerification/typed-array + 51 more unit tests. - Compiled output BYTE-IDENTICAL to the prior boxed path across fill/stencil, NaN/±Inf/-0, assignment-result, non-numeric-RHS coercion fallback, and OOB. - Test262 (633 TypedArray index-semantics files, in-process compiled): per-file outcomes identical base-vs-fix — zero conformance change. Residual ~3.4x is the byte[]+BitConverter backing and the per-element accessor call vs V8's raw double memory — a follow-up (sealed concrete type for JIT devirtualization/inlining, or a native double[] backing).

nickna merged commit 5c67ddd into main Jun 22, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf #878: Float64Array element access fast path — unboxed get/set#881

Perf #878: Float64Array element access fast path — unboxed get/set#881
nickna merged 1 commit into
mainfrom
wrk/878-float64-fast-path

nickna commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nickna commented Jun 22, 2026

Problem

Fix

Result

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant