Perf #878: Float64Array element access fast path — unboxed get/set#881
Merged
Conversation
Compiled Float64Array element access (`a[i]` read, `a[i] = x` write) routed through the boxed dispatcher chain Runtime.GetIndex → GetTypedArrayElement (isinst + castclass + virtual Get) → BitConverter + Box per element. With no typed fast path (unlike number[] → List<double> per #857), Float64Array ran 24–101× slower than Node — ironically slower than a plain number[]. Add unboxed `double GetUnboxed(int)` / `void SetUnboxed(int, double)` on the emitted $Float64Array (mirroring the boxed byte logic without the box or Convert.ToDouble coercion), and bind them directly in EmitGetIndex/EmitSetIndex when the receiver is a variable statically typed Float64Array (write also gated on a statically-numeric RHS; a non-numeric RHS falls back to the boxed path for ToNumber coercion). Reads leave a native double on the stack; writes take one — no GetIndex/SetIndex dispatch, no isinst, no per-element box. Calls stay in the output assembly → fully standalone. OOB faults exactly as the boxed Get/Set do today (both SharpTS modes already throw on OOB — semantics unchanged). Result (Float64Array fill + 3-point stencil, n=100000): ~33x → ~3.4x vs Node. Verification: - IL verifies (--verify); 223 ILVerification/typed-array + 51 more unit tests. - Compiled output BYTE-IDENTICAL to the prior boxed path across fill/stencil, NaN/±Inf/-0, assignment-result, non-numeric-RHS coercion fallback, and OOB. - Test262 (633 TypedArray index-semantics files, in-process compiled): per-file outcomes identical base-vs-fix — zero conformance change. Residual ~3.4x is the byte[]+BitConverter backing and the per-element accessor call vs V8's raw double memory — a follow-up (sealed concrete type for JIT devirtualization/inlining, or a native double[] backing).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #878.
Problem
Compiled
Float64Arrayelement access —a[i](read) anda[i] = x(write) — routed through the boxed dispatcher chainRuntime.GetIndex→GetTypedArrayElement(isinst+castclass+ virtualGet) →BitConverter+ box per element. Unlike plainnumber[](which got aList<double>typed path in #857/#872), typed arrays had no fast path, soFloat64Arrayran 24–101× slower than Node — ironically slower than a plainnumber[].Fix
Add unboxed accessors
double GetUnboxed(int)/void SetUnboxed(int, double)to the emitted$Float64Array(RuntimeEmitter.TSTypedArray.cs) — they mirror the existing boxed byte logic but drop theBox/Convert.ToDoublecoercion. Then bind them directly inEmitGetIndex/EmitSetIndex(ILEmitter.Properties.cs) when:Float64Array(side-effect-free, loaded once), andToNumbercoercion.Reads leave a native
doubleon the stack; writes take one. This eliminates theGetIndex/SetIndexdispatch, theisinstladder, the virtualGet/Set, and the per-element box. The calls target a type in the output assembly, so the result stays fully standalone (noSharpTS.dlldependency).OOB semantics unchanged: like the existing
Get/Set, the unboxed accessors are not bounds-checked, so out-of-range access faults viaBitConverter/Array.Copyexactly as today. (Both SharpTS modes already throw on OOB — interpreterRangeError, compiledArgumentOutOfRangeException; only Node returnsundefined. Fixing OOB→undefinedis a separate pre-existing correctness matter, out of scope here.)Result
The catastrophic gap is closed. The residual ~3.4× is the
byte[]+BitConverterbacking and the per-element accessor call vs V8's rawdoublememory — a follow-up (a sealed concrete type to let RyuJIT devirtualize/inline the accessor, or a nativedouble[]backing).Verification
--compile … --verify); 223 ILVerification/typed-array + 51 more unit tests pass.NaN/±Infinity/-0, assignment-expression result, the non-numeric-RHS coercion fallback, and OOB (verified by a base-vs-fix diff).TypedArrayindex-semantics files (TypedArrayConstructors/internals,Float64Array,prototype/{set,fill,subarray,slice,copyWithin}) in-process compiled on both base and fix — per-file outcomes are identical. (Run directly because the committed baseline differ is stale/crashing in this environment; the pre-existing failures are unchanged typed-array spec gaps.)Scope: limited to
Float64Array(the issue + benchmark). The same pattern extends to the other numeric typed-array kinds (Int32Array,Float32Array, …) with their respective coercions — separate follow-up.