Perf #877: compiled Array.sort — replace O(n²) insertion sort with stable merge sort#880
Merged
Conversation
…able merge sort Compiled `Array.prototype.sort`/`toSorted` emitted a hand-written in-place insertion sort (Θ(n²)), making compiled sort slower than the tree-walking interpreter and ~998× slower than Node at n=10000. The interpreter already uses a stable Θ(n log n) sort (LINQ OrderBy); only the IL path was quadratic. Replace Phase 2 of EmitSortBodyOnLocal with a pure-IL stable bottom-up merge sort that ping-pongs between two object[] buffers, reusing the exact comparison semantics (custom compareFn → double sign with NaN/non-number ⇒ equal/stable; otherwise ECMA-262 ToJsString/CompareOrdinal). Phases 1 (partition undefined/ holes) and 3 (rebuild) are unchanged, so undefined-to-end, holes, and the default comparator are preserved. Stays fully standalone (no SharpTS.dll reference). Index/width arithmetic is overflow-safe (saturating) so the sort is correct up to the array allocation limit. Result (sort benchmark, compiled, n=10000): 3235 ms → 46.6 ms (998× → ~14× vs Node); the O(n²) scaling is gone. Verification: - IL verifies (--verify); compiled == interpreter on stability, undefined, default lexicographic, toSorted, strings, empty/single, 5000-elem random. - ArraySort unit tests 38/38. - Test262 sort/toSorted (75 files, in-process compiled): per-file outcomes identical base-vs-fix — zero conformance change. The residual ~14× vs Node is the secondary per-comparison de-virtualization noted in #877 (boxed object[2] + InvokeValue per compare), tracked separately.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #877.
Problem
Compiled
Array.prototype.sort(comparator)/toSortedemitted a hand-written in-place insertion sort (Θ(n²)) inEmitSortBodyOnLocal. That made compiled sort scale quadratically — slower than our own tree-walking interpreter, and ~998× slower than Node at n=10000. The interpreter never hit this: it uses a stable Θ(n·log n) sort (LINQOrderBy). Only the IL path was quadratic.Fix
Replace Phase 2 of
EmitSortBodyOnLocal(Compilation/RuntimeEmitter.Arrays.Mutators.cs) with a pure-IL stable bottom-up merge sort that ping-pongs between twoobject[]buffers. It reuses the exact comparison semantics that were already there:compareFn→InvokeValue, double → sign, NaN / non-number ⇒0(equal → stable);undefinedcomparator → ECMA-262 defaultToJsString+CompareOrdinal.Phases 1 (partition
undefined/holes) and 3 (rebuild + appendundefined) are untouched, soundefined-to-end, holes, and the default comparator are preserved. Stability is structural (ties take the left run first). The path stays fully standalone — noSharpTS.dllreference. Index/width arithmetic is overflow-safe (saturating), so the sort is correct up to the array-allocation limit.Both compiled entry points (
ArraySortdirect emit and the reflective$Arraydispatch) flow through this body, so both are fixed.Result
O(n²) is gone (10× n now ~12× time = n·log n). The residual ~14× vs Node is the secondary per-comparison de-virtualization #877 already calls out (boxed
object[2]+InvokeValueper compare) — separate follow-up.Verification
--compile … --verify); compiled == interpreter on stability,undefined-to-end, default lexicographic,toSortednon-mutation, strings, empty/single, and a 5000-element random sort.Array/prototype/sort+toSortedTest262 files in-process compiled on both base and fix — the per-file outcomes are byte-identical. The 34 pre-existing failures (non-callable comparefn, primitive-receiver coercion, oldthis-handling tests) are unchanged spec gaps, unrelated to ordering.