Skip to content

Perf #877: compiled Array.sort — replace O(n²) insertion sort with stable merge sort#880

Merged
nickna merged 1 commit into
mainfrom
wrk/877-sort-merge-sort
Jun 22, 2026
Merged

Perf #877: compiled Array.sort — replace O(n²) insertion sort with stable merge sort#880
nickna merged 1 commit into
mainfrom
wrk/877-sort-merge-sort

Conversation

@nickna

@nickna nickna commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Closes #877.

Problem

Compiled Array.prototype.sort(comparator) / toSorted emitted a hand-written in-place insertion sort (Θ(n²)) in EmitSortBodyOnLocal. That made compiled sort scale quadratically — slower than our own tree-walking interpreter, and ~998× slower than Node at n=10000. The interpreter never hit this: it uses a stable Θ(n·log n) sort (LINQ OrderBy). Only the IL path was quadratic.

Fix

Replace Phase 2 of EmitSortBodyOnLocal (Compilation/RuntimeEmitter.Arrays.Mutators.cs) with a pure-IL stable bottom-up merge sort that ping-pongs between two object[] buffers. It reuses the exact comparison semantics that were already there:

  • custom compareFnInvokeValue, double → sign, NaN / non-number ⇒ 0 (equal → stable);
  • absent/undefined comparator → ECMA-262 default ToJsString + CompareOrdinal.

Phases 1 (partition undefined/holes) and 3 (rebuild + append undefined) are untouched, so undefined-to-end, holes, and the default comparator are preserved. Stability is structural (ties take the left run first). The path stays fully standalone — no SharpTS.dll reference. Index/width arithmetic is overflow-safe (saturating), so the sort is correct up to the array-allocation limit.

Both compiled entry points (ArraySort direct emit and the reflective $Array dispatch) flow through this body, so both are fixed.

Result

sort (compiled) Before After vs Node
n=1000 31.9 ms 6.17 ms
n=10000 3235 ms 46.6 ms 998× → ~14×

O(n²) is gone (10× n now ~12× time = n·log n). The residual ~14× vs Node is the secondary per-comparison de-virtualization #877 already calls out (boxed object[2] + InvokeValue per compare) — separate follow-up.

Verification

  • IL verifies (--compile … --verify); compiled == interpreter on stability, undefined-to-end, default lexicographic, toSorted non-mutation, strings, empty/single, and a 5000-element random sort.
  • ArraySort unit tests: 38/38.
  • Test262 conformance — no regression, proven directly. The committed baseline differ is stale/crashing in this environment (the interpreted control fails on baseline drift unrelated to this change; the compiled batched runner crashes the test host). So I ran all 75 Array/prototype/sort + toSorted Test262 files in-process compiled on both base and fix — the per-file outcomes are byte-identical. The 34 pre-existing failures (non-callable comparefn, primitive-receiver coercion, old this-handling tests) are unchanged spec gaps, unrelated to ordering.
  • No interpreter/type-checker changes, so language conformance is unaffected.

…able merge sort

Compiled `Array.prototype.sort`/`toSorted` emitted a hand-written in-place
insertion sort (Θ(n²)), making compiled sort slower than the tree-walking
interpreter and ~998× slower than Node at n=10000. The interpreter already uses
a stable Θ(n log n) sort (LINQ OrderBy); only the IL path was quadratic.

Replace Phase 2 of EmitSortBodyOnLocal with a pure-IL stable bottom-up merge
sort that ping-pongs between two object[] buffers, reusing the exact comparison
semantics (custom compareFn → double sign with NaN/non-number ⇒ equal/stable;
otherwise ECMA-262 ToJsString/CompareOrdinal). Phases 1 (partition undefined/
holes) and 3 (rebuild) are unchanged, so undefined-to-end, holes, and the
default comparator are preserved. Stays fully standalone (no SharpTS.dll
reference). Index/width arithmetic is overflow-safe (saturating) so the sort
is correct up to the array allocation limit.

Result (sort benchmark, compiled, n=10000): 3235 ms → 46.6 ms
(998× → ~14× vs Node); the O(n²) scaling is gone.

Verification:
- IL verifies (--verify); compiled == interpreter on stability, undefined,
  default lexicographic, toSorted, strings, empty/single, 5000-elem random.
- ArraySort unit tests 38/38.
- Test262 sort/toSorted (75 files, in-process compiled): per-file outcomes
  identical base-vs-fix — zero conformance change.

The residual ~14× vs Node is the secondary per-comparison de-virtualization
noted in #877 (boxed object[2] + InvokeValue per compare), tracked separately.
@nickna nickna merged commit 5310d2c into main Jun 22, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Perf #856: compiled Array.sort(comparator) is O(n²) insertion sort — ~998× slower than Node, slower than our own interpreter

1 participant