Skip to content

feat(vectorization): broaden tuple operators and recover precise types 📐#146

Merged
timfennis merged 3 commits into
masterfrom
feature/vectorization-redesign
May 24, 2026
Merged

feat(vectorization): broaden tuple operators and recover precise types 📐#146
timfennis merged 3 commits into
masterfrom
feature/vectorization-redesign

Conversation

@timfennis
Copy link
Copy Markdown
Owner

Summary

An alternate take on #141 with the same four user-visible features but
cleaner separation of concerns and matching-or-better performance on
every benchmark.

Vectorization now works for anything that has a scalar overload.
Unary, n-ary, non-numeric, scalar broadcast — same as #141:

-(1, 2, 3)               → (-1, -2, -3)
("a", "b") ++ ("c", "d") → ("ac", "bd")
([1], [2]) ++ ([3], [4]) → ([1, 3], [2, 4])
(1, 2) + 5               → (6, 7)

Vectorization only fires on operator syntax. id((1, 2, 3)) still
returns the tuple, never three calls to id.

Precise types survive chained operations. Operator calls that the
analyser can pin to one scalar overload keep their precise return
type — chains of Tuple<Int, Int> ops no longer widen to Any.

More problems caught at compile time. Per-position vec resolution
errors on (1, "a") + (2, "b") and (1,1,(1,)) + (1,1,(1,)) at
analysis time instead of crashing mid-iteration.

Design

Three structural moves vs the existing PR:

  • New AST variant Expression::OperatorCall for desugared operator
    syntax. Distinct from Call, so downstream layers match exhaustively
    and the parser is the only crate that knows which token names are
    operators. No operator_form: bool riding Expression::Call across
    every layer.
  • Candidate::{Scalar, Vec} as a sum type rather than a struct
    with a bool. Binding::Resolved(Candidate::Vec(scalar)) reads cleanly
    in pattern matches without if c.vectorized.
  • ScopeTree::resolve_call is a single walk that returns both the
    binding and the inferred return type. The analyser no longer runs
    per-position resolution twice per Dynamic operator-form call.

Runtime dispatch

  • OpCode::CallVec(args) for analyser-pinned vec calls. The
    compiler emits the scalar function directly (no OverloadSet wrapper)
    and the VM broadcasts it across the tuple axis without overload
    probing. This is the missing "step 6" optimisation from feat(vectorization): broaden tuple operators and recover precise types 📐 #141's RFC,
    brought forward into the same change.
  • Object::OverloadSet { scalars, vec_candidates } keeps the hot
    scalar walk at master's footprint — a unified Vec<Candidate> was
    the source of the numerics-heavy regressions.
  • Vm::resolve_callee still returns Option<Function> (same shape
    as master), so the dispatch loop's OpCode::Call arm stays compact.

Full design write-up at docs/design/vectorization.md.

Behaviour changes

  • Mixed-element tuples error at compile time instead of crashing
    mid-iteration. Existing test 003_vector_error2.ndc updated to match
    the new analyser-side error message.
  • BinaryOperator::supports_vectorization and the
    StaticType::supports_vectorization{,_with} helpers deleted — vec
    decisions live entirely in the analyser now.

Benchmarks

Hyperfine, release-with-debug, 20+ runs per command:

Script Master This branch Δ
vec_hot_loop (new) 58.1 ms 42.3 ms −27%
fibonacci 69.1 ms 63.7 ms −8%
hof_pipeline 35.7 ms 34.6 ms −3%
enumerate_for_loop 107.4 ms 107.4 ms 0%
sieve 107.7 ms 107.3 ms 0%
matrix_mul 57.3 ms 56.1 ms −2%
ackermann 127.6 ms 124.6 ms −2%
quicksort 73.6 ms 76.5 ms +4%

The vec win comes from CallVec skipping the per-element overload
probe. No bench regresses outside noise — versus #141's reported +27%
on AoC 2025/08 vec-heavy workloads.

Test plan

  • cargo test --workspace — 298 functional + 18 compiler + 64
    unit tests, all green
  • cargo clippy --workspace --lib --tests — zero new warnings
    from this change
  • cargo fmt --check — clean
  • REPL spot-checks of the four feature areas (unary vec, string
    ++, vec op=, mixed-element error)

🤖 Generated with Claude Code

Extends element-wise tuple broadcast beyond binary numeric operators —
unary forms (`-(1,2,3)`), n-ary scalar overloads (`("a","b") ++ ("c","d")`),
per-position heterogeneous dispatch, and scalar broadcast — and restores
the type-inference precision PR #140 widened to `Any` for soundness.

Vec dispatch is gated on a new `Expression::OperatorCall` AST variant
emitted by the parser for operator desugars, so regular calls never
accidentally broadcast over tuple arguments. The analyser resolves
operator calls through a single `ScopeTree::resolve_call` walk that
returns both the binding and the inferred return type, with per-position
candidate lookups catching mixed-element tuples (`(1, "a") + (2, "b")`)
at compile time instead of mid-iteration at runtime.

When the analyser pins a homogeneous vec call to one scalar overload,
the compiler emits a dedicated `OpCode::CallVec(args)` whose handler
broadcasts a directly-loaded scalar across the tuple axis without any
overload probing. `Object::OverloadSet` now stores scalars and vec
candidates in separate `Vec<ResolvedVar>`s so the hot scalar walk keeps
master's footprint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82909b8c12

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ndc_analyser/src/scope.rs Outdated
…th 🔁

`extend_dedup` was comparing `Candidate`s by inner `ResolvedVar`, so vec
candidates were stripped from the Dynamic binding's candidate list when
their scalar twin (same slot) had already been added. The compiler then
emitted an `OverloadSet` with no `vec_candidates`, and any call where
both args were statically `Any` but turned out to be tuples at runtime
(e.g. `a - b` where `a, b` were produced by `combinations(2)`-style
destructuring) fell through to the "no function found" error.

Also speed up runtime vec dispatch for the heterogeneous-element case:

* `Vm::dispatch_vec_call_dynamic` resolves vec candidates lazily from
  `&[ResolvedVar]` instead of materialising a `Vec<Function>` up front
  on every outer call — matches master's `try_vectorized_call` pattern.
* Both vec dispatchers now cache the last-matched scalar across
  positions, so homogeneous tuples (the common shape, including the
  AoC 2025/08 hot loop) pay one candidate probe per outer call.

Brings the AoC 2025/08 part1 regression from +22% to +8% vs master
while keeping every other bench at parity or better.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread benches/programs/vec_hot_loop.ndc Outdated
Comment thread ndc_analyser/src/analyser.rs Outdated
Comment thread ndc_analyser/src/analyser.rs Outdated
Addresses review feedback on PR #146 from @timfennis:

* scrub all mentions of an external AoC repo (`benches/programs/vec_hot_loop.ndc`,
  three comments in `ndc_vm/src/vm.rs`) — those references don't belong here
* simplify the OpAssignment "both `op=` and `op`" comment in
  `ndc_analyser/src/analyser.rs` — drop the jargon, keep the rationale
* tighten the `analyse_call` doc comment — say what it does, skip the
  side-table mechanics that the caller already documents

No behaviour change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timfennis
Copy link
Copy Markdown
Owner Author

@codex please check the PR again

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@timfennis timfennis merged commit ae1e616 into master May 24, 2026
1 check passed
@timfennis timfennis deleted the feature/vectorization-redesign branch May 24, 2026 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant