feat(vectorization): broaden tuple operators and recover precise types 📐#146
Merged
Conversation
Extends element-wise tuple broadcast beyond binary numeric operators —
unary forms (`-(1,2,3)`), n-ary scalar overloads (`("a","b") ++ ("c","d")`),
per-position heterogeneous dispatch, and scalar broadcast — and restores
the type-inference precision PR #140 widened to `Any` for soundness.
Vec dispatch is gated on a new `Expression::OperatorCall` AST variant
emitted by the parser for operator desugars, so regular calls never
accidentally broadcast over tuple arguments. The analyser resolves
operator calls through a single `ScopeTree::resolve_call` walk that
returns both the binding and the inferred return type, with per-position
candidate lookups catching mixed-element tuples (`(1, "a") + (2, "b")`)
at compile time instead of mid-iteration at runtime.
When the analyser pins a homogeneous vec call to one scalar overload,
the compiler emits a dedicated `OpCode::CallVec(args)` whose handler
broadcasts a directly-loaded scalar across the tuple axis without any
overload probing. `Object::OverloadSet` now stores scalars and vec
candidates in separate `Vec<ResolvedVar>`s so the hot scalar walk keeps
master's footprint.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 82909b8c12
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…th 🔁 `extend_dedup` was comparing `Candidate`s by inner `ResolvedVar`, so vec candidates were stripped from the Dynamic binding's candidate list when their scalar twin (same slot) had already been added. The compiler then emitted an `OverloadSet` with no `vec_candidates`, and any call where both args were statically `Any` but turned out to be tuples at runtime (e.g. `a - b` where `a, b` were produced by `combinations(2)`-style destructuring) fell through to the "no function found" error. Also speed up runtime vec dispatch for the heterogeneous-element case: * `Vm::dispatch_vec_call_dynamic` resolves vec candidates lazily from `&[ResolvedVar]` instead of materialising a `Vec<Function>` up front on every outer call — matches master's `try_vectorized_call` pattern. * Both vec dispatchers now cache the last-matched scalar across positions, so homogeneous tuples (the common shape, including the AoC 2025/08 hot loop) pay one candidate probe per outer call. Brings the AoC 2025/08 part1 regression from +22% to +8% vs master while keeping every other bench at parity or better. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
timfennis
commented
May 24, 2026
Addresses review feedback on PR #146 from @timfennis: * scrub all mentions of an external AoC repo (`benches/programs/vec_hot_loop.ndc`, three comments in `ndc_vm/src/vm.rs`) — those references don't belong here * simplify the OpAssignment "both `op=` and `op`" comment in `ndc_analyser/src/analyser.rs` — drop the jargon, keep the rationale * tighten the `analyse_call` doc comment — say what it does, skip the side-table mechanics that the caller already documents No behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
Author
|
@codex please check the PR again |
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
An alternate take on #141 with the same four user-visible features but
cleaner separation of concerns and matching-or-better performance on
every benchmark.
Vectorization now works for anything that has a scalar overload.
Unary, n-ary, non-numeric, scalar broadcast — same as #141:
Vectorization only fires on operator syntax.
id((1, 2, 3))stillreturns the tuple, never three calls to
id.Precise types survive chained operations. Operator calls that the
analyser can pin to one scalar overload keep their precise return
type — chains of
Tuple<Int, Int>ops no longer widen toAny.More problems caught at compile time. Per-position vec resolution
errors on
(1, "a") + (2, "b")and(1,1,(1,)) + (1,1,(1,))atanalysis time instead of crashing mid-iteration.
Design
Three structural moves vs the existing PR:
Expression::OperatorCallfor desugared operatorsyntax. Distinct from
Call, so downstream layers match exhaustivelyand the parser is the only crate that knows which token names are
operators. No
operator_form: boolridingExpression::Callacrossevery layer.
Candidate::{Scalar, Vec}as a sum type rather than a structwith a bool.
Binding::Resolved(Candidate::Vec(scalar))reads cleanlyin pattern matches without
if c.vectorized.ScopeTree::resolve_callis a single walk that returns both thebinding and the inferred return type. The analyser no longer runs
per-position resolution twice per Dynamic operator-form call.
Runtime dispatch
OpCode::CallVec(args)for analyser-pinned vec calls. Thecompiler emits the scalar function directly (no
OverloadSetwrapper)and the VM broadcasts it across the tuple axis without overload
probing. This is the missing "step 6" optimisation from feat(vectorization): broaden tuple operators and recover precise types 📐 #141's RFC,
brought forward into the same change.
Object::OverloadSet { scalars, vec_candidates }keeps the hotscalar walk at master's footprint — a unified
Vec<Candidate>wasthe source of the numerics-heavy regressions.
Vm::resolve_calleestill returnsOption<Function>(same shapeas master), so the dispatch loop's
OpCode::Callarm stays compact.Full design write-up at
docs/design/vectorization.md.Behaviour changes
mid-iteration. Existing test
003_vector_error2.ndcupdated to matchthe new analyser-side error message.
BinaryOperator::supports_vectorizationand theStaticType::supports_vectorization{,_with}helpers deleted — vecdecisions live entirely in the analyser now.
Benchmarks
Hyperfine, release-with-debug, 20+ runs per command:
vec_hot_loop(new)fibonaccihof_pipelineenumerate_for_loopsievematrix_mulackermannquicksortThe vec win comes from
CallVecskipping the per-element overloadprobe. No bench regresses outside noise — versus #141's reported +27%
on AoC 2025/08 vec-heavy workloads.
Test plan
cargo test --workspace— 298 functional + 18 compiler + 64unit tests, all green
cargo clippy --workspace --lib --tests— zero new warningsfrom this change
cargo fmt --check— clean++, vec op=, mixed-element error)🤖 Generated with Claude Code