feat(vectorization): broaden tuple operators and recover precise types 📐 by timfennis · Pull Request #146 · timfennis/andy-cpp

timfennis · 2026-05-24T09:00:21Z

Summary

An alternate take on #141 with the same four user-visible features but
cleaner separation of concerns and matching-or-better performance on
every benchmark.

Vectorization now works for anything that has a scalar overload.
Unary, n-ary, non-numeric, scalar broadcast — same as #141:

-(1, 2, 3)               → (-1, -2, -3)
("a", "b") ++ ("c", "d") → ("ac", "bd")
([1], [2]) ++ ([3], [4]) → ([1, 3], [2, 4])
(1, 2) + 5               → (6, 7)

Vectorization only fires on operator syntax. id((1, 2, 3)) still
returns the tuple, never three calls to id.

Precise types survive chained operations. Operator calls that the
analyser can pin to one scalar overload keep their precise return
type — chains of Tuple<Int, Int> ops no longer widen to Any.

More problems caught at compile time. Per-position vec resolution
errors on (1, "a") + (2, "b") and (1,1,(1,)) + (1,1,(1,)) at
analysis time instead of crashing mid-iteration.

Design

Three structural moves vs the existing PR:

New AST variant Expression::OperatorCall for desugared operator
syntax. Distinct from Call, so downstream layers match exhaustively
and the parser is the only crate that knows which token names are
operators. No operator_form: bool riding Expression::Call across
every layer.
Candidate::{Scalar, Vec} as a sum type rather than a struct
with a bool. Binding::Resolved(Candidate::Vec(scalar)) reads cleanly
in pattern matches without if c.vectorized.
ScopeTree::resolve_call is a single walk that returns both the
binding and the inferred return type. The analyser no longer runs
per-position resolution twice per Dynamic operator-form call.

Runtime dispatch

OpCode::CallVec(args) for analyser-pinned vec calls. The
compiler emits the scalar function directly (no OverloadSet wrapper)
and the VM broadcasts it across the tuple axis without overload
probing. This is the missing "step 6" optimisation from feat(vectorization): broaden tuple operators and recover precise types 📐 #141's RFC,
brought forward into the same change.
Object::OverloadSet { scalars, vec_candidates } keeps the hot
scalar walk at master's footprint — a unified Vec<Candidate> was
the source of the numerics-heavy regressions.
Vm::resolve_callee still returns Option<Function> (same shape
as master), so the dispatch loop's OpCode::Call arm stays compact.

Full design write-up at docs/design/vectorization.md.

Behaviour changes

Mixed-element tuples error at compile time instead of crashing
mid-iteration. Existing test 003_vector_error2.ndc updated to match
the new analyser-side error message.
BinaryOperator::supports_vectorization and the
StaticType::supports_vectorization{,_with} helpers deleted — vec
decisions live entirely in the analyser now.

Benchmarks

Hyperfine, release-with-debug, 20+ runs per command:

Script	Master	This branch	Δ
`vec_hot_loop` (new)	58.1 ms	42.3 ms	−27%
`fibonacci`	69.1 ms	63.7 ms	−8%
`hof_pipeline`	35.7 ms	34.6 ms	−3%
`enumerate_for_loop`	107.4 ms	107.4 ms	0%
`sieve`	107.7 ms	107.3 ms	0%
`matrix_mul`	57.3 ms	56.1 ms	−2%
`ackermann`	127.6 ms	124.6 ms	−2%
`quicksort`	73.6 ms	76.5 ms	+4%

The vec win comes from CallVec skipping the per-element overload
probe. No bench regresses outside noise — versus #141's reported +27%
on AoC 2025/08 vec-heavy workloads.

Test plan

cargo test --workspace — 298 functional + 18 compiler + 64
unit tests, all green
cargo clippy --workspace --lib --tests — zero new warnings
from this change
cargo fmt --check — clean
REPL spot-checks of the four feature areas (unary vec, string
++, vec op=, mixed-element error)

🤖 Generated with Claude Code

Extends element-wise tuple broadcast beyond binary numeric operators — unary forms (`-(1,2,3)`), n-ary scalar overloads (`("a","b") ++ ("c","d")`), per-position heterogeneous dispatch, and scalar broadcast — and restores the type-inference precision PR #140 widened to `Any` for soundness. Vec dispatch is gated on a new `Expression::OperatorCall` AST variant emitted by the parser for operator desugars, so regular calls never accidentally broadcast over tuple arguments. The analyser resolves operator calls through a single `ScopeTree::resolve_call` walk that returns both the binding and the inferred return type, with per-position candidate lookups catching mixed-element tuples (`(1, "a") + (2, "b")`) at compile time instead of mid-iteration at runtime. When the analyser pins a homogeneous vec call to one scalar overload, the compiler emits a dedicated `OpCode::CallVec(args)` whose handler broadcasts a directly-loaded scalar across the tuple axis without any overload probing. `Object::OverloadSet` now stores scalars and vec candidates in separate `Vec<ResolvedVar>`s so the hot scalar walk keeps master's footprint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82909b8c12

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…th 🔁 `extend_dedup` was comparing `Candidate`s by inner `ResolvedVar`, so vec candidates were stripped from the Dynamic binding's candidate list when their scalar twin (same slot) had already been added. The compiler then emitted an `OverloadSet` with no `vec_candidates`, and any call where both args were statically `Any` but turned out to be tuples at runtime (e.g. `a - b` where `a, b` were produced by `combinations(2)`-style destructuring) fell through to the "no function found" error. Also speed up runtime vec dispatch for the heterogeneous-element case: * `Vm::dispatch_vec_call_dynamic` resolves vec candidates lazily from `&[ResolvedVar]` instead of materialising a `Vec<Function>` up front on every outer call — matches master's `try_vectorized_call` pattern. * Both vec dispatchers now cache the last-matched scalar across positions, so homogeneous tuples (the common shape, including the AoC 2025/08 hot loop) pay one candidate probe per outer call. Brings the AoC 2025/08 part1 regression from +22% to +8% vs master while keeping every other bench at parity or better. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@timfennis

Addresses review feedback on PR #146 from @timfennis: * scrub all mentions of an external AoC repo (`benches/programs/vec_hot_loop.ndc`, three comments in `ndc_vm/src/vm.rs`) — those references don't belong here * simplify the OpAssignment "both `op=` and `op`" comment in `ndc_analyser/src/analyser.rs` — drop the jargon, keep the rationale * tighten the `analyse_call` doc comment — say what it does, skip the side-table mechanics that the caller already documents No behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

timfennis · 2026-05-24T10:04:13Z

@codex please check the PR again

chatgpt-codex-connector · 2026-05-24T10:12:26Z

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector Bot reviewed May 24, 2026

View reviewed changes

Comment thread ndc_analyser/src/scope.rs Outdated

timfennis commented May 24, 2026

View reviewed changes

Comment thread benches/programs/vec_hot_loop.ndc Outdated

Comment thread ndc_analyser/src/analyser.rs Outdated

Comment thread ndc_analyser/src/analyser.rs Outdated

timfennis merged commit ae1e616 into master May 24, 2026
1 check passed

timfennis deleted the feature/vectorization-redesign branch May 24, 2026 10:19

timfennis mentioned this pull request May 24, 2026

Vec dispatch should pick a scalar per element pair, not one for all pairs #145

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vectorization): broaden tuple operators and recover precise types 📐#146

feat(vectorization): broaden tuple operators and recover precise types 📐#146
timfennis merged 3 commits into
masterfrom
feature/vectorization-redesign

timfennis commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timfennis commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

timfennis commented May 24, 2026

Summary

Design

Runtime dispatch

Behaviour changes

Benchmarks

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timfennis commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant