perf(vm): pass callback args by slice to skip per-call Vec allocs 🩻 by timfennis · Pull Request #154 · timfennis/andy-cpp

timfennis · 2026-05-24T13:01:31Z

Summary

dispatch_vec_call / dispatch_vec_call_dynamic previously allocated a fresh Vec<Value> per broadcast element via std::mem::replace(&mut elem_args, Vec::with_capacity(args)). They now reuse a single elem_args buffer via clear().
All 16 stdlib HOF call sites (map, filter, sort_by, reduce, max_by_key, …) previously did comp.call(vec![…]) — one heap alloc per element. They now build a stack array and pass &[…].
Three callsites (filter, find, by_key) where the slice was &[x.clone()] switched to std::slice::from_ref(&x) per clippy's suggestion, eliminating a real Rc::clone + drop per element on object-heavy iterables.
Signatures updated: Vm::call_callback, Vm::call_function, VmCallable::call all take &[Value]. The native dispatch path was already using &args; the closure path now does self.stack.extend(args.iter().cloned()). For numeric tuples (the main vec-dispatch case), elements are Int/Float and clone is a bitwise copy — no extra Rc work.

Benchmarks

Bench	Baseline	Patched	Ratio
`vec_hot_loop` (200k Tuple+Tuple)	41.0 ± 3.4 ms	39.7 ± 3.1 ms	1.03× ±0.12
`hof_pipeline` (filter/map/reduce, 100k items)	34.7 ± 2.9 ms	35.2 ± 2.3 ms	no change

Bench needle didn't move meaningfully — the allocator caches these small Vecs well — but the dispatch loop and HOF callsites read cleaner and the from_ref change has measurable upside for non-trivial element types. The advent-of-brian comparison run will be added as a comment.

Test plan

cargo fmt
cargo clippy --all-targets — no new warnings introduced
cargo test — all 718 tests pass

In `dispatch_vec_call` and `dispatch_vec_call_dynamic`, every broadcast element used `std::mem::replace(&mut elem_args, Vec::with_capacity(args))` to hand a fresh `Vec<Value>` to `call_callback` — N allocations per outer call. Likewise every stdlib HOF callsite did `comp.call(vec![…])`, one heap allocation per element of `map`/`filter`/`sort_by`/`reduce`/etc. `call_callback` and `VmCallable::call` now take `&[Value]`. The native path was already passing `&args`; the closure path becomes `extend(args.iter().cloned())`. The vec dispatch loops reuse a single `elem_args` buffer via `clear()`. Stdlib HOFs build stack arrays. Three `vec![x.clone()]` sites that clippy flagged switch to `std::slice::from_ref(&x)`, eliminating a real Rc bump+drop per element on object-heavy iterables. `vec_hot_loop` and `hof_pipeline` benches show no meaningful movement (the allocator caches these small Vecs well), but the dispatch loops and HOF callsites read cleaner and the slice-from-ref change has measurable upside for non-trivial element types. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

timfennis · 2026-05-24T13:05:29Z

advent-of-brian comparison (23 programs)

Aggregate, 5 full-suite runs

	run 1	run 2	run 3	run 4	run 5	mean
baseline	4355	4547	4367	4314	4294	4375 ms
patched	4339	4408	4330	4340	4291	4342 ms

0.77% delta on the aggregate — within noise. Most programs are short enough that process startup dominates.

Per-program (hyperfine, 8 runs each, the 4 longest)

Program	Baseline	Patched	Ratio
`2025/08/part1.ndc`	530.8 ± 5.5 ms	495.0 ± 5.5 ms	1.07× faster
`2025/08/part2.ndc`	803.1 ± 14.5 ms	764.5 ± 7.4 ms	1.05× faster
`2025/04/part2.ndc`	362.1 ± 3.4 ms	364.3 ± 5.8 ms	flat (±1%)
`2025/09/part2.ndc`	2042 ± 11 ms	2055 ± 9 ms	flat (-1%)

Day-08 wins are real (5-7%, well outside σ) and reproducible. Both programs lean heavily on vec dispatch ((left - right) * (left - right) on int tuples) chained with HOFs (.map, .enumerate, .combinations, .sum, .product, .sort) — exactly the paths this PR targets.

Day-04 and day-09 are flat, which is what we'd expect for programs that don't lean on the affected paths.

timfennis merged commit 7c2a345 into master May 24, 2026
1 check passed

timfennis deleted the perf/callback-slice-args branch May 24, 2026 13:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(vm): pass callback args by slice to skip per-call Vec allocs 🩻#154

perf(vm): pass callback args by slice to skip per-call Vec allocs 🩻#154
timfennis merged 1 commit into
masterfrom
perf/callback-slice-args

timfennis commented May 24, 2026

Uh oh!

Uh oh!

timfennis commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

timfennis commented May 24, 2026

Summary

Benchmarks

Test plan

Uh oh!

Uh oh!

timfennis commented May 24, 2026

advent-of-brian comparison (23 programs)

Aggregate, 5 full-suite runs

Per-program (hyperfine, 8 runs each, the 4 longest)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant