Skip to content

perf(vm): defer GetIterator type stringification to error path 🦥#147

Merged
timfennis merged 1 commit into
masterfrom
bugfix/getiterator-eager-static-type
May 24, 2026
Merged

perf(vm): defer GetIterator type stringification to error path 🦥#147
timfennis merged 1 commit into
masterfrom
bugfix/getiterator-eager-static-type

Conversation

@timfennis
Copy link
Copy Markdown
Owner

Summary

OpCode::GetIterator was eagerly building the "X is not iterable" error message before checking whether the value was iterable. For nested containers (List<Tuple<Int, List<Tuple<Int,Int>>>>-shaped values), val.static_type() recursively walks every element of every container and allocates a fresh StaticType tree — which was then thrown away in the common case because the value was iterable.

Moving the static_type() call into the actual error branch turns a per-for-loop O(n) deep walk into a check that only runs when iteration genuinely fails.

How I found it

A user-reported script (AoB 2025 day 9 part 2) ran slower on the VM than on the pre-VM tree-walk interpreter, while every other script in that project was 3-5× faster on the VM. perf showed ~50% of CPU time in StaticType allocation / cloning / dropping / equality.

I instrumented Object::static_type for List/Tuple/Map/Deque with atomic counters and added a backtrace on calls against any List with >100 elements. A reduced repro of the script's hot loop logged 25.9 million Tuple static_type calls — all traced back to vm.rs:391, the format!("{}", val.static_type()) inside OpCode::GetIterator.

matches_param fallback (the other obvious suspect) fired 0 times. The eager error-message construction was the entire problem.

Benchmark suite

Hyperfine on benches/programs/*.ndc plus two scripts that exercise the regression (part2_aob is the originally-reported case; nested is a reduced repro):

Benchmark Before (ms) After (ms) Speedup
ackermann 132.1 ± 4.0 132.9 ± 4.2 0.99×
bigint 6.9 ± 1.9 7.5 ± 1.7 0.92×
closures 72.6 ± 3.4 73.8 ± 3.3 0.98×
enumerate_find 155.4 ± 3.5 156.7 ± 1.8 0.99×
enumerate_for_loop 108.0 ± 3.7 107.0 ± 4.5 1.01×
enumerate_take_small 30.2 ± 3.7 30.5 ± 3.2 0.99×
enumerate_to_list 65.9 ± 2.8 64.9 ± 2.8 1.01×
fibonacci 68.2 ± 2.8 68.7 ± 4.1 0.99×
fibonacci_typed 58.3 ± 3.9 59.5 ± 3.8 0.98×
hof_pipeline 34.2 ± 3.4 35.6 ± 2.7 0.96×
map_ops 25.9 ± 3.1 24.7 ± 2.8 1.05×
matrix_mul 57.0 ± 4.0 54.3 ± 3.7 1.05×
nested (repro) 1552.4 ± 9.6 122.5 ± 2.4 12.68×
part1_aob 83.3 ± 3.4 75.5 ± 3.3 1.10×
part2_aob 5638.2 ± 81.8 2042.2 ± 11.9 2.76×
perlin 64.9 ± 4.1 65.2 ± 4.6 1.00×
pi_approx 31.9 ± 3.9 31.5 ± 3.4 1.01×
print_heavy 7.2 ± 1.5 7.1 ± 1.1 1.02×
quicksort 72.7 ± 3.3 71.6 ± 3.9 1.02×
sieve 107.9 ± 2.9 107.7 ± 3.3 1.00×
string_concat 14.6 ± 1.2 14.8 ± 3.5 0.98×
vec_hot_loop 43.7 ± 3.8 41.6 ± 3.1 1.05×

Most of the curated suite shows no measurable change — expected, since the bug only fires when for ... in <expr> evaluates over deeply-nested containers. The two benches that hit the bug improve by ~3× and ~13×.

hyperfine --warmup 2 --min-runs 5 --max-runs 15 for the curated suite; longer-running scripts used fewer runs. Stats are mean ± stddev of wall time.

Test plan

  • cargo test — all 380+ tests across crates pass
  • cargo fmt --check — clean
  • cargo clippy — no new warnings (pre-existing ones remain)
  • Output of part2 still matches 1529011204
  • Benchmarked against master binary with hyperfine

🤖 Generated with Claude Code

`OpCode::GetIterator` was eagerly building the "X is not iterable"
error message before checking whether the value was iterable. For
deeply nested containers (e.g. `List<Tuple<Int, List<Tuple<Int,Int>>>>`),
`val.static_type()` recursively walks every element and allocates a
fresh `StaticType` tree — which was then thrown away in the common
case because the value *was* iterable.

Moving the call into the actual error branch turns a per-iter O(n)
deep walk into a check that only runs when iteration actually fails.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timfennis timfennis merged commit 87e2e1b into master May 24, 2026
1 check passed
@timfennis timfennis deleted the bugfix/getiterator-eager-static-type branch May 24, 2026 10:53
timfennis added a commit that referenced this pull request May 24, 2026
…155)

## Summary

`dispatch_vec_call` and `dispatch_vec_call_dynamic` both eagerly built
an `Option<String>` for the callee's name on every call, then only read
it from rarely-taken error branches (the overload-not-found `Err` and
the `call_callback` `map_err` closure). The success path threw the
`String` away. Same shape of bug as the GetIterator fix in #147.

## Changes

- `dispatch_vec_call`: borrow `&str` directly from
`scalars.first().and_then(|f| f.name())`. The slice is a caller-owned
parameter, so the borrow lifetime is independent of `&mut self` and the
`map_err` closure can capture it freely.
- `dispatch_vec_call_dynamic`: resolve the first vec candidate once into
a held `Rc<Object>`, then borrow `&str` out of it. The `resolve_var`
call already happened inside the old `callee_name()`; the `.to_string()`
is what's gone.
- `Vm::callee_name()` itself is kept — it's still used from the regular
`Call` opcode's "no function found" error path, where the allocation is
fine because we're already on an error path.

## Caveat — perf impact is barely measurable

`vec_hot_loop` (200k–2M `(int,int) + (int,int)` calls):

| Iters | Baseline | This PR |
|---|---|---|
| 200k | 39.0 ± 3.4 ms | 38.6 ± 3.1 ms |
| 2M | 336.2 ± 3.8 ms | 334.5 ± 4.1 ms |

≈1.01× — within noise. `perf` confirms ~13% of total time goes to
malloc/free, but the eliminated allocation is one small `String`
(operator name like `"+"`) per outer vec call, dwarfed by
`Function::clone`, the per-call `Vec` allocations for
`arg_values`/`elem_args`/`results`, and the final
`Rc::new(Object::Tuple(...))`. Unlike the GetIterator case, there's no
deep recursive walk being saved here.

So this is more of a code-cleanliness/correctness fix (no wasted
allocation on the hot path; `&str` reads more naturally than
`Option<String>`) than a real perf win. Happy to drop it if you'd rather
not carry the churn.

🤖 PR description generated by Claude.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant