perf(vm): defer GetIterator type stringification to error path 🦥#147
Merged
Conversation
`OpCode::GetIterator` was eagerly building the "X is not iterable" error message before checking whether the value was iterable. For deeply nested containers (e.g. `List<Tuple<Int, List<Tuple<Int,Int>>>>`), `val.static_type()` recursively walks every element and allocates a fresh `StaticType` tree — which was then thrown away in the common case because the value *was* iterable. Moving the call into the actual error branch turns a per-iter O(n) deep walk into a check that only runs when iteration actually fails. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
timfennis
added a commit
that referenced
this pull request
May 24, 2026
…155) ## Summary `dispatch_vec_call` and `dispatch_vec_call_dynamic` both eagerly built an `Option<String>` for the callee's name on every call, then only read it from rarely-taken error branches (the overload-not-found `Err` and the `call_callback` `map_err` closure). The success path threw the `String` away. Same shape of bug as the GetIterator fix in #147. ## Changes - `dispatch_vec_call`: borrow `&str` directly from `scalars.first().and_then(|f| f.name())`. The slice is a caller-owned parameter, so the borrow lifetime is independent of `&mut self` and the `map_err` closure can capture it freely. - `dispatch_vec_call_dynamic`: resolve the first vec candidate once into a held `Rc<Object>`, then borrow `&str` out of it. The `resolve_var` call already happened inside the old `callee_name()`; the `.to_string()` is what's gone. - `Vm::callee_name()` itself is kept — it's still used from the regular `Call` opcode's "no function found" error path, where the allocation is fine because we're already on an error path. ## Caveat — perf impact is barely measurable `vec_hot_loop` (200k–2M `(int,int) + (int,int)` calls): | Iters | Baseline | This PR | |---|---|---| | 200k | 39.0 ± 3.4 ms | 38.6 ± 3.1 ms | | 2M | 336.2 ± 3.8 ms | 334.5 ± 4.1 ms | ≈1.01× — within noise. `perf` confirms ~13% of total time goes to malloc/free, but the eliminated allocation is one small `String` (operator name like `"+"`) per outer vec call, dwarfed by `Function::clone`, the per-call `Vec` allocations for `arg_values`/`elem_args`/`results`, and the final `Rc::new(Object::Tuple(...))`. Unlike the GetIterator case, there's no deep recursive walk being saved here. So this is more of a code-cleanliness/correctness fix (no wasted allocation on the hot path; `&str` reads more naturally than `Option<String>`) than a real perf win. Happy to drop it if you'd rather not carry the churn. 🤖 PR description generated by Claude. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
OpCode::GetIteratorwas eagerly building the "X is not iterable" error message before checking whether the value was iterable. For nested containers (List<Tuple<Int, List<Tuple<Int,Int>>>>-shaped values),val.static_type()recursively walks every element of every container and allocates a freshStaticTypetree — which was then thrown away in the common case because the value was iterable.Moving the
static_type()call into the actual error branch turns a per-for-loop O(n) deep walk into a check that only runs when iteration genuinely fails.How I found it
A user-reported script (AoB 2025 day 9 part 2) ran slower on the VM than on the pre-VM tree-walk interpreter, while every other script in that project was 3-5× faster on the VM.
perfshowed ~50% of CPU time inStaticTypeallocation / cloning / dropping / equality.I instrumented
Object::static_typeforList/Tuple/Map/Dequewith atomic counters and added a backtrace on calls against anyListwith >100 elements. A reduced repro of the script's hot loop logged 25.9 million Tuplestatic_typecalls — all traced back tovm.rs:391, theformat!("{}", val.static_type())insideOpCode::GetIterator.matches_paramfallback (the other obvious suspect) fired 0 times. The eager error-message construction was the entire problem.Benchmark suite
Hyperfine on
benches/programs/*.ndcplus two scripts that exercise the regression (part2_aobis the originally-reported case;nestedis a reduced repro):ackermannbigintclosuresenumerate_findenumerate_for_loopenumerate_take_smallenumerate_to_listfibonaccifibonacci_typedhof_pipelinemap_opsmatrix_mulnested(repro)part1_aobpart2_aobperlinpi_approxprint_heavyquicksortsievestring_concatvec_hot_loopMost of the curated suite shows no measurable change — expected, since the bug only fires when
for ... in <expr>evaluates over deeply-nested containers. The two benches that hit the bug improve by ~3× and ~13×.hyperfine --warmup 2 --min-runs 5 --max-runs 15for the curated suite; longer-running scripts used fewer runs. Stats are mean ± stddev of wall time.Test plan
cargo test— all 380+ tests across crates passcargo fmt --check— cleancargo clippy— no new warnings (pre-existing ones remain)1529011204masterbinary withhyperfine🤖 Generated with Claude Code