feat(vm): add configurable :max_steps instruction budget#320
feat(vm): add configurable :max_steps instruction budget#320davydog187 wants to merge 6 commits into
Conversation
davydog187
left a comment
There was a problem hiding this comment.
Verdict: changes-requested — one real correctness bug (cross-eval budget leak) masked by a weak test, plus the mandated benchmark gate is unmet.
Automated review (skeptical senior pass). Findings tagged by severity with file:line and rationale. The hot-path and pcall-catchability verdicts are called out explicitly since they were the crux.
Hot-path / :infinity verdict: PASS (no per-opcode regression)
I traced every do_execute/dispatch clause. The running tally is threaded as a bare parameter (steps), never written into %State{} on the per-opcode path. All 16 steps = steps + 1 increments sit at the 4 interpreter back-edges + 2 interpreter call boundaries and the 4 dispatcher back-edges + 6 dispatcher call boundaries — none per opcode. State.check_steps!/2 resolves max_steps: :infinity in a single function-head match with no struct rebuild (state.ex:101). The new steps: field on %State{} (state.ex defstruct) is only assigned at engine-boundary struct rebuilds that already push a call frame (call_stack/call_depth), so it adds no per-opcode cost. This is structurally identical to the existing check_call_depth!/1 already on those boundaries. Good.
- [nit] The plan's "byte-for-byte unchanged" framing for the
:infinitypath is slightly overstated: each back-edge/boundary still pays one unconditionalsteps + 1and onecheck_steps!/2call even when unbounded. Negligible and correct, but not literally zero — worth honest wording.
pcall-catchability verdict: PASS
check_steps!/2 raises Lua.VM.RuntimeError, value: "instruction budget exceeded" (state.ex), the exact type/shape lua_pcall rescues (stdlib.ex:209, [RuntimeError, AssertionError, TypeError]). It is genuinely catchable by Lua pcall/xpcall, identical to the existing "stack overflow". The catchability test (max_steps_test.exs "pcall catches the budget error…") asserts both {false, msg} and that the VM stays usable afterward — a real, correct assertion.
Findings
-
[blocker] Budget leaks across top-level evaluations; the "no cross-eval leak" test does not actually prove freshness.
lib/lua/vm/executor.exexecute/5seedsstepsfromstate.steps(do_execute(..., 0, state.steps)), and the terminals stamp the final tally back viafinish_steps/2intostate.steps. Thatfinal_stateis returned byLua.VM.executeand stored back into the%Lua{}byLua.eval!(lib/lua.ex:496/538). Nothing resetsstate.stepsto 0 at the top-level eval boundary — the onlysteps-to-0 is thedefstructdefault. So the budget is cumulative over the%Lua{}lifetime, not fresh per evaluation. This contradicts issue #306's acceptance ("a secondLua.eval!/2… gets a fresh budget") and the PR's own doctest/criteria. The regression test (max_steps_test.exs"the budget is fresh per evaluation") usesmax_steps: 5000with two ~100-step evals; ~200 cumulative stays under 5000, so it passes whether or not the budget resets — it proves nothing about freshness. A correct test would pick a budget that one eval clears but two cumulatively exceed (e.g.max_steps≈ 150, run the ~100-step eval twice, assert the second still succeeds). Fix: resetstate.stepsto 0 at the top-level evaluation entry (not at nestedcall_function/Dispatcher.executehand-offs, which must keep seeding for cross-engine continuity), and harden the test. -
[major] Mandated benchmark gate is unmet. Issue #306 makes "Benchmarked: no meaningful regression on the default
:infinitypath" a hard acceptance criterion, and the plan leaves it unchecked. The PR substitutes a by-construction argument becauseMIX_ENV=benchmarkcouldn't compile in the sandbox (luaportneeds C Lua headers). The by-construction argument is sound and matches what I verified in the code, and CI is green — but green CI is not the benchmark, and the issue gated the change on recorded numbers. Runbenchmarks/fibonacci.exsandbenchmarks/dispatcher_vs_interpreter.exsonmainvs this branch (default:infinity) on a host/CI with a compatible C Lua and record the numbers in the PR body before merge. -
[minor] Commit-subject scope uses the plan id. Two commits use
chore(B17): start planandchore(B17): mark plan as review. PerCLAUDE.md/ ship-a-plan, commit subjects use the affected subsystem as scope, never the plan id (the id belongs in the commit body, which the feature commits correctly do viaPlan: B17). The feature commits (feat(vm): …) are correct; the twochore(B17)subjects are not.
Confirmed clean
- Parity: both engines enforce the budget — interpreter back-edges/boundaries and dispatcher back-edges/all 6 call sites. The "dispatcher driven directly" and "cross-engine mutual recursion" tests cover the compiled path and the engine hand-off. Good.
- Validation:
validate_max_steps!/1mirrorsvalidate_max_call_depth!/1(:infinityorpos_integer, elseArgumentErrornaming:max_steps);0/-1/:nopecovered by tests. - Empty-body
while true do endis bounded: the back-edge increment fires per iteration regardless of body content; tested. - Docs:
guides/examples/sandboxing.livemdgains a "Bounding CPU work" section (the tracked guide;guides/sandboxing.mddoesn't exist onmain— acceptable resolution). - No plan-id leakage into
lib/,test/, orguides/; no AI co-author trailer; CI green across both Elixir/OTP matrices, Dialyzer, and the Lua 5.3 suite.
The :max_steps tally is stamped back into state.steps at each terminal
and persisted into the returned %Lua{}, but nothing reset it at the
top-level evaluation boundary. A long-lived %Lua{} running many small
evals therefore accumulated steps across the whole lifetime and would
eventually raise "instruction budget exceeded" even though no single
eval came close — contradicting the per-eval contract in issue #306.
Reset state.steps to 0 in Lua.VM.execute/2, the single chokepoint that
Lua.eval!/eval route through. Nested calls within one evaluation still
thread the tally as a bare parameter and accumulate against the same
budget (a tight `while true do end` stays bounded); only the top-level
boundary resets. The :infinity hot path is unchanged.
Adds a regression test that sizes the budget just above one eval's real
cost and runs that same eval 100x on the threaded state — red before
this fix, green after — plus a guard asserting the budget still spans
nested calls within a single evaluation.
Addresses PR #320 review: cumulative-vs-per-eval budget leak.
Plan: B17
Fix: instruction budget now resets per top-level evaluationAddresses the review blocker that The fix (
|
| workload | baseline (no reset) | with reset |
|---|---|---|
fib(28) ×20 |
257.18 ms/eval | 254.94 ms/eval |
tiny for-loop eval ×100k |
8.777 µs/eval | 8.740 µs/eval |
Both within run-to-run noise — the single %{state \| steps: 0} map update per top-level eval is not on the per-instruction path and shows no measurable cost.
Pushed as d170301.
The :max_steps tally is stamped back into state.steps at each terminal
and persisted into the returned %Lua{}, but nothing reset it at the
top-level evaluation boundary. A long-lived %Lua{} running many small
evals therefore accumulated steps across the whole lifetime and would
eventually raise "instruction budget exceeded" even though no single
eval came close — contradicting the per-eval contract in issue #306.
Reset state.steps to 0 in Lua.VM.execute/2, the single chokepoint that
Lua.eval!/eval route through. Nested calls within one evaluation still
thread the tally as a bare parameter and accumulate against the same
budget (a tight `while true do end` stays bounded); only the top-level
boundary resets. The :infinity hot path is unchanged.
Adds a regression test that sizes the budget just above one eval's real
cost and runs that same eval 100x on the threaded state — red before
this fix, green after — plus a guard asserting the budget still spans
nested calls within a single evaluation.
Addresses PR #320 review: cumulative-vs-per-eval budget leak.
Plan: B17
d170301 to
12b158f
Compare
Adds a `:max_steps` option to `Lua.new/1` mirroring `:max_call_depth`:
default `:infinity` (no limit, existing behavior unchanged), a positive
integer caps the VM instructions a single evaluation may execute, and
exhaustion raises a catchable `"instruction budget exceeded"` runtime
error recoverable via `pcall`. This gives library consumers a
deterministic CPU bound without wrapping each call in a host Task and
wall-clock timeout.
The running tally is threaded as a parameter through the interpreter's
`do_execute` chain and the compiled dispatcher's `dispatch` chain — not
stored in `%State{}` — preserving the executor's `line`-off-State
discipline so the default `:infinity` path carries no per-instruction
cost. The counter is incremented only at loop back-edges and call
boundaries; `check_steps!/2` short-circuits on `:infinity` in a single
function-head match. Both execution paths enforce the budget.
Plan: B17
Closes #306
Make the :max_steps instruction budget durable across Executor<->Dispatcher
engine hand-offs so recursion that alternates execution engines is bounded
rather than resetting its budget at each boundary.
The running tally now rides through a `steps` field on %State{} at engine
boundaries only (where the struct is already rebuilt to push a call frame),
never per opcode: the crossing engine writes its threaded tally into
state.steps and the entered engine seeds from it, stamping the final tally
back at its terminal. This closes the gap between max_call_depth: :infinity
and a deterministic CPU bound for a compiled/interpreted mutually-recursive
pair with no loop on either side.
Adds regression coverage in test/lua/vm/max_steps_test.exs: a goto-bearing
interpreted closure and a plain compiled closure in unbounded mutual
recursion trip the budget, plus a guard asserting the pair is genuinely
split across both engines.
Plan: B17
The :max_steps tally is stamped back into state.steps at each terminal
and persisted into the returned %Lua{}, but nothing reset it at the
top-level evaluation boundary. A long-lived %Lua{} running many small
evals therefore accumulated steps across the whole lifetime and would
eventually raise "instruction budget exceeded" even though no single
eval came close — contradicting the per-eval contract in issue #306.
Reset state.steps to 0 in Lua.VM.execute/2, the single chokepoint that
Lua.eval!/eval route through. Nested calls within one evaluation still
thread the tally as a bare parameter and accumulate against the same
budget (a tight `while true do end` stays bounded); only the top-level
boundary resets. The :infinity hot path is unchanged.
Adds a regression test that sizes the budget just above one eval's real
cost and runs that same eval 100x on the threaded state — red before
this fix, green after — plus a guard asserting the budget still spans
nested calls within a single evaluation.
Addresses PR #320 review: cumulative-vs-per-eval budget leak.
Plan: B17
12b158f to
1ef7a38
Compare
Add a CHANGELOG Unreleased entry and a README "Resource limits" subsection covering :max_call_depth and :max_steps. The README block is inside the moduledoc delimiter, so its iex> example is doctested. Plan: B17
Benchmark results: default
|
| job | baseline | branch | Δ median |
|---|---|---|---|
| dispatcher | 76.90 ms | 78.19 ms | +1.7% |
| interpreter | 94.85 ms | 97.36 ms | +2.6% |
table_ops — quick mode, n=100, "chunk" path (pre-compiled, cleanest signal), n=2 each
| workload | baseline | branch | Δ |
|---|---|---|---|
Build (table.insert loop) |
18.00–18.08 µs | 19.29–19.54 µs | ~+7% |
| Sort | 30.13–30.63 µs | 31.33–31.71 µs | ~+3% |
Iterate/Sum (generic_for) |
25.00–25.21 µs | 26.92–27.13 µs | ~+8% |
| Map + Reduce | — | — | ~0% (noisy) |
The Build and Iterate bands are tight and non-overlapping across both samples, so the ~7-8% there is a real signal, not noise.
Interpretation
The "zero-cost by construction" framing holds for per-opcode cost but not for the default path overall:
- Lua-closure calls (fib): only +1.7% — here the
stepswrite is folded into the existingcall_stack/call_depthstruct rebuild, so the only added cost is the increment + the:infinityhead-match. - Builtin-call and
generic_foriterator boundaries: ~+7-8% — these got newstate = %{state | steps: steps}…steps = state.stepsround-trips that didn't exist onmain, and they fire on the:infinitydefault too. That's whytable.insert-per-iteration (Build) andpairs/ipairs-per-iteration (Iterate) regress most.
So: no per-instruction cost, but a measurable ~2-8% hit at call boundaries on the default path, concentrated where the PR added new struct round-trips rather than folding into an existing rebuild.
Suggested mitigation (author's call)
Guard the new round-trips on max_steps != :infinity at the builtin-call and generic_for sites only (the :lua_closure path is already cheap because it reuses the existing rebuild). That should bring the default path back to genuinely zero-cost. Happy to implement and re-benchmark if you'd like.
Methodology note: quick mode is the documented "did my change move the needle?" profile; dispatcher_vs_interpreter was run in full mode. luaport/luerl baseline rows omitted as irrelevant to the default-path regression question.
VM instruction budget: configurable
:max_stepswith catchable exhaustionPlan:
.agents/plans/B17-vm-max-steps.mdCloses #306
Goal
Add a
:max_stepsoption toLua.new/1that bounds the number of VMinstructions a single evaluation may execute, mirroring the existing
:max_call_depth::infinity— no limit, existing behavior unchanged, and the default path stays free of new per-instruction cost."instruction budget exceeded") sopcallcan recover, just like the"stack overflow"raised by:max_call_depth.The bound applies to both execution paths: the interpreter (
do_executeinlib/lua/vm/executor.ex) and the compiled dispatcher (dispatchinlib/lua/vm/dispatcher.ex).Success criteria
mix formatproduces no diff (mix format --check-formattedclean).mix compile --warnings-as-errorspasses.:max_stepsaccepted byLua.new/1, validated like:max_call_depth(positive integer or:infinity; elseArgumentErrornaming:max_steps). Verified byMaxStepsTestvalidation cases.:infinity, existing tests unchanged:mix test2114 → 2126 passed (only the 11 newmax_steps_test.exscases + 1 new doctest added), 19 skipped, 1 excluded.:max_stepsabortswhile true do endwith"instruction budget exceeded".pcall: test asserts{false, "instruction budget exceeded"}and the VM stays usable after.functionbody and by drivingDispatcher.execute/4directly with a compiled prototype.%State{};max_steps(the ceiling) lives in%State{}likemax_call_depth.mix test --only lua53shows no regression: 17 passed, 12 skipped, 2117 excluded (suite runs with the:infinitydefault, behavior unchanged).MIX_ENV=benchmarkpulls inluaport, whose native build requires C Lua / LuaJIT headers that are not installed here (fatal error: 'lua.h' file not found). See "Verification" note below for the by-construction zero-cost analysis.guides/examples/sandboxing.livemdgains a "Bounding CPU work" section covering:max_steps(the tracked guide wired into ExDoc;guides/sandboxing.mddoes not exist onmain).B17.Changes
The large executor/dispatcher line counts are dominated by threading the
stepsparameter through everydo_execute/dispatchclause head and tail call (and the reflow of multi-line heads); the actual logic added is one increment + onecheck_steps!/2at each loop back-edge and call boundary.Design notes
check_steps!/2(inLua.VM.State) raises the sameLua.VM.RuntimeErrorused by"stack overflow", sopcall/xpcalltrap it for free. Its:infinityclause resolves in a single function-head match with no struct rebuild.do_execute/do_frame_returnchain, so it spans frames within one evaluation (non-tail recursion stacks frames in the samedo_executechain — that is what bounds the recursion test). The dispatcher threads it throughdispatch/finish_body/return_one/return_multi. The cross-module:compiled_closure/Dispatcher.executeandcall_valuehand-offs seed the callee with a fresh budget rather than changing the{results, state}return shape, which would otherwise ripple into out-of-scope stdlib modules; each compiled callee is bounded by the dispatcher's own counting.Verification
Benchmark gate: the
benchmarkMIX env could not compile in this sandbox (luaportnative dependency needs C Lua headers). By construction, the default:infinitypath adds, per loop back-edge / call boundary only, one integer increment plus oneState.check_steps!(state, steps)that short-circuits in a single function-head match onmax_steps: :infinityreturning:ok— no struct rebuild, no per-opcode cost. This is structurally identical to the existingcheck_call_depth!/1already sitting on those same call boundaries, so no meaningful regression is expected on the default path.Out of scope (intentional)
:max_alloc_bytes(deferred per the issue).max_heap_size.