feat(vm): add configurable :max_steps instruction budget by davydog187 · Pull Request #320 · tv-labs/lua

davydog187 · 2026-06-01T17:18:39Z

VM instruction budget: configurable `:max_steps` with catchable exhaustion

Plan: .agents/plans/B17-vm-max-steps.md
Closes #306

Goal

Add a :max_steps option to Lua.new/1 that bounds the number of VM
instructions a single evaluation may execute, mirroring the existing
:max_call_depth:

Default :infinity — no limit, existing behavior unchanged, and the default path stays free of new per-instruction cost.
A positive integer caps total instructions executed. On exhaustion the VM raises a catchable Lua runtime error ("instruction budget exceeded") so pcall can recover, just like the "stack overflow" raised by :max_call_depth.

The bound applies to both execution paths: the interpreter (do_execute in lib/lua/vm/executor.ex) and the compiled dispatcher (dispatch in lib/lua/vm/dispatcher.ex).

Success criteria

Changes

 guides/examples/sandboxing.livemd |  40 +++
 lib/lua.ex                        |  34 ++-
 lib/lua/vm/dispatcher.ex          | 336 ++++++++++++++----------
 lib/lua/vm/executor.ex            | 536 +++++++++++++++++++++++++-------------
 lib/lua/vm/state.ex               |  35 ++-
 test/lua/vm/max_steps_test.exs    | new

The large executor/dispatcher line counts are dominated by threading the steps parameter through every do_execute/dispatch clause head and tail call (and the reflow of multi-line heads); the actual logic added is one increment + one check_steps!/2 at each loop back-edge and call boundary.

Design notes

check_steps!/2 (in Lua.VM.State) raises the same Lua.VM.RuntimeError used by "stack overflow", so pcall/xpcall trap it for free. Its :infinity clause resolves in a single function-head match with no struct rebuild.
The tally is threaded through the interpreter's do_execute/do_frame_return chain, so it spans frames within one evaluation (non-tail recursion stacks frames in the same do_execute chain — that is what bounds the recursion test). The dispatcher threads it through dispatch/finish_body/return_one/return_multi. The cross-module :compiled_closure/Dispatcher.execute and call_value hand-offs seed the callee with a fresh budget rather than changing the {results, state} return shape, which would otherwise ripple into out-of-scope stdlib modules; each compiled callee is bounded by the dispatcher's own counting.

Verification

mix format --check-formatted   # clean
mix compile --warnings-as-errors  # clean
mix test                       # 2126 passed, 19 skipped, 1 excluded
mix test --only lua53          # 17 passed, 12 skipped, 2117 excluded
mix test test/lua/vm/max_steps_test.exs        # 11 passed
mix test test/lua/vm/recursion_depth_test.exs  # 7 passed

Benchmark gate: the benchmark MIX env could not compile in this sandbox (luaport native dependency needs C Lua headers). By construction, the default :infinity path adds, per loop back-edge / call boundary only, one integer increment plus one State.check_steps!(state, steps) that short-circuits in a single function-head match on max_steps: :infinity returning :ok — no struct rebuild, no per-opcode cost. This is structurally identical to the existing check_call_depth!/1 already sitting on those same call boundaries, so no meaningful regression is expected on the default path.

Out of scope (intentional)

:max_alloc_bytes (deferred per the issue).
Per-instruction counting on every opcode.
Tail-call optimization / frame-push changes.
Wall-clock timeouts / max_heap_size.
Mid-run budget introspection or reset.

davydog187

Verdict: changes-requested — one real correctness bug (cross-eval budget leak) masked by a weak test, plus the mandated benchmark gate is unmet.

Automated review (skeptical senior pass). Findings tagged by severity with file:line and rationale. The hot-path and pcall-catchability verdicts are called out explicitly since they were the crux.

Hot-path / `:infinity` verdict: PASS (no per-opcode regression)

I traced every do_execute/dispatch clause. The running tally is threaded as a bare parameter (steps), never written into %State{} on the per-opcode path. All 16 steps = steps + 1 increments sit at the 4 interpreter back-edges + 2 interpreter call boundaries and the 4 dispatcher back-edges + 6 dispatcher call boundaries — none per opcode. State.check_steps!/2 resolves max_steps: :infinity in a single function-head match with no struct rebuild (state.ex:101). The new steps: field on %State{} (state.ex defstruct) is only assigned at engine-boundary struct rebuilds that already push a call frame (call_stack/call_depth), so it adds no per-opcode cost. This is structurally identical to the existing check_call_depth!/1 already on those boundaries. Good.

[nit] The plan's "byte-for-byte unchanged" framing for the :infinity path is slightly overstated: each back-edge/boundary still pays one unconditional steps + 1 and one check_steps!/2 call even when unbounded. Negligible and correct, but not literally zero — worth honest wording.

pcall-catchability verdict: PASS

check_steps!/2 raises Lua.VM.RuntimeError, value: "instruction budget exceeded" (state.ex), the exact type/shape lua_pcall rescues (stdlib.ex:209, [RuntimeError, AssertionError, TypeError]). It is genuinely catchable by Lua pcall/xpcall, identical to the existing "stack overflow". The catchability test (max_steps_test.exs "pcall catches the budget error…") asserts both {false, msg} and that the VM stays usable afterward — a real, correct assertion.

Findings

[blocker] Budget leaks across top-level evaluations; the "no cross-eval leak" test does not actually prove freshness. lib/lua/vm/executor.ex execute/5 seeds steps from state.steps (do_execute(..., 0, state.steps)), and the terminals stamp the final tally back via finish_steps/2 into state.steps. That final_state is returned by Lua.VM.execute and stored back into the %Lua{} by Lua.eval! (lib/lua.ex:496/538). Nothing resets state.steps to 0 at the top-level eval boundary — the only steps-to-0 is the defstruct default. So the budget is cumulative over the %Lua{} lifetime, not fresh per evaluation. This contradicts issue #306's acceptance ("a second Lua.eval!/2 … gets a fresh budget") and the PR's own doctest/criteria. The regression test (max_steps_test.exs "the budget is fresh per evaluation") uses max_steps: 5000 with two ~100-step evals; ~200 cumulative stays under 5000, so it passes whether or not the budget resets — it proves nothing about freshness. A correct test would pick a budget that one eval clears but two cumulatively exceed (e.g. max_steps ≈ 150, run the ~100-step eval twice, assert the second still succeeds). Fix: reset state.steps to 0 at the top-level evaluation entry (not at nested call_function/Dispatcher.execute hand-offs, which must keep seeding for cross-engine continuity), and harden the test.
[major] Mandated benchmark gate is unmet. Issue #306 makes "Benchmarked: no meaningful regression on the default :infinity path" a hard acceptance criterion, and the plan leaves it unchecked. The PR substitutes a by-construction argument because MIX_ENV=benchmark couldn't compile in the sandbox (luaport needs C Lua headers). The by-construction argument is sound and matches what I verified in the code, and CI is green — but green CI is not the benchmark, and the issue gated the change on recorded numbers. Run benchmarks/fibonacci.exs and benchmarks/dispatcher_vs_interpreter.exs on main vs this branch (default :infinity) on a host/CI with a compatible C Lua and record the numbers in the PR body before merge.
[minor] Commit-subject scope uses the plan id. Two commits use chore(B17): start plan and chore(B17): mark plan as review. Per CLAUDE.md / ship-a-plan, commit subjects use the affected subsystem as scope, never the plan id (the id belongs in the commit body, which the feature commits correctly do via Plan: B17). The feature commits (feat(vm): …) are correct; the two chore(B17) subjects are not.

Confirmed clean

Parity: both engines enforce the budget — interpreter back-edges/boundaries and dispatcher back-edges/all 6 call sites. The "dispatcher driven directly" and "cross-engine mutual recursion" tests cover the compiled path and the engine hand-off. Good.
Validation: validate_max_steps!/1 mirrors validate_max_call_depth!/1 (:infinity or pos_integer, else ArgumentError naming :max_steps); 0/-1/:nope covered by tests.
Empty-body while true do end is bounded: the back-edge increment fires per iteration regardless of body content; tested.
Docs: guides/examples/sandboxing.livemd gains a "Bounding CPU work" section (the tracked guide; guides/sandboxing.md doesn't exist on main — acceptable resolution).
No plan-id leakage into lib/, test/, or guides/; no AI co-author trailer; CI green across both Elixir/OTP matrices, Dialyzer, and the Lua 5.3 suite.

The :max_steps tally is stamped back into state.steps at each terminal and persisted into the returned %Lua{}, but nothing reset it at the top-level evaluation boundary. A long-lived %Lua{} running many small evals therefore accumulated steps across the whole lifetime and would eventually raise "instruction budget exceeded" even though no single eval came close — contradicting the per-eval contract in issue #306. Reset state.steps to 0 in Lua.VM.execute/2, the single chokepoint that Lua.eval!/eval route through. Nested calls within one evaluation still thread the tally as a bare parameter and accumulate against the same budget (a tight `while true do end` stays bounded); only the top-level boundary resets. The :infinity hot path is unchanged. Adds a regression test that sizes the budget just above one eval's real cost and runs that same eval 100x on the threaded state — red before this fix, green after — plus a guard asserting the budget still spans nested calls within a single evaluation. Addresses PR #320 review: cumulative-vs-per-eval budget leak. Plan: B17

davydog187 · 2026-06-01T19:36:05Z

Fix: instruction budget now resets per top-level evaluation

Addresses the review blocker that :max_steps was cumulative over the whole %Lua{} lifetime rather than fresh per evaluation. The tally is stamped back into state.steps at each terminal and persisted into the returned %Lua{}, but nothing reset it at the top-level boundary — so a long-lived %Lua{} running many small evals would eventually raise "instruction budget exceeded" even though no single eval came close (contradicting #306).

The fix (`d170301`)

One line at the single chokepoint both Lua.eval!/eval route through — Lua.VM.execute/2:

# Reset the instruction-budget tally at the top-level evaluation boundary
state = %{state | steps: 0}

One engine entry covers both engines. Lua.VM.execute/2 always frames the eval through the interpreter (Executor.execute/5); the compiled dispatcher is only entered for nested calls within an eval, which thread/seed from this reset state. So a single reset bounds the whole evaluation across both engines. I did not reset inside Executor.execute/5 / Dispatcher.execute — those are re-entered on nested calls and must keep accumulating.
Within-eval accumulation preserved. The praised design is intact: tally threaded as a bare parameter, increments only at loop back-edges + call boundaries, :infinity short-circuit on the hot path. A tight while true do end (even one calling a helper every iteration) is still bounded.

TDD

The previous "no cross-eval leak" test used max_steps: 5000 with two ~100-step evals (~200 cumulative) — it passed regardless of the leak. New tests in test/lua/vm/max_steps_test.exs:

"a budget sized for one eval survives repeating that same eval on the threaded state" — max_steps: 2000, runs a ~50-iteration eval 100× on the threaded state. Red on HEAD (cfadf03: raised "instruction budget exceeded" on an early iteration), green after the fix.
"the budget does NOT reset on nested calls within a single evaluation" — a while true do step(s) end loop still trips the budget, guarding against an over-broad reset.

Validation

mix format --check-formatted ✅
mix compile --warnings-as-errors ✅
mix test ✅ — 2130 passed, 19 skipped
mix test test/lua53_suite_test.exs --only lua53 ✅ — 17 passed, 12 skipped
mix test test/lua/vm/max_steps_test.exs ✅ — 15 passed (incl. the new red-then-green test)

Benchmark status (formal #306 gate: UNMET — honest disclosure)

The default-:infinity Benchee acceptance benchmark could not be run: MIX_ENV=benchmark mix run benchmarks/fibonacci.exs fails to compile the :luaport native dependency — c_src/luaport.c:14: fatal error: 'lua.h' file not found (missing native Lua 5.4 headers; LuaJIT also not on PKG_CONFIG_PATH). Benchee/statistex/Luerl themselves compiled fine; the blocker is the C port dep. The formal gate remains unmet in this environment.

As a best-effort substitute I took a :timer.tc micro-measurement of the :infinity hot path, with the one-line reset vs. without it (mix run, dev env, same machine):

workload	baseline (no reset)	with reset
`fib(28)` ×20	257.18 ms/eval	254.94 ms/eval
tiny `for`-loop eval ×100k	8.777 µs/eval	8.740 µs/eval

Both within run-to-run noise — the single %{state \| steps: 0} map update per top-level eval is not on the per-instruction path and shows no measurable cost.

Pushed as d170301.

The :max_steps tally is stamped back into state.steps at each terminal and persisted into the returned %Lua{}, but nothing reset it at the top-level evaluation boundary. A long-lived %Lua{} running many small evals therefore accumulated steps across the whole lifetime and would eventually raise "instruction budget exceeded" even though no single eval came close — contradicting the per-eval contract in issue #306. Reset state.steps to 0 in Lua.VM.execute/2, the single chokepoint that Lua.eval!/eval route through. Nested calls within one evaluation still thread the tally as a bare parameter and accumulate against the same budget (a tight `while true do end` stays bounded); only the top-level boundary resets. The :infinity hot path is unchanged. Adds a regression test that sizes the budget just above one eval's real cost and runs that same eval 100x on the threaded state — red before this fix, green after — plus a guard asserting the budget still spans nested calls within a single evaluation. Addresses PR #320 review: cumulative-vs-per-eval budget leak. Plan: B17

Adds a `:max_steps` option to `Lua.new/1` mirroring `:max_call_depth`: default `:infinity` (no limit, existing behavior unchanged), a positive integer caps the VM instructions a single evaluation may execute, and exhaustion raises a catchable `"instruction budget exceeded"` runtime error recoverable via `pcall`. This gives library consumers a deterministic CPU bound without wrapping each call in a host Task and wall-clock timeout. The running tally is threaded as a parameter through the interpreter's `do_execute` chain and the compiled dispatcher's `dispatch` chain — not stored in `%State{}` — preserving the executor's `line`-off-State discipline so the default `:infinity` path carries no per-instruction cost. The counter is incremented only at loop back-edges and call boundaries; `check_steps!/2` short-circuits on `:infinity` in a single function-head match. Both execution paths enforce the budget. Plan: B17 Closes #306

Make the :max_steps instruction budget durable across Executor<->Dispatcher engine hand-offs so recursion that alternates execution engines is bounded rather than resetting its budget at each boundary. The running tally now rides through a `steps` field on %State{} at engine boundaries only (where the struct is already rebuilt to push a call frame), never per opcode: the crossing engine writes its threaded tally into state.steps and the entered engine seeds from it, stamping the final tally back at its terminal. This closes the gap between max_call_depth: :infinity and a deterministic CPU bound for a compiled/interpreted mutually-recursive pair with no loop on either side. Adds regression coverage in test/lua/vm/max_steps_test.exs: a goto-bearing interpreted closure and a plain compiled closure in unbounded mutual recursion trip the budget, plus a guard asserting the pair is genuinely split across both engines. Plan: B17

The :max_steps tally is stamped back into state.steps at each terminal and persisted into the returned %Lua{}, but nothing reset it at the top-level evaluation boundary. A long-lived %Lua{} running many small evals therefore accumulated steps across the whole lifetime and would eventually raise "instruction budget exceeded" even though no single eval came close — contradicting the per-eval contract in issue #306. Reset state.steps to 0 in Lua.VM.execute/2, the single chokepoint that Lua.eval!/eval route through. Nested calls within one evaluation still thread the tally as a bare parameter and accumulate against the same budget (a tight `while true do end` stays bounded); only the top-level boundary resets. The :infinity hot path is unchanged. Adds a regression test that sizes the budget just above one eval's real cost and runs that same eval 100x on the threaded state — red before this fix, green after — plus a guard asserting the budget still spans nested calls within a single evaluation. Addresses PR #320 review: cumulative-vs-per-eval budget leak. Plan: B17

Add a CHANGELOG Unreleased entry and a README "Resource limits" subsection covering :max_call_depth and :max_steps. The README block is inside the moduledoc delimiter, so its iex> example is doctested. Plan: B17

davydog187 · 2026-06-02T20:09:20Z

Benchmark results: default `:infinity` path — main vs branch

Ran the benchmark gate that couldn't run in the original sandbox (built luaport against Homebrew lua@5.4). Compared by swapping the 5 VM files to their merge-base (9dc141a) and back, so hardware/deps were identical — Apple M4, Elixir 1.20.0-rc.6, OTP 29. Medians reported (quick-mode means were noisy; medians are the robust stat).

`dispatcher_vs_interpreter` — fib(25), full mode (maximal Lua-closure call density)

job	baseline	branch	Δ median
dispatcher	76.90 ms	78.19 ms	+1.7%
interpreter	94.85 ms	97.36 ms	+2.6%

`table_ops` — quick mode, n=100, "chunk" path (pre-compiled, cleanest signal), n=2 each

workload	baseline	branch	Δ
Build (`table.insert` loop)	18.00–18.08 µs	19.29–19.54 µs	~+7%
Sort	30.13–30.63 µs	31.33–31.71 µs	~+3%
Iterate/Sum (`generic_for`)	25.00–25.21 µs	26.92–27.13 µs	~+8%
Map + Reduce	—	—	~0% (noisy)

The Build and Iterate bands are tight and non-overlapping across both samples, so the ~7-8% there is a real signal, not noise.

Interpretation

The "zero-cost by construction" framing holds for per-opcode cost but not for the default path overall:

Lua-closure calls (fib): only +1.7% — here the steps write is folded into the existing call_stack/call_depth struct rebuild, so the only added cost is the increment + the :infinity head-match.
Builtin-call and generic_for iterator boundaries: ~+7-8% — these got new state = %{state | steps: steps} … steps = state.steps round-trips that didn't exist on main, and they fire on the :infinity default too. That's why table.insert-per-iteration (Build) and pairs/ipairs-per-iteration (Iterate) regress most.

So: no per-instruction cost, but a measurable ~2-8% hit at call boundaries on the default path, concentrated where the PR added new struct round-trips rather than folding into an existing rebuild.

Suggested mitigation (author's call)

Guard the new round-trips on max_steps != :infinity at the builtin-call and generic_for sites only (the :lua_closure path is already cheap because it reuses the existing rebuild). That should bring the default path back to genuinely zero-cost. Happy to implement and re-benchmark if you'd like.

_{Methodology note: quick mode is the documented "did my change move the needle?" profile; dispatcher_vs_interpreter was run in full mode. luaport/luerl baseline rows omitted as irrelevant to the default-path regression question.}

davydog187 commented Jun 1, 2026

View reviewed changes

davydog187 force-pushed the feat/vm-max-steps branch from d170301 to 12b158f Compare June 1, 2026 22:38

davydog187 added 5 commits June 2, 2026 12:35

chore(B17): start plan

aef7701

chore(B17): mark plan as review

8a30931

davydog187 force-pushed the feat/vm-max-steps branch from 12b158f to 1ef7a38 Compare June 2, 2026 19:36

docs(changelog,readme): document :max_steps instruction budget

37f2f46

Add a CHANGELOG Unreleased entry and a README "Resource limits" subsection covering :max_call_depth and :max_steps. The README block is inside the moduledoc delimiter, so its iex> example is doctested. Plan: B17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vm): add configurable :max_steps instruction budget#320

feat(vm): add configurable :max_steps instruction budget#320
davydog187 wants to merge 6 commits into
mainfrom
feat/vm-max-steps

davydog187 commented Jun 1, 2026

Uh oh!

davydog187 left a comment

Uh oh!

davydog187 commented Jun 1, 2026

Uh oh!

davydog187 commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davydog187 commented Jun 1, 2026

VM instruction budget: configurable :max_steps with catchable exhaustion

Goal

Success criteria

Changes

Design notes

Verification

Out of scope (intentional)

Uh oh!

davydog187 left a comment

Choose a reason for hiding this comment

Hot-path / :infinity verdict: PASS (no per-opcode regression)

pcall-catchability verdict: PASS

Findings

Confirmed clean

Uh oh!

davydog187 commented Jun 1, 2026

Fix: instruction budget now resets per top-level evaluation

The fix (d170301)

TDD

Validation

Benchmark status (formal #306 gate: UNMET — honest disclosure)

Uh oh!

davydog187 commented Jun 2, 2026

Benchmark results: default :infinity path — main vs branch

dispatcher_vs_interpreter — fib(25), full mode (maximal Lua-closure call density)

table_ops — quick mode, n=100, "chunk" path (pre-compiled, cleanest signal), n=2 each

Interpretation

Suggested mitigation (author's call)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

VM instruction budget: configurable `:max_steps` with catchable exhaustion

Hot-path / `:infinity` verdict: PASS (no per-opcode regression)

The fix (`d170301`)

Benchmark results: default `:infinity` path — main vs branch

`dispatcher_vs_interpreter` — fib(25), full mode (maximal Lua-closure call density)

`table_ops` — quick mode, n=100, "chunk" path (pre-compiled, cleanest signal), n=2 each