Severity: Medium (performance)
Trace event accumulation is O(steps²) on the generic sync and stale-async execution paths, because each step copies the entire event tuple.
Locations — src/variopt/study/execution.py:646, src/variopt/study/stale_async.py:394
trace = trace.append(event) # Trace.append copies the full tuple each call
Scenario
A 10k-step, batch_size=1 run performs ~5·10^7 tuple-element copies purely for tracing. Profiling of cheap objectives on the generic path shows trace append dominating wall time.
Notably, the recently added cheap-scalar-sequential fast path (execution.py:217,301) already avoids this by buffering trace_events in a plain list and constructing the Trace once at report assembly — the general and stale-async paths still pay the quadratic cost this fast path was specifically built to avoid.
Fix direction
Accumulate TraceEvents in a list during the run loop and construct Trace once at report-assembly time, as the fast path already does.
Severity: Medium (performance)
Trace event accumulation is O(steps²) on the generic sync and stale-async execution paths, because each step copies the entire event tuple.
Locations —
src/variopt/study/execution.py:646,src/variopt/study/stale_async.py:394Scenario
A 10k-step,
batch_size=1run performs ~5·10^7 tuple-element copies purely for tracing. Profiling of cheap objectives on the generic path shows trace append dominating wall time.Notably, the recently added cheap-scalar-sequential fast path (
execution.py:217,301) already avoids this by bufferingtrace_eventsin a plain list and constructing theTraceonce at report assembly — the general and stale-async paths still pay the quadratic cost this fast path was specifically built to avoid.Fix direction
Accumulate
TraceEvents in a list during the run loop and constructTraceonce at report-assembly time, as the fast path already does.