Severity: Medium
A single failing evaluation aborts an entire run and discards all already-completed records — RunReport/RunResult is only assembled on the happy path.
Locations — src/variopt/study/execution.py:654, src/variopt/study/stale_async.py:420, src/variopt/evaluators/joblib/asynchronous.py:423-440
Any exception from SequentialEvaluator/JoblibEvaluator (including a BatchExecutionFailed from user objective code) propagates all the way to the caller with no partial results or final run-method state attached anywhere.
Scenario
999 of 1000 expensive evaluations complete over several hours of wall-clock time; evaluation 1000 raises. The caller receives only the exception — every assimilated record and the advanced run-method state are unrecoverable, with no way to resume or salvage the completed work.
Fix direction
Carry partial records/trace/run-method-state on the raised exception (e.g. attach them as attributes on BatchExecutionFailed), or offer a failure-recording policy that lets callers opt into "record failures and continue" instead of unconditional fail-fast. This would also integrate naturally with the existing checkpoint/resume surface.
Severity: Medium
A single failing evaluation aborts an entire run and discards all already-completed records —
RunReport/RunResultis only assembled on the happy path.Locations —
src/variopt/study/execution.py:654,src/variopt/study/stale_async.py:420,src/variopt/evaluators/joblib/asynchronous.py:423-440Any exception from
SequentialEvaluator/JoblibEvaluator(including aBatchExecutionFailedfrom user objective code) propagates all the way to the caller with no partial results or final run-method state attached anywhere.Scenario
999 of 1000 expensive evaluations complete over several hours of wall-clock time; evaluation 1000 raises. The caller receives only the exception — every assimilated record and the advanced run-method state are unrecoverable, with no way to resume or salvage the completed work.
Fix direction
Carry partial records/trace/run-method-state on the raised exception (e.g. attach them as attributes on
BatchExecutionFailed), or offer a failure-recording policy that lets callers opt into "record failures and continue" instead of unconditional fail-fast. This would also integrate naturally with the existing checkpoint/resume surface.