Skip to content

[Medium] A single failing evaluation discards all completed records for the whole run #30

Description

@gratus907

Severity: Medium

A single failing evaluation aborts an entire run and discards all already-completed records — RunReport/RunResult is only assembled on the happy path.

Locationssrc/variopt/study/execution.py:654, src/variopt/study/stale_async.py:420, src/variopt/evaluators/joblib/asynchronous.py:423-440

Any exception from SequentialEvaluator/JoblibEvaluator (including a BatchExecutionFailed from user objective code) propagates all the way to the caller with no partial results or final run-method state attached anywhere.

Scenario

999 of 1000 expensive evaluations complete over several hours of wall-clock time; evaluation 1000 raises. The caller receives only the exception — every assimilated record and the advanced run-method state are unrecoverable, with no way to resume or salvage the completed work.

Fix direction

Carry partial records/trace/run-method-state on the raised exception (e.g. attach them as attributes on BatchExecutionFailed), or offer a failure-recording policy that lets callers opt into "record failures and continue" instead of unconditional fail-fast. This would also integrate naturally with the existing checkpoint/resume surface.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions