Skip to content

[High] Stale-async execution overshoots the evaluation budget by up to batch_size - 1 #26

Description

@gratus907

Severity: High

Stale-async execution systematically overshoots the evaluation budget, by up to batch_size - 1 evaluations.

Locationsrc/variopt/study/stale_async.py:340-423

The budget (remaining) is only debited when a completion group finishes (line 390-393), refill sizing (min(len(group_records), remaining), line 403) ignores requests already in flight in other sessions, and once remaining <= 0 the outer loop still keeps draining and assimilating every session already in flight (while active_sessions or (remaining > 0 and ...), line 337).

Scenario

Study.run(max_evaluations=10, batch_size=10, execution_model=STALE_ASYNC_EXECUTION_MODEL) against a joblib async evaluator with single-outcome completion groups: 10 initial requests plus 9 refills are all evaluated and recorded, so RunReport.evaluation_count == 19 and 19 records are returned, versus exactly 10 on the sync execution path. The overshoot is roughly batch_size - 1 in general and grows with batch_size.

Fix direction

Debit the budget at request-issue time (track issued-but-not-yet-completed count when opening/refilling sessions) rather than at completion, or actively cancel outstanding sessions as soon as remaining <= 0 instead of assimilating them to completion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions