execution_graph: preserve unrun dirty work on dispatch error by no-materials · Pull Request #91 · forest-rs/execution

no-materials · 2026-06-03T12:21:50Z

plan_all / plan_within_dependencies_of drain (and clear) the entire in-scope dirty set before any node runs. InlineDispatcher is fail-fast, so when a node traps or errors mid-pass, the dirty marks of every node scheduled after it are already gone and never re-instated. The caller fixes the fault, retries, and gets Ok(executed_nodes: 0): unrelated pending work is silently lost.

Fix

On error, the dispatcher re-marks the failed node and the unrun tail dirty via remark_scheduled_dirty, so a subsequent run re-attempts them. Still fail-fast, only the recovery of dropped work changes. Covers both run_all (global scope) and run_node (dependency-closure scope).

Tests

Three regression tests (verified to fail without the fix):

independent node survives a sibling's trap and recovers on re-run
nodes that already executed before the trap are not re-marked
run_node closure sibling survives a trap within the closure

Planning drains and clears the entire in-scope dirty set up front to build the schedule, but InlineDispatcher executes fail-fast: when a node trapped or errored, every node scheduled after it lost its (already drained) dirty mark. A retry then planned from an empty set and reported "nothing to do" while the pending work was silently dropped. On error, re-mark the failed node and the unrun tail dirty (remark_scheduled_dirty) so the work is recoverable on the next run. Applies to both run_all (global) and run_node (closure) paths. Add three regression tests and document the fail-fast recovery contract on run_all/run_node.

waywardmonkeys · 2026-06-03T13:50:27Z

+            if let Err(e) = graph.execute_scheduled_node(node) {
+                // Fail-fast: this node errored and `to_run[i + 1..]` never ran. Re-mark them so
+                // their drained dirty state is not silently lost (see `dispatch`).
+                graph.remark_scheduled_dirty(&to_run[i..]);


But the error gets returned here, the report that has been accumulated so far is dropped ... so the Report handling here could use improvement (but maybe as a follow up?)

Agreed that this should be taken as a follow-up. Surfacing the partial report on error is an API decision (richer return, or an error variant carrying the partial RunDetailReport, etc).

Also it predates this PR: the old ?-based loop dropped it too.

For now I've left the behavior and added an inline note at the drop site.

waywardmonkeys · 2026-06-03T14:24:09Z

+                // were cleared when the plan was drained, so re-mark them to keep that pending
+                // work recoverable on the next run instead of silently dropping it.
+                graph.remark_scheduled_dirty(&to_run[i..]);
+                return Err(e);


Also, because of this error path, we aren't restoring the buffer for to_run back into the scratch workspace. Might not be that serious a thing, but maybe this is a sign we need a better API or design here overall.

True! I made it so the dispatcher now hands the buffer back to scratch itself via a reclaim_schedule_buffer hook on every exit path, rather than returning it for the caller to stash on the Ok arm only. Dispatcher is pub(crate), so no public API change - clean on that end.

The InlineDispatcher returned the drained schedule buffer for the caller to stash back into scratch, but only on the Ok arm — on the fail-fast error path the buffer was dropped, so the next planning pass reallocated. Have the dispatcher reclaim the buffer itself via a new reclaim_schedule_buffer hook on every exit path (success and error). dispatch now returns Result<(), GraphError> and dispatch_with_report returns Result<RunDetailReport, GraphError> — the buffer no longer rides on the return type and can't be lost on error. Dispatcher is pub(crate), so no public API change.

waywardmonkeys

Rebase it and you can land it after that (via squash).

no-materials requested a review from waywardmonkeys June 3, 2026 12:21

waywardmonkeys reviewed Jun 3, 2026

View reviewed changes

no-materials added 2 commits June 3, 2026 16:50

fmt

7cdb29e

waywardmonkeys approved these changes Jun 3, 2026

View reviewed changes

no-materials merged commit 88e434e into forest-rs:main Jun 3, 2026
15 checks passed

no-materials deleted the graph/fix/preserve-unrun-dirty-on-dispatch-error branch June 4, 2026 06:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

execution_graph: preserve unrun dirty work on dispatch error#91

execution_graph: preserve unrun dirty work on dispatch error#91
no-materials merged 3 commits into
forest-rs:mainfrom
no-materials:graph/fix/preserve-unrun-dirty-on-dispatch-error

no-materials commented Jun 3, 2026

Uh oh!

waywardmonkeys Jun 3, 2026

Uh oh!

no-materials Jun 3, 2026

Uh oh!

waywardmonkeys Jun 3, 2026

Uh oh!

no-materials Jun 3, 2026

Uh oh!

waywardmonkeys left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

no-materials commented Jun 3, 2026

Fix

Tests

Uh oh!

waywardmonkeys Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

no-materials Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

waywardmonkeys Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

no-materials Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

waywardmonkeys left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants