sd-future: rework cancellation, add slot callbacks and future groups#15
sd-future: rework cancellation, add slot callbacks and future groups#15daandemeyer wants to merge 1 commit into
Conversation
Cancellation propagation: - sd_fiber_resume() now stashes the resume value even when the target fiber isn't suspended yet (INITIAL or running/READY); the next fiber_swap() consumes it without yielding to the event loop. -ECANCELED is sticky once queued, so a concurrent async wakeup can't override a pending cancellation. - sd_future_cancel_wait_unref() remembers the last non-zero return from its internal sd_fiber_await() across loop iterations and, once the future has finally resolved, re-queues it via sd_fiber_resume(self, ...) so the calling fiber's next sd_fiber_suspend() / sd_fiber_yield() observes it. Previously those values were discarded. Together these let sd_future_cancel_wait_unref() forward interruptions (cancellations / outer SD_FIBER_TIMEOUT firings) instead of silently swallowing them. FIBER_STATE_CANCELLED is dropped in the process: fiber_cancel() now queues -ECANCELED through the same sd_fiber_resume() path (with an idempotency check so fiber_on_exit()'s two-pass arm-then-dispatch cycle still works), and fiber_swap() delivers it via the standard result_pending branch instead of a dedicated state-machine check. Net effect: one enum value gone, the cancel path goes through exactly the same plumbing as any other async wakeup. Decouple unreferencing a future from resolving it: futures must now be resolved by the time they're unreffed. To make this easier we introduce sd_future_cancel_unref() (plus _unrefp / array variants), which cancels and then unrefs. We use it everywhere where we can't assume we're on a fiber and thus can't wait for the future to resolve after cancelling. All future kinds we support today except fibers cancel synchronously, so this isn't a problem in practice. Even fibers cancel synchronously after creation until they're scheduled once by the event loop, after which they cancel asynchronously. Slot-based callbacks: Replaces the single-callback-per-future model (sd_future_set_callback, sd_future_get_userdata, and the wait_future indirection in sd_fiber_await) with explicit, awaiter-owned slots: - sd_future_add_callback() returns a refcounted sd_future_slot the caller owns via _cleanup_(sd_future_slot_unrefp). A future can have many slots; dropping a slot deregisters that one callback cleanly without resolving or otherwise touching the future. NULL ret_slot makes the slot 'floating' — its lifetime is bound to the future. - Callbacks never run inline. Each slot owns a one-shot defer event source and a one-shot exit event source on the future's event loop; the one applicable to the loop's current phase is armed when the future resolves — or at add time when the slot is attached to an already-RESOLVED future. Slots inherit the calling fiber's priority when added from inside one. Deferring unconditionally keeps the resolver free of reentrancy concerns (slot-set mutation during iteration, callbacks dropping the future's last ref) and frees callers from having to reason about whether a callback might fire before sd_future_add_callback() returns. - sd_fiber_await() registers a slot on the target directly and drops it on scope exit — no more wait_future allocation, no auto-set callback on sd_future_new(). sd_fiber_timeout() and sd_fiber_sleep() likewise install their resume callback explicitly via sd_future_add_callback(). - sd_future_new_wait() / sd_future_set_callback() / sd_future_get_userdata() are gone; existing callers in sd-bus, qmp-client, event-future, and fiber-io migrate to the slot API. sd_future_resume_callback() is exposed as a shared helper for the common "resume my fiber when this future settles" wiring. sd_future_new() now takes the sd_event the future belongs to, and sd_future_get_event() exposes it. This lets the rest of the API (slot dispatch, group child tracking, defer futures) avoid plumbing the event through separately. Future groups and defer futures: Adds sd_future_group, modelled on asyncio.TaskGroup: aggregate N child futures with a policy (default = wait-all-fail-fast; SD_FUTURE_GROUP_WAIT_ANY resolves on the first settled child; SD_FUTURE_GROUP_IGNORE_ERRORS collects all results without cancelling siblings on error). Policy is configured up-front via sd_future_group_set_policy() before any child is added — once children are in flight, the resolution mechanics are locked in. On group failure, the parent fiber that created the group is cancelled so it observes the failure even if it hasn't started awaiting yet — and not cancelled when it's already inside sd_future_group_await(), nor when the parent itself is the fiber driving the cancel. Finalize-time draining ensures the group only resolves once every child has actually settled, even if siblings need a round-trip through the event loop to honor a cancel. Also adds sd_future_new_defer() for one-shot 'resolve on next event loop iteration' futures. Tests cover the queued-while-running behavior of sd_fiber_resume(), the propagation guarantees of sd_future_cancel_wait_unref() for both timeouts and external cancellations (from main and from a peer fiber), and a new test-future-group.c with coverage for each policy, parent-cancel propagation (including the parent-drives-cancel skip path), child-drain-on-resolve, add-rejected-during-finalize, slot lifecycle, and the already-resolved add_callback path.
Claude review of PR #15 (4924a75)Suggestions
|
| if (r < 0) | ||
| return r; | ||
|
|
||
| return sd_future_result(target); |
There was a problem hiding this comment.
Claude: suggestion: After sd_fiber_suspend() returns a non-negative value, sd_fiber_await() now returns sd_future_result(target). If the fiber was woken by something other than target resolving (e.g. an unrelated sd_fiber_resume(self, 0) wired up by the caller), target is still PENDING and sd_future_result() will log an assert_return failure and return -EBUSY. The previous sd_fiber_suspend() path returned the resume value directly without this trap. This also feeds back into sd_future_cancel_wait_unref(), whose loop calls sd_fiber_await(f): a spurious non-negative wakeup turns into q = -EBUSY, which then gets re-queued onto the calling fiber as a bogus interruption. Consider guarding on sd_future_state(target) == SD_FUTURE_RESOLVED before returning its result, or returning the raw suspend value when the target hasn't resolved.
Cancellation propagation:
sd_fiber_resume() now stashes the resume value even when the target fiber isn't suspended yet (INITIAL or running/READY); the next fiber_swap() consumes it without yielding to the event loop. -ECANCELED is sticky once queued, so a concurrent async wakeup can't override a pending cancellation.
sd_future_cancel_wait_unref() remembers the last non-zero return from its internal sd_fiber_await() across loop iterations and, once the future has finally resolved, re-queues it via sd_fiber_resume(self, ...) so the calling fiber's next sd_fiber_suspend() / sd_fiber_yield() observes it. Previously those values were discarded.
Together these let sd_future_cancel_wait_unref() forward interruptions (cancellations / outer SD_FIBER_TIMEOUT firings) instead of silently swallowing them.
FIBER_STATE_CANCELLED is dropped in the process: fiber_cancel() now queues -ECANCELED through the same sd_fiber_resume() path (with an idempotency check so fiber_on_exit()'s two-pass arm-then-dispatch cycle still works), and fiber_swap() delivers it via the standard result_pending branch instead of a dedicated state-machine check. Net effect: one enum value gone, the cancel path goes through exactly the same plumbing as any other async wakeup.
Decouple unreferencing a future from resolving it: futures must now be resolved by the time they're unreffed. To make this easier we introduce sd_future_cancel_unref() (plus _unrefp / array variants), which cancels and then unrefs. We use it everywhere where we can't assume we're on a fiber and thus can't wait for the future to resolve after cancelling. All future kinds we support today except fibers cancel synchronously, so this isn't a problem in practice. Even fibers cancel synchronously after creation until they're scheduled once by the event loop, after which they cancel asynchronously.
Slot-based callbacks:
Replaces the single-callback-per-future model (sd_future_set_callback, sd_future_get_userdata, and the wait_future indirection in sd_fiber_await) with explicit, awaiter-owned slots:
sd_future_new() now takes the sd_event the future belongs to, and sd_future_get_event() exposes it. This lets the rest of the API (slot dispatch, group child tracking, defer futures) avoid plumbing the event through separately.
Future groups and defer futures:
Adds sd_future_group, modelled on asyncio.TaskGroup: aggregate N child futures with a policy (default = wait-all-fail-fast; SD_FUTURE_GROUP_WAIT_ANY resolves on the first settled child; SD_FUTURE_GROUP_IGNORE_ERRORS collects all results without cancelling siblings on error). Policy is configured up-front via sd_future_group_set_policy() before any child is added — once children are in flight, the resolution mechanics are locked in. On group failure, the parent fiber that created the group is cancelled so it observes the failure even if it hasn't started awaiting yet — and not cancelled when it's already inside sd_future_group_await(), nor when the parent itself is the fiber driving the cancel. Finalize-time draining ensures the group only resolves once every child has actually settled, even if siblings need a round-trip through the event loop to honor a cancel.
Also adds sd_future_new_defer() for one-shot 'resolve on next event loop iteration' futures.
Tests cover the queued-while-running behavior of sd_fiber_resume(), the propagation guarantees of sd_future_cancel_wait_unref() for both timeouts and external cancellations (from main and from a peer fiber), and a new test-future-group.c with coverage for each policy, parent-cancel propagation (including the parent-drives-cancel skip path), child-drain-on-resolve, add-rejected-during-finalize, slot lifecycle, and the already-resolved add_callback path.