Skip to content

sd-future: rework cancellation, add slot callbacks and future groups#15

Open
daandemeyer wants to merge 1 commit into
mainfrom
push-tpznxutwnzzt
Open

sd-future: rework cancellation, add slot callbacks and future groups#15
daandemeyer wants to merge 1 commit into
mainfrom
push-tpznxutwnzzt

Conversation

@daandemeyer

Copy link
Copy Markdown
Owner

Cancellation propagation:

  • sd_fiber_resume() now stashes the resume value even when the target fiber isn't suspended yet (INITIAL or running/READY); the next fiber_swap() consumes it without yielding to the event loop. -ECANCELED is sticky once queued, so a concurrent async wakeup can't override a pending cancellation.

  • sd_future_cancel_wait_unref() remembers the last non-zero return from its internal sd_fiber_await() across loop iterations and, once the future has finally resolved, re-queues it via sd_fiber_resume(self, ...) so the calling fiber's next sd_fiber_suspend() / sd_fiber_yield() observes it. Previously those values were discarded.

Together these let sd_future_cancel_wait_unref() forward interruptions (cancellations / outer SD_FIBER_TIMEOUT firings) instead of silently swallowing them.

FIBER_STATE_CANCELLED is dropped in the process: fiber_cancel() now queues -ECANCELED through the same sd_fiber_resume() path (with an idempotency check so fiber_on_exit()'s two-pass arm-then-dispatch cycle still works), and fiber_swap() delivers it via the standard result_pending branch instead of a dedicated state-machine check. Net effect: one enum value gone, the cancel path goes through exactly the same plumbing as any other async wakeup.

Decouple unreferencing a future from resolving it: futures must now be resolved by the time they're unreffed. To make this easier we introduce sd_future_cancel_unref() (plus _unrefp / array variants), which cancels and then unrefs. We use it everywhere where we can't assume we're on a fiber and thus can't wait for the future to resolve after cancelling. All future kinds we support today except fibers cancel synchronously, so this isn't a problem in practice. Even fibers cancel synchronously after creation until they're scheduled once by the event loop, after which they cancel asynchronously.

Slot-based callbacks:

Replaces the single-callback-per-future model (sd_future_set_callback, sd_future_get_userdata, and the wait_future indirection in sd_fiber_await) with explicit, awaiter-owned slots:

  • sd_future_add_callback() returns a refcounted sd_future_slot the caller owns via cleanup(sd_future_slot_unrefp). A future can have many slots; dropping a slot deregisters that one callback cleanly without resolving or otherwise touching the future. NULL ret_slot makes the slot 'floating' — its lifetime is bound to the future.
  • Callbacks never run inline. Each slot owns a one-shot defer event source and a one-shot exit event source on the future's event loop; the one applicable to the loop's current phase is armed when the future resolves — or at add time when the slot is attached to an already-RESOLVED future. Slots inherit the calling fiber's priority when added from inside one. Deferring unconditionally keeps the resolver free of reentrancy concerns (slot-set mutation during iteration, callbacks dropping the future's last ref) and frees callers from having to reason about whether a callback might fire before sd_future_add_callback() returns.
  • sd_fiber_await() registers a slot on the target directly and drops it on scope exit — no more wait_future allocation, no auto-set callback on sd_future_new(). sd_fiber_timeout() and sd_fiber_sleep() likewise install their resume callback explicitly via sd_future_add_callback().
  • sd_future_new_wait() / sd_future_set_callback() / sd_future_get_userdata() are gone; existing callers in sd-bus, qmp-client, event-future, and fiber-io migrate to the slot API. sd_future_resume_callback() is exposed as a shared helper for the common "resume my fiber when this future settles" wiring.

sd_future_new() now takes the sd_event the future belongs to, and sd_future_get_event() exposes it. This lets the rest of the API (slot dispatch, group child tracking, defer futures) avoid plumbing the event through separately.

Future groups and defer futures:

Adds sd_future_group, modelled on asyncio.TaskGroup: aggregate N child futures with a policy (default = wait-all-fail-fast; SD_FUTURE_GROUP_WAIT_ANY resolves on the first settled child; SD_FUTURE_GROUP_IGNORE_ERRORS collects all results without cancelling siblings on error). Policy is configured up-front via sd_future_group_set_policy() before any child is added — once children are in flight, the resolution mechanics are locked in. On group failure, the parent fiber that created the group is cancelled so it observes the failure even if it hasn't started awaiting yet — and not cancelled when it's already inside sd_future_group_await(), nor when the parent itself is the fiber driving the cancel. Finalize-time draining ensures the group only resolves once every child has actually settled, even if siblings need a round-trip through the event loop to honor a cancel.

Also adds sd_future_new_defer() for one-shot 'resolve on next event loop iteration' futures.

Tests cover the queued-while-running behavior of sd_fiber_resume(), the propagation guarantees of sd_future_cancel_wait_unref() for both timeouts and external cancellations (from main and from a peer fiber), and a new test-future-group.c with coverage for each policy, parent-cancel propagation (including the parent-drives-cancel skip path), child-drain-on-resolve, add-rejected-during-finalize, slot lifecycle, and the already-resolved add_callback path.

Cancellation propagation:

- sd_fiber_resume() now stashes the resume value even when the target fiber
  isn't suspended yet (INITIAL or running/READY); the next fiber_swap() consumes
  it without yielding to the event loop. -ECANCELED is sticky once queued, so a
  concurrent async wakeup can't override a pending cancellation.

- sd_future_cancel_wait_unref() remembers the last non-zero return from its
  internal sd_fiber_await() across loop iterations and, once the future has
  finally resolved, re-queues it via sd_fiber_resume(self, ...) so the calling
  fiber's next sd_fiber_suspend() / sd_fiber_yield() observes it. Previously
  those values were discarded.

Together these let sd_future_cancel_wait_unref() forward interruptions
(cancellations / outer SD_FIBER_TIMEOUT firings) instead of silently swallowing
them.

FIBER_STATE_CANCELLED is dropped in the process: fiber_cancel() now queues
-ECANCELED through the same sd_fiber_resume() path (with an idempotency check
so fiber_on_exit()'s two-pass arm-then-dispatch cycle still works), and
fiber_swap() delivers it via the standard result_pending branch instead of a
dedicated state-machine check. Net effect: one enum value gone, the cancel path
goes through exactly the same plumbing as any other async wakeup.

Decouple unreferencing a future from resolving it: futures must now be resolved
by the time they're unreffed. To make this easier we introduce
sd_future_cancel_unref() (plus _unrefp / array variants), which cancels and
then unrefs. We use it everywhere where we can't assume we're on a fiber and
thus can't wait for the future to resolve after cancelling. All future kinds
we support today except fibers cancel synchronously, so this isn't a problem
in practice. Even fibers cancel synchronously after creation until they're
scheduled once by the event loop, after which they cancel asynchronously.

Slot-based callbacks:

Replaces the single-callback-per-future model (sd_future_set_callback,
sd_future_get_userdata, and the wait_future indirection in sd_fiber_await)
with explicit, awaiter-owned slots:

- sd_future_add_callback() returns a refcounted sd_future_slot the caller
  owns via _cleanup_(sd_future_slot_unrefp). A future can have many slots;
  dropping a slot deregisters that one callback cleanly without resolving or
  otherwise touching the future. NULL ret_slot makes the slot 'floating' —
  its lifetime is bound to the future.
- Callbacks never run inline. Each slot owns a one-shot defer event source and
  a one-shot exit event source on the future's event loop; the one applicable
  to the loop's current phase is armed when the future resolves — or at add
  time when the slot is attached to an already-RESOLVED future. Slots inherit
  the calling fiber's priority when added from inside one. Deferring
  unconditionally keeps the resolver free of reentrancy concerns (slot-set
  mutation during iteration, callbacks dropping the future's last ref) and
  frees callers from having to reason about whether a callback might fire
  before sd_future_add_callback() returns.
- sd_fiber_await() registers a slot on the target directly and drops it on
  scope exit — no more wait_future allocation, no auto-set callback on
  sd_future_new(). sd_fiber_timeout() and sd_fiber_sleep() likewise install
  their resume callback explicitly via sd_future_add_callback().
- sd_future_new_wait() / sd_future_set_callback() / sd_future_get_userdata()
  are gone; existing callers in sd-bus, qmp-client, event-future, and fiber-io
  migrate to the slot API. sd_future_resume_callback() is exposed as a shared
  helper for the common "resume my fiber when this future settles" wiring.

sd_future_new() now takes the sd_event the future belongs to, and
sd_future_get_event() exposes it. This lets the rest of the API (slot dispatch,
group child tracking, defer futures) avoid plumbing the event through
separately.

Future groups and defer futures:

Adds sd_future_group, modelled on asyncio.TaskGroup: aggregate N child futures
with a policy (default = wait-all-fail-fast; SD_FUTURE_GROUP_WAIT_ANY resolves
on the first settled child; SD_FUTURE_GROUP_IGNORE_ERRORS collects all results
without cancelling siblings on error). Policy is configured up-front via
sd_future_group_set_policy() before any child is added — once children are in
flight, the resolution mechanics are locked in. On group failure, the parent
fiber that created the group is cancelled so it observes the failure even if
it hasn't started awaiting yet — and not cancelled when it's already inside
sd_future_group_await(), nor when the parent itself is the fiber driving the
cancel. Finalize-time draining ensures the group only resolves once every
child has actually settled, even if siblings need a round-trip through the
event loop to honor a cancel.

Also adds sd_future_new_defer() for one-shot 'resolve on next event loop
iteration' futures.

Tests cover the queued-while-running behavior of sd_fiber_resume(), the
propagation guarantees of sd_future_cancel_wait_unref() for both timeouts and
external cancellations (from main and from a peer fiber), and a new
test-future-group.c with coverage for each policy, parent-cancel propagation
(including the parent-drives-cancel skip path), child-drain-on-resolve,
add-rejected-during-finalize, slot lifecycle, and the already-resolved
add_callback path.
@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

Claude review of PR #15 (4924a75)

Suggestions

  • sd_fiber_await returns target result on spurious wakeupsrc/libsystemd/sd-future/fiber.c:822 — After a non-negative sd_fiber_suspend() return, sd_fiber_await() returns sd_future_result(target). If the fiber was woken by something other than target resolving, target is still PENDING and sd_future_result() hits an assert_return and returns -EBUSY, which sd_future_cancel_wait_unref() then re-queues as a bogus interruption. Consider guarding on SD_FUTURE_RESOLVED before returning the result.

Workflow run

if (r < 0)
return r;

return sd_future_result(target);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: After sd_fiber_suspend() returns a non-negative value, sd_fiber_await() now returns sd_future_result(target). If the fiber was woken by something other than target resolving (e.g. an unrelated sd_fiber_resume(self, 0) wired up by the caller), target is still PENDING and sd_future_result() will log an assert_return failure and return -EBUSY. The previous sd_fiber_suspend() path returned the resume value directly without this trap. This also feeds back into sd_future_cancel_wait_unref(), whose loop calls sd_fiber_await(f): a spurious non-negative wakeup turns into q = -EBUSY, which then gets re-queued onto the calling fiber as a bogus interruption. Consider guarding on sd_future_state(target) == SD_FUTURE_RESOLVED before returning its result, or returning the raw suspend value when the target hasn't resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant