diff --git a/Cargo.lock b/Cargo.lock index 8c32543..aba9cfe 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -381,6 +381,7 @@ name = "cano-macros" version = "0.14.0" dependencies = [ "cano", + "futures-util", "proc-macro2", "quote", "serde", diff --git a/README.md b/README.md index 6be4388..32e5d97 100644 --- a/README.md +++ b/README.md @@ -30,7 +30,7 @@ The engine is built on three core concepts: **Tasks** for logic, **Workflows** f ## Features - **Type-Safe State Machines**: Enum-driven transitions with compile-time guarantees. -- **Multiple Processing Models**: `Task` for general-purpose work, plus `RouterTask`, `PollTask`, `TimerTask`, `BatchTask`, and `SteppedTask` for specialized shapes — mixed freely in one workflow. +- **Multiple Processing Models**: `Task` for general-purpose work, plus `RouterTask`, `PollTask`, `TimerTask`, `BatchTask`, `SteppedTask`, and `StreamTask` for specialized shapes — mixed freely in one workflow. - **Resource Dependency Injection**: Typed, lifecycle-managed `Resources` dictionary with `setup`/`teardown`/`health` hooks, looked up by key and type, plus `#[derive(FromResources)]` for ergonomic wiring. - **Parallel Execution (Split/Join)**: Run tasks concurrently and join results with strategies like `All`, `Any`, `Quorum`, or `PartialResults`, with an optional bulkhead to cap concurrency. - **Robust Retry Logic**: Configurable strategies including exponential backoff with jitter and per-attempt timeouts. diff --git a/cano-macros/Cargo.toml b/cano-macros/Cargo.toml index cc11692..90ee435 100644 --- a/cano-macros/Cargo.toml +++ b/cano-macros/Cargo.toml @@ -20,5 +20,6 @@ syn = { version = "2", features = ["full", "visit-mut"] } [dev-dependencies] cano = { path = "../cano" } +futures-util = "0.3" serde = { version = "1", features = ["derive"] } tokio = { version = "1", features = ["rt", "rt-multi-thread", "macros", "time", "test-util"] } diff --git a/cano-macros/src/lib.rs b/cano-macros/src/lib.rs index 1947ec0..4b1d328 100644 --- a/cano-macros/src/lib.rs +++ b/cano-macros/src/lib.rs @@ -17,6 +17,7 @@ //! - `#[cano::task::timer]` — for `impl TimerTask` and the `TimerTask` trait //! - `#[cano::task::batch]` — for `impl BatchTask` and the `BatchTask` trait //! - `#[cano::task::stepped]` — for `impl SteppedTask` and the `SteppedTask` trait +//! - `#[cano::task::stream]` — for `impl StreamTask` and the `StreamTask` trait //! - `#[cano::saga::task]` — for `impl CompensatableTask` //! - `#[cano::resource]` — for `impl Resource` and the `Resource` trait //! - `#[cano::checkpoint_store]` — for `impl CheckpointStore` and the `CheckpointStore` trait @@ -38,6 +39,7 @@ mod poll_task_impl; mod resource_derive; mod router_task_impl; mod stepped_task_impl; +mod stream_task_impl; mod task_impl; mod timer_task_impl; @@ -354,6 +356,34 @@ pub fn stepped_task(attr: TokenStream, item: TokenStream) -> TokenStream { .into() } +/// Apply to the `StreamTask` trait definition, an `impl StreamTask for T` +/// block, or an inherent `impl T { ... }` block. +/// +/// Use as `#[cano::task::stream]`. `StreamTask` is a genuine stream-processing model: +/// consume an `impl Stream` continuously, flush per [`WindowPolicy`] window, run until +/// the workflow's `CancellationToken` fires, and persist a resumable cursor (via +/// [`Workflow::register_stream`]). Per-item errors are governed by [`StreamErrorPolicy`]. +/// +/// Two surface forms on impl blocks: +/// +/// 1. **Trait-impl form:** `#[task::stream] impl StreamTask for T { type Item = ..; .. }`. +/// 2. **Inherent-impl form:** `#[task::stream(state = S [, key = K])] impl T { async fn open(..) .. }` — +/// the macro infers `type Item` from `process_item`'s owned `item` parameter and +/// `type Output` / `type Cursor` from the `Ok` 2-tuple of `process_item`'s return, +/// requires `open` / `process_item` / `flush_window` / `on_close`, and emits a +/// companion `impl Task for T` whose `run` forwards to `StreamTask::run_in_memory`. +/// +/// On a trait definition the macro just performs the async-fn-in-trait rewrite. +/// +/// The default `config()` injected by the inherent form is [`TaskConfig::minimal()`] +/// (no outer retry — like `PollTask`; an outer retry would re-invoke `open()`). +#[proc_macro_attribute] +pub fn stream_task(attr: TokenStream, item: TokenStream) -> TokenStream { + stream_task_impl::expand(attr.into(), item.into()) + .unwrap_or_else(syn::Error::into_compile_error) + .into() +} + /// Derive an empty `cano::Resource` impl (uses the trait's default no-op /// `setup` / `teardown`). /// diff --git a/cano-macros/src/stream_task_impl.rs b/cano-macros/src/stream_task_impl.rs new file mode 100644 index 0000000..fd84130 --- /dev/null +++ b/cano-macros/src/stream_task_impl.rs @@ -0,0 +1,547 @@ +//! Boilerplate-filling pass for `#[cano::task::stream]` on `impl StreamTask for T` +//! blocks and on inherent `impl T { ... }` blocks. +//! +//! ## Why a sibling `impl Task` is emitted +//! +//! Like the other specialized task traits, each use of `#[stream_task]` on an **impl** +//! block synthesises a concrete `impl Task for T` alongside the `StreamTask` +//! impl. `Task::run` forwards to the provided method `StreamTask::run_in_memory`, which +//! drives the in-memory windowed loop (no cursor persistence, no cancellation). This gives +//! the "register a `StreamTask` directly with `Workflow::register`" ergonomics without +//! touching coherence. The durable / cancellable path is `Workflow::register_stream`. +//! +//! Forwarding to a single provided method (rather than inlining the loop per module +//! prefix, as `#[task::poll]`/`#[task::stepped]` do) keeps exactly one loop body and +//! sidesteps the HRTB "implementation is not general enough" error, because the companion +//! `Task::run` is a hand-desugared `fn` (no `async move` binder) that simply returns the +//! pinned future produced by `run_in_memory`. +//! +//! ## Type inference (inherent form) +//! +//! - `type Item` ← the owned third parameter of `process_item` (`item: T` → `T`). +//! - `type Output` ← the first element of the `Ok` tuple of `process_item`'s +//! `Result<(Output, Cursor), _>` return. +//! - `type Cursor` ← the second element of that tuple. +//! +//! ## Surface forms +//! +//! 1. **Trait-definition form:** `#[stream_task] pub trait StreamTask<...> { ... }` — +//! the macro only async-rewrites the trait. +//! 2. **Trait-impl form:** `#[stream_task] impl StreamTask for T { type Item = ..; ... }`. +//! 3. **Inherent-impl form:** `#[stream_task(state = S [, key = K])] impl T { async fn open(..) .. }`. + +use proc_macro2::TokenStream; +use quote::quote; +use syn::{ + AngleBracketedGenericArguments, FnArg, GenericArgument, ImplItem, ImplItemFn, ItemImpl, + ItemTrait, Path, PathArguments, PathSegment, ReturnType, Type, parse2, spanned::Spanned, +}; + +use crate::async_rewrite; +use crate::attr_args::{AttrArgs, combine_errors}; +use crate::path_prefix::{ModulePrefix, derive_module_prefix}; + +/// Entry point — dispatches based on what `item` parses to. +pub(crate) fn expand(attr: TokenStream, item: TokenStream) -> syn::Result { + if let Ok(trait_def) = parse2::(item.clone()) { + if !attr.is_empty() { + return Err(syn::Error::new( + trait_def.span(), + "#[cano::task::stream]: no attribute args are accepted on a trait definition", + )); + } + let rewritten = async_rewrite::rewrite_trait_def(trait_def); + return Ok(quote! { #rewritten }); + } + + if let Ok(item_impl) = parse2::(item.clone()) { + let args = AttrArgs::parse(attr)?; + + if item_impl.trait_.is_none() { + let state_ty = args.state.ok_or_else(|| { + syn::Error::new( + item_impl.span(), + "#[cano::task::stream] on an inherent `impl T { ... }` block requires \ + `state = T` (e.g. `#[task::stream(state = MyState)]`)", + ) + })?; + return expand_inherent_impl(item_impl, state_ty, args.key); + } else { + if args.state.is_some() || args.key.is_some() { + return Err(syn::Error::new( + item_impl.span(), + "#[cano::task::stream]: `state` / `key` args only apply to inherent \ + `impl T { ... }` blocks; when writing `impl StreamTask<...> for T` the \ + trait header already specifies them", + )); + } + return expand_trait_impl(item_impl); + } + } + + Err(syn::Error::new( + proc_macro2::Span::call_site(), + "#[cano::task::stream]: expected a trait definition or impl block", + )) +} + +// --------------------------------------------------------------------------- +// Trait-impl form: `#[stream_task] impl StreamTask for T { ... }` +// --------------------------------------------------------------------------- + +fn expand_trait_impl(item_impl: ItemImpl) -> syn::Result { + let (state_ty, key_ty, task_trait_path, module_prefix) = + extract_state_key_task_path_and_prefix(&item_impl)?; + + let stream_impl = async_rewrite::rewrite_impl_block(item_impl.clone()); + + let task_impl = synthesise_task_impl( + &item_impl, + &state_ty, + key_ty.as_ref(), + task_trait_path, + &module_prefix, + )?; + + Ok(quote! { + #stream_impl + #task_impl + }) +} + +// --------------------------------------------------------------------------- +// Inherent-impl form: `#[stream_task(state = S [, key = K])] impl T { ... }` +// --------------------------------------------------------------------------- + +fn expand_inherent_impl( + item_impl: ItemImpl, + state_ty: Type, + key_ty: Option, +) -> syn::Result { + let mut process_item_fn: Option<&ImplItemFn> = None; + let mut has_open = false; + let mut has_flush_window = false; + let mut has_on_close = false; + let mut has_window = false; + let mut has_on_item_error = false; + let mut has_config = false; + let mut has_name = false; + let mut has_item_ty = false; + let mut has_output_ty = false; + let mut has_cursor_ty = false; + let mut errors: Vec = Vec::new(); + + for it in &item_impl.items { + match it { + ImplItem::Fn(f) => match f.sig.ident.to_string().as_str() { + "open" => has_open = true, + "process_item" => process_item_fn = Some(f), + "flush_window" => has_flush_window = true, + "on_close" => has_on_close = true, + "window" => has_window = true, + "on_item_error" => has_on_item_error = true, + "config" => has_config = true, + "name" => has_name = true, + other => { + errors.push(syn::Error::new_spanned( + &f.sig.ident, + format!( + "#[cano::task::stream]: unexpected method `{other}` in inherent impl; \ + allowed: `open`, `process_item`, `flush_window`, `on_close`, \ + `window`, `on_item_error`, `config`, `name`" + ), + )); + } + }, + ImplItem::Type(t) => match t.ident.to_string().as_str() { + "Item" => has_item_ty = true, + "Output" => has_output_ty = true, + "Cursor" => has_cursor_ty = true, + other => { + errors.push(syn::Error::new_spanned( + &t.ident, + format!( + "#[cano::task::stream]: unexpected associated type `{other}`; \ + only `Item`, `Output`, `Cursor` are recognised (and can be inferred)" + ), + )); + } + }, + _ => {} + } + } + + if !has_open { + errors.push(syn::Error::new( + item_impl.span(), + "#[cano::task::stream] requires an `async fn open(&self, res: &Resources<_>, \ + cursor: Option<_>) -> Result + Send>>, CanoError>` method", + )); + } + if process_item_fn.is_none() { + errors.push(syn::Error::new( + item_impl.span(), + "#[cano::task::stream] requires an `async fn process_item(&self, res: &Resources<_>, \ + item: T) -> Result<(Output, Cursor), CanoError>` method", + )); + } + if !has_flush_window { + errors.push(syn::Error::new( + item_impl.span(), + "#[cano::task::stream] requires an `async fn flush_window(&self, res: &Resources<_>, \ + outputs: Vec<_>) -> Result, CanoError>` method", + )); + } + if !has_on_close { + errors.push(syn::Error::new( + item_impl.span(), + "#[cano::task::stream] requires an `async fn on_close(&self, res: &Resources<_>, \ + reason: CloseReason) -> Result, CanoError>` method", + )); + } + + if !errors.is_empty() { + return Err(combine_errors(errors)); + } + + let process_item = process_item_fn.unwrap(); + + // Infer Item / Output / Cursor from `process_item`. + let item_inj = if has_item_ty { + None + } else { + let inferred = peel_owned_param(process_item, "process_item")?; + Some(quote!(type Item = #inferred;)) + }; + let (output_inj, cursor_inj) = if has_output_ty && has_cursor_ty { + (None, None) + } else { + let (out_ty, cur_ty) = peel_result_tuple_return(process_item, "process_item")?; + let out = if has_output_ty { + None + } else { + Some(quote!(type Output = #out_ty;)) + }; + let cur = if has_cursor_ty { + None + } else { + Some(quote!(type Cursor = #cur_ty;)) + }; + (out, cur) + }; + + let stream_trait_ref: syn::Path = match &key_ty { + Some(k) => syn::parse_quote!(::cano::StreamTask<#state_ty, #k>), + None => syn::parse_quote!(::cano::StreamTask<#state_ty>), + }; + + let task_trait_path: syn::Path = match &key_ty { + Some(k) => syn::parse_quote!(::cano::Task<#state_ty, #k>), + None => syn::parse_quote!(::cano::Task<#state_ty>), + }; + + let attrs = &item_impl.attrs; + let unsafety = &item_impl.unsafety; + let generics = &item_impl.generics; + let where_clause = &item_impl.generics.where_clause; + let self_ty = &item_impl.self_ty; + let user_items = &item_impl.items; + + let window_default = (!has_window).then(|| { + quote! { + fn window(&self) -> ::cano::StreamWindow { + ::cano::StreamWindow::Count(1) + } + } + }); + let on_item_error_default = (!has_on_item_error).then(|| { + quote! { + fn on_item_error(&self) -> ::cano::StreamErrorPolicy { + ::cano::StreamErrorPolicy::FailFast + } + } + }); + // Streams default to no outer retry (like PollTask): an outer retry would + // re-invoke `open()` and re-consume the stream. + let config_default = (!has_config).then(|| { + quote! { + fn config(&self) -> ::cano::TaskConfig { + ::cano::TaskConfig::minimal() + } + } + }); + let name_default = (!has_name).then(|| { + quote! { + fn name(&self) -> ::std::borrow::Cow<'static, str> { + ::std::borrow::Cow::Borrowed(::std::any::type_name::()) + } + } + }); + + let synth = quote! { + #(#attrs)* + #unsafety impl #generics #stream_trait_ref for #self_ty #where_clause { + #item_inj + #output_inj + #cursor_inj + #window_default + #on_item_error_default + #config_default + #name_default + #(#user_items)* + } + }; + + let synth_impl: ItemImpl = parse2(synth)?; + let stream_impl = async_rewrite::rewrite_impl_block(synth_impl.clone()); + + let module_prefix = ModulePrefix::Cano; + let task_impl = synthesise_task_impl( + &synth_impl, + &state_ty, + key_ty.as_ref(), + task_trait_path, + &module_prefix, + )?; + + Ok(quote! { + #stream_impl + #task_impl + }) +} + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +fn extract_state_key_task_path_and_prefix( + item_impl: &ItemImpl, +) -> syn::Result<(Type, Option, Path, ModulePrefix)> { + let (_, trait_path, _) = item_impl + .trait_ + .as_ref() + .ok_or_else(|| syn::Error::new(item_impl.span(), "expected a trait impl block"))?; + + let last_seg = trait_path + .segments + .last() + .ok_or_else(|| syn::Error::new(item_impl.span(), "cannot read trait path segments"))?; + + let args = match &last_seg.arguments { + syn::PathArguments::AngleBracketed(a) => a, + _ => { + return Err(syn::Error::new( + item_impl.span(), + "StreamTask impl must have angle-bracketed type arguments \ + (e.g. `StreamTask`)", + )); + } + }; + + let type_args: Vec<&Type> = args + .args + .iter() + .filter_map(|a| { + if let syn::GenericArgument::Type(t) = a { + Some(t) + } else { + None + } + }) + .collect(); + + if type_args.is_empty() { + return Err(syn::Error::new( + item_impl.span(), + "StreamTask impl requires at least one type argument (the state type)", + )); + } + + let state_ty = type_args[0].clone(); + let key_ty = type_args.get(1).map(|t| (*t).clone()); + + let module_prefix = derive_module_prefix(trait_path); + let task_path = derive_task_path_from_stream_path(trait_path, &state_ty, key_ty.as_ref())?; + + Ok((state_ty, key_ty, task_path, module_prefix)) +} + +fn derive_task_path_from_stream_path( + stream_path: &Path, + state_ty: &Type, + key_ty: Option<&Type>, +) -> syn::Result { + let mut task_path = stream_path.clone(); + + let angle_args: AngleBracketedGenericArguments = match key_ty { + Some(k) => syn::parse_quote!(<#state_ty, #k>), + None => syn::parse_quote!(<#state_ty>), + }; + let task_args = PathArguments::AngleBracketed(angle_args); + + if let Some(last) = task_path.segments.last_mut() { + *last = PathSegment { + ident: syn::Ident::new("Task", last.ident.span()), + arguments: task_args, + }; + } + + Ok(task_path) +} + +/// Synthesise `impl Task for T` whose `Task::run` forwards to +/// `StreamTask::run_in_memory`, and whose `config`/`name` forward to +/// `StreamTask::config`/`StreamTask::name` via UFCS. +/// +/// `Task::run` is written as a hand-desugared `fn` (not `async fn`) so that no +/// `for<'async_trait>` binder is introduced; it simply returns the already-pinned +/// future produced by `run_in_memory`. This is uniform across all module prefixes. +fn synthesise_task_impl( + stream_impl: &ItemImpl, + state_ty: &Type, + key_ty: Option<&Type>, + task_trait_path: Path, + module_prefix: &ModulePrefix, +) -> syn::Result { + let attrs = &stream_impl.attrs; + let generics = &stream_impl.generics; + let where_clause = &stream_impl.generics.where_clause; + let self_ty = &stream_impl.self_ty; + + let (_, stream_trait_path, _) = stream_impl.trait_.as_ref().ok_or_else(|| { + syn::Error::new(stream_impl.span(), "expected a StreamTask trait impl block") + })?; + + let task_config_ty = module_prefix.qualify("TaskConfig"); + let resources_ty = module_prefix.qualify("Resources"); + let task_result_ty = module_prefix.qualify("TaskResult"); + let cano_error_ty = module_prefix.qualify("CanoError"); + + let key_ty_tok: TokenStream = match key_ty { + Some(k) => quote! { #k }, + None => quote! { ::std::borrow::Cow<'static, str> }, + }; + + let synth = quote! { + #(#attrs)* + impl #generics #task_trait_path for #self_ty #where_clause { + fn config(&self) -> #task_config_ty { + ::config(self) + } + fn name(&self) -> ::std::borrow::Cow<'static, str> { + ::name(self) + } + fn run<'life0, 'life1, 'async_trait>( + &'life0 self, + res: &'life1 #resources_ty<#key_ty_tok>, + ) -> ::core::pin::Pin<::std::boxed::Box< + dyn ::core::future::Future< + Output = ::std::result::Result<#task_result_ty<#state_ty>, #cano_error_ty> + > + ::core::marker::Send + 'async_trait + >> + where + 'life0: 'async_trait, + 'life1: 'async_trait, + Self: ::core::marker::Sync + 'async_trait, + { + ::run_in_memory(self, res) + } + } + }; + + parse2(synth).map(|impl_block: ItemImpl| quote! { #impl_block }) +} + +// --------------------------------------------------------------------------- +// Type-inference helpers +// --------------------------------------------------------------------------- + +/// Return the type of the owned third parameter of `process_item` +/// (`&self`, `res`, then `item: T` → `T`). The type is returned verbatim — no +/// reference is peeled (stream items are owned, unlike `BatchTask::process_item`). +fn peel_owned_param(f: &ImplItemFn, fn_name: &str) -> syn::Result { + let third = f.sig.inputs.iter().nth(2).ok_or_else(|| { + syn::Error::new_spanned( + &f.sig, + format!( + "#[cano::task::stream]: `{fn_name}` must have a third parameter \ + `item: T` (an owned type) from which `type Item` can be inferred" + ), + ) + })?; + + match third { + FnArg::Typed(pt) => Ok((*pt.ty).clone()), + FnArg::Receiver(_) => Err(syn::Error::new_spanned( + third, + format!( + "#[cano::task::stream]: `{fn_name}` third parameter must be a typed argument, not `self`" + ), + )), + } +} + +/// Extract `(Output, Cursor)` from the `Ok` 2-tuple of `process_item`'s +/// `Result<(Output, Cursor), _>` return type. +fn peel_result_tuple_return(f: &ImplItemFn, fn_name: &str) -> syn::Result<(Type, Type)> { + let ret_ty = match &f.sig.output { + ReturnType::Type(_, t) => &**t, + ReturnType::Default => { + return Err(syn::Error::new_spanned( + &f.sig, + format!( + "#[cano::task::stream]: `{fn_name}` must return \ + `Result<(Output, Cursor), CanoError>` so `type Output` / `type Cursor` \ + can be inferred" + ), + )); + } + }; + + let Type::Path(tp) = ret_ty else { + return Err(tuple_return_error(fn_name, ret_ty)); + }; + let last = tp + .path + .segments + .last() + .ok_or_else(|| tuple_return_error(fn_name, ret_ty))?; + if last.ident != "Result" && last.ident != "CanoResult" { + return Err(tuple_return_error(fn_name, ret_ty)); + } + let PathArguments::AngleBracketed(args) = &last.arguments else { + return Err(tuple_return_error(fn_name, ret_ty)); + }; + let ok_ty = args + .args + .iter() + .find_map(|a| match a { + GenericArgument::Type(t) => Some(t), + _ => None, + }) + .ok_or_else(|| tuple_return_error(fn_name, ret_ty))?; + + let Type::Tuple(tuple) = ok_ty else { + return Err(tuple_return_error(fn_name, ret_ty)); + }; + if tuple.elems.len() != 2 { + return Err(tuple_return_error(fn_name, ret_ty)); + } + let mut it = tuple.elems.iter(); + let output_ty = it.next().unwrap().clone(); + let cursor_ty = it.next().unwrap().clone(); + Ok((output_ty, cursor_ty)) +} + +fn tuple_return_error(fn_name: &str, ret_ty: &Type) -> syn::Error { + syn::Error::new_spanned( + ret_ty, + format!( + "#[cano::task::stream]: `{fn_name}` must return \ + `Result<(Output, Cursor), CanoError>` (a 2-tuple `Ok` type) so that \ + `type Output` / `type Cursor` can be inferred; found `{}`. Write explicit \ + `type Output = ..;` / `type Cursor = ..;` lines if the return shape differs.", + quote! { #ret_ty } + ), + ) +} diff --git a/cano-macros/tests/stream_task_impl.rs b/cano-macros/tests/stream_task_impl.rs new file mode 100644 index 0000000..06a6e4d --- /dev/null +++ b/cano-macros/tests/stream_task_impl.rs @@ -0,0 +1,219 @@ +//! Integration tests for `#[cano::task::stream]` — the inherent form (which emits +//! `::cano::` paths, so it only resolves *outside* the `cano` crate), the `key =` form, +//! and the explicit trait-impl form. Exercises the companion `impl Task` and the +//! engine-driven `register_stream` path. + +use cano::prelude::*; +use futures_util::{Stream, stream}; +use std::pin::Pin; +use std::sync::Arc; + +#[derive(Debug, Clone, PartialEq, Eq, Hash)] +enum Step { + Consume, + Done, +} + +#[derive(Debug, Clone, PartialEq, Eq, Hash)] +enum Key { + Store, +} + +// 1. Inherent form with inferred Item / Output / Cursor and default window / config / name. +struct InherentInferred; + +#[task::stream(state = Step)] +impl InherentInferred { + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(vec![1u32, 2, 3])) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u32) -> Result<(String, u64), CanoError> { + Ok((format!("v={item}"), item as u64)) + } + + async fn flush_window( + &self, + _res: &Resources, + outputs: Vec, + ) -> Result, CanoError> { + // Default window is Count(1): one item per window. + assert_eq!(outputs.len(), 1); + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + reason: CloseReason, + ) -> Result, CanoError> { + assert_eq!(reason, CloseReason::Exhausted); + Ok(TaskResult::Single(Step::Done)) + } +} + +#[test] +fn inherent_default_window_is_count_one() { + assert_eq!( + StreamTask::::window(&InherentInferred), + StreamWindow::Count(1) + ); +} + +#[test] +fn inherent_default_error_policy_is_fail_fast() { + assert_eq!( + StreamTask::::on_item_error(&InherentInferred), + StreamErrorPolicy::FailFast + ); +} + +#[test] +fn inherent_default_config_is_minimal() { + // TaskConfig::minimal() → 1 attempt (no retries). + assert_eq!(Task::config(&InherentInferred).retry_mode.max_attempts(), 1); +} + +#[test] +fn inherent_default_name_contains_type_name() { + assert!(StreamTask::::name(&InherentInferred).contains("InherentInferred")); +} + +#[tokio::test] +async fn inherent_runs_via_register_stream() { + let workflow = Workflow::bare() + .register_stream(Step::Consume, InherentInferred) + .add_exit_state(Step::Done); + let result = workflow + .orchestrate(Step::Consume, CancellationToken::disabled()) + .await + .unwrap(); + assert_eq!(result, Step::Done); +} + +#[tokio::test] +async fn inherent_runs_via_register_in_memory() { + let res = Resources::new(); + let result = Task::run(&InherentInferred, &res).await.unwrap(); + assert_eq!(result, TaskResult::Single(Step::Done)); +} + +// 2. Inherent form with `key = Key` and an overridden window. +struct InherentWithKey; + +#[task::stream(state = Step, key = Key)] +impl InherentWithKey { + fn window(&self) -> StreamWindow { + StreamWindow::Count(2) + } + + async fn open( + &self, + res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + // Exercise the enum-keyed resource lookup (constructs `Key::Store`). + let _ = res.get::(&Key::Store); + Ok(Box::pin(stream::iter(vec![10u32, 20, 30, 40])) + as Pin + Send>>) + } + + async fn process_item( + &self, + _res: &Resources, + item: u32, + ) -> Result<(u32, u64), CanoError> { + Ok((item * 10, item as u64)) + } + + async fn flush_window( + &self, + _res: &Resources, + outputs: Vec, + ) -> Result, CanoError> { + assert_eq!(outputs.len(), 2); + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(Step::Done)) + } +} + +#[tokio::test] +async fn inherent_with_key_runs() { + let resources = Resources::::new(); + let result = Task::run(&InherentWithKey, &resources).await.unwrap(); + assert_eq!(result, TaskResult::Single(Step::Done)); +} + +// 3. Trait-impl form with explicit associated types + an explicit `on_item_error`. +struct TraitStream; + +#[task::stream] +impl StreamTask for TraitStream { + type Item = u32; + type Output = u32; + type Cursor = u64; + + fn on_item_error(&self) -> StreamErrorPolicy { + StreamErrorPolicy::SkipAndContinue + } + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(vec![5u32])) as Pin + Send>>) + } + + async fn process_item( + &self, + _res: &Resources, + item: Self::Item, + ) -> Result<(Self::Output, Self::Cursor), CanoError> { + Ok((item + 1, item as u64)) + } + + async fn flush_window( + &self, + _res: &Resources, + outputs: Vec, + ) -> Result, CanoError> { + assert_eq!(outputs, vec![6u32]); + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(Step::Done)) + } +} + +#[test] +fn trait_form_error_policy_overridden() { + assert_eq!( + StreamTask::::on_item_error(&TraitStream), + StreamErrorPolicy::SkipAndContinue + ); +} + +#[tokio::test] +async fn trait_form_runs_and_is_dyn_task() { + let task: Arc> = Arc::new(TraitStream); + let res = Resources::new(); + let result = Task::run(task.as_ref(), &res).await.unwrap(); + assert_eq!(result, TaskResult::Single(Step::Done)); +} diff --git a/cano/examples/stream_task.rs b/cano/examples/stream_task.rs new file mode 100644 index 0000000..c50bb92 --- /dev/null +++ b/cano/examples/stream_task.rs @@ -0,0 +1,204 @@ +//! # StreamTask — Flipping Between Processing Modes Mid-Stream +//! +//! Run with: `cargo run --example stream_task` +//! +//! A more involved [`StreamTask`] demo: one continuous source of temperature +//! readings is consumed by a workflow that **flips between two processing modes** +//! depending on the data, processing readings differently in each: +//! +//! - **`Calm`** — batches readings into windows of 3 and reports the window +//! *average* (cheap, aggregate processing). If any reading in a window is too hot +//! (`>= HIGH`), it returns [`WindowSignal::Stop`] to flip the FSM to `Alert`. +//! - **`Alert`** — processes readings *one at a time* (`StreamWindow::Count(1)`), +//! reacting to each individually. Once a reading cools off (`<= COOL`, hysteresis), +//! it flips back to `Calm`. +//! +//! The two modes are separate [`StreamTask`]s registered at different FSM states, +//! but they pull from the **same shared source** held in [`Resources`]: each mode's +//! `open()` builds a lazy stream that pops one reading at a time, so flipping modes +//! (which drops the current stream) leaves the unconsumed readings in place for the +//! next mode to continue from. The run ends when the source is exhausted. +//! +//! ```text +//! Calm ⇄ Alert ⇄ Calm … ──(source exhausted)──► Done +//! ``` +//! +//! Registering these with [`Workflow::register_stream`] + a checkpoint store would +//! additionally persist each window's cursor for crash-resume (omitted here to keep +//! the example focused on mode-flipping and to run under default features). + +use cano::prelude::*; +use futures_util::{Stream, stream}; +use std::collections::VecDeque; +use std::pin::Pin; +use std::sync::Mutex; + +/// Flip to `Alert` when a calm window contains a reading at or above this. +const HIGH: i32 = 80; +/// Flip back to `Calm` when an alert reading drops to or below this (hysteresis). +const COOL: i32 = 70; + +#[derive(Debug, Clone, PartialEq, Eq, Hash)] +enum Mode { + Calm, + Alert, + Done, +} + +#[derive(Debug, Clone)] +struct Reading { + seq: u64, + temp: i32, +} + +/// The shared event source — both modes pop from this same queue, so consumption +/// continues seamlessly across mode flips. +struct Source { + queue: Mutex>, +} + +#[resource] +impl Resource for Source {} + +/// Build a lazy stream that pops one reading at a time from the shared `Source`. +/// Dropping this stream (when the FSM flips modes) leaves the rest in the queue. +fn open_source(source: std::sync::Arc) -> Pin + Send>> { + Box::pin(stream::unfold(source, |source| async move { + let next = source.queue.lock().unwrap().pop_front(); + next.map(|reading| (reading, source)) + })) +} + +// --------------------------------------------------------------------------- +// Calm mode — windowed averaging; flips to Alert on a hot window. +// --------------------------------------------------------------------------- + +struct CalmMode; + +#[task::stream(state = Mode)] +impl CalmMode { + fn window(&self) -> StreamWindow { + StreamWindow::Count(3) + } + + async fn open( + &self, + res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(open_source(res.get::("source")?)) + } + + async fn process_item(&self, _res: &Resources, item: Reading) -> Result<(i32, u64), CanoError> { + Ok((item.temp, item.seq)) + } + + async fn flush_window( + &self, + _res: &Resources, + temps: Vec, + ) -> Result, CanoError> { + let avg = temps.iter().sum::() as f64 / temps.len() as f64; + let hottest = *temps.iter().max().unwrap(); + println!("calm : window {temps:?} → avg {avg:.1}°"); + if hottest >= HIGH { + println!("calm : {hottest}° ≥ {HIGH}° — flipping to ALERT"); + return Ok(WindowSignal::Stop(TaskResult::Single(Mode::Alert))); + } + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + reason: CloseReason, + ) -> Result, CanoError> { + println!("calm : source closed ({reason:?}) → Done"); + Ok(TaskResult::Single(Mode::Done)) + } +} + +// --------------------------------------------------------------------------- +// Alert mode — per-reading handling; flips back to Calm once it cools off. +// --------------------------------------------------------------------------- + +struct AlertMode; + +#[task::stream(state = Mode)] +impl AlertMode { + fn window(&self) -> StreamWindow { + // One reading per window — react to each individually. + StreamWindow::Count(1) + } + + async fn open( + &self, + res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(open_source(res.get::("source")?)) + } + + async fn process_item(&self, _res: &Resources, item: Reading) -> Result<(i32, u64), CanoError> { + Ok((item.temp, item.seq)) + } + + async fn flush_window( + &self, + _res: &Resources, + temps: Vec, + ) -> Result, CanoError> { + let temp = temps[0]; + let over = temp - HIGH; + println!("alert : reading {temp}° ({over:+}° vs HIGH) — handling individually"); + if temp <= COOL { + println!("alert : {temp}° ≤ {COOL}° — cooled off, flipping back to CALM"); + return Ok(WindowSignal::Stop(TaskResult::Single(Mode::Calm))); + } + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + reason: CloseReason, + ) -> Result, CanoError> { + println!("alert : source closed ({reason:?}) → Done"); + Ok(TaskResult::Single(Mode::Done)) + } +} + +#[tokio::main] +async fn main() -> CanoResult<()> { + println!("=== stream_task example: flipping between processing modes ===\n"); + + // A synthetic temperature stream that heats up and cools off twice. + let temps = [70, 71, 73, 85, 88, 76, 71, 90, 68, 73, 74, 95, 88, 60]; + let readings: VecDeque = temps + .iter() + .enumerate() + .map(|(i, &temp)| Reading { + seq: i as u64, + temp, + }) + .collect(); + + let resources = Resources::new().insert( + "source", + Source { + queue: Mutex::new(readings), + }, + ); + + let workflow = Workflow::new(resources) + .register_stream(Mode::Calm, CalmMode) + .register_stream(Mode::Alert, AlertMode) + .add_exit_state(Mode::Done); + + let result = workflow + .orchestrate(Mode::Calm, CancellationToken::disabled()) + .await?; + assert_eq!(result, Mode::Done); + println!("\ncompleted at {result:?}"); + Ok(()) +} diff --git a/cano/src/lib.rs b/cano/src/lib.rs index 3fbc78b..acf2184 100644 --- a/cano/src/lib.rs +++ b/cano/src/lib.rs @@ -125,10 +125,17 @@ //! [`Workflow::register`]. A resumed run re-runs `wait()`, so the delay restarts after a crash. //! - [`BatchTask`] trait: fan-out over data items via `load`/`process_item`/`finish`. //! - [`SteppedTask`] trait: resumable iterative work via `step()` with a serializable cursor. -//! -//! Every [`RouterTask`], [`PollTask`], [`TimerTask`], [`BatchTask`], and [`SteppedTask`] -//! automatically implements [`Task`] via companion impls emitted by the `#[task::router]`, -//! `#[task::poll]`, `#[task::timer]`, `#[task::batch]`, and `#[task::stepped]` macros respectively. +//! - [`StreamTask`] trait: a genuine stream-processing model — consume an `impl Stream` +//! continuously via `open`/`process_item`/`flush_window`/`on_close`, flush per +//! [`StreamWindow`] window, run until the [`CancellationToken`] fires, and resume from a +//! persisted cursor. Registered with [`Workflow::register_stream`]; per-item errors are +//! governed by [`StreamErrorPolicy`]. (Plain [`Workflow::register`] runs an in-memory, +//! non-durable, non-cancellable companion loop.) +//! +//! Every [`RouterTask`], [`PollTask`], [`TimerTask`], [`BatchTask`], [`SteppedTask`], and +//! [`StreamTask`] automatically implements [`Task`] via companion impls emitted by the +//! `#[task::router]`, `#[task::poll]`, `#[task::timer]`, `#[task::batch]`, `#[task::stepped]`, +//! and `#[task::stream]` macros respectively. //! //! ### Parallel Execution (Split/Join) //! @@ -227,6 +234,7 @@ //! - [`task::timer`]: The [`TimerTask`] trait — wait-then-transition via `wait()`/`after_wait()`; registered with [`Workflow::register`] //! - [`task::batch`]: The [`BatchTask`] trait — fan-out over data items via `load`/`process_item`/`finish`; registered with [`Workflow::register`] //! - [`task::stepped`]: The [`SteppedTask`] trait — resumable iterative work via `step()` with a serializable cursor; registered with [`Workflow::register_stepped`] (persists the cursor when a checkpoint store is attached) +//! - [`task::stream`]: The [`StreamTask`] trait — a genuine stream-processing model: consume an `impl Stream` continuously, flush per [`StreamWindow`] window, run until the [`CancellationToken`] fires, and resume from a persisted cursor; registered with [`Workflow::register_stream`] //! - [`cancel`]: [`CancellationToken`] / [`CancellationHandle`] — cooperative cancellation for [`orchestrate`](Workflow::orchestrate) //! - [`workflow`]: [`Workflow`] — FSM orchestration with Split/Join support //! - `scheduler` (requires `scheduler` feature): `Scheduler` (builder) and `RunningScheduler` (live handle) — cron and interval scheduling @@ -296,6 +304,7 @@ pub use task::router::{DynRouterTask, RouterTask, RouterTaskObject}; pub use task::stepped::{ DefaultStepCursor, DynSteppedTask, StepOutcome, SteppedTask, SteppedTaskObject, run_stepped, }; +pub use task::stream::{CloseReason, StreamErrorPolicy, StreamTask, StreamWindow, WindowSignal}; pub use task::timer::{DynTimerTask, TimerOutcome, TimerTask, TimerTaskObject, run_timer}; #[cfg(feature = "recovery")] @@ -326,6 +335,7 @@ pub use scheduler::{BackoffPolicy, FlowInfo, RunningScheduler, Schedule, Schedul /// - `#[cano::task::timer]` — for `impl TimerTask` blocks /// - `#[cano::task::batch]` — for `impl BatchTask` blocks /// - `#[cano::task::stepped]` — for `impl SteppedTask` blocks +/// - `#[cano::task::stream]` — for `impl StreamTask` blocks /// - `#[cano::saga::task]` — for `impl CompensatableTask` blocks /// /// [`cano-macros`]: https://docs.rs/cano-macros @@ -369,13 +379,14 @@ pub mod prelude { pub use crate::{ BatchTask, CancellationHandle, CancellationToken, CanoError, CanoResult, CheckpointRow, - CheckpointStore, CircuitBreaker, CircuitPermit, CircuitPolicy, CircuitState, + CheckpointStore, CircuitBreaker, CircuitPermit, CircuitPolicy, CircuitState, CloseReason, CompensatableTask, HealthStatus, JoinConfig, JoinStrategy, MemoryStore, Meter, MeterStatus, MultiPermit, MultiRateLimiter, PollErrorPolicy, PollOutcome, PollTask, RateLimiter, RateLimiterPermit, RateLimiterPolicy, Reservation, Resource, Resources, RetryMode, RouterTask, RowKind, SplitResult, SplitTaskResult, StateEntry, StepOutcome, SteppedTask, - Task, TaskConfig, TaskObject, TaskResult, Tier, TimerOutcome, TimerTask, WindowPermit, - WindowPolicy, WindowedRateLimiter, Workflow, WorkflowObserver, run_stepped, + StreamErrorPolicy, StreamTask, StreamWindow, Task, TaskConfig, TaskObject, TaskResult, + Tier, TimerOutcome, TimerTask, WindowPermit, WindowPolicy, WindowSignal, + WindowedRateLimiter, Workflow, WorkflowObserver, run_stepped, }; #[cfg(feature = "scheduler")] diff --git a/cano/src/metrics.rs b/cano/src/metrics.rs index 60a1b1c..16f21ab 100644 --- a/cano/src/metrics.rs +++ b/cano/src/metrics.rs @@ -118,6 +118,9 @@ pub const CIRCUIT_OUTCOMES_TOTAL: &str = "cano_circuit_outcomes_total"; pub const POLL_ITERATIONS_TOTAL: &str = "cano_poll_iterations_total"; pub const BATCH_RUNS_TOTAL: &str = "cano_batch_runs_total"; pub const BATCH_ITEMS_TOTAL: &str = "cano_batch_items_total"; +pub const STREAM_RUNS_TOTAL: &str = "cano_stream_runs_total"; +pub const STREAM_WINDOWS_TOTAL: &str = "cano_stream_windows_total"; +pub const STREAM_ITEMS_TOTAL: &str = "cano_stream_items_total"; pub const STEP_ITERATIONS_TOTAL: &str = "cano_step_iterations_total"; pub const CHECKPOINT_APPENDS_TOTAL: &str = "cano_checkpoint_appends_total"; pub const CHECKPOINT_CLEARS_TOTAL: &str = "cano_checkpoint_clears_total"; @@ -266,6 +269,21 @@ pub fn describe() { Unit::Count, "BatchTask items processed, by result (ok|err)" ); + describe_counter!( + STREAM_RUNS_TOTAL, + Unit::Count, + "StreamTask runs, by outcome (completed|cancelled|failed)" + ); + describe_counter!( + STREAM_WINDOWS_TOTAL, + Unit::Count, + "StreamTask windows flushed" + ); + describe_counter!( + STREAM_ITEMS_TOTAL, + Unit::Count, + "StreamTask items processed, by result (ok|err)" + ); describe_counter!( STEP_ITERATIONS_TOTAL, Unit::Count, @@ -478,6 +496,21 @@ pub(crate) fn batch_items(ok: usize, err: usize) { pub(crate) fn step_iteration(done: bool) { counter!(STEP_ITERATIONS_TOTAL, "outcome" => if done { "done" } else { "more" }).increment(1); } +pub(crate) fn stream_run(outcome: &'static str) { + // `outcome` is one of "completed" | "cancelled" | "failed". + counter!(STREAM_RUNS_TOTAL, "outcome" => outcome).increment(1); +} +pub(crate) fn stream_window() { + counter!(STREAM_WINDOWS_TOTAL).increment(1); +} +pub(crate) fn stream_items(ok: usize, err: usize) { + if ok > 0 { + counter!(STREAM_ITEMS_TOTAL, "result" => "ok").increment(ok as u64); + } + if err > 0 { + counter!(STREAM_ITEMS_TOTAL, "result" => "err").increment(err as u64); + } +} // ----- recovery / saga ----- diff --git a/cano/src/task.rs b/cano/src/task.rs index e2e7388..9837c3e 100644 --- a/cano/src/task.rs +++ b/cano/src/task.rs @@ -88,6 +88,7 @@ pub mod poll; mod retry; pub mod router; pub mod stepped; +pub mod stream; pub mod timer; pub use batch::{ @@ -101,17 +102,19 @@ pub use router::{DynRouterTask, RouterTask, RouterTaskObject}; pub use stepped::{ DefaultStepCursor, DynSteppedTask, StepOutcome, SteppedTask, SteppedTaskObject, run_stepped, }; +pub use stream::{CloseReason, StreamErrorPolicy, StreamTask, StreamWindow, WindowSignal}; pub use timer::{DynTimerTask, TimerOutcome, TimerTask, TimerTaskObject, run_timer}; // Attribute macros namespaced under `cano::task::` so that -// `#[task::router]`, `#[task::poll]`, `#[task::timer]`, `#[task::batch]`, and -// `#[task::stepped]` all resolve as path-qualified attribute macros. +// `#[task::router]`, `#[task::poll]`, `#[task::timer]`, `#[task::batch]`, +// `#[task::stepped]`, and `#[task::stream]` all resolve as path-qualified attribute macros. // (Modules and macros occupy different namespaces, so these coexist with the -// `router`, `poll`, `timer`, `batch`, and `stepped` submodules above.) +// `router`, `poll`, `timer`, `batch`, `stepped`, and `stream` submodules above.) pub use cano_macros::batch_task as batch; pub use cano_macros::poll_task as poll; pub use cano_macros::router_task as router; pub use cano_macros::stepped_task as stepped; +pub use cano_macros::stream_task as stream; pub use cano_macros::timer_task as timer; /// Result type for task execution that supports both single and split transitions diff --git a/cano/src/task/stream.rs b/cano/src/task/stream.rs new file mode 100644 index 0000000..a0d48f3 --- /dev/null +++ b/cano/src/task/stream.rs @@ -0,0 +1,3068 @@ +//! # StreamTask — A Genuine Stream-Processing Model +//! +//! A [`StreamTask`] consumes an `impl Stream` **continuously**, processes each item, and +//! flushes per-[`StreamWindow`] window — so memory stays bounded and downstream sees +//! progress before the source ends. It terminates in one of three ways: +//! +//! - **Exhausted** — the source returns `None`: the partial window is flushed and +//! [`on_close`](StreamTask::on_close)`(Exhausted)` chooses the next state. +//! - **Stop** — [`flush_window`](StreamTask::flush_window) returns [`WindowSignal::Stop`]: +//! transition to that result. +//! - **Cancelled** — the workflow's [`CancellationToken`](crate::cancel::CancellationToken) +//! fires: cooperative drain — the in-flight window is flushed, its cursor is committed, +//! `on_close(Cancelled)` runs for cleanup (its returned state is *ignored*), and the run +//! ends as [`CanoError::Cancelled`](crate::error::CanoError::Cancelled) so a later +//! [`resume_from`](crate::workflow::Workflow::resume_from) continues from the committed +//! cursor. Cancel means "stop cleanly + resumable", not "transition onward". +//! +//! ## Batch vs. stream +//! +//! This is **not** [`BatchTask`](crate::task::batch::BatchTask). Batch loads a *bounded* +//! `Vec`, processes all of it, and aggregates **once** at the end — O(N) memory, one +//! emission, requires the data to end. `StreamTask` is for *unbounded* / continuous +//! sources (Kafka, SSE, file-tail, WebSocket): incremental per-window emission, bounded +//! memory, runs until stopped, and **resumable** from a persisted cursor. +//! +//! ## Cursor persistence & resume +//! +//! Register with [`Workflow::register_stream`](crate::workflow::Workflow::register_stream) +//! and attach a [`CheckpointStore`](crate::recovery::CheckpointStore) + a workflow id: the +//! engine persists the cursor returned by the **last item of each flushed window** (as a +//! [`RowKind::StepCursor`](crate::recovery::RowKind::StepCursor) row), and a resumed run +//! re-opens the source from that position. Registering via plain +//! [`Workflow::register`](crate::workflow::Workflow::register) runs the in-memory loop +//! with **no** persistence and **no** cancellation — the companion `Task` path is for +//! convenience / tests only. +//! +//! ## Idempotency (at-least-once) +//! +//! The FSM writes the state-entry checkpoint *before* running the task, so a resumed run +//! re-enters the state and calls [`open`](StreamTask::open) again from the last committed +//! cursor. The window *after* that cursor may be partially processed then replayed — +//! [`open`](StreamTask::open), [`process_item`](StreamTask::process_item), and +//! [`on_close`](StreamTask::on_close) **must be idempotent**. `config` defaults to +//! [`TaskConfig::minimal()`] (no outer retry) because an outer retry would re-invoke +//! `open()` and re-consume the stream; only [`attempt_timeout`](crate::task::TaskConfig) +//! is honored — as a per-[`process_item`](StreamTask::process_item) bound. + +use crate::cancel::CancellationToken; +use crate::error::CanoError; +use crate::resource::Resources; +use crate::task::{TaskConfig, TaskResult}; +use futures_util::Stream; +use serde::Serialize; +use serde::de::DeserializeOwned; +use std::borrow::Cow; +use std::fmt; +use std::future::Future; +use std::hash::Hash; +use std::pin::Pin; +use std::sync::Arc; + +// --------------------------------------------------------------------------- +// Value types +// --------------------------------------------------------------------------- + +/// Controls how the per-item windowed loop responds when +/// [`process_item`](StreamTask::process_item) returns an [`Err`]. Modelled on +/// [`PollErrorPolicy`](crate::task::poll::PollErrorPolicy), with an extra +/// [`SkipAndContinue`](StreamErrorPolicy::SkipAndContinue) for poison-message handling. +#[derive(Debug, Clone, PartialEq, Eq, Default)] +pub enum StreamErrorPolicy { + /// Propagate the first item error — the loop stops and the run fails. + #[default] + FailFast, + /// Log/observe the bad item, drop it, and keep consuming. The skipped item's + /// cursor is not committed (the next good item advances it). + SkipAndContinue, + /// Tolerate up to `max_errors` **consecutive** item errors before failing. The + /// counter resets on every successfully processed item. + RetryOnError { + /// Maximum number of consecutive item errors before the loop fails. + max_errors: u32, + }, +} + +/// Tumbling-window trigger: how often [`flush_window`](StreamTask::flush_window) fires and +/// how much the driver buffers. Defaults to per-item ([`Count(1)`](StreamWindow::Count)); +/// larger windows amortise flush + checkpoint cost. +#[derive(Debug, Clone, PartialEq, Eq)] +pub enum StreamWindow { + /// Flush after this many successfully processed items (clamped to a minimum of 1). + Count(usize), + /// Flush after this much wall-clock elapses, tumbling. Empty windows are skipped + /// (no [`flush_window`](StreamTask::flush_window) call) so an idle source does not + /// emit spurious empty flushes. + Duration(std::time::Duration), +} + +/// The result of one [`flush_window`](StreamTask::flush_window) call. +#[derive(Debug)] +pub enum WindowSignal { + /// Keep consuming the stream. + Continue, + /// Stop and transition the FSM to this result. The driver commits the window's + /// cursor first. + Stop(TaskResult), +} + +/// Why the consume loop is ending — passed to [`on_close`](StreamTask::on_close). +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum CloseReason { + /// The source stream returned `None`. + Exhausted, + /// The workflow's [`CancellationToken`](crate::cancel::CancellationToken) fired + /// (cooperative shutdown). The in-flight partial window was flushed first. + Cancelled, +} + +// --------------------------------------------------------------------------- +// StreamTask trait +// --------------------------------------------------------------------------- + +/// A genuine stream-processing model: consume an `impl Stream` continuously, flush per +/// window, run until cancelled/exhausted, and resume from a persisted cursor. +/// +/// # Generic Types +/// +/// - **`TState`**: The workflow state enum (`Clone + Debug + Send + Sync`). +/// - **`TResourceKey`**: The resource-lookup key type (defaults to [`Cow<'static, str>`]). +/// +/// # Associated Types +/// +/// - **`Item`**: one element pulled from the source stream. +/// - **`Output`**: the per-item result accumulated into a window. +/// - **`Cursor`**: the resumable position; `Serialize + DeserializeOwned + Send + Sync + 'static`. +/// +/// Prefer the inherent `#[task::stream(state = S)]` form, which infers `Item` from +/// `process_item`'s owned `item` parameter and `Output` / `Cursor` from the `Ok` tuple of +/// its return type. +#[crate::task::stream] +pub trait StreamTask>: Send + Sync +where + TState: Clone + fmt::Debug + Send + Sync + 'static, + TResourceKey: Hash + Eq + Send + Sync + 'static, +{ + /// One element pulled from the source stream. + type Item: Send + 'static; + /// The per-item result accumulated into a window. + type Output: Send + 'static; + /// The resumable position, persisted as a cursor for crash-resume. + type Cursor: Serialize + DeserializeOwned + Send + Sync + 'static; + + /// Windowing policy. Defaults to [`StreamWindow::Count(1)`] (flush per item). + fn window(&self) -> StreamWindow { + StreamWindow::Count(1) + } + + /// Per-item error policy. Defaults to [`StreamErrorPolicy::FailFast`]. + fn on_item_error(&self) -> StreamErrorPolicy { + StreamErrorPolicy::FailFast + } + + /// Task configuration. Defaults to [`TaskConfig::minimal()`]. + /// + /// Only [`attempt_timeout`](crate::task::TaskConfig) is applied — as a bound on each + /// [`process_item`](StreamTask::process_item) call (a timeout becomes an item error + /// governed by [`on_item_error`](StreamTask::on_item_error)). **Outer retry + /// (`max_attempts`) is intentionally not applied**: it would re-invoke + /// [`open`](StreamTask::open) and re-consume the stream. The per-item error policy, the + /// `CancellationToken`, and the window loop are the resilience surface. + fn config(&self) -> TaskConfig { + TaskConfig::minimal() + } + + /// Human-readable identifier, reported to + /// [`WorkflowObserver`](crate::observer::WorkflowObserver) hooks. + fn name(&self) -> Cow<'static, str> { + Cow::Borrowed(std::any::type_name::()) + } + + /// Open (or resume) the source stream. `cursor` is the last committed position, or + /// `None` on a fresh run. Must be idempotent (see the module docs). + async fn open( + &self, + res: &Resources, + cursor: Option, + ) -> Result + Send>>, CanoError>; + + /// Process one item; return its output and the cursor reached by consuming it (the + /// position to commit once this item's window flushes). + async fn process_item( + &self, + res: &Resources, + item: Self::Item, + ) -> Result<(Self::Output, Self::Cursor), CanoError>; + + /// Flush one full window: commit side effects, then decide whether to continue or + /// stop. The driver persists the window's cursor after this returns. + async fn flush_window( + &self, + res: &Resources, + outputs: Vec, + ) -> Result, CanoError>; + + /// Close hook, called after the in-flight partial window has been flushed. + /// + /// - [`CloseReason::Exhausted`]: the returned [`TaskResult`] is the **next state**. + /// - [`CloseReason::Cancelled`]: a **cleanup** hook — the returned `TaskResult` is + /// **ignored** and the run ends as + /// [`CanoError::Cancelled`](crate::error::CanoError::Cancelled) (an `Err` returned + /// here *is* propagated). Use it to release resources / commit final offsets. + /// + /// **At-least-once:** like [`open`](StreamTask::open) / [`process_item`](StreamTask::process_item), + /// `on_close` runs once per run but may be **re-invoked on crash-resume** (a crash + /// between `on_close` and the cursor commit replays the boundary window). It **must be + /// idempotent** — e.g. committing final offsets here must tolerate a repeat. + async fn on_close( + &self, + res: &Resources, + reason: CloseReason, + ) -> Result, CanoError>; + + /// Drive the in-memory windowed loop (no cursor persistence, no cancellation). Used by + /// the macro-synthesised `impl Task::run` so a `StreamTask` can be + /// [`register`](crate::workflow::Workflow::register)ed like any task. The durable, + /// cancellable path is [`Workflow::register_stream`](crate::workflow::Workflow::register_stream). + /// + /// Written as a hand-desugared `fn` (not `async fn`) so no `for<'async_trait>` binder + /// is introduced; it returns the future produced by the crate-private driver. + #[doc(hidden)] + fn run_in_memory<'life0, 'life1, 'async_trait>( + &'life0 self, + res: &'life1 Resources, + ) -> Pin, CanoError>> + Send + 'async_trait>> + where + 'life0: 'async_trait, + 'life1: 'async_trait, + Self: Sync + 'async_trait + Sized, + { + Box::pin(run_stream_in_memory(self, res)) + } +} + +// --------------------------------------------------------------------------- +// drive_window — the single per-window loop body (shared by both drivers) +// --------------------------------------------------------------------------- + +/// One window of consumption: returned per [`flush_window`](StreamTask::flush_window) or +/// per terminal close. The cursor is the concrete `Cursor` of the last item in the window. +pub(crate) enum WindowStep { + /// A full window was flushed and the task asked to continue. Commit `cursor`. + Window { cursor: TCursor }, + /// Natural termination: the stream ended or a window returned `Stop`. Commit + /// `final_cursor` (if any), then transition to `result`. + Done { + final_cursor: Option, + result: TaskResult, + }, + /// The run was cancelled: the in-flight window was flushed and `on_close` ran for + /// cleanup. Commit `final_cursor` (if any), then end as + /// [`CanoError::Cancelled`](crate::error::CanoError::Cancelled) so + /// [`resume_from`](crate::workflow::Workflow::resume_from) continues from this position. + Cancelled { final_cursor: Option }, +} + +/// Pull and process items until one window flushes (or the loop terminates). Shared by the +/// in-memory companion and the engine-driven session — there is exactly one loop body. +#[allow(clippy::too_many_arguments)] +async fn drive_window( + task: &T, + res: &Resources, + stream: &mut Pin + Send>>, + consecutive_errors: &mut u32, + window: &StreamWindow, + policy: &StreamErrorPolicy, + attempt_timeout: Option, + token: &CancellationToken, +) -> Result, CanoError> +where + T: StreamTask + ?Sized, + S: Clone + fmt::Debug + Send + Sync + 'static, + K: Hash + Eq + Send + Sync + 'static, +{ + use futures_util::StreamExt as _; + + let count_limit = match window { + StreamWindow::Count(n) => Some((*n).max(1)), + StreamWindow::Duration(_) => None, + }; + let mut deadline: Option = match window { + StreamWindow::Duration(d) => Some(tokio::time::Instant::now() + *d), + StreamWindow::Count(_) => None, + }; + + let mut buf: Vec = Vec::new(); + let mut last_cursor: Option = None; + + loop { + // Count-window flush. + if let Some(limit) = count_limit + && buf.len() >= limit + { + #[cfg(feature = "metrics")] + crate::metrics::stream_window(); + let outputs = std::mem::take(&mut buf); + return Ok(match task.flush_window(res, outputs).await? { + WindowSignal::Continue => WindowStep::Window { + cursor: last_cursor.expect("a non-empty window always has a cursor"), + }, + WindowSignal::Stop(result) => WindowStep::Done { + final_cursor: last_cursor, + result, + }, + }); + } + + // Resolves at the duration-window deadline, or never (count windows). + let tick = async { + match deadline { + Some(d) => tokio::time::sleep_until(d).await, + None => std::future::pending::<()>().await, + } + }; + + tokio::select! { + biased; + _ = token.cancelled() => { + if !buf.is_empty() { + #[cfg(feature = "metrics")] + crate::metrics::stream_window(); + // On cancel, `flush_window` runs only to commit the partial window's + // side effects; its `WindowSignal` is ignored (the run ends as + // `Cancelled` regardless — honoring `Stop` here would contradict that). + let _ = task.flush_window(res, std::mem::take(&mut buf)).await?; + } + // `on_close(Cancelled)` is a cleanup hook; its returned state is ignored — + // a cancelled run ends as `CanoError::Cancelled` (an `Err` it returns IS + // propagated). Resume continues from the committed `final_cursor`. + let _ = task.on_close(res, CloseReason::Cancelled).await?; + return Ok(WindowStep::Cancelled { final_cursor: last_cursor }); + } + _ = tick => { + // Duration window elapsed. + if buf.is_empty() { + // Empty tumbling window: advance the deadline and keep waiting. + if let (Some(d), StreamWindow::Duration(dur)) = (deadline.as_mut(), window) { + *d = tokio::time::Instant::now() + *dur; + } + continue; + } + #[cfg(feature = "metrics")] + crate::metrics::stream_window(); + let outputs = std::mem::take(&mut buf); + return Ok(match task.flush_window(res, outputs).await? { + WindowSignal::Continue => WindowStep::Window { + cursor: last_cursor.expect("a non-empty window always has a cursor"), + }, + WindowSignal::Stop(result) => WindowStep::Done { + final_cursor: last_cursor, + result, + }, + }); + } + item = stream.next() => { + match item { + Some(item) => { + // Bound a single `process_item` by `config().attempt_timeout` when set + // (a hung source item is the realistic failure mode). A timeout becomes + // an ordinary item error governed by `on_item_error()` below. Outer + // retry (`max_attempts`) is intentionally NOT applied — the per-item + // policy + the loop are the resilience surface. + let processed = match attempt_timeout { + Some(d) => match tokio::time::timeout(d, task.process_item(res, item)).await { + Ok(inner) => inner, + Err(_elapsed) => Err(CanoError::timeout( + "stream process_item exceeded attempt_timeout", + )), + }, + None => task.process_item(res, item).await, + }; + match processed { + Ok((out, cursor)) => { + *consecutive_errors = 0; + buf.push(out); + last_cursor = Some(cursor); + #[cfg(feature = "metrics")] + crate::metrics::stream_items(1, 0); + } + Err(e) => { + #[cfg(feature = "metrics")] + crate::metrics::stream_items(0, 1); + match policy { + StreamErrorPolicy::FailFast => return Err(e), + StreamErrorPolicy::SkipAndContinue => {} + StreamErrorPolicy::RetryOnError { max_errors } => { + *consecutive_errors += 1; + if *consecutive_errors > *max_errors { + return Err(e); + } + } + } + } + } + } + None => { + // Source exhausted: flush the final partial window. Honor a `Stop` + // here (transition to it) just like a full window; on `Continue` + // fall through to `on_close(Exhausted)` for the terminal transition. + if !buf.is_empty() { + #[cfg(feature = "metrics")] + crate::metrics::stream_window(); + match task.flush_window(res, std::mem::take(&mut buf)).await? { + WindowSignal::Stop(result) => { + return Ok(WindowStep::Done { + final_cursor: last_cursor, + result, + }); + } + WindowSignal::Continue => {} + } + } + let result = task.on_close(res, CloseReason::Exhausted).await?; + return Ok(WindowStep::Done { final_cursor: last_cursor, result }); + } + } + } + } + } +} + +/// In-memory companion loop: drive windows with a disabled token (no cancellation), no +/// cursor persistence. Backs [`StreamTask::run_in_memory`]. +async fn run_stream_in_memory( + task: &T, + res: &Resources, +) -> Result, CanoError> +where + T: StreamTask + ?Sized, + S: Clone + fmt::Debug + Send + Sync + 'static, + K: Hash + Eq + Send + Sync + 'static, +{ + let token = CancellationToken::disabled(); + let window = task.window(); + let policy = task.on_item_error(); + let attempt_timeout = task.config().attempt_timeout; + let mut consecutive_errors: u32 = 0; + + let result: Result, CanoError> = async { + let mut stream = task.open(res, None).await?; + loop { + match drive_window( + task, + res, + &mut stream, + &mut consecutive_errors, + &window, + &policy, + attempt_timeout, + &token, + ) + .await? + { + WindowStep::Window { .. } => continue, + WindowStep::Done { result, .. } => return Ok(result), + // Unreachable: the in-memory companion drives with a disabled token. + WindowStep::Cancelled { .. } => return Err(CanoError::cancelled()), + } + } + } + .await; + + // The in-memory companion uses a disabled token, so it never cancels. + #[cfg(feature = "metrics")] + crate::metrics::stream_run(if result.is_ok() { + "completed" + } else { + "failed" + }); + result +} + +// --------------------------------------------------------------------------- +// Type-erased infrastructure (for StateEntry::Stream / register_stream) +// --------------------------------------------------------------------------- + +/// One erased window step: serialized cursor bytes in place of the concrete `Cursor`. +pub enum ErasedWindowStep { + /// A full window flushed; persist `cursor` and continue. + Window { cursor: Vec }, + /// Natural termination: persist `final_cursor` (if any) then transition to `result`. + Done { + final_cursor: Option>, + result: TaskResult, + }, + /// Cancelled: persist `final_cursor` (if any) then end as `CanoError::Cancelled`. + Cancelled { final_cursor: Option> }, +} + +/// Future returned by [`ErasedStreamSession::next_window`]. +pub type WindowFuture<'a, TState> = + Pin, CanoError>> + Send + 'a>>; + +/// Object-safe view of an opened stream session. The engine advances it one window at a +/// time, persisting the returned cursor between windows. +pub trait ErasedStreamSession: Send +where + TState: Clone + Send + Sync + 'static, + TResourceKey: Hash + Eq + Send + Sync + 'static, +{ + /// Consume until one window flushes (or the loop terminates). + fn next_window<'a>( + &'a mut self, + res: &'a Resources, + token: &'a CancellationToken, + ) -> WindowFuture<'a, TState>; +} + +/// Future returned by [`ErasedStreamTask::open_session`]. +pub type OpenSessionFuture<'a, TState, TResourceKey> = Pin< + Box< + dyn Future>, CanoError>> + + Send + + 'a, + >, +>; + +/// Object-safe, type-erased view of a [`StreamTask`] for the engine's +/// [`StateEntry::Stream`](crate::workflow::execution::StateEntry) path. +pub trait ErasedStreamTask: Send + Sync +where + TState: Clone + Send + Sync + 'static, + TResourceKey: Hash + Eq + Send + Sync + 'static, +{ + fn name(&self) -> Cow<'static, str>; + /// Open (or resume) the source from `cursor_bytes`, returning a driven session. + /// `attempt_timeout` (from the registered `config()`) bounds each `process_item`. + fn open_session<'a>( + &'a self, + res: &'a Resources, + cursor_bytes: Option>, + attempt_timeout: Option, + ) -> OpenSessionFuture<'a, TState, TResourceKey>; +} + +/// An opened, concretely-typed stream session: owns the task handle + the stream + the +/// per-stream error counter. Holds the single windowed loop body. +struct StreamSession +where + T: StreamTask + 'static, + S: Clone + fmt::Debug + Send + Sync + 'static, + K: Hash + Eq + Send + Sync + 'static, +{ + task: Arc, + stream: Pin + Send>>, + window: StreamWindow, + policy: StreamErrorPolicy, + attempt_timeout: Option, + consecutive_errors: u32, +} + +impl ErasedStreamSession for StreamSession +where + T: StreamTask + 'static, + S: Clone + fmt::Debug + Send + Sync + 'static, + K: Hash + Eq + Send + Sync + 'static, +{ + fn next_window<'a>( + &'a mut self, + res: &'a Resources, + token: &'a CancellationToken, + ) -> WindowFuture<'a, S> { + Box::pin(async move { + let task = Arc::clone(&self.task); + let step = drive_window( + &*task, + res, + &mut self.stream, + &mut self.consecutive_errors, + &self.window, + &self.policy, + self.attempt_timeout, + token, + ) + .await?; + Ok(match step { + WindowStep::Window { cursor } => ErasedWindowStep::Window { + cursor: encode_cursor(&cursor, &self.task.name())?, + }, + WindowStep::Done { + final_cursor, + result, + } => ErasedWindowStep::Done { + final_cursor: match final_cursor { + Some(c) => Some(encode_cursor(&c, &self.task.name())?), + None => None, + }, + result, + }, + WindowStep::Cancelled { final_cursor } => ErasedWindowStep::Cancelled { + final_cursor: match final_cursor { + Some(c) => Some(encode_cursor(&c, &self.task.name())?), + None => None, + }, + }, + }) + }) + } +} + +/// Bridges a concrete [`StreamTask`] to the object-safe [`ErasedStreamTask`]. Handles +/// `serde_json` cursor (de)serialization so the engine only sees `Vec`. +pub(crate) struct StreamAdapter(pub Arc); + +impl ErasedStreamTask for StreamAdapter +where + TState: Clone + fmt::Debug + Send + Sync + 'static, + TResourceKey: Hash + Eq + Send + Sync + 'static, + T: StreamTask + 'static, +{ + fn name(&self) -> Cow<'static, str> { + self.0.name() + } + fn open_session<'a>( + &'a self, + res: &'a Resources, + cursor_bytes: Option>, + attempt_timeout: Option, + ) -> OpenSessionFuture<'a, TState, TResourceKey> { + Box::pin(async move { + let cursor: Option = match cursor_bytes { + None => None, + Some(ref b) => Some(serde_json::from_slice(b).map_err(|e| { + CanoError::task_execution(format!( + "deserialize stream cursor for `{}`: {e}", + self.0.name() + )) + })?), + }; + let stream = self.0.open(res, cursor).await?; + let session = StreamSession { + task: Arc::clone(&self.0), + stream, + window: self.0.window(), + policy: self.0.on_item_error(), + attempt_timeout, + consecutive_errors: 0, + }; + Ok(Box::new(session) as Box>) + }) + } +} + +fn encode_cursor(cursor: &C, task_name: &str) -> Result, CanoError> { + serde_json::to_vec(cursor).map_err(|e| { + CanoError::task_execution(format!("serialize stream cursor for `{task_name}`: {e}")) + }) +} + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +#[cfg(test)] +mod tests { + use super::*; + use crate::task; + use crate::task::Task; + use futures_util::stream; + use std::sync::Mutex; + + #[derive(Debug, Clone, PartialEq, Eq, Hash)] + enum Step { + Consume, + Done, + } + + #[test] + fn value_type_defaults() { + assert_eq!(StreamErrorPolicy::default(), StreamErrorPolicy::FailFast); + let _ = StreamWindow::Count(8); + let _ = StreamWindow::Duration(std::time::Duration::from_millis(5)); + assert_eq!(CloseReason::Exhausted, CloseReason::Exhausted); + } + + // Note: in-crate impls use the trait-impl form (`impl StreamTask for T`); the + // inherent form emits `::cano::` paths that don't resolve inside this crate. The + // inherent form is exercised in `cano-macros/tests/stream_task_impl.rs`. + + #[derive(Default)] + struct Collector { + seen: Mutex>, + windows: Mutex>>, + } + + #[task::stream] + impl StreamTask for Collector { + type Item = u32; + type Output = u32; + type Cursor = u64; + + fn window(&self) -> StreamWindow { + StreamWindow::Count(2) + } + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(vec![10u32, 20, 30, 40, 50])) + as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u32) -> Result<(u32, u64), CanoError> { + self.seen.lock().unwrap().push(item); + Ok((item * 2, item as u64)) + } + + async fn flush_window( + &self, + _res: &Resources, + outputs: Vec, + ) -> Result, CanoError> { + self.windows.lock().unwrap().push(outputs); + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(Step::Done)) + } + } + + #[tokio::test] + async fn in_memory_windows_and_order() { + let task = Collector::default(); + let res = Resources::new(); + let result = Task::run(&task, &res).await.unwrap(); + assert_eq!(result, TaskResult::Single(Step::Done)); + assert_eq!(*task.seen.lock().unwrap(), vec![10, 20, 30, 40, 50]); + // Count(2): windows [20,40], [60,80], then the partial [100] flushed on close. + assert_eq!( + *task.windows.lock().unwrap(), + vec![vec![20u32, 40], vec![60, 80], vec![100]] + ); + } + + struct FailOnSecond { + policy: StreamErrorPolicy, + flushed: Mutex>, + } + + #[task::stream] + impl StreamTask for FailOnSecond { + type Item = u32; + type Output = u32; + type Cursor = u64; + + fn on_item_error(&self) -> StreamErrorPolicy { + self.policy.clone() + } + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(vec![1u32, 2, 3])) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u32) -> Result<(u32, u64), CanoError> { + if item == 2 { + Err(CanoError::task_execution("item 2 failed")) + } else { + Ok((item, item as u64)) + } + } + + async fn flush_window( + &self, + _res: &Resources, + outputs: Vec, + ) -> Result, CanoError> { + self.flushed.lock().unwrap().extend(outputs); + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(Step::Done)) + } + } + + #[tokio::test] + async fn fail_fast_propagates() { + let task = FailOnSecond { + policy: StreamErrorPolicy::FailFast, + flushed: Mutex::new(Vec::new()), + }; + let res = Resources::new(); + let err = Task::run(&task, &res).await.unwrap_err(); + assert!(matches!(err, CanoError::TaskExecution(_))); + } + + #[tokio::test] + async fn skip_and_continue_drops_bad_item() { + let task = FailOnSecond { + policy: StreamErrorPolicy::SkipAndContinue, + flushed: Mutex::new(Vec::new()), + }; + let res = Resources::new(); + let result = Task::run(&task, &res).await.unwrap(); + assert_eq!(result, TaskResult::Single(Step::Done)); + // item 2 dropped; 1 and 3 survive. + assert_eq!(*task.flushed.lock().unwrap(), vec![1u32, 3]); + } + + struct StopAfterFirst; + + #[task::stream] + impl StreamTask for StopAfterFirst { + type Item = u32; + type Output = u32; + type Cursor = u64; + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(vec![1u32, 2, 3, 4])) + as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u32) -> Result<(u32, u64), CanoError> { + Ok((item, item as u64)) + } + + async fn flush_window( + &self, + _res: &Resources, + _outputs: Vec, + ) -> Result, CanoError> { + // Window is Count(1); stop after the very first window. + Ok(WindowSignal::Stop(TaskResult::Single(Step::Done))) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + panic!("on_close must not run when a window returns Stop"); + } + } + + #[tokio::test] + async fn window_stop_short_circuits() { + let res = Resources::new(); + let result = Task::run(&StopAfterFirst, &res).await.unwrap(); + assert_eq!(result, TaskResult::Single(Step::Done)); + } + + #[tokio::test] + async fn integrates_with_workflow_via_register() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + let workflow = Workflow::bare() + .register(Step::Consume, Collector::default()) + .add_exit_state(Step::Done); + let result = workflow + .orchestrate(Step::Consume, CancellationToken::disabled()) + .await + .unwrap(); + assert_eq!(result, Step::Done); + } + + // ----------------------------------------------------------------------- + // Engine-path tests: register_stream + cancellation + cursor persistence + // ----------------------------------------------------------------------- + + use std::collections::HashMap; + use std::sync::atomic::{AtomicBool, AtomicU32, Ordering}; + + /// Minimal in-memory `CheckpointStore` for the resume test. `committed` records every + /// appended `StepCursor` blob at append time, so cursor assertions survive the log + /// `clear` that a *successfully completed* run performs. + #[derive(Default)] + struct InMemoryStore { + rows: Mutex>>, + committed: Mutex>>, + } + + #[crate::checkpoint_store] + impl crate::recovery::CheckpointStore for InMemoryStore { + async fn append( + &self, + workflow_id: &str, + row: crate::recovery::CheckpointRow, + ) -> Result<(), CanoError> { + if row.kind == crate::recovery::RowKind::StepCursor + && let Some(blob) = &row.output_blob + { + self.committed.lock().unwrap().push(blob.clone()); + } + let mut g = self.rows.lock().unwrap(); + let v = g.entry(workflow_id.to_string()).or_default(); + if v.iter().any(|r| r.sequence == row.sequence) { + return Err(CanoError::checkpoint_store("duplicate sequence")); + } + v.push(row); + Ok(()) + } + async fn load_run( + &self, + workflow_id: &str, + ) -> Result, CanoError> { + let g = self.rows.lock().unwrap(); + let mut v = g.get(workflow_id).cloned().unwrap_or_default(); + v.sort_by_key(|r| r.sequence); + Ok(v) + } + async fn clear(&self, workflow_id: &str) -> Result<(), CanoError> { + self.rows.lock().unwrap().remove(workflow_id); + Ok(()) + } + } + + struct Forever { + closed_cancelled: Arc, + flushed_windows: Arc, + } + + #[task::stream] + impl StreamTask for Forever { + type Item = u64; + type Output = u64; + type Cursor = u64; + + fn window(&self) -> StreamWindow { + StreamWindow::Count(2) + } + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + // Effectively infinite source. + Ok(Box::pin(stream::iter(0u64..)) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u64) -> Result<(u64, u64), CanoError> { + tokio::time::sleep(std::time::Duration::from_millis(2)).await; + Ok((item, item)) + } + + async fn flush_window( + &self, + _res: &Resources, + _outputs: Vec, + ) -> Result, CanoError> { + self.flushed_windows.fetch_add(1, Ordering::SeqCst); + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + reason: CloseReason, + ) -> Result, CanoError> { + if reason == CloseReason::Cancelled { + self.closed_cancelled.store(true, Ordering::SeqCst); + } + Ok(TaskResult::Single(Step::Done)) + } + } + + #[tokio::test] + async fn cancel_drains_and_surfaces_cancelled() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + let closed = Arc::new(AtomicBool::new(false)); + let task = Forever { + closed_cancelled: Arc::clone(&closed), + flushed_windows: Arc::new(AtomicU32::new(0)), + }; + let (handle, token) = CancellationToken::new(); + let workflow = Workflow::bare() + .register_stream(Step::Consume, task) + .add_exit_state(Step::Done); + + tokio::spawn(async move { + tokio::time::sleep(std::time::Duration::from_millis(25)).await; + handle.cancel(); + }); + + let result = workflow.orchestrate(Step::Consume, token).await; + assert!( + matches!(&result, Err(e) if e.category() == "cancelled"), + "a cancelled stream must surface as cancelled, got {result:?}" + ); + assert!( + closed.load(Ordering::SeqCst), + "on_close(Cancelled) must run (cooperative drain reached the close hook)" + ); + } + + struct Resumable { + opened: Arc>>>, + processed: Arc>>, + fail_third: Arc, + } + + #[task::stream] + impl StreamTask for Resumable { + type Item = u64; + type Output = u64; + type Cursor = u64; + + fn window(&self) -> StreamWindow { + StreamWindow::Count(2) + } + + async fn open( + &self, + _res: &Resources, + cursor: Option, + ) -> Result + Send>>, CanoError> { + self.opened.lock().unwrap().push(cursor); + let start = cursor.map(|c| c + 1).unwrap_or(1); + let items: Vec = (start..=6).collect(); + Ok(Box::pin(stream::iter(items)) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u64) -> Result<(u64, u64), CanoError> { + self.processed.lock().unwrap().push(item); + Ok((item, item)) // cursor == item id + } + + async fn flush_window( + &self, + _res: &Resources, + outputs: Vec, + ) -> Result, CanoError> { + // Simulate a crash flushing the [5,6] window — only on the first run. + if outputs == vec![5u64, 6] && self.fail_third.swap(false, Ordering::SeqCst) { + return Err(CanoError::task_execution("simulated crash in window [5,6]")); + } + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(Step::Done)) + } + } + + #[tokio::test] + async fn persists_cursor_and_resumes() { + use crate::cancel::CancellationToken; + use crate::recovery::{CheckpointStore, RowKind}; + use crate::workflow::Workflow; + + let opened = Arc::new(Mutex::new(Vec::new())); + let processed = Arc::new(Mutex::new(Vec::new())); + let task = Resumable { + opened: Arc::clone(&opened), + processed: Arc::clone(&processed), + fail_third: Arc::new(AtomicBool::new(true)), + }; + let store = Arc::new(InMemoryStore::default()); + let workflow = Workflow::bare() + .register_stream(Step::Consume, task) + .add_exit_state(Step::Done) + .with_checkpoint_store(store.clone()) + .with_workflow_id("resume-test"); + + // Run 1: fails flushing window [5,6]. + let r1 = workflow + .orchestrate(Step::Consume, CancellationToken::disabled()) + .await; + assert!(r1.is_err(), "run 1 should fail mid-stream: {r1:?}"); + + // Windows [1,2] (cursor 2) and [3,4] (cursor 4) committed; [5,6] failed. + let rows = store.load_run("resume-test").await.unwrap(); + let cursors: Vec = rows + .iter() + .filter(|r| r.kind == RowKind::StepCursor) + .map(|r| serde_json::from_slice::(r.output_blob.as_ref().unwrap()).unwrap()) + .collect(); + assert_eq!( + cursors, + vec![2, 4], + "only fully-flushed windows commit a cursor" + ); + + // Resume: re-open at cursor 4 and finish [5,6]. + let r2 = workflow + .resume_from("resume-test", CancellationToken::disabled()) + .await + .unwrap(); + assert_eq!(r2, Step::Done); + + assert_eq!(*opened.lock().unwrap(), vec![None, Some(4)]); + assert_eq!( + *processed.lock().unwrap(), + vec![1u64, 2, 3, 4, 5, 6, 5, 6], + "resume reprocesses only the items after the committed cursor" + ); + } + + // ----------------------------------------------------------------------- + // Fix 1: WindowSignal::Stop from the terminal (exhaustion) partial flush is honored. + // ----------------------------------------------------------------------- + + #[derive(Debug, Clone, PartialEq, Eq, Hash)] + enum S3 { + Consume, + ViaStop, + ViaClose, + } + + struct StopOnFinalWindow; + + #[task::stream] + impl StreamTask for StopOnFinalWindow { + type Item = u32; + type Output = u32; + type Cursor = u64; + + fn window(&self) -> StreamWindow { + StreamWindow::Count(3) // never fills for a 2-item stream → terminal partial flush + } + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(vec![1u32, 2])) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u32) -> Result<(u32, u64), CanoError> { + Ok((item, item as u64)) + } + + async fn flush_window( + &self, + _res: &Resources, + _outputs: Vec, + ) -> Result, CanoError> { + Ok(WindowSignal::Stop(TaskResult::Single(S3::ViaStop))) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + // Must NOT run: the terminal partial flush returned Stop. + Ok(TaskResult::Single(S3::ViaClose)) + } + } + + #[tokio::test] + async fn terminal_flush_stop_is_honored() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + let workflow = Workflow::bare() + .register_stream(S3::Consume, StopOnFinalWindow) + .add_exit_states([S3::ViaStop, S3::ViaClose]); + let result = workflow + .orchestrate(S3::Consume, CancellationToken::disabled()) + .await + .unwrap(); + assert_eq!( + result, + S3::ViaStop, + "a Stop from the final partial flush must win over on_close(Exhausted)" + ); + } + + // ----------------------------------------------------------------------- + // Fix 2: cooperative cancel fires on_cancelled exactly once. + // ----------------------------------------------------------------------- + + #[derive(Default)] + struct CancelCounter { + cancels: AtomicU32, + } + + impl crate::observer::WorkflowObserver for CancelCounter { + fn on_cancelled(&self, _state: &str) { + self.cancels.fetch_add(1, Ordering::SeqCst); + } + } + + #[tokio::test] + async fn cancel_fires_on_cancelled_once() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + let task = Forever { + closed_cancelled: Arc::new(AtomicBool::new(false)), + flushed_windows: Arc::new(AtomicU32::new(0)), + }; + let counter = Arc::new(CancelCounter::default()); + let (handle, token) = CancellationToken::new(); + let workflow = Workflow::bare() + .register_stream(Step::Consume, task) + .add_exit_state(Step::Done) + .with_observer(counter.clone()); + + tokio::spawn(async move { + tokio::time::sleep(std::time::Duration::from_millis(25)).await; + handle.cancel(); + }); + + let result = workflow.orchestrate(Step::Consume, token).await; + assert!(matches!(&result, Err(e) if e.category() == "cancelled")); + assert_eq!( + counter.cancels.load(Ordering::SeqCst), + 1, + "on_cancelled must fire exactly once on a stream cancel" + ); + } + + // ----------------------------------------------------------------------- + // Fix 4: config().attempt_timeout bounds each process_item. + // ----------------------------------------------------------------------- + + struct SlowItem; + + #[task::stream] + impl StreamTask for SlowItem { + type Item = u32; + type Output = u32; + type Cursor = u64; + + fn config(&self) -> TaskConfig { + TaskConfig::minimal().with_attempt_timeout(std::time::Duration::from_millis(10)) + } + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(vec![1u32])) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u32) -> Result<(u32, u64), CanoError> { + // Far longer than the 10ms attempt_timeout. + tokio::time::sleep(std::time::Duration::from_secs(60)).await; + Ok((item, item as u64)) + } + + async fn flush_window( + &self, + _res: &Resources, + _outputs: Vec, + ) -> Result, CanoError> { + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(Step::Done)) + } + } + + #[tokio::test] + async fn attempt_timeout_bounds_process_item() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + let workflow = Workflow::bare() + .register_stream(Step::Consume, SlowItem) + .add_exit_state(Step::Done); + // FailFast (default) → the timed-out item fails the run promptly. + let result = tokio::time::timeout( + std::time::Duration::from_secs(5), + workflow.orchestrate(Step::Consume, CancellationToken::disabled()), + ) + .await + .expect("attempt_timeout must bound the hung process_item well under 5s"); + assert!( + matches!(&result, Err(e) if e.category() == "timeout"), + "a process_item exceeding attempt_timeout must surface a timeout error, got {result:?}" + ); + } + + // ======================================================================= + // Edge-case coverage (audited against drive_window + execute_stream_task). + // Shared helpers first, then one section per behavioural dimension. + // ======================================================================= + + /// A drop-safe channel-backed source. Polling `next()` borrows the receiver, so the + /// driver's `select!` dropping the in-flight `next()` future (which happens every time a + /// duration deadline wins the race) never loses a queued item — unlike + /// `stream::unfold(rx, …)`, whose future *owns* the receiver and would close the channel + /// on drop. This lets the duration-window tests feed items at controlled virtual times. + struct RecvStream(tokio::sync::mpsc::UnboundedReceiver); + + impl Stream for RecvStream { + type Item = u64; + fn poll_next( + self: Pin<&mut Self>, + cx: &mut std::task::Context<'_>, + ) -> std::task::Poll> { + self.get_mut().0.poll_recv(cx) + } + } + + /// The cursors (decoded as `u64`) committed during a run, in append order — captured at + /// append time so they survive the log `clear` a completed run performs. + fn step_cursors(store: &InMemoryStore) -> Vec { + store + .committed + .lock() + .unwrap() + .iter() + .map(|blob| serde_json::from_slice::(blob).unwrap()) + .collect() + } + + // ----------------------------------------------------------------------- + // Duration windowing — the entire tumbling-time path was undriven by tests. + // ----------------------------------------------------------------------- + + /// A `Duration`-windowed source fed from a channel; records the contents of each flush. + struct DurationSource { + rx: Mutex>>, + windows: Arc>>>, + } + + #[task::stream] + impl StreamTask for DurationSource { + type Item = u64; + type Output = u64; + type Cursor = u64; + + fn window(&self) -> StreamWindow { + StreamWindow::Duration(std::time::Duration::from_millis(50)) + } + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + let rx = self.rx.lock().unwrap().take().expect("open called once"); + Ok(Box::pin(RecvStream(rx)) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u64) -> Result<(u64, u64), CanoError> { + Ok((item, item)) + } + + async fn flush_window( + &self, + _res: &Resources, + outputs: Vec, + ) -> Result, CanoError> { + self.windows.lock().unwrap().push(outputs); + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(Step::Done)) + } + } + + #[tokio::test(start_paused = true)] + async fn duration_window_flushes_on_deadline_and_rearms() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + // 50ms windows; items arrive at +30ms each (1@30, 2@60, 3@90, 4@120) under paused + // time. Deadlines tumble at 50/100/150ms → windows [1] (t=50), [2,3] (t=100), then + // the channel closes at t=120 so [4] flushes as the terminal partial on exhaustion. + let (tx, rx) = tokio::sync::mpsc::unbounded_channel::(); + let windows = Arc::new(Mutex::new(Vec::new())); + let store = Arc::new(InMemoryStore::default()); + let task = DurationSource { + rx: Mutex::new(Some(rx)), + windows: Arc::clone(&windows), + }; + let workflow = Workflow::bare() + .register_stream(Step::Consume, task) + .add_exit_state(Step::Done) + .with_checkpoint_store(store.clone()) + .with_workflow_id("dur-rearm"); + + tokio::spawn(async move { + for v in 1u64..=4 { + tokio::time::sleep(std::time::Duration::from_millis(30)).await; + let _ = tx.send(v); + } + }); + + let result = workflow + .orchestrate(Step::Consume, CancellationToken::disabled()) + .await + .unwrap(); + assert_eq!(result, Step::Done); + assert_eq!( + *windows.lock().unwrap(), + vec![vec![1u64], vec![2, 3], vec![4]], + "tumbling duration windows re-arm after each Continue flush" + ); + assert_eq!( + step_cursors(&store), + vec![1u64, 3, 4], + "each duration flush commits its last item's cursor" + ); + } + + #[tokio::test(start_paused = true)] + async fn duration_window_skips_empty_intervals() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + // 50ms windows but the source is idle until +120ms. The deadlines at 50/100ms fire + // with an empty buffer and must NOT emit a spurious flush — they re-arm and wait. + let (tx, rx) = tokio::sync::mpsc::unbounded_channel::(); + let windows = Arc::new(Mutex::new(Vec::new())); + let task = DurationSource { + rx: Mutex::new(Some(rx)), + windows: Arc::clone(&windows), + }; + let workflow = Workflow::bare() + .register_stream(Step::Consume, task) + .add_exit_state(Step::Done); + + tokio::spawn(async move { + tokio::time::sleep(std::time::Duration::from_millis(120)).await; + let _ = tx.send(1); + // tx dropped here → channel closes, run exhausts. + }); + + let result = workflow + .orchestrate(Step::Consume, CancellationToken::disabled()) + .await + .unwrap(); + assert_eq!(result, Step::Done); + let w = windows.lock().unwrap(); + assert!( + w.iter().all(|win| !win.is_empty()), + "an idle duration window must never flush an empty buffer: {w:?}" + ); + assert_eq!( + *w, + vec![vec![1u64]], + "exactly one real window despite two elapsed-but-empty deadlines" + ); + } + + struct DurationStop { + rx: Mutex>>, + on_close_ran: Arc, + } + + #[task::stream] + impl StreamTask for DurationStop { + type Item = u64; + type Output = u64; + type Cursor = u64; + + fn window(&self) -> StreamWindow { + StreamWindow::Duration(std::time::Duration::from_millis(50)) + } + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + let rx = self.rx.lock().unwrap().take().expect("open called once"); + Ok(Box::pin(RecvStream(rx)) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u64) -> Result<(u64, u64), CanoError> { + Ok((item, item)) + } + + async fn flush_window( + &self, + _res: &Resources, + _outputs: Vec, + ) -> Result, CanoError> { + Ok(WindowSignal::Stop(TaskResult::Single(S3::ViaStop))) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + self.on_close_ran.store(true, Ordering::SeqCst); + Ok(TaskResult::Single(S3::ViaClose)) + } + } + + #[tokio::test(start_paused = true)] + async fn duration_window_stop_transitions_without_close() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + // One item buffered, then the 50ms deadline fires and flush_window returns Stop: + // the FSM must transition to that result, NOT fall through to on_close. + let (tx, rx) = tokio::sync::mpsc::unbounded_channel::(); + let on_close_ran = Arc::new(AtomicBool::new(false)); + let task = DurationStop { + rx: Mutex::new(Some(rx)), + on_close_ran: Arc::clone(&on_close_ran), + }; + let workflow = Workflow::bare() + .register_stream(S3::Consume, task) + .add_exit_states([S3::ViaStop, S3::ViaClose]); + + tokio::spawn(async move { + tokio::time::sleep(std::time::Duration::from_millis(30)).await; + let _ = tx.send(1); + // Hold the sender open past the deadline so the *duration* tick (not exhaustion) + // drives the flush. + tokio::time::sleep(std::time::Duration::from_secs(3600)).await; + drop(tx); + }); + + let result = workflow + .orchestrate(S3::Consume, CancellationToken::disabled()) + .await + .unwrap(); + assert_eq!( + result, + S3::ViaStop, + "a duration-window Stop wins over on_close" + ); + assert!( + !on_close_ran.load(Ordering::SeqCst), + "on_close must not run when a duration window returns Stop" + ); + } + + // ----------------------------------------------------------------------- + // Per-item error policy: RetryOnError (previously untested) + timeout×policy. + // ----------------------------------------------------------------------- + + /// Yields `1..=len`; `process_item` fails for any id in `fail`. Records flushed outputs. + struct ScriptedErrors { + len: u64, + fail: Vec, + policy: StreamErrorPolicy, + flushed: Arc>>, + } + + #[task::stream] + impl StreamTask for ScriptedErrors { + type Item = u64; + type Output = u64; + type Cursor = u64; + + fn on_item_error(&self) -> StreamErrorPolicy { + self.policy.clone() + } + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + let items: Vec = (1..=self.len).collect(); + Ok(Box::pin(stream::iter(items)) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u64) -> Result<(u64, u64), CanoError> { + if self.fail.contains(&item) { + Err(CanoError::task_execution(format!("item {item} failed"))) + } else { + Ok((item, item)) + } + } + + async fn flush_window( + &self, + _res: &Resources, + outputs: Vec, + ) -> Result, CanoError> { + self.flushed.lock().unwrap().extend(outputs); + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(Step::Done)) + } + } + + #[tokio::test] + async fn retry_on_error_tolerates_consecutive_within_max() { + // max_errors=2; items 2 and 3 fail consecutively (count reaches 2, == max) then 4 + // succeeds → the run survives and the good items are flushed. + let flushed = Arc::new(Mutex::new(Vec::new())); + let task = ScriptedErrors { + len: 4, + fail: vec![2, 3], + policy: StreamErrorPolicy::RetryOnError { max_errors: 2 }, + flushed: Arc::clone(&flushed), + }; + let res = Resources::new(); + let result = Task::run(&task, &res).await.unwrap(); + assert_eq!(result, TaskResult::Single(Step::Done)); + assert_eq!( + *flushed.lock().unwrap(), + vec![1u64, 4], + "only ok items flush" + ); + } + + #[tokio::test] + async fn retry_on_error_fails_past_max() { + // max_errors=1; two consecutive failures (count 1 then 2 > 1) fails the run. + let task = ScriptedErrors { + len: 3, + fail: vec![2, 3], + policy: StreamErrorPolicy::RetryOnError { max_errors: 1 }, + flushed: Arc::new(Mutex::new(Vec::new())), + }; + let res = Resources::new(); + let err = Task::run(&task, &res).await.unwrap_err(); + assert_eq!(err.category(), "task_execution"); + } + + #[tokio::test] + async fn retry_on_error_counter_resets_on_success() { + // max_errors=2; pattern ok,err,err,ok,err,err. Without the reset-on-success the 5th + // item would push the count to 3 (>2) and fail; because item 4 resets it to 0, the + // run completes — proving the tolerance is on *consecutive* errors only. + let task = ScriptedErrors { + len: 6, + fail: vec![2, 3, 5, 6], + policy: StreamErrorPolicy::RetryOnError { max_errors: 2 }, + flushed: Arc::new(Mutex::new(Vec::new())), + }; + let res = Resources::new(); + let result = Task::run(&task, &res).await.unwrap(); + assert_eq!(result, TaskResult::Single(Step::Done)); + } + + #[tokio::test] + async fn retry_on_error_max_zero_fails_on_first() { + // max_errors=0 behaves like FailFast: the first error (count 1 > 0) fails the run. + let task = ScriptedErrors { + len: 3, + fail: vec![2], + policy: StreamErrorPolicy::RetryOnError { max_errors: 0 }, + flushed: Arc::new(Mutex::new(Vec::new())), + }; + let res = Resources::new(); + let err = Task::run(&task, &res).await.unwrap_err(); + assert_eq!(err.category(), "task_execution"); + } + + /// `process_item` sleeps for `slow` items; an `attempt_timeout` turns that into a + /// timeout item-error that the policy then governs. + struct SlowUnderPolicy { + policy: StreamErrorPolicy, + flushed: Arc>>, + } + + #[task::stream] + impl StreamTask for SlowUnderPolicy { + type Item = u64; + type Output = u64; + type Cursor = u64; + + fn config(&self) -> TaskConfig { + TaskConfig::minimal().with_attempt_timeout(std::time::Duration::from_millis(10)) + } + + fn on_item_error(&self) -> StreamErrorPolicy { + self.policy.clone() + } + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(vec![1u64, 2, 3])) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u64) -> Result<(u64, u64), CanoError> { + if item == 1 { + // Far longer than the 10ms attempt_timeout → a timeout item error. + tokio::time::sleep(std::time::Duration::from_secs(60)).await; + } + Ok((item, item)) + } + + async fn flush_window( + &self, + _res: &Resources, + outputs: Vec, + ) -> Result, CanoError> { + self.flushed.lock().unwrap().extend(outputs); + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(Step::Done)) + } + } + + #[tokio::test] + async fn timeout_item_skipped_under_skip_and_continue() { + // The timed-out item 1 is treated as an ordinary item error and dropped; 2 and 3 + // process normally and the run completes. + let flushed = Arc::new(Mutex::new(Vec::new())); + let task = SlowUnderPolicy { + policy: StreamErrorPolicy::SkipAndContinue, + flushed: Arc::clone(&flushed), + }; + let res = Resources::new(); + let result = Task::run(&task, &res).await.unwrap(); + assert_eq!(result, TaskResult::Single(Step::Done)); + assert_eq!( + *flushed.lock().unwrap(), + vec![2u64, 3], + "the timed-out item is skipped, not fatal" + ); + } + + #[tokio::test] + async fn timeout_item_counts_under_retry_on_error() { + // With max_errors=0 the timeout item-error fails the run, surfacing as a timeout. + let task = SlowUnderPolicy { + policy: StreamErrorPolicy::RetryOnError { max_errors: 0 }, + flushed: Arc::new(Mutex::new(Vec::new())), + }; + let res = Resources::new(); + let err = Task::run(&task, &res).await.unwrap_err(); + assert_eq!(err.category(), "timeout"); + } + + // ----------------------------------------------------------------------- + // SkipAndContinue cursor + Count(0) clamp. + // ----------------------------------------------------------------------- + + #[tokio::test] + async fn skip_does_not_commit_bad_item_cursor() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + // Count(1) over [1,2,3] with item 2 failing under SkipAndContinue: cursors 1 and 3 + // commit but 2 never does — a skipped item does not advance the persisted cursor. + let task = ScriptedErrors { + len: 3, + fail: vec![2], + policy: StreamErrorPolicy::SkipAndContinue, + flushed: Arc::new(Mutex::new(Vec::new())), + }; + let store = Arc::new(InMemoryStore::default()); + let workflow = Workflow::bare() + .register_stream(Step::Consume, task) + .add_exit_state(Step::Done) + .with_checkpoint_store(store.clone()) + .with_workflow_id("skip-cursor"); + let result = workflow + .orchestrate(Step::Consume, CancellationToken::disabled()) + .await + .unwrap(); + assert_eq!(result, Step::Done); + assert_eq!( + step_cursors(&store), + vec![1u64, 3], + "the skipped item's cursor (2) is never committed" + ); + } + + struct CountWindowSource { + window: StreamWindow, + windows: Arc>>>, + } + + #[task::stream] + impl StreamTask for CountWindowSource { + type Item = u64; + type Output = u64; + type Cursor = u64; + + fn window(&self) -> StreamWindow { + self.window.clone() + } + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(vec![1u64, 2, 3])) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u64) -> Result<(u64, u64), CanoError> { + Ok((item, item)) + } + + async fn flush_window( + &self, + _res: &Resources, + outputs: Vec, + ) -> Result, CanoError> { + self.windows.lock().unwrap().push(outputs); + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(Step::Done)) + } + } + + #[tokio::test] + async fn count_zero_window_clamps_to_per_item() { + // Count(0) is clamped to a minimum of 1, so it flushes one item per window. + let windows = Arc::new(Mutex::new(Vec::new())); + let task = CountWindowSource { + window: StreamWindow::Count(0), + windows: Arc::clone(&windows), + }; + let res = Resources::new(); + let result = Task::run(&task, &res).await.unwrap(); + assert_eq!(result, TaskResult::Single(Step::Done)); + assert_eq!( + *windows.lock().unwrap(), + vec![vec![1u64], vec![2], vec![3]], + "Count(0) behaves like Count(1)" + ); + } + + // ----------------------------------------------------------------------- + // Natural termination & cursor commit (engine path). + // ----------------------------------------------------------------------- + + /// Yields `1..=len`; cursor == item. Counts flushes so a missing/extra flush is visible. + struct CountingFlush { + len: u64, + window: StreamWindow, + flushes: Arc>>>, + } + + #[task::stream] + impl StreamTask for CountingFlush { + type Item = u64; + type Output = u64; + type Cursor = u64; + + fn window(&self) -> StreamWindow { + self.window.clone() + } + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + let items: Vec = (1..=self.len).collect(); + Ok(Box::pin(stream::iter(items)) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u64) -> Result<(u64, u64), CanoError> { + Ok((item, item)) + } + + async fn flush_window( + &self, + _res: &Resources, + outputs: Vec, + ) -> Result, CanoError> { + self.flushes.lock().unwrap().push(outputs); + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(Step::Done)) + } + } + + #[tokio::test] + async fn empty_stream_closes_without_flush_or_cursor() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + // open() yields nothing → on_close(Exhausted) runs, but no window flushes and no + // cursor commits. + let flushes = Arc::new(Mutex::new(Vec::new())); + let task = CountingFlush { + len: 0, + window: StreamWindow::Count(2), + flushes: Arc::clone(&flushes), + }; + let store = Arc::new(InMemoryStore::default()); + let workflow = Workflow::bare() + .register_stream(Step::Consume, task) + .add_exit_state(Step::Done) + .with_checkpoint_store(store.clone()) + .with_workflow_id("empty"); + let result = workflow + .orchestrate(Step::Consume, CancellationToken::disabled()) + .await + .unwrap(); + assert_eq!(result, Step::Done); + assert!( + flushes.lock().unwrap().is_empty(), + "no flush for an empty source" + ); + assert!( + step_cursors(&store).is_empty(), + "no cursor committed when nothing is processed" + ); + } + + #[tokio::test] + async fn exhaust_exact_divide_commits_only_full_window_cursors() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + // Count(2) over exactly 4 items: windows [1,2] and [3,4] flush; exhaustion finds an + // empty buffer so on_close runs without a third flush and no extra cursor commits. + let flushes = Arc::new(Mutex::new(Vec::new())); + let task = CountingFlush { + len: 4, + window: StreamWindow::Count(2), + flushes: Arc::clone(&flushes), + }; + let store = Arc::new(InMemoryStore::default()); + let workflow = Workflow::bare() + .register_stream(Step::Consume, task) + .add_exit_state(Step::Done) + .with_checkpoint_store(store.clone()) + .with_workflow_id("exact"); + let result = workflow + .orchestrate(Step::Consume, CancellationToken::disabled()) + .await + .unwrap(); + assert_eq!(result, Step::Done); + assert_eq!(*flushes.lock().unwrap(), vec![vec![1u64, 2], vec![3, 4]]); + assert_eq!(step_cursors(&store), vec![2u64, 4]); + } + + #[tokio::test] + async fn exhaust_partial_window_commits_its_cursor() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + // Count(2) over 5 items: [1,2], [3,4], then the terminal partial [5] flushes on + // exhaustion and its cursor (5) commits before on_close transitions. + let flushes = Arc::new(Mutex::new(Vec::new())); + let task = CountingFlush { + len: 5, + window: StreamWindow::Count(2), + flushes: Arc::clone(&flushes), + }; + let store = Arc::new(InMemoryStore::default()); + let workflow = Workflow::bare() + .register_stream(Step::Consume, task) + .add_exit_state(Step::Done) + .with_checkpoint_store(store.clone()) + .with_workflow_id("partial"); + let result = workflow + .orchestrate(Step::Consume, CancellationToken::disabled()) + .await + .unwrap(); + assert_eq!(result, Step::Done); + assert_eq!( + *flushes.lock().unwrap(), + vec![vec![1u64, 2], vec![3, 4], vec![5]] + ); + assert_eq!(step_cursors(&store), vec![2u64, 4, 5]); + } + + // ----------------------------------------------------------------------- + // Cancellation drain semantics. + // ----------------------------------------------------------------------- + + /// Self-cancels from `process_item` once `cancel_after` items have been seen, so a + /// partial (sub-window) buffer is in flight when the token fires. Records flushes/close. + struct CancelMidWindow { + handle: crate::cancel::CancellationHandle, + cancel_after: u32, + seen: AtomicU32, + window: StreamWindow, + stop_on_flush: bool, + flushed: Arc>>>, + close_reason: Arc>>, + } + + #[task::stream] + impl StreamTask for CancelMidWindow { + type Item = u64; + type Output = u64; + type Cursor = u64; + + fn window(&self) -> StreamWindow { + self.window.clone() + } + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(0u64..)) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u64) -> Result<(u64, u64), CanoError> { + let n = self.seen.fetch_add(1, Ordering::SeqCst) + 1; + if n == self.cancel_after { + self.handle.cancel(); + } + Ok((item, item)) + } + + async fn flush_window( + &self, + _res: &Resources, + outputs: Vec, + ) -> Result, CanoError> { + self.flushed.lock().unwrap().push(outputs); + if self.stop_on_flush { + Ok(WindowSignal::Stop(TaskResult::Single(S3::ViaStop))) + } else { + Ok(WindowSignal::Continue) + } + } + + async fn on_close( + &self, + _res: &Resources, + reason: CloseReason, + ) -> Result, CanoError> { + *self.close_reason.lock().unwrap() = Some(reason); + Ok(TaskResult::Single(S3::ViaClose)) + } + } + + #[tokio::test] + async fn cancel_flushes_partial_in_flight_window() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + // Count(3) but cancel fires after the 2nd item: the in-flight partial [0,1] is + // flushed, on_close(Cancelled) runs, and the run ends as cancelled. The biased + // select also proves cancel wins over the ready 3rd stream item. + let flushed = Arc::new(Mutex::new(Vec::new())); + let close_reason = Arc::new(Mutex::new(None)); + let (handle, token) = CancellationToken::new(); + let task = CancelMidWindow { + handle, + cancel_after: 2, + seen: AtomicU32::new(0), + window: StreamWindow::Count(3), + stop_on_flush: false, + flushed: Arc::clone(&flushed), + close_reason: Arc::clone(&close_reason), + }; + let workflow = Workflow::bare() + .register_stream(S3::Consume, task) + .add_exit_states([S3::ViaStop, S3::ViaClose]); + let result = workflow.orchestrate(S3::Consume, token).await; + assert!( + matches!(&result, Err(e) if e.category() == "cancelled"), + "got {result:?}" + ); + assert_eq!( + *flushed.lock().unwrap(), + vec![vec![0u64, 1]], + "the partial window flushes once on cancel" + ); + assert_eq!(*close_reason.lock().unwrap(), Some(CloseReason::Cancelled)); + } + + #[tokio::test] + async fn cancel_ignores_stop_from_partial_flush() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + // The cancel-drain flush returns Stop(ViaStop); it must be ignored — the run still + // ends cancelled rather than transitioning to ViaStop. + let (handle, token) = CancellationToken::new(); + let task = CancelMidWindow { + handle, + cancel_after: 2, + seen: AtomicU32::new(0), + window: StreamWindow::Count(3), + stop_on_flush: true, + flushed: Arc::new(Mutex::new(Vec::new())), + close_reason: Arc::new(Mutex::new(None)), + }; + let workflow = Workflow::bare() + .register_stream(S3::Consume, task) + .add_exit_states([S3::ViaStop, S3::ViaClose]); + let result = workflow.orchestrate(S3::Consume, token).await; + assert!( + matches!(&result, Err(e) if e.category() == "cancelled"), + "Stop from the cancel-drain flush must not transition, got {result:?}" + ); + } + + /// Self-cancels from `flush_window` (after a full window), so the next loop observes the + /// cancel with an *empty* buffer. `on_close` may return an error to test propagation. + struct CancelAfterWindow { + handle: crate::cancel::CancellationHandle, + flushes: AtomicU32, + close_errors: bool, + closed_cancelled: Arc, + } + + #[task::stream] + impl StreamTask for CancelAfterWindow { + type Item = u64; + type Output = u64; + type Cursor = u64; + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(0u64..)) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u64) -> Result<(u64, u64), CanoError> { + Ok((item, item)) + } + + async fn flush_window( + &self, + _res: &Resources, + _outputs: Vec, + ) -> Result, CanoError> { + // Default Count(1): fire cancel after the first full window so the next iteration + // drains with an empty buffer. + self.flushes.fetch_add(1, Ordering::SeqCst); + self.handle.cancel(); + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + reason: CloseReason, + ) -> Result, CanoError> { + if reason == CloseReason::Cancelled { + self.closed_cancelled.store(true, Ordering::SeqCst); + if self.close_errors { + return Err(CanoError::task_execution("cleanup failed")); + } + } + Ok(TaskResult::Single(Step::Done)) + } + } + + #[tokio::test] + async fn cancel_with_empty_buffer_skips_flush_but_runs_close() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + let (handle, token) = CancellationToken::new(); + let closed = Arc::new(AtomicBool::new(false)); + let task = CancelAfterWindow { + handle, + flushes: AtomicU32::new(0), + close_errors: false, + closed_cancelled: Arc::clone(&closed), + }; + let workflow = Workflow::bare() + .register_stream(Step::Consume, task) + .add_exit_state(Step::Done); + let result = workflow.orchestrate(Step::Consume, token).await; + assert!( + matches!(&result, Err(e) if e.category() == "cancelled"), + "got {result:?}" + ); + assert!( + closed.load(Ordering::SeqCst), + "on_close(Cancelled) still runs with an empty buffer" + ); + } + + #[tokio::test] + async fn cancel_propagates_on_close_error() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + // An Err from on_close(Cancelled) surfaces as that error, not as a generic cancel. + let (handle, token) = CancellationToken::new(); + let task = CancelAfterWindow { + handle, + flushes: AtomicU32::new(0), + close_errors: true, + closed_cancelled: Arc::new(AtomicBool::new(false)), + }; + let workflow = Workflow::bare() + .register_stream(Step::Consume, task) + .add_exit_state(Step::Done); + let err = workflow + .orchestrate(Step::Consume, token) + .await + .unwrap_err(); + assert_eq!(err.category(), "task_execution"); + assert!(err.to_string().contains("cleanup failed"), "got {err}"); + } + + /// Cancels mid-window with a committed-cursor source so the cancel commits a final + /// cursor; a fresh-cursor `open` lets resume continue from it. + struct CancelThenResume { + handle: crate::cancel::CancellationHandle, + seen: AtomicU32, + opened_cursors: Arc>>>, + } + + #[task::stream] + impl StreamTask for CancelThenResume { + type Item = u64; + type Output = u64; + type Cursor = u64; + + fn window(&self) -> StreamWindow { + StreamWindow::Count(3) + } + + async fn open( + &self, + _res: &Resources, + cursor: Option, + ) -> Result + Send>>, CanoError> { + self.opened_cursors.lock().unwrap().push(cursor); + // First run: long source so cancel lands mid-window. Resume: empty tail → clean + // exhaustion (the run is already past the cancel point). + let items: Vec = match cursor { + None => (0u64..1000).collect(), + Some(_) => Vec::new(), + }; + Ok(Box::pin(stream::iter(items)) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u64) -> Result<(u64, u64), CanoError> { + let n = self.seen.fetch_add(1, Ordering::SeqCst) + 1; + if n == 2 { + self.handle.cancel(); + } + Ok((item, item)) + } + + async fn flush_window( + &self, + _res: &Resources, + _outputs: Vec, + ) -> Result, CanoError> { + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(Step::Done)) + } + } + + #[tokio::test] + async fn cancel_commits_partial_cursor_and_resumes() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + let (handle, token) = CancellationToken::new(); + let opened = Arc::new(Mutex::new(Vec::new())); + let task = CancelThenResume { + handle, + seen: AtomicU32::new(0), + opened_cursors: Arc::clone(&opened), + }; + let store = Arc::new(InMemoryStore::default()); + let workflow = Workflow::bare() + .register_stream(Step::Consume, task) + .add_exit_state(Step::Done) + .with_checkpoint_store(store.clone()) + .with_workflow_id("cancel-resume"); + + // Run 1: cancel after the 2nd item; the partial window [0,1] flushes and commits + // cursor 1. + let r1 = workflow.orchestrate(Step::Consume, token).await; + assert!( + matches!(&r1, Err(e) if e.category() == "cancelled"), + "got {r1:?}" + ); + assert_eq!( + step_cursors(&store), + vec![1u64], + "the cancelled run commits the in-flight window's final cursor" + ); + + // Resume: re-open at cursor 1, exhaust the empty tail, finish. + let r2 = workflow + .resume_from("cancel-resume", CancellationToken::disabled()) + .await + .unwrap(); + assert_eq!(r2, Step::Done); + assert_eq!(*opened.lock().unwrap(), vec![None, Some(1)]); + } + + // ----------------------------------------------------------------------- + // Engine-arm error paths: Split rejection, corrupt cursor, append failure, panic. + // ----------------------------------------------------------------------- + + struct SplitOnClose; + + #[task::stream] + impl StreamTask for SplitOnClose { + type Item = u64; + type Output = u64; + type Cursor = u64; + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(vec![1u64])) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u64) -> Result<(u64, u64), CanoError> { + Ok((item, item)) + } + + async fn flush_window( + &self, + _res: &Resources, + _outputs: Vec, + ) -> Result, CanoError> { + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Split(vec![Step::Done, Step::Done])) + } + } + + #[tokio::test] + async fn stream_split_result_is_rejected() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + let workflow = Workflow::bare() + .register_stream(Step::Consume, SplitOnClose) + .add_exit_state(Step::Done); + let err = workflow + .orchestrate(Step::Consume, CancellationToken::disabled()) + .await + .unwrap_err(); + assert_eq!(err.category(), "workflow"); + assert!(err.to_string().contains("split"), "got {err}"); + } + + /// Count(1) over [1,2]; the [2] window flush fails so run 1 crashes after committing + /// cursor 1 — giving a StepCursor row to corrupt before resume. + struct CrashAfterFirstWindow; + + #[task::stream] + impl StreamTask for CrashAfterFirstWindow { + type Item = u64; + type Output = u64; + type Cursor = u64; + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(vec![1u64, 2])) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u64) -> Result<(u64, u64), CanoError> { + Ok((item, item)) + } + + async fn flush_window( + &self, + _res: &Resources, + outputs: Vec, + ) -> Result, CanoError> { + if outputs == vec![2u64] { + return Err(CanoError::task_execution("crash")); + } + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(Step::Done)) + } + } + + #[tokio::test] + async fn corrupt_cursor_fails_to_deserialize_on_resume() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + let store = Arc::new(InMemoryStore::default()); + let workflow = Workflow::bare() + .register_stream(Step::Consume, CrashAfterFirstWindow) + .add_exit_state(Step::Done) + .with_checkpoint_store(store.clone()) + .with_workflow_id("corrupt"); + + // Run 1 commits cursor 1 then crashes flushing [2]; the log is kept for resume. + let r1 = workflow + .orchestrate(Step::Consume, CancellationToken::disabled()) + .await; + assert!(r1.is_err(), "run 1 should crash: {r1:?}"); + + // Corrupt the committed cursor blob to invalid JSON. + { + let mut g = store.rows.lock().unwrap(); + for row in g.get_mut("corrupt").unwrap().iter_mut() { + if row.kind == crate::recovery::RowKind::StepCursor { + row.output_blob = Some(b"not-json".to_vec()); + } + } + } + + let err = workflow + .resume_from("corrupt", CancellationToken::disabled()) + .await + .unwrap_err(); + assert_eq!(err.category(), "task_execution"); + assert!( + err.to_string().contains("deserialize stream cursor"), + "got {err}" + ); + } + + /// A store that accepts `StateEntry` rows but rejects every `StepCursor` append. + #[derive(Default)] + struct CursorAppendFails; + + #[crate::checkpoint_store] + impl crate::recovery::CheckpointStore for CursorAppendFails { + async fn append( + &self, + _workflow_id: &str, + row: crate::recovery::CheckpointRow, + ) -> Result<(), CanoError> { + if row.kind == crate::recovery::RowKind::StepCursor { + Err(CanoError::checkpoint_store("disk full")) + } else { + Ok(()) + } + } + async fn load_run( + &self, + _workflow_id: &str, + ) -> Result, CanoError> { + Ok(Vec::new()) + } + async fn clear(&self, _workflow_id: &str) -> Result<(), CanoError> { + Ok(()) + } + } + + #[tokio::test] + async fn checkpoint_append_failure_surfaces() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + let task = CountingFlush { + len: 3, + window: StreamWindow::Count(1), + flushes: Arc::new(Mutex::new(Vec::new())), + }; + let workflow = Workflow::bare() + .register_stream(Step::Consume, task) + .add_exit_state(Step::Done) + .with_checkpoint_store(Arc::new(CursorAppendFails)) + .with_workflow_id("append-fail"); + let err = workflow + .orchestrate(Step::Consume, CancellationToken::disabled()) + .await + .unwrap_err(); + assert_eq!(err.category(), "checkpoint_store"); + assert!( + err.to_string().contains("append stream cursor checkpoint"), + "got {err}" + ); + } + + struct PanicInFlush; + + #[task::stream] + impl StreamTask for PanicInFlush { + type Item = u64; + type Output = u64; + type Cursor = u64; + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(vec![1u64])) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u64) -> Result<(u64, u64), CanoError> { + Ok((item, item)) + } + + async fn flush_window( + &self, + _res: &Resources, + _outputs: Vec, + ) -> Result, CanoError> { + panic!("boom in flush_window"); + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(Step::Done)) + } + } + + #[tokio::test] + async fn panic_in_callback_becomes_error() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + // The engine wraps the session in catch_panic_to_error: a panic becomes a CanoError + // (so resource teardown runs) instead of unwinding past the FSM. + let workflow = Workflow::bare() + .register_stream(Step::Consume, PanicInFlush) + .add_exit_state(Step::Done); + let err = workflow + .orchestrate(Step::Consume, CancellationToken::disabled()) + .await + .unwrap_err(); + assert_eq!(err.category(), "task_execution"); + assert!(err.to_string().contains("panic"), "got {err}"); + } + + // ----------------------------------------------------------------------- + // Config / observer surface. + // ----------------------------------------------------------------------- + + /// Fails every item under FailFast, counting how many times `open` is invoked. + struct AlwaysFails { + opened: Arc, + } + + #[task::stream] + impl StreamTask for AlwaysFails { + type Item = u64; + type Output = u64; + type Cursor = u64; + + fn config(&self) -> TaskConfig { + // max_attempts = 3 — must NOT be applied to a stream (no re-open / re-consume). + TaskConfig::minimal().with_fixed_retry(2, std::time::Duration::from_millis(1)) + } + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + self.opened.fetch_add(1, Ordering::SeqCst); + Ok(Box::pin(stream::iter(vec![1u64])) as Pin + Send>>) + } + + async fn process_item( + &self, + _res: &Resources, + _item: u64, + ) -> Result<(u64, u64), CanoError> { + Err(CanoError::task_execution("always fails")) + } + + async fn flush_window( + &self, + _res: &Resources, + _outputs: Vec, + ) -> Result, CanoError> { + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(Step::Done)) + } + } + + #[tokio::test] + async fn outer_retry_not_applied_open_called_once() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + let opened = Arc::new(AtomicU32::new(0)); + let task = AlwaysFails { + opened: Arc::clone(&opened), + }; + let workflow = Workflow::bare() + .register_stream(Step::Consume, task) + .add_exit_state(Step::Done); + let result = workflow + .orchestrate(Step::Consume, CancellationToken::disabled()) + .await; + assert!(result.is_err(), "FailFast item error fails the run"); + assert_eq!( + opened.load(Ordering::SeqCst), + 1, + "config().max_attempts must not re-open/re-consume the stream" + ); + } + + /// Records the task id passed to each observer hook, in order. + #[derive(Default)] + struct EventLog { + events: Mutex>, + } + + impl crate::observer::WorkflowObserver for EventLog { + fn on_task_start(&self, task_id: &str) { + self.events.lock().unwrap().push(format!("start:{task_id}")); + } + fn on_task_success(&self, task_id: &str) { + self.events + .lock() + .unwrap() + .push(format!("success:{task_id}")); + } + fn on_task_failure(&self, _task_id: &str, _err: &CanoError) { + self.events.lock().unwrap().push("failure".to_string()); + } + fn on_cancelled(&self, _state: &str) { + self.events.lock().unwrap().push("cancelled".to_string()); + } + } + + struct NamedExhaust; + + #[task::stream] + impl StreamTask for NamedExhaust { + type Item = u64; + type Output = u64; + type Cursor = u64; + + fn name(&self) -> Cow<'static, str> { + Cow::Borrowed("my-custom-stream") + } + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(vec![1u64])) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u64) -> Result<(u64, u64), CanoError> { + Ok((item, item)) + } + + async fn flush_window( + &self, + _res: &Resources, + _outputs: Vec, + ) -> Result, CanoError> { + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(Step::Done)) + } + } + + #[tokio::test] + async fn name_override_forwarded_to_observer() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + let log = Arc::new(EventLog::default()); + let workflow = Workflow::bare() + .register_stream(Step::Consume, NamedExhaust) + .add_exit_state(Step::Done) + .with_observer(log.clone()); + workflow + .orchestrate(Step::Consume, CancellationToken::disabled()) + .await + .unwrap(); + let events = log.events.lock().unwrap(); + assert_eq!( + *events, + vec![ + "start:my-custom-stream".to_string(), + "success:my-custom-stream".to_string() + ], + "the StreamTask name() override reaches observer hooks" + ); + } + + #[tokio::test] + async fn cancel_fires_full_observer_sequence() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + // A cancelled stream fires on_task_start, then (since cancel surfaces as Err) + // on_task_failure, then on_cancelled — in that order, exactly once each. + let (handle, token) = CancellationToken::new(); + let task = CancelAfterWindow { + handle, + flushes: AtomicU32::new(0), + close_errors: false, + closed_cancelled: Arc::new(AtomicBool::new(false)), + }; + let log = Arc::new(EventLog::default()); + let workflow = Workflow::bare() + .register_stream(Step::Consume, task) + .add_exit_state(Step::Done) + .with_observer(log.clone()); + let result = workflow.orchestrate(Step::Consume, token).await; + assert!(matches!(&result, Err(e) if e.category() == "cancelled")); + let events = log.events.lock().unwrap(); + assert_eq!(events.len(), 3, "exactly three hooks fire, got {events:?}"); + assert!( + events[0].starts_with("start:") && events[0].contains("CancelAfterWindow"), + "first hook is on_task_start, got {events:?}" + ); + assert_eq!( + &events[1..], + &["failure".to_string(), "cancelled".to_string()], + "cancel fires start → failure → cancelled, got {events:?}" + ); + } + + #[tokio::test] + async fn register_stream_without_store_completes() { + use crate::cancel::CancellationToken; + use crate::workflow::Workflow; + + // register_stream with neither a checkpoint store nor a workflow id: cursor + // persistence is simply skipped and the run completes normally. + let task = CountingFlush { + len: 3, + window: StreamWindow::Count(2), + flushes: Arc::new(Mutex::new(Vec::new())), + }; + let workflow = Workflow::bare() + .register_stream(Step::Consume, task) + .add_exit_state(Step::Done); + let result = workflow + .orchestrate(Step::Consume, CancellationToken::disabled()) + .await + .unwrap(); + assert_eq!(result, Step::Done); + } +} + +#[cfg(all(test, feature = "metrics"))] +mod metrics_tests { + use super::*; + use crate::cancel::CancellationToken; + use crate::metrics::test_support::*; + use crate::task; + use crate::task::Task; + use crate::workflow::Workflow; + use futures_util::stream; + + #[derive(Debug, Clone, PartialEq, Eq, Hash)] + enum St { + Consume, + Done, + } + + struct FiveItems; + + #[task::stream] + impl StreamTask for FiveItems { + type Item = u32; + type Output = u32; + type Cursor = u64; + + fn window(&self) -> StreamWindow { + StreamWindow::Count(2) + } + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(vec![1u32, 2, 3, 4, 5])) + as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u32) -> Result<(u32, u64), CanoError> { + Ok((item, item as u64)) + } + + async fn flush_window( + &self, + _res: &Resources, + _outputs: Vec, + ) -> Result, CanoError> { + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(St::Done)) + } + } + + #[test] + fn stream_metrics_counted_correctly() { + let (result, rows) = run_with_recorder(|| async { + let workflow = Workflow::bare() + .register_stream(St::Consume, FiveItems) + .add_exit_state(St::Done); + workflow + .orchestrate(St::Consume, CancellationToken::disabled()) + .await + }); + assert!(result.is_ok(), "workflow should succeed: {result:?}"); + assert_eq!( + counter(&rows, "cano_stream_runs_total", &[("outcome", "completed")]), + 1, + "one completed stream run" + ); + // Count(2) over 5 items → windows [1,2], [3,4], then partial [5] on close. + assert_eq!( + counter(&rows, "cano_stream_windows_total", &[]), + 3, + "three windows flushed" + ); + assert_eq!( + counter(&rows, "cano_stream_items_total", &[("result", "ok")]), + 5, + "five ok items" + ); + } + + /// Cancels itself after the first window (deterministic — no spawn/sleep). + struct SelfCancel { + handle: crate::cancel::CancellationHandle, + } + + #[task::stream] + impl StreamTask for SelfCancel { + type Item = u32; + type Output = u32; + type Cursor = u64; + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(0u32..)) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u32) -> Result<(u32, u64), CanoError> { + Ok((item, item as u64)) + } + + async fn flush_window( + &self, + _res: &Resources, + _outputs: Vec, + ) -> Result, CanoError> { + // Default window is Count(1); fire cancel after the first window — the next + // loop iteration observes it and drains cooperatively. + self.handle.cancel(); + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(St::Done)) + } + } + + #[test] + fn cancelled_stream_records_cancelled_outcome() { + let (handle, token) = CancellationToken::new(); + let (result, rows) = run_with_recorder(|| async move { + let workflow = Workflow::bare() + .register_stream(St::Consume, SelfCancel { handle }) + .add_exit_state(St::Done); + workflow.orchestrate(St::Consume, token).await + }); + assert!(result.is_err(), "a cancelled run is Err: {result:?}"); + assert_eq!( + counter(&rows, "cano_stream_runs_total", &[("outcome", "cancelled")]), + 1, + "a cooperative cancel is recorded as cancelled, not failed" + ); + } + + /// FailFast over [1,2] with item 2 failing — one ok item, one err item, then a failed run. + struct FailSecond; + + #[task::stream] + impl StreamTask for FailSecond { + type Item = u32; + type Output = u32; + type Cursor = u64; + + async fn open( + &self, + _res: &Resources, + _cursor: Option, + ) -> Result + Send>>, CanoError> { + Ok(Box::pin(stream::iter(vec![1u32, 2])) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u32) -> Result<(u32, u64), CanoError> { + if item == 2 { + Err(CanoError::task_execution("boom")) + } else { + Ok((item, item as u64)) + } + } + + async fn flush_window( + &self, + _res: &Resources, + _outputs: Vec, + ) -> Result, CanoError> { + Ok(WindowSignal::Continue) + } + + async fn on_close( + &self, + _res: &Resources, + _reason: CloseReason, + ) -> Result, CanoError> { + Ok(TaskResult::Single(St::Done)) + } + } + + #[test] + fn failed_stream_records_failed_outcome_and_err_item() { + let (result, rows) = run_with_recorder(|| async { + let workflow = Workflow::bare() + .register_stream(St::Consume, FailSecond) + .add_exit_state(St::Done); + workflow + .orchestrate(St::Consume, CancellationToken::disabled()) + .await + }); + assert!( + result.is_err(), + "FailFast item error fails the run: {result:?}" + ); + assert_eq!( + counter(&rows, "cano_stream_runs_total", &[("outcome", "failed")]), + 1, + "a genuine error is recorded as failed" + ); + assert_eq!( + counter(&rows, "cano_stream_items_total", &[("result", "ok")]), + 1, + "item 1 processed ok" + ); + assert_eq!( + counter(&rows, "cano_stream_items_total", &[("result", "err")]), + 1, + "item 2 recorded as an err item" + ); + } + + #[test] + fn inmemory_completed_records_completed_outcome() { + // The companion Task::run path (Workflow::register) also emits the run outcome. + let (result, rows) = run_with_recorder(|| async { + let res = Resources::new(); + Task::run(&FiveItems, &res).await + }); + assert!(result.is_ok(), "{result:?}"); + assert_eq!( + counter(&rows, "cano_stream_runs_total", &[("outcome", "completed")]), + 1, + "the in-memory companion records a completed run" + ); + } + + #[test] + fn inmemory_failed_records_failed_outcome() { + let (result, rows) = run_with_recorder(|| async { + let res = Resources::new(); + Task::run(&FailSecond, &res).await + }); + assert!(result.is_err(), "{result:?}"); + assert_eq!( + counter(&rows, "cano_stream_runs_total", &[("outcome", "failed")]), + 1, + "the in-memory companion records a failed run (never cancelled)" + ); + } +} diff --git a/cano/src/workflow.rs b/cano/src/workflow.rs index 277f71f..240b07b 100644 --- a/cano/src/workflow.rs +++ b/cano/src/workflow.rs @@ -84,6 +84,7 @@ use crate::recovery::CheckpointStore; use crate::resource::Resources; use crate::saga::{CompensatableTask, ErasedCompensatable}; use crate::task::stepped::{ErasedSteppedTask, SteppedAdapter, SteppedTask}; +use crate::task::stream::{ErasedStreamTask, StreamAdapter, StreamTask}; use crate::task::{RouterTask, Task}; #[cfg(feature = "tracing")] @@ -542,6 +543,36 @@ where self } + /// Register a [`StreamTask`](crate::task::stream::StreamTask) for `state`, with + /// engine-driven windowing, cooperative cancellation, and cursor persistence. + /// + /// Unlike `register` (which runs a `StreamTask`'s in-memory companion loop with no + /// persistence and no cancellation), this drives the stream through the FSM engine: it + /// observes the run's [`CancellationToken`](crate::cancel::CancellationToken), and — + /// when a [`checkpoint store`](Self::with_checkpoint_store) + a + /// [`workflow id`](Self::with_workflow_id) are attached — persists the cursor of each + /// flushed window so a crashed or cancelled run resumes from the last committed + /// position via [`resume_from`](Self::resume_from). + /// + /// Replaces any handler previously registered for `state`. Infallible. + pub fn register_stream(mut self, state: TState, task: T) -> Self + where + T: StreamTask + 'static, + { + self.forget_compensator_for(&state); + let config = Arc::new(StreamTask::config(&task)); + let erased: Arc> = + Arc::new(StreamAdapter(Arc::new(task))); + self.states.insert( + state, + Arc::new(StateEntry::Stream { + task: erased, + config, + }), + ); + self + } + /// Add exit state pub fn add_exit_state(mut self, state: TState) -> Self { if !self.exit_states.contains(&state) { diff --git a/cano/src/workflow/execution.rs b/cano/src/workflow/execution.rs index 366ca3d..c5ef27e 100644 --- a/cano/src/workflow/execution.rs +++ b/cano/src/workflow/execution.rs @@ -19,6 +19,7 @@ use crate::error::CanoError; use crate::recovery::CheckpointRow; use crate::saga::{CompensationEntry, ErasedCompensatable}; use crate::task::stepped::{ErasedStep, ErasedSteppedTask}; +use crate::task::stream::{ErasedStreamTask, ErasedWindowStep}; use crate::task::{Task, TaskResult, run_with_retries}; use super::compensation::resolve_compensation_deadline; @@ -91,6 +92,21 @@ where /// Task config captured at registration time (same rationale as `Single::config`). config: Arc, }, + /// A [`StreamTask`](crate::task::stream::StreamTask) registered via + /// [`Workflow::register_stream`](crate::workflow::Workflow::register_stream). + /// + /// The engine drives the windowed consume loop, persisting the cursor of each flushed + /// window's last item as a [`RowKind::StepCursor`](crate::recovery::RowKind::StepCursor) + /// row. Unlike every other variant the driver observes the `CancellationToken` itself + /// (a cooperative drain: flush the in-flight window, commit its cursor, run `on_close` + /// for cleanup, then end as `Cancelled` so `resume_from` continues) rather than being + /// dropped mid-window by `dispatch_with_budget`. + Stream { + /// Type-erased stream task — exposes `name`, `config`, and `open_session`. + task: Arc>, + /// Task config captured at registration time (same rationale as `Single::config`). + config: Arc, + }, } impl Clone for StateEntry @@ -125,6 +141,10 @@ where task: Arc::clone(task), config: Arc::clone(config), }, + StateEntry::Stream { task, config } => StateEntry::Stream { + task: Arc::clone(task), + config: Arc::clone(config), + }, } } } @@ -354,6 +374,7 @@ where Some(StateEntry::Single { task, .. }) => task.name().into_owned(), Some(StateEntry::CompensatableSingle { task, .. }) => task.name().into_owned(), Some(StateEntry::Stepped { task, .. }) => task.name().into_owned(), + Some(StateEntry::Stream { task, .. }) => task.name().into_owned(), // Router is unreachable here (is_router guard above), Split has no single task_id. _ => String::new(), }; @@ -599,6 +620,27 @@ where ) .await } + StateEntry::Stream { task, config } => { + // A stream runs until cancelled/exhausted and must FLUSH the in-flight + // window + transition cleanly on cancel. So — like CompensatableSingle — + // it is NOT wrapped in `dispatch_with_budget`'s drop-on-cancel `select!` + // (which would orphan the partial window). Instead the token is threaded + // INTO `execute_stream_task`, which observes it cooperatively at the next + // item boundary, flushes, and returns a clean next state. + let resume_cursor = state_label + .as_deref() + .and_then(|label| resume_cursors.remove(label)); + self.execute_stream_task( + task.clone(), + Arc::clone(config), + &workflow_id, + state_label.as_deref().unwrap_or_default(), + &mut sequence, + resume_cursor, + &token, + ) + .await + } }; // CompensatableSingle records its own duration earlier (before the @@ -610,6 +652,7 @@ where StateEntry::Split { .. } => Some("split"), StateEntry::CompensatableSingle { .. } => None, StateEntry::Stepped { .. } => Some("stepped"), + StateEntry::Stream { .. } => Some("stream"), } { crate::metrics::task_dispatch_duration( _state_label, @@ -988,6 +1031,129 @@ where outcome } + /// Drive the windowed consume loop for a `Stream` state. + /// + /// Opens (or resumes from `resume_cursor`) the source, then advances the session one + /// window at a time, persisting each flushed window's cursor as a + /// [`RowKind::StepCursor`] row before continuing (crash-safe ordering, identical to + /// `execute_stepped_task`). The `token` is observed cooperatively inside the session so + /// a cancel flushes the in-flight window, commits its cursor, runs `on_close` for + /// cleanup, and returns `Err(Cancelled)` (resumable) rather than dropping the future. + /// Returns the next `TState` on a terminal window; rejects a `Split` result + /// (single-next-state only, like `Stepped`). + #[allow(clippy::too_many_arguments)] + async fn execute_stream_task( + &self, + task: Arc>, + config: Arc, + workflow_id: &Option>, + state_label: &str, + sequence: &mut u64, + resume_cursor: Option>, + token: &CancellationToken, + ) -> Result { + let observers = self.observer_slice(); + let task_name = task.name(); + if let Some(ref slice) = observers { + notify_observers(slice, |o| o.on_task_start(task_name.as_ref())); + } + + // Set in the `ErasedWindowStep::Cancelled` arm so the post-loop observer + metrics + // can distinguish a cooperative cancel from a genuine error. `AtomicBool` (not + // `Cell`) so the spawned workflow future stays `Send`. + let was_cancelled = std::sync::atomic::AtomicBool::new(false); + let outcome: Result = async { + let mut session = task + .open_session(&self.resources, resume_cursor, config.attempt_timeout) + .await?; + loop { + // Catch a panic inside `process_item`/`flush_window`/`on_close` so it + // becomes a `CanoError` instead of unwinding past the FSM and skipping + // resource teardown — same discipline as the other drivers. + let step = + super::catch_panic_to_error(session.next_window(&self.resources, token), "Stream task") + .await?; + + // `terminal == Some(_)` ends the loop with that result after the cursor is + // persisted. A cancelled stream commits its final cursor, then surfaces + // `Cancelled` (so the log is NOT cleared and `resume_from` continues). + let (cursor_to_commit, terminal): (Option>, Option>) = + match step { + ErasedWindowStep::Window { cursor } => (Some(cursor), None), + ErasedWindowStep::Done { + final_cursor, + result, + } => { + let next = match result { + TaskResult::Single(next_state) => Ok(next_state), + TaskResult::Split(_) => Err(CanoError::workflow( + "Stream task returned split result — split is not supported for Stream states", + )), + }; + (final_cursor, Some(next)) + } + ErasedWindowStep::Cancelled { final_cursor } => { + was_cancelled.store(true, std::sync::atomic::Ordering::Relaxed); + (final_cursor, Some(Err(CanoError::cancelled()))) + } + }; + + // Persist the window cursor before advancing so a crash/cancel resumes + // from this exact position. Gated on a store + workflow id both present. + if let Some(bytes) = cursor_to_commit + && let (Some(store), Some(wf_id)) = + (&self.checkpoint_store, workflow_id.as_deref()) + { + let row = CheckpointRow::new(*sequence, state_label, task_name.as_ref()) + .with_cursor(bytes) + .with_workflow_version(self.workflow_version); + let append_result = store.append(wf_id, row).await; + #[cfg(feature = "metrics")] + crate::metrics::checkpoint_append(append_result.is_ok()); + if let Err(e) = append_result { + return Err(CanoError::checkpoint_store(format!( + "append stream cursor checkpoint: {e}" + ))); + } + notify_observers(&self.observers, |o| o.on_checkpoint(wf_id, *sequence)); + *sequence += 1; + } + + if let Some(result) = terminal { + return result; + } + } + } + .await; + + let cancelled = was_cancelled.load(std::sync::atomic::Ordering::Relaxed); + + #[cfg(feature = "metrics")] + crate::metrics::stream_run(if outcome.is_ok() { + "completed" + } else if cancelled { + "cancelled" + } else { + "failed" + }); + + if let Some(ref slice) = observers { + match &outcome { + Ok(_) => notify_observers(slice, |o| o.on_task_success(task_name.as_ref())), + Err(e) => { + notify_observers(slice, |o| o.on_task_failure(task_name.as_ref(), e)); + // The stream arm bypasses `dispatch_with_budget`, so it fires + // `on_cancelled` itself — exactly once per cancelled run, mirroring the + // fan-out + `on_cancelled` pairing dispatch does for other task types. + if cancelled { + notify_observers(slice, |o| o.on_cancelled(state_label)); + } + } + } + } + outcome + } + async fn execute_split_join( &self, tasks: Vec + Send + Sync>>, diff --git a/docs/content/_index.md b/docs/content/_index.md index 41fbfe7..90554af 100644 --- a/docs/content/_index.md +++ b/docs/content/_index.md @@ -45,7 +45,7 @@ It excels at managing complex lifecycles where state transitions matter:

Processing Models

-

A whole Task family: plain Task, side-effect-free RouterTask, wait-until PollTask, wait-then-go TimerTask, fan-out BatchTask, resumable SteppedTask — mixed freely in one workflow.

+

A whole Task family: plain Task, side-effect-free RouterTask, wait-until PollTask, wait-then-go TimerTask, fan-out BatchTask, resumable SteppedTask, continuous StreamTask — mixed freely in one workflow.

@@ -218,7 +218,7 @@ async fn main() -> Result<(), CanoError> {
  1. Workflows — defining states, the builder, validation, and how a run executes.
  2. Resources — typed, lifecycle-managed dependency injection (every task receives a &Resources).
  3. -
  4. Task — the default processing unit, then the rest of the Task family (RouterTask, PollTask, TimerTask, BatchTask, SteppedTask) as you hit a shape that fits.
  5. +
  6. Task — the default processing unit, then the rest of the Task family (RouterTask, PollTask, TimerTask, BatchTask, SteppedTask, StreamTask) as you hit a shape that fits.
  7. Split & Join and Scheduler — parallelism within a workflow, and time-driven execution of workflows.
  8. Resilience & recovery: Resilience, Recovery, Saga.
  9. Observability: Tracing, Metrics, Observers.
  10. diff --git a/docs/content/batch-task/_index.md b/docs/content/batch-task/_index.md index d909b82..c7f24a7 100644 --- a/docs/content/batch-task/_index.md +++ b/docs/content/batch-task/_index.md @@ -14,8 +14,8 @@ concurrency (each item independently retryable), collects the per-item results < order, and decides the next state from the aggregate — all within one workflow state. It is one of the Task family of processing models, alongside RouterTask, -PollTask, TimerTask, and -SteppedTask, and it reads +PollTask, TimerTask, +SteppedTask, and StreamTask, and it reads typed dependencies from Resources like the rest. New to Cano? Read Workflows and Resources first.

    diff --git a/docs/content/poll-task/_index.md b/docs/content/poll-task/_index.md index eabd453..e18d5c8 100644 --- a/docs/content/poll-task/_index.md +++ b/docs/content/poll-task/_index.md @@ -14,7 +14,8 @@ is ready. Each call returns either "ready, here's the next state" or "not yet, t n milliseconds" — an async sleep, not a blocked thread. It is one of the Task family of processing models, alongside RouterTask, TimerTask, -BatchTask, and SteppedTask. A +BatchTask, SteppedTask, and +StreamTask. A PollTask reads typed dependencies from Resources the same way every other model does. New to Cano? Read Workflows and Resources first. diff --git a/docs/content/router-task/_index.md b/docs/content/router-task/_index.md index fa17e69..099dfca 100644 --- a/docs/content/router-task/_index.md +++ b/docs/content/router-task/_index.md @@ -16,7 +16,8 @@ is free, the workflow engine records no checkpoint row for it Crash Recovery for what that means. It is one of the Task family of processing models, alongside PollTask, TimerTask, -BatchTask, and SteppedTask. +BatchTask, SteppedTask, and +StreamTask.

    diff --git a/docs/content/stepped-task/_index.md b/docs/content/stepped-task/_index.md index 74a5b32..5576f42 100644 --- a/docs/content/stepped-task/_index.md +++ b/docs/content/stepped-task/_index.md @@ -15,7 +15,8 @@ returns either "more work, here's the new cursor" or "done, here's the next stat step — so a crash mid-loop resumes from where it left off, not from step zero. It is one of the Task family of processing models, alongside RouterTask, PollTask, -TimerTask, and BatchTask, and it reads +TimerTask, BatchTask, and +StreamTask, and it reads typed dependencies from Resources like the rest. New to Cano? Read Workflows and Resources first; for the diff --git a/docs/content/stream-task/_index.md b/docs/content/stream-task/_index.md new file mode 100644 index 0000000..15b3b5f --- /dev/null +++ b/docs/content/stream-task/_index.md @@ -0,0 +1,567 @@ ++++ +title = "StreamTask" +description = "StreamTask in Cano - consume an unbounded stream continuously, flush per tumbling window, run until cancelled or exhausted, and resume from a persisted cursor." +template = "section.html" ++++ + +
    +

    StreamTask

    +

    Consume an unbounded stream — flush per window, run until cancelled or exhausted, resume from a cursor.

    + +

    +A StreamTask consumes an impl Stream continuously: it pulls +one item at a time, processes each into an output, and flushes per tumbling +window — so memory stays bounded and downstream sees progress before the +source ends. It runs until the stream is exhausted, until a window asks to stop, or until the run is +cancelled; each flushed window commits a cursor so a +crashed or cancelled run resumes where it left off. It is one of the +Task family of processing models, alongside +RouterTask, PollTask, +TimerTask, BatchTask, and +SteppedTask, and it reads typed dependencies from +Resources like the rest. New to Cano? Read +Workflows and Resources first; for the +cursor-persistence half, Recovery. +

    + +
    +At a glance — open a stream, process_item per item, flush_window per window + +```rust +use cano::prelude::*; +use futures_util::{Stream, stream}; +use std::pin::Pin; + +#[derive(Debug, Clone, PartialEq, Eq, Hash)] +enum Step { Consume, Done } + +struct ConsumeEvents; + +#[task::stream(state = Step)] +impl ConsumeEvents { + fn window(&self) -> StreamWindow { + StreamWindow::Count(3) // flush every 3 processed items + } + + async fn open(&self, _res: &Resources, cursor: Option) + -> Result + Send>>, CanoError> + { + // On a fresh run `cursor` is `None`; on resume it's the last committed offset. + let start = cursor.map(|c| c + 1).unwrap_or(0); + Ok(Box::pin(stream::iter(start..10)) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u64) + -> Result<(u64, u64), CanoError> + { + Ok((item * 2, item)) // (output, cursor reached by consuming this item) + } + + async fn flush_window(&self, _res: &Resources, outputs: Vec) + -> Result, CanoError> + { + // Commit side effects / offsets for this window here. + println!("flushed window of {} outputs", outputs.len()); + Ok(WindowSignal::Continue) + } + + async fn on_close(&self, _res: &Resources, reason: CloseReason) + -> Result, CanoError> + { + println!("stream ended ({reason:?})"); + Ok(TaskResult::Single(Step::Done)) + } +} + +let workflow = Workflow::bare() + .register_stream(Step::Consume, ConsumeEvents) + .add_exit_state(Step::Done); +``` +
    + +
    +
    Key concept
    +

    +A StreamTask is for sources that don't end on their own — Kafka, SSE, a +file-tail, a WebSocket. Instead of buffering everything and aggregating once, it emits +per window and keeps memory bounded, runs until you stop it, and resumes from a +persisted cursor after a crash. If your data is a bounded Vec you want to map +over and aggregate once, reach for a BatchTask instead. +

    +
    + + + + + +
    +

    BatchTask vs StreamTask

    +

    +The two look similar — both fan a sub-operation over many items — but they solve opposite problems. A +BatchTask loads a bounded Vec, processes +all of it, and aggregates once at the end: O(N) memory, one emission, and it +requires the data to end. A StreamTask is for unbounded / +continuous sources: it emits incrementally per window, keeps memory bounded, runs until +cancelled/exhausted, and resumes from a cursor. +

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    BatchTaskStreamTask
    Databounded — a Vec loaded up frontunbounded — a continuous impl Stream
    Terminationwhen the loaded items are exhaustedexhausted, a window Stop, or cancelled
    Emissionone aggregate, in finishper-window, in flush_window
    MemoryO(N) — the whole batch is heldbounded — one window at a time
    Recoveryre-run the whole state on resumeresume from the last committed cursor
    + + +
    +

    How the Stream Loop Works

    + +
    +

    open → process_item (×window) → flush_window → commit cursor → … → on_close

    +
    +graph LR +A["open(cursor)"] --> B[process_item] +B -->|"buffer until window full"| B +B -->|"window full"| F[flush_window] +F -->|"Continue"| P[commit cursor] +P --> B +F -->|"Stop(result)"| D[Next State] +B -->|"source exhausted"| C[on_close] +C --> D +
    +
    + +

    +A StreamTask has three associated types — type Item (one element pulled from +the source), type Output (the per-item result accumulated into a window), and +type Cursor (the resumable position; Serialize + DeserializeOwned + Send + Sync + +'static) — and four required methods: +

    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    MethodRole
    open(res, cursor)Open (or resume) the source stream. cursor is None on a fresh run, or the last committed position on resume.
    process_item(res, item)Process one item; return (Output, Cursor) — the cursor is the position reached by consuming this item.
    flush_window(res, outputs)Flush one full window: commit side effects, then return WindowSignal::Continue or WindowSignal::Stop(result). The window's cursor is committed after this returns.
    on_close(res, reason)Terminal transition when the stream is Exhausted or the run is Cancelled. The in-flight partial window has already been flushed.
    + +

    +Optional methods carry defaults: window() (defaults to +StreamWindow::Count(1) — flush per item), on_item_error() (defaults to +StreamErrorPolicy::FailFast), config() (defaults to +TaskConfig::minimal()no outer retry, because an outer retry would +re-invoke open() and re-consume the stream), and name() (defaults to the +type name). +

    + + +
    +

    Quick Start with #[task::stream]

    +

    +Attach #[task::stream(state = MyState)] to an inherent impl block. The macro +infers Item from process_item's owned item parameter and +Output / Cursor from the Ok tuple of its return type, injects +default window / on_item_error / config / name if +absent, synthesises the impl StreamTask<MyState> for ConsumeEvents header, and emits +a companion impl Task<MyState> for ConsumeEvents whose run drives the +in-memory loop — useful if you register it with plain register and don't want +persistence. +

    + +
    + Inference form — #[task::stream(state = ...)] on an inherent impl + +```rust +use cano::prelude::*; +use futures_util::{Stream, stream}; +use std::pin::Pin; + +#[derive(Debug, Clone, PartialEq, Eq, Hash)] +enum Step { Consume, Done } + +#[derive(Debug, Clone)] +struct Event { offset: u64, payload: String } + +#[derive(Debug)] +struct Processed { offset: u64, bytes: usize } + +struct ConsumeEvents; + +#[task::stream(state = Step)] +impl ConsumeEvents { + fn window(&self) -> StreamWindow { + StreamWindow::Count(3) + } + + async fn open(&self, _res: &Resources, cursor: Option) + -> Result + Send>>, CanoError> + { + let start = cursor.map(|c| c + 1).unwrap_or(0); + let events: Vec = (start..10) + .map(|offset| Event { offset, payload: format!("payload-{offset}") }) + .collect(); + Ok(Box::pin(stream::iter(events)) as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: Event) + -> Result<(Processed, u64), CanoError> + { + Ok(( + Processed { offset: item.offset, bytes: item.payload.len() }, + item.offset, // the cursor reached by consuming this item + )) + } + + async fn flush_window(&self, _res: &Resources, outputs: Vec) + -> Result, CanoError> + { + let bytes: usize = outputs.iter().map(|p| p.bytes).sum(); + println!("flushed {} events ({bytes} bytes)", outputs.len()); + Ok(WindowSignal::Continue) + } + + async fn on_close(&self, _res: &Resources, reason: CloseReason) + -> Result, CanoError> + { + println!("stream ended ({reason:?})"); + Ok(TaskResult::Single(Step::Done)) + } +} +``` +
    + + +
    +

    Windowing: StreamWindow

    +

    +The window() method returns a tumbling-window trigger that controls how +often flush_window fires and how much the driver buffers. Larger windows amortise the +flush + checkpoint cost. +

    + +
    +
    +

    StreamWindow::Count(n)

    +

    Flush after every n successfully processed items (clamped to a minimum of 1). The +default is Count(1) — flush per item.

    +
    +
    +

    StreamWindow::Duration(d)

    +

    Flush every d of wall-clock time, tumbling. Empty windows are skipped +— an idle source emits no spurious empty flushes; the deadline simply advances.

    +
    +
    + + +
    +

    Per-Item Error Policy

    +

    +on_item_error() returns a StreamErrorPolicy deciding what the windowed loop +does when process_item returns an Err: +

    + + + + + + + + + + + + + + + + + + + + + + +
    StreamErrorPolicy variantEffect
    FailFast (default)Propagate the first item error — the loop stops and the run fails.
    SkipAndContinueDrop the bad item and keep consuming (poison-message handling). The skipped item's cursor is not committed — the next good item advances it.
    RetryOnError { max_errors }Tolerate up to max_errors consecutive item errors before failing. The counter resets on every successfully processed item.
    + + +
    +

    Cursor Persistence & Resume

    +

    +Register a stream with Workflow::register_stream(state, task) — the durable, cancellable +engine path. When a CheckpointStore plus a workflow id are +attached (with_checkpoint_store + with_workflow_id), the engine persists the +cursor returned by the last item of each flushed window as a +CheckpointRow whose kind is RowKind::StepCursor (the cursor is +serde_json-encoded). On +Workflow::resume_from(workflow_id, token), the latest persisted cursor is rehydrated and +passed to open(Some(cursor)) — so a crashed or cancelled run re-opens the source from the +last committed position instead of starting over. Only fully flushed windows commit +a cursor; a window that errors mid-flush commits nothing. +

    + +
    + Wiring a durable, resumable stream into a checkpointed workflow + +```rust +use cano::prelude::*; +use cano::RedbCheckpointStore; // requires the `recovery` feature +use std::sync::Arc; + +let checkpoint_store = RedbCheckpointStore::new("/var/lib/myapp/checkpoints.redb")?; +let workflow = Workflow::new(resources) + .with_checkpoint_store(Arc::new(checkpoint_store)) + .with_workflow_id("event-consumer") + .register_stream(Step::Consume, ConsumeEvents) // cursor persisted per flushed window + .add_exit_state(Step::Done); + +// A crashed or cancelled run resumes from the last committed window: +// let result = workflow.resume_from("event-consumer", token).await?; +// open() is called with Some(cursor) — the position of the last fully-flushed window. +``` +
    + + +
    +

    The Cancellation Contract

    +

    +Because an unbounded stream never ends on its own, cancellation is how you stop it cleanly. When the +run's CancellationToken fires, the engine-driven driver performs a cooperative +drain: +

    +
      +
    1. it flushes the in-flight (partial) window via flush_window;
    2. +
    3. it commits that window's cursor;
    4. +
    5. it calls on_close(CloseReason::Cancelled) for cleanup — its returned state is +ignored;
    6. +
    7. the run ends as CanoError::Cancelled (category "cancelled").
    8. +
    + +
    +Cancel means "stop cleanly + resumable", not "transition onward" +

    +A cancelled stream does not gracefully transition to another state — it surfaces as +CanoError::Cancelled. But because the committed cursor survives, a later +resume_from continues from the last committed window. So cancellation is a clean, +resumable stop, not a hand-off. (An Err returned by on_close during the +drain is propagated.) +

    +
    + + +
    +

    The Idempotency Contract

    + +
    +Important — at-least-once +

    open and process_item must be idempotent. The FSM writes +the state-entry checkpoint before running the task, so a resumed run re-enters the state and +calls open(Some(cursor)) from the last committed cursor. The window after that +cursor may have been partially processed and then replayed — make process_item safe to +re-apply (upserts, dedupe keys, "if not already processed" guards, conditional writes). This is also +why config() defaults to TaskConfig::minimal() (no outer retry): an outer +retry would re-invoke open() and re-consume the stream.

    +
    + +
    +
    Limitation — the cursor log only clears at an exit state
    +

    +The checkpoint log is append-only and is only cleared when the run reaches an exit state, so a +never-terminating stream accumulates one StepCursor row per flushed window +indefinitely. For bounded streams, or runs that reach an exit state via on_close / +WindowSignal::Stop, this is fine. For genuinely endless sources, prefer bounded windows +plus periodic restarts (which clear the log on the clean exit) until a log-compaction step lands. +

    +
    + + +
    +

    register vs register_stream

    +

    +A StreamTask can be registered two ways, and the difference is durability: +

    + + + + + + + + + + + + + + + + + + + + + + + + +
    RegistrationCursor persistenceCancellationUse for
    register_streamyes (with a checkpoint store + id)yes — cooperative drainreal, durable stream consumption
    registernonoconvenience / tests only
    + +

    +Plain register runs the macro-generated companion Task, which drives an +in-memory windowed loop with no cursor persistence and +no cancellation — it always runs to exhaustion. Reach for register_stream +for anything real; it is the path that observes the CancellationToken and persists +cursors. +

    + + +
    +

    Explicit Trait-Impl Form

    +

    +Prefer writing the trait header yourself — e.g. to name the associated types, or for a generic impl? +Put a bare #[task::stream] on an impl StreamTask<...> for ... block and +declare the three type lines (Item / Output / Cursor) +yourself. The companion impl Task is still emitted. +

    + +
    + Explicit form — #[task::stream] on a trait impl + +```rust +use cano::prelude::*; +use futures_util::{Stream, stream}; +use std::pin::Pin; + +#[derive(Debug, Clone, PartialEq, Eq, Hash)] +enum Step { Consume, Done } + +struct Collector; + +#[task::stream] +impl StreamTask for Collector { + type Item = u32; + type Output = u32; + type Cursor = u64; + + fn window(&self) -> StreamWindow { + StreamWindow::Count(2) + } + + async fn open(&self, _res: &Resources, _cursor: Option) + -> Result + Send>>, CanoError> + { + Ok(Box::pin(stream::iter(vec![10u32, 20, 30, 40, 50])) + as Pin + Send>>) + } + + async fn process_item(&self, _res: &Resources, item: u32) + -> Result<(u32, u64), CanoError> + { + Ok((item * 2, item as u64)) + } + + async fn flush_window(&self, _res: &Resources, _outputs: Vec) + -> Result, CanoError> + { + Ok(WindowSignal::Continue) + } + + async fn on_close(&self, _res: &Resources, _reason: CloseReason) + -> Result, CanoError> + { + Ok(TaskResult::Single(Step::Done)) + } +} +``` +
    + + +
    +

    When to Use StreamTask

    +

    Reach for a StreamTask when:

    +
      +
    • your source is unbounded or continuous — a Kafka topic, an SSE feed, a tailed +file, a WebSocket — and never produces a final Vec to aggregate;
    • +
    • you want incremental, per-window emission with bounded memory rather than one +end-of-batch aggregate;
    • +
    • you need the consumer to resume from a committed offset after a crash or a +cancellation, not start over.
    • +
    +

    +If your data is a bounded collection you want to map over and aggregate once, a +BatchTask is simpler. If you have a long iterative job over a finite +range that you want to crash-resume mid-loop, a SteppedTask fits +better. +

    + +
    +
    Runnable example
    +

    +The crate ships a complete example — run it with cargo run --example stream_task. +

    +
    +
    diff --git a/docs/content/task/_index.md b/docs/content/task/_index.md index 3299607..86b5a1c 100644 --- a/docs/content/task/_index.md +++ b/docs/content/task/_index.md @@ -11,11 +11,11 @@ template = "section.html"

    A Task is the fundamental building block of a Cano workflow: a single run method that decides the next state. Start hereTask is the default -choice for every processing unit. The other five processing models +choice for every processing unit. The other six processing models (RouterTask, PollTask, TimerTask, BatchTask, -SteppedTask) are specialisations you reach for only when a task +SteppedTask, StreamTask) are specialisations you reach for only when a task has a shape that one of them fits better — see The Task Family below for the decision matrix. Tasks receive a &Resources reference at dispatch time — see Resources for how to register and retrieve typed dependencies. @@ -38,7 +38,7 @@ example on this page wires a task into a Workflow and pulls depende

  11. Resource-Free Tasks
  12. Configuring Tasks
  13. Real-World Task Patterns
  14. -
  15. The Task Family: Four More Processing Models
  16. +
  17. The Task Family: Six More Processing Models
  18. Choosing a Processing Model
@@ -328,10 +328,10 @@ impl AggregatorTask {
-

The Task Family: Four More Processing Models

+

The Task Family: Six More Processing Models

Beyond the plain Task, Cano ships -four more Task-derived processing models. Each is a specialised shape — +six more Task-derived processing models. Each is a specialised shape — they all ultimately dispatch as a Task, so you mix them freely in one workflow — and each has its own page with the full reference.

@@ -367,13 +367,20 @@ re-joined in one state.

step, so a crash resumes mid-loop. Registered with register_stepped.

Reach for it when: long page-by-page scans, chunked migrations — crash-resume finer than per-state.

+
+

StreamTask

+

Continuous stream consumption: open → process_item → flush_window per tumbling window, +running until cancelled/exhausted, resumable from a committed cursor. Registered with +register_stream.

+

Reach for it when: unbounded sources — Kafka, SSE, file-tail — with per-window emission and resume-from-cursor.

+

Choosing a Processing Model

-All six models dispatch as a Task, so you can mix them in one workflow. Start from +All seven models dispatch as a Task, so you can mix them in one workflow. Start from Task and move to a specialised model only when your work has its shape:

@@ -416,6 +423,11 @@ All six models dispatch as a Task, so you can mix them in one workf You have a long iterative job (page-by-page scan, chunked migration) you want to crash-resume mid-loop, finer than per-state. register_stepped + +StreamTask +You're consuming an unbounded / continuous source (Kafka, SSE, file-tail) — per-window emission, bounded memory, runs until cancelled/exhausted, resumable from a cursor. +register_stream + diff --git a/docs/content/timer-task/_index.md b/docs/content/timer-task/_index.md index 3324a1e..5a4e622 100644 --- a/docs/content/timer-task/_index.md +++ b/docs/content/timer-task/_index.md @@ -14,7 +14,8 @@ the task decides where to go next. There is no loop and no condition re timer schedules one tokio::time::sleep and wakes a single time. It is one of the Task family of processing models, alongside RouterTask, PollTask, -BatchTask, and SteppedTask. A +BatchTask, SteppedTask, and +StreamTask. A TimerTask reads typed dependencies from Resources the same way every other model does. New to Cano? Read Workflows and Resources first. diff --git a/docs/templates/base.html b/docs/templates/base.html index 8436884..6235fe3 100644 --- a/docs/templates/base.html +++ b/docs/templates/base.html @@ -74,6 +74,7 @@
  • TimerTask
  • BatchTask
  • SteppedTask
  • +
  • StreamTask