You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue #313 (one-time scheduled tasks delayed/duplicated after a restart) was the spark, but
the real subject is bigger: CraftBot is already a hand-rolled durable-execution engine, and
this doc names its primitives and closes the gaps where durability is partial or emergent.
We explored Temporal and parked it: we want the paradigm (durable execution — work that
survives crashes without loss or duplication), not the operational tax (running a Temporal
cluster). Everything here is Temporal-shaped on the SQLite we already run
(app/usage/session_storage.py → sessions.db, WAL, idempotent upserts). No new infra.
Four pains, all in scope:
Restart durability — produced-but-not-yet-executed triggers (scheduled, proactive,
living-ui, resume) live only in an in-memory heap and are lost on restart.
Too many ad-hoc producers — ~13 sites build Triggers with stringly-typed payload["type"] (several with none). No single model or entry API.
Cost/latency of routing — TriggerQueue.put() makes a synchronous LLM call for
session routing on the hot path of every non-system trigger.
Emergent-but-holey action idempotency — irreversible side effects (send email) are
protected from re-execution only by the LLM re-reading the event stream, with a real
crash-window hole (§4.2). This is the most agent-critical gap and the truest Temporal
alignment.
2. The frame: durable execution as three primitives
Temporal's value isn't "a durable queue." It's three things made crash-proof together:
Temporal concept
CraftBot equivalent
Durable today?
Target
Signals / timers
Triggers
❌ volatile heap
durable store (§6.2)
Workflows
Tasks (+ event stream)
⚠️ partial (persist/restore exists)
formalize contract (§6.3)
Activities (side effects)
Irreversible actions
⚠️ emergent only
idempotency keys (§6.4)
The divergence principle (state it explicitly)
Temporal reconstructs in-flight state by deterministically replaying workflow code against
an event history. CraftBot's "workflow body" is an LLM — non-deterministic by nature, so
true replay is impossible. CraftBot's durable-execution model is therefore deliberately
different:
Checkpoint observable state + make irreversible side effects idempotent — not
deterministic replay.
This is the correct model for an agent, and it sets the ceiling: we can guarantee "didn't lose
it" and "didn't do the irreversible thing twice," but not "reconstructed the exact reasoning."
3. Current architecture (as built)
3.1 The Trigger
agent_core/core/trigger.py — @dataclass(order=True): fire_at, priority, next_action_description, payload (untyped, compare=False), session_id, waiting_for_reply. No id, no source type, no dedup key, no attempt count.
3.2 The queue
agent_core/core/impl/trigger/queue.py — in-memory heapq ordered by (fire_at, priority). put() optionally calls the LLM to route to a session, then merges/replaces same-session
triggers ("prefer newest"). Volatile. Built in app/agent_base.py:242.
3.3 Producers (inventory)
~13 sites; representative:
Source
payload["type"]
Priority
Durable today?
User / integration message
(none)
3
message not persisted as a trigger
Scheduled (recurring + one-time)
scheduled
20–40
schedule persists; trigger doesn't
Memory processing
memory_processing
50–60
no
Proactive heartbeat / planner
proactive_*
50
no
Restart notice / resume
restart_notice / (none)
1 / 5–7
reconstructed from Tasks on boot
Task continuation
(none)
5/7
implied by Task
Living UI dev / crash / import
living_ui_*
30/50
no
Skill workflow / onboarding
(none)
60 / 1
no
3.4 Consumer (single loop)
app/ui_layer/controller/ui_controller.py:_consume_triggers(): trigger = await agent.triggers.get(); await agent.react(trigger). react() classifies and
turns the trigger into Task creation/continuation.
3.5 Existing durability we build on
SQLite+WAL at {APP_DATA_PATH}/.usage/sessions.db; tables active_tasks, event_streams, event_records, conversation_history; idempotent INSERT … ON CONFLICT DO UPDATE. Task
persist hooks (on_task_persist / on_task_remove_persist); boot restore
(_restore_sessions + _schedule_restored_task_triggers), filtering terminal and >24h tasks.
4. Failure modes today
4.1 Trigger layer
Lost triggers — anything produced but not yet a Task vanishes on restart.
Delay/re-anchoring — fixed for scheduled tasks (Phase 0) but only locally.
Untyped fan-in — react()/put() branch on payload["type"] strings across files.
LLM on the hot path — routing cost/latency per message trigger.
4.2 Action layer — emergent idempotency and its hole
Idempotency for side effects is not explicit (no dedup key, verified). It is emergent
from event-stream replay, and it works for the common case:
Agent runs send_email → side effect happens.
action_end event recorded → event stream → persisted to sessions.db
(agent_core/core/impl/action/manager.py:385, then on_task_persist).
On resume, react() reloads the stream; the LLM reads "send_email completed" and picks
the next action instead of resending.
The ordering, exactly:
1. side effect runs ← irreversible (email actually sent)
2. action_end recorded + persisted to sessions.db
The hole: a crash between 1 and 2 leaves an email sent with no durable record of it. On
resume the LLM sees no completion → resends → duplicate. Two things make it sharper than a
normal queue:
The evidence is read by an LLM, not a checker. If action_start persisted but action_end didn't, the model sees "Running send_email…" with no result and must guess
whether it already sent — a coin-flip on an irreversible action.
Emergent ≠ guaranteed. It depends on the model correctly inferring done-ness from prose.
Window is milliseconds and harmless for reversible actions (read_file, web_search), but
real and high-consequence for irreversible external actions (email, payment, post, outbound
message). This is the gap the activity layer (§6.4) closes.
5. Goals / non-goals
Goals
A trigger that is accepted is durably recorded before it can run, and is exactly-once
w.r.t. execution (at-least-once delivery + idempotent claim).
One typed trigger model and one producer API; remove scattered payload["type"].
Crash/restart rehydrates pending + in-flight triggers; no loss, no dupes.
Routing (incl. the LLM call) becomes pluggable and off the hot path.
Irreversible side effects are exactly-once via idempotency keys carried to the provider.
Clear layering: when (triggers) · what/state (tasks) · did-it-happen (activities).
Non-goals
No external broker / Temporal / Redis. Reuse sessions.db.
No change to the LLM reasoning loop or action-selection logic.
Not distributed/multi-process — single consumer stays.
No deterministic replay — explicitly out (§2): the workflow body is an LLM.
TriggerService is the single front door; producers call service.emit(spec) instead of
touching TriggerQueue. The queue stays as the in-memory ordering primitive but is fed
from the store, not from producers.
Typed record (stored form):
id uuid — identity
source enum: MESSAGE|SCHEDULED|IMMEDIATE|MEMORY|PROACTIVE|RESUME|LIVING_UI|
SKILL_WORKFLOW|ONBOARDING|SYSTEM (replaces payload["type"])
dedup_key idempotency, UNIQUE when present (nullable for one-off user messages)
fire_at when due
not_before retry backoff floor
priority tiebreak
session_id routing target (may be null → router decides)
description next_action_description
payload source-specific (still flexible; typed sub-objects over time)
status PENDING | CLAIMED | DONE | FAILED | DEAD
attempts, lease_until, created_at, updated_at
Idempotency keys (the keystone; INSERT … ON CONFLICT(dedup_key) DO NOTHING):
Single consumer ⇒ leases guard crash recovery, not concurrency. A boot-time reclaim
scan (CLAIMED → PENDING for orphans) is the core; a periodic sweep is optional polish.
Routing off the hot path:emit() returns after the durable INSERT, no LLM call.
Session routing becomes a Router strategy invoked by the queue feeder, only when needed
(multiple live sessions + routable source), and after the durable write so a crash mid-route
simply re-routes on retry. Decouples TriggerQueue from llm/prompt templates.
Overdue handling generalized: Phase 0's agent-judged catch-up note (fire it, but tell the
agent how late it is and to use judgment) moves to the store layer so any rehydrated
overdue trigger — not just scheduled — gets the same treatment.
6.3 Primitive B — Tasks-as-workflows (mostly formalizing what exists)
Tasks already persist (active_tasks) and restore. This primitive is naming + a contract,
not a rewrite:
A Task is the durable workflow: its state = (todos, status, event stream), already in sessions.db.
Define the resume contract explicitly: restore reads task + event stream, emits a RESUME
trigger (now via emit() with dedup_key=resume:{task_id} — so double-boot can't
double-resume), the LLM advances from the event history.
No deterministic replay (§2) — resume = "reload checkpoint + let the LLM read the stream."
6.4 Primitive C — Idempotent activities (the Temporal headline; scoped, not universal)
Closes §4.2. Applies only to irreversible external actions (email, payment, post, outbound
message) — flagged on the action definition (e.g. irreversible=True). Reversible actions are
untouched (wrapping read_file would be gold-plating).
For a flagged action:
Mint a stable idempotency key (activity:{task_id}:{action_seq} or content-hash) and
record intent durably to the event stream/ledger before execution.
Carry the key to the action, and where the provider supports it, to the provider
(email Message-ID / idempotency header, payment idempotency key) so a re-send is deduped at the destination — the only place exactly-once is actually enforceable.
Record result durably after.
On resume, the key answers "did this already run?" — a hard check, not LLM prose-reading.
This mirrors Temporal honestly: Temporal doesn't give free exactly-once either; it gives
at-least-once + an idempotency key you push to the external system. Same here.
7. Data model (additions to sessions.db)
CREATETABLEIF NOT EXISTS triggers (
id TEXTPRIMARY KEY, source TEXTNOT NULL, dedup_key TEXT UNIQUE,
session_id TEXT, fire_at REALNOT NULL, not_before REAL, priority INTEGERNOT NULL,
description TEXT, payload_json TEXTNOT NULL, status TEXTNOT NULL,
attempts INTEGERNOT NULL DEFAULT 0, lease_until REAL,
created_at REALNOT NULL, updated_at REALNOT NULL
);
CREATEINDEXIF NOT EXISTS idx_triggers_due ON triggers(status, fire_at);
-- Activity ledger for irreversible actions (§6.4)CREATETABLEIF NOT EXISTS activity_log (
idem_key TEXTPRIMARY KEY, -- the stable key, also pushed to provider when possible
task_id TEXTNOT NULL, action TEXTNOT NULL,
status TEXTNOT NULL, -- INTENT | DONE | FAILED
provider_ref TEXT, -- e.g. email Message-ID returned by the provider
created_at REALNOT NULL, updated_at REALNOT NULL
);
Both reuse the existing WAL connection, upsert idioms, and stale-GC TTL (DONE/INTENT rows GC'd
on the same 24h pattern — don't keep message-content payloads indefinitely).
8. Boot / restore integration
Extend _restore_sessions():
Reclaim CLAIMED triggers past lease_until → PENDING; reclaim INTENT activities
(decide per-action: re-attempt with same key, or surface for confirmation).
Load PENDING triggers → feed the queue (respecting fire_at/not_before); apply the
overdue catch-up note (§6.2).
Task resume becomes a RESUME-trigger producer via emit() (dedup'd).
Dedup-key & activity-key granularity is load-bearing — wrong bucket either lets dupes
through or suppresses legitimate re-fires. Needs a reviewed per-source / per-action key table.
Merge semantics — today put() "prefers newest" by dropping queued triggers. With a
store, "drop" must mean mark superseded DONE/DEAD (not silent delete) so they don't
rehydrate.
Which actions are "irreversible"? Mislabel → either gold-plated or unprotected. Start
conservative (email/payment/post/outbound message) and require explicit opt-in per action.
Provider dedup support varies — not every side effect has an idempotency primitive; where
absent, the ledger reduces (not eliminates) the window, and reclaim should prefer confirm-with-user over blind re-attempt.
Write amplification — every trigger + irreversible action now hits SQLite; low volume,
WAL is cheap, but measure boot-rehydration + heartbeat cadence.
Retention vs privacy — payloads/ledger may hold message content; reuse the existing 24h
TTL, consider stripping bodies on DONE.
11. TL;DR
CraftBot is already a durable-execution engine; this names its three primitives and closes the
gaps — on the SQLite we already run, no Temporal cluster.
Tasks = workflows → formalize the persist/restore that already exists into an explicit
resume contract.
Irreversible actions = activities → today they're protected only by the LLM re-reading
the event stream (emergent, with a real crash-window hole); add an idempotency key recorded
before the side effect and pushed to the provider, so "did it already run?" is a hard check.
Principle, stated once: checkpoint state + idempotent activities, not deterministic replay —
because the workflow body is an LLM.
1. Why
Issue #313 (one-time scheduled tasks delayed/duplicated after a restart) was the spark, but
the real subject is bigger: CraftBot is already a hand-rolled durable-execution engine, and
this doc names its primitives and closes the gaps where durability is partial or emergent.
We explored Temporal and parked it: we want the paradigm (durable execution — work that
survives crashes without loss or duplication), not the operational tax (running a Temporal
cluster). Everything here is Temporal-shaped on the SQLite we already run
(
app/usage/session_storage.py→sessions.db, WAL, idempotent upserts). No new infra.Four pains, all in scope:
living-ui, resume) live only in an in-memory heap and are lost on restart.
Triggers with stringly-typedpayload["type"](several with none). No single model or entry API.TriggerQueue.put()makes a synchronous LLM call forsession routing on the hot path of every non-system trigger.
protected from re-execution only by the LLM re-reading the event stream, with a real
crash-window hole (§4.2). This is the most agent-critical gap and the truest Temporal
alignment.
2. The frame: durable execution as three primitives
Temporal's value isn't "a durable queue." It's three things made crash-proof together:
The divergence principle (state it explicitly)
Temporal reconstructs in-flight state by deterministically replaying workflow code against
an event history. CraftBot's "workflow body" is an LLM — non-deterministic by nature, so
true replay is impossible. CraftBot's durable-execution model is therefore deliberately
different:
This is the correct model for an agent, and it sets the ceiling: we can guarantee "didn't lose
it" and "didn't do the irreversible thing twice," but not "reconstructed the exact reasoning."
3. Current architecture (as built)
3.1 The Trigger
agent_core/core/trigger.py—@dataclass(order=True):fire_at,priority,next_action_description,payload(untyped,compare=False),session_id,waiting_for_reply. No id, no source type, no dedup key, no attempt count.3.2 The queue
agent_core/core/impl/trigger/queue.py— in-memoryheapqordered by(fire_at, priority).put()optionally calls the LLM to route to a session, then merges/replaces same-sessiontriggers ("prefer newest"). Volatile. Built in
app/agent_base.py:242.3.3 Producers (inventory)
~13 sites; representative:
payload["type"]scheduledmemory_processingproactive_*restart_notice/ (none)living_ui_*3.4 Consumer (single loop)
app/ui_layer/controller/ui_controller.py:_consume_triggers():trigger = await agent.triggers.get(); await agent.react(trigger).react()classifies andturns the trigger into Task creation/continuation.
3.5 Existing durability we build on
SQLite+WAL at
{APP_DATA_PATH}/.usage/sessions.db; tablesactive_tasks,event_streams,event_records,conversation_history; idempotentINSERT … ON CONFLICT DO UPDATE. Taskpersist hooks (
on_task_persist/on_task_remove_persist); boot restore(
_restore_sessions+_schedule_restored_task_triggers), filtering terminal and >24h tasks.4. Failure modes today
4.1 Trigger layer
react()/put()branch onpayload["type"]strings across files.4.2 Action layer — emergent idempotency and its hole
Idempotency for side effects is not explicit (no dedup key, verified). It is emergent
from event-stream replay, and it works for the common case:
send_email→ side effect happens.action_endevent recorded → event stream → persisted tosessions.db(
agent_core/core/impl/action/manager.py:385, thenon_task_persist).react()reloads the stream; the LLM reads "send_email completed" and picksthe next action instead of resending.
The ordering, exactly:
The hole: a crash between 1 and 2 leaves an email sent with no durable record of it. On
resume the LLM sees no completion → resends → duplicate. Two things make it sharper than a
normal queue:
action_startpersisted butaction_enddidn't, the model sees "Running send_email…" with no result and must guesswhether it already sent — a coin-flip on an irreversible action.
Window is milliseconds and harmless for reversible actions (
read_file,web_search), butreal and high-consequence for irreversible external actions (email, payment, post, outbound
message). This is the gap the activity layer (§6.4) closes.
5. Goals / non-goals
Goals
w.r.t. execution (at-least-once delivery + idempotent claim).
payload["type"].Non-goals
sessions.db.6. Proposed design — three primitives
6.1 Layering
6.2 Primitive A — Durable triggers (signals/timers) (Phase 1, fixes #313 class)
TriggerService is the single front door; producers call
service.emit(spec)instead oftouching
TriggerQueue. The queue stays as the in-memory ordering primitive but is fedfrom the store, not from producers.
Typed record (stored form):
Idempotency keys (the keystone;
INSERT … ON CONFLICT(dedup_key) DO NOTHING):scheduled:{schedule_id}:{fire_at_bucket}scheduled-once:{schedule_id}resume:{task_id}· proactive:proactive:{frequency}:{slot}living_ui:{project_id}:{job_kind}This retires the trigger-duplication class structurally; Phase 0's
run_count>0skip becomesone instance of it.
Lifecycle (at-least-once + idempotent claim):
Single consumer ⇒ leases guard crash recovery, not concurrency. A boot-time reclaim
scan (
CLAIMED → PENDINGfor orphans) is the core; a periodic sweep is optional polish.Routing off the hot path:
emit()returns after the durable INSERT, no LLM call.Session routing becomes a
Routerstrategy invoked by the queue feeder, only when needed(multiple live sessions + routable source), and after the durable write so a crash mid-route
simply re-routes on retry. Decouples
TriggerQueuefromllm/prompt templates.Overdue handling generalized: Phase 0's agent-judged catch-up note (fire it, but tell the
agent how late it is and to use judgment) moves to the store layer so any rehydrated
overdue trigger — not just scheduled — gets the same treatment.
6.3 Primitive B — Tasks-as-workflows (mostly formalizing what exists)
Tasks already persist (
active_tasks) and restore. This primitive is naming + a contract,not a rewrite:
(todos, status, event stream), already insessions.db.RESUMEtrigger (now via
emit()withdedup_key=resume:{task_id}— so double-boot can'tdouble-resume), the LLM advances from the event history.
6.4 Primitive C — Idempotent activities (the Temporal headline; scoped, not universal)
Closes §4.2. Applies only to irreversible external actions (email, payment, post, outbound
message) — flagged on the action definition (e.g.
irreversible=True). Reversible actions areuntouched (wrapping
read_filewould be gold-plating).For a flagged action:
activity:{task_id}:{action_seq}or content-hash) andrecord intent durably to the event stream/ledger before execution.
(email
Message-ID/ idempotency header, payment idempotency key) so a re-send is dedupedat the destination — the only place exactly-once is actually enforceable.
This mirrors Temporal honestly: Temporal doesn't give free exactly-once either; it gives
at-least-once + an idempotency key you push to the external system. Same here.
7. Data model (additions to sessions.db)
Both reuse the existing WAL connection, upsert idioms, and stale-GC TTL (DONE/INTENT rows GC'd
on the same 24h pattern — don't keep message-content payloads indefinitely).
8. Boot / restore integration
Extend
_restore_sessions():CLAIMEDtriggers pastlease_until→PENDING; reclaimINTENTactivities(decide per-action: re-attempt with same key, or surface for confirmation).
PENDINGtriggers → feed the queue (respectingfire_at/not_before); apply theoverdue catch-up note (§6.2).
RESUME-trigger producer viaemit()(dedup'd).9. Phasing
fire_at, skip already-firedone-time tasks, remove-before-enqueue, agent-judged catch-up.
triggerstable +TriggerStore(claim/ack/reclaim) +
TriggerService.emit()+ boot rehydration. Migrate scheduled andresume first (natural dedup keys). Retires the [V.1.3.3] Scheduled tasks are delayed and duplicated after restart of CraftBot #313 class for all producers.
TriggerSourceenum; migrate remainingproducers to
emit(); delete scatteredpayload["type"]branching.Routerbehind the queue feeder; remove the LLM call fromput(); routing post-durable-write and retry-safe.key before, result after; provider-level dedup; key-based resume check. Independent of
Phases 1–3 — can be pulled forward if the double-send risk bites first.
Each phase ships and is observable on its own.
10. Risks / open questions
through or suppresses legitimate re-fires. Needs a reviewed per-source / per-action key table.
put()"prefers newest" by dropping queued triggers. With astore, "drop" must mean mark superseded
DONE/DEAD(not silent delete) so they don'trehydrate.
conservative (email/payment/post/outbound message) and require explicit opt-in per action.
absent, the ledger reduces (not eliminates) the window, and reclaim should prefer
confirm-with-user over blind re-attempt.
WAL is cheap, but measure boot-rehydration + heartbeat cadence.
optional.
TTL, consider stripping bodies on DONE.
11. TL;DR
CraftBot is already a durable-execution engine; this names its three primitives and closes the
gaps — on the SQLite we already run, no Temporal cluster.
emit()API and routing off the hot path. Phase 1 alone retires the [V.1.3.3] Scheduled tasks are delayed and duplicated after restart of CraftBot #313 failure class forevery producer.
resume contract.
the event stream (emergent, with a real crash-window hole); add an idempotency key recorded
before the side effect and pushed to the provider, so "did it already run?" is a hard check.
Principle, stated once: checkpoint state + idempotent activities, not deterministic replay —
because the workflow body is an LLM.