Problem
Repeated failed launches are not tracked or quarantined, so a bad recipe/backend can be launched repeatedly with no per-recipe failure budget or clear crash_loop state.
Evidence
controller/src/modules/engines/engine-coordinator.ts:18-20 has no failure counter/quarantine state.
controller/src/modules/engines/engine-coordinator.ts:66-87 emits launch-progress error but does not record/count failures.
controller/src/modules/engines/routes.ts:142-202 calls setActiveRecipe once per request and has no budget check.
controller/src/modules/system/event-manager.ts:132-151 can already carry optional metadata on launch_progress.
frontend/src/lib/types/recipes/recipes.ts:152-153 already accepts recipe status: "error".
Proposed direction
Add an in-memory per-recipe crash-loop budget in the controller. After N launch failures in a time window, mark the recipe/runtime as crash-looped, stop additional launch attempts until manual retry/config change, and expose optional state via existing status/recipe/SSE payloads.
Acceptance criteria
Non-goals
- No auto-retry scheduler.
- No persistent SQLite crash-loop state.
- No process-manager launch rewrite.
- No frontend redesign in the first slice.
Problem
Repeated failed launches are not tracked or quarantined, so a bad recipe/backend can be launched repeatedly with no per-recipe failure budget or clear
crash_loopstate.Evidence
controller/src/modules/engines/engine-coordinator.ts:18-20has no failure counter/quarantine state.controller/src/modules/engines/engine-coordinator.ts:66-87emits launch-progresserrorbut does not record/count failures.controller/src/modules/engines/routes.ts:142-202callssetActiveRecipeonce per request and has no budget check.controller/src/modules/system/event-manager.ts:132-151can already carry optional metadata onlaunch_progress.frontend/src/lib/types/recipes/recipes.ts:152-153already accepts recipestatus: "error".Proposed direction
Add an in-memory per-recipe crash-loop budget in the controller. After N launch failures in a time window, mark the recipe/runtime as crash-looped, stop additional launch attempts until manual retry/config change, and expose optional state via existing status/recipe/SSE payloads.
Acceptance criteria
/recipes,/status, orlaunch_progressexposes optional crash-loop state without breaking existing clients.Non-goals