Skip to content

Add per-recipe crash-loop launch budget #153

Description

@OnlyTerp

Problem

Repeated failed launches are not tracked or quarantined, so a bad recipe/backend can be launched repeatedly with no per-recipe failure budget or clear crash_loop state.

Evidence

  • controller/src/modules/engines/engine-coordinator.ts:18-20 has no failure counter/quarantine state.
  • controller/src/modules/engines/engine-coordinator.ts:66-87 emits launch-progress error but does not record/count failures.
  • controller/src/modules/engines/routes.ts:142-202 calls setActiveRecipe once per request and has no budget check.
  • controller/src/modules/system/event-manager.ts:132-151 can already carry optional metadata on launch_progress.
  • frontend/src/lib/types/recipes/recipes.ts:152-153 already accepts recipe status: "error".

Proposed direction

Add an in-memory per-recipe crash-loop budget in the controller. After N launch failures in a time window, mark the recipe/runtime as crash-looped, stop additional launch attempts until manual retry/config change, and expose optional state via existing status/recipe/SSE payloads.

Acceptance criteria

  • Controller tracks repeated launch failures per recipe over a bounded time window.
  • After threshold, launch returns a clear crash-loop response without spawning a new process.
  • Successful readiness or recipe config change resets the budget.
  • /recipes, /status, or launch_progress exposes optional crash-loop state without breaking existing clients.
  • Focused tests simulate repeated immediate-exit/failed launches without starting real model servers.

Non-goals

  • No auto-retry scheduler.
  • No persistent SQLite crash-loop state.
  • No process-manager launch rewrite.
  • No frontend redesign in the first slice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions