Skip to content

fix(orch): don't let infra-failed managers exhaust the respawn budget#64

Merged
nathanwhit merged 1 commit into
mainfrom
manager-respawn-budget
Jun 24, 2026
Merged

fix(orch): don't let infra-failed managers exhaust the respawn budget#64
nathanwhit merged 1 commit into
mainfrom
manager-respawn-budget

Conversation

@nathanwhit

Copy link
Copy Markdown
Owner

A manager that dies at workspace-prepare without ever running is an infra failure (full disk, unreachable target), not a stuck objective — yet it counted the same as a genuine failure against the 4-manager respawn cap. A transient host outage therefore burned the whole budget and permanently parked the objective even after the infra healed (this stranded a8641186 behind three disk-full respawns).

managerRespawnExhausted now counts only genuine attempts (a manager that reached running, plus any still in flight — StartedAt is stamped only on the transition to running) against maxManagerSessions. A separate hard ceiling (maxManagerSpawns) on total spawns still bounds an objective whose target is persistently unplaceable, so it escalates instead of respawning forever.

Tests: TestManagerRespawn_InfraDeathsDontExhaustBudget, TestManagerRespawn_HardCeilingBoundsInfraLoop.

A manager that dies at workspace-prepare without ever running is an INFRA
failure (a full disk, an unreachable target), not a stuck objective — yet it
counted the same as a genuine failure against the 4-manager respawn cap. A
transient host outage therefore burned the whole budget and permanently parked
the objective even after the infra healed: exactly what stranded a8641186
behind three disk-full respawns.

managerRespawnExhausted now counts only GENUINE attempts (a manager that
reached running, plus any still in flight — StartedAt is stamped only on the
transition to running) against maxManagerSessions. A separate hard ceiling
(maxManagerSpawns) on TOTAL spawns still bounds an objective whose target is
persistently unplaceable, so it escalates instead of respawning forever.
@nathanwhit nathanwhit merged commit e3f7bb1 into main Jun 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant