Skip to content

fix(engine): plan errors fail fast; service caps consecutive failures#201

Open
gmoigneu wants to merge 1 commit into
mainfrom
fix/134-plan-errors-service-backoff
Open

fix(engine): plan errors fail fast; service caps consecutive failures#201
gmoigneu wants to merge 1 commit into
mainfrom
fix/134-plan-errors-service-backoff

Conversation

@gmoigneu
Copy link
Copy Markdown
Contributor

@gmoigneu gmoigneu commented May 3, 2026

Closes #134. Two safety nets: (1) sync now checks plan.HasErrors before RunOnce and exits 2 if the plan already knows the apply will fail. (2) RunService caps consecutive failures at 5 (resets on success) and exits with the last error so a process supervisor can intervene.

Two related orchestration safety nets called out in #134.

1. cmd/sync.go now checks plan.HasErrors after Plan and before
   RunOnce. If the plan already knows e.g. "unsupported VCS type",
   sync exits with ExitCodeError{Code: 2, Cause: ...} instead of
   re-discovering the same error per resource inside the apply.

2. engine.RunService now bounds consecutive failures at
   maxConsecutiveServiceFailures (5). A successful sync resets the
   counter; transient errors are absorbed. Once the cap is hit,
   RunService returns the wrapped last error so a process supervisor
   (systemd, kubelet, …) can intervene. Previously a misconfigured
   service retried forever, burning CPU and network indefinitely.

Closes #134.
@Theosakamg Theosakamg self-requested a review May 3, 2026 11:51
Copy link
Copy Markdown
Contributor

@Theosakamg Theosakamg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gmoigneu
Copy link
Copy Markdown
Contributor Author

gmoigneu commented May 6, 2026

This needs a strict: true/false to either accept partial updates or stop before they are applied if the plan failed

@Theosakamg
Copy link
Copy Markdown
Contributor

Use atomic update/remove by skill/MCP but integrate a Commit concept

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

engine: continue past per-class errors and propagate plan errors before apply

2 participants