Skip to content

fix(cron): treat non-budget session error as cron failure#19

Merged
nyem69 merged 1 commit into
mainfrom
fix/cron-runner-silent-spawn-fail
Jun 29, 2026
Merged

fix(cron): treat non-budget session error as cron failure#19
nyem69 merged 1 commit into
mainfrom
fix/cron-runner-silent-spawn-fail

Conversation

@nyem69

@nyem69 nyem69 commented Jun 29, 2026

Copy link
Copy Markdown
Owner

Why

A claude-engine cron outage went silent for ~2.8 days (Jun 26→29). The native-installer migration relocated the claude binary out of the pinned nvm path, ENOENT'ing every spawn — yet run-logs showed clean status:"success".

Root cause of the invisibility: on a spawn-fail (or any non-budget engine error) the session manager catches the reject and sets session.status="error", but route() still resolves with a sessionId, so the runner took the success path.

What

  • runner.ts: after route() resolves, if finalSession?.status === "error" and it is not a budget stop → log status:"error" + fire opsAlert.
  • runner.test.ts: 2 regression tests + a default getSession mock.

Pairing

This is the signal source the t1a-health-check.sh cron heartbeat depends on (~/.jinn, committed separately). The heartbeat alerts on 0 successful cron runs in 8h — it only works because spawn-fails now record error not success. Reverting this blinds the heartbeat.

Verification

pnpm build clean; full suite 792/792 pass; gateway restarted on the built dist and confirmed claude spawns from ~/.local/bin/claude.

🤖 Generated with Claude Code

Pre-fix, a spawn-fail (or any non-budget engine error) left the session
manager catching the reject and setting session.status="error", but
route() still resolved with a sessionId -- so the runner took the success
path and logged status:"success". This hid a ~2.8-day outage where every
claude-engine cron silently failed (binary path went stale after the
native-installer migration) yet run-logs showed clean successes.

- runner.ts: after route() resolves, if finalSession?.status === "error"
  and it is not a budget stop, log status:"error" + fire opsAlert
- runner.test.ts: 2 regression tests + default getSession mock

This is the signal source the t1a-health-check heartbeat depends on.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01EcjhZ9Nj1KmHyBBRMzBJ93
@nyem69 nyem69 merged commit 1e3f7a9 into main Jun 29, 2026
2 checks passed
@nyem69 nyem69 deleted the fix/cron-runner-silent-spawn-fail branch June 29, 2026 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant