Skip to content

feat(cron): precheck gate — skip the LLM session when a shell check finds no work#18

Merged
nyem69 merged 2 commits into
mainfrom
feat/cron-precheck-gate
Jun 7, 2026
Merged

feat(cron): precheck gate — skip the LLM session when a shell check finds no work#18
nyem69 merged 2 commits into
mainfrom
feat/cron-precheck-gate

Conversation

@nyem69

@nyem69 nyem69 commented Jun 7, 2026

Copy link
Copy Markdown
Owner

What

Adds optional CronJob.precheck — a cheap shell command the scheduler runs before creating the (expensive) LLM session, skipping the session entirely when there's nothing to do.

Why

The fleet's dominant cost pattern is high-frequency watcher/queue jobs that boot a full LLM session every fire just to run a deterministic prefilter that usually finds no work:

Job ~$/mo sched shape
jinn-group-watcher ~$210 (≈25% of spend) hourly 8–23 shell prefilter run inside a session every fire
plan-url-queue-processor ~$95 every 30m empty-queue most fires
sitrep-regen-watch ~$57 every 15m 1–7am "needs regen?" usually no
social-alert ~$54 11×/day
top-4 ~$520 / 41% mostly no-op LLM boots

jinn-group-watcher already has jinn-watcher-prefilter.sh, but the session boots, reads context, runs the prefilter, finds nothing, and exits — every hour. This moves the gate to the scheduler so no-op fires never spawn Claude.

Contract

exit 0                 -> proceed (spawn as today)
exit ∈ skipExitCodes   -> gated-skip   (no session, no alert, runlog only)
any other non-zero     -> precheck_error (no session, ops alert)
timeout                -> precheck_error

skipExitCodes is required for any skip — with none set, every non-zero is an error. Deliberate: stops a real outage being silently skipped. The first adopter's prefilter proves why blanket "non-zero = skip" is wrong — 0=URLs, 10=no-op, 20=lock-contention, 21–24=wacli/JSON/state/watermark failures. So skipExitCodes: [10,20]; 21–24 surface as precheck_error.

Implementation

  • shared/types.tsCronJob.precheck { command, timeoutMs?, skipExitCodes? }, optional/back-compat.
  • cron/precheck.tsrunPrecheck(): /bin/bash -c in JINN_HOME, timeout (default 60s), captured output; never rejects.
  • cron/runner.ts — gate before session creation; new runlog statuses gated-skip / precheck_error; ops-alert only on error, never on a no-work skip. Gated-skip still writes a runlog entry + lastStartedAt, so catch-up dedup treats the slot as serviced (no double-fire).

Tests (790 pass, tsc clean)

  • precheck.test.ts (real bash): proceed / skip / 2nd-skip-code / non-listed-nonzero=error / no-skip-codes-never-skips-blind / timeout / invalid-config / stdout.
  • runner.test.ts: proceed spawns; skip+error+timeout don't; skip silent; error ops-alerts; no-precheck jobs bypass (back-compat).

Rollout (staged)

Migrate only jinn-group-watcher first (jinn-data repo, after this deploys): prefilter moves from in-session Step 1 to precheck, prompt collapses to the process-URLs branch. Observe live skip rate + cost delta, then apply to plan-url-queue-processor / sitrep-regen-watch / social-alert.

🤖 Generated with Claude Code

nyem69 and others added 2 commits June 7, 2026 03:49
The catch-up sweep dedups against the on-disk run-log, but runner.ts only
writes that log when the (often multi-minute) LLM session COMPLETES. So when
the ~5-min catch-up sweep ran while an on-time fire was still executing, it saw
a stale run-log and replayed the slot — every cron whose runtime outlasts the
gap to the next sweep double-fired (collectors, all sitreps, etc.; NS produced
a genuine duplicate draft row on 2026-06-07).

Track each job's start time in-memory, set synchronously before the await, and
have catch-up dedup consult max(disk-last-run, in-memory-last-start). In-memory
is deliberate: lost on restart, which is correct — a fresh process must fall
back to the on-disk log so genuinely slept-through/crashed fires still replay.

- runner.ts: lastStartedAt map + lastStartedAtMs() getter, set at invocation
- catchup.ts: pure mostRecentRun(disk, started) helper
- scheduler.ts: catch-up lastRunAt merges disk log with in-memory start
- +5 unit tests (34/34 cron tests pass, tsc clean)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…inds no work

Adds optional CronJob.precheck: a cheap shell command the scheduler runs BEFORE
creating the (expensive) LLM session, skipping the session entirely when there's
nothing to do. Targets the dominant cost pattern in the fleet: high-frequency
watcher/queue jobs that boot a full session every fire just to run a deterministic
prefilter that usually finds no work (jinn-group-watcher alone ~$210/mo, ~25% of
spend; the top 4 watchers ~$520/mo / 41%).

Contract (CronJob.precheck):
  exit 0                  -> proceed (spawn session as today)
  exit in skipExitCodes   -> gated-skip (no session, no alert, runlog only)
  any other non-zero      -> precheck_error (no session, ops alert)
  timeout                 -> precheck_error
skipExitCodes is required to get skip behaviour; with none set, every non-zero is
an error. This is deliberate: it stops a real dependency outage (e.g. the
jinn-watcher prefilter's exit 21 = wacli down) from being silently skipped.

- shared/types.ts: CronJob.precheck { command, timeoutMs?, skipExitCodes? }, optional/back-compat.
- cron/precheck.ts: runPrecheck() — /bin/bash -c, cwd JINN_HOME, timeout, never rejects.
- cron/runner.ts: gate runs before session creation; gated-skip / precheck_error runlog
  statuses; ops-alert ONLY on error (never on a normal no-work skip).
- tests: precheck unit (proceed/skip/2nd-skip/error/no-skip-codes/timeout/invalid/stdout)
  + runner branching (proceed spawns, skip/error/timeout don't, back-compat). 790 pass, tsc clean.

First adopter jinn-group-watcher is migrated separately (jinn-data repo, after this
deploys) so the prefilter moves from in-session Step 1 to precheck; observe skip
rate + cost delta before applying to plan-url-queue-processor / sitrep-regen-watch /
social-alert.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@nyem69 nyem69 merged commit d84b616 into main Jun 7, 2026
3 checks passed
@nyem69 nyem69 deleted the feat/cron-precheck-gate branch June 7, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant