feat(cron): precheck gate — skip the LLM session when a shell check finds no work#18
Merged
Conversation
The catch-up sweep dedups against the on-disk run-log, but runner.ts only writes that log when the (often multi-minute) LLM session COMPLETES. So when the ~5-min catch-up sweep ran while an on-time fire was still executing, it saw a stale run-log and replayed the slot — every cron whose runtime outlasts the gap to the next sweep double-fired (collectors, all sitreps, etc.; NS produced a genuine duplicate draft row on 2026-06-07). Track each job's start time in-memory, set synchronously before the await, and have catch-up dedup consult max(disk-last-run, in-memory-last-start). In-memory is deliberate: lost on restart, which is correct — a fresh process must fall back to the on-disk log so genuinely slept-through/crashed fires still replay. - runner.ts: lastStartedAt map + lastStartedAtMs() getter, set at invocation - catchup.ts: pure mostRecentRun(disk, started) helper - scheduler.ts: catch-up lastRunAt merges disk log with in-memory start - +5 unit tests (34/34 cron tests pass, tsc clean) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…inds no work
Adds optional CronJob.precheck: a cheap shell command the scheduler runs BEFORE
creating the (expensive) LLM session, skipping the session entirely when there's
nothing to do. Targets the dominant cost pattern in the fleet: high-frequency
watcher/queue jobs that boot a full session every fire just to run a deterministic
prefilter that usually finds no work (jinn-group-watcher alone ~$210/mo, ~25% of
spend; the top 4 watchers ~$520/mo / 41%).
Contract (CronJob.precheck):
exit 0 -> proceed (spawn session as today)
exit in skipExitCodes -> gated-skip (no session, no alert, runlog only)
any other non-zero -> precheck_error (no session, ops alert)
timeout -> precheck_error
skipExitCodes is required to get skip behaviour; with none set, every non-zero is
an error. This is deliberate: it stops a real dependency outage (e.g. the
jinn-watcher prefilter's exit 21 = wacli down) from being silently skipped.
- shared/types.ts: CronJob.precheck { command, timeoutMs?, skipExitCodes? }, optional/back-compat.
- cron/precheck.ts: runPrecheck() — /bin/bash -c, cwd JINN_HOME, timeout, never rejects.
- cron/runner.ts: gate runs before session creation; gated-skip / precheck_error runlog
statuses; ops-alert ONLY on error (never on a normal no-work skip).
- tests: precheck unit (proceed/skip/2nd-skip/error/no-skip-codes/timeout/invalid/stdout)
+ runner branching (proceed spawns, skip/error/timeout don't, back-compat). 790 pass, tsc clean.
First adopter jinn-group-watcher is migrated separately (jinn-data repo, after this
deploys) so the prefilter moves from in-session Step 1 to precheck; observe skip
rate + cost delta before applying to plan-url-queue-processor / sitrep-regen-watch /
social-alert.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds optional
CronJob.precheck— a cheap shell command the scheduler runs before creating the (expensive) LLM session, skipping the session entirely when there's nothing to do.Why
The fleet's dominant cost pattern is high-frequency watcher/queue jobs that boot a full LLM session every fire just to run a deterministic prefilter that usually finds no work:
jinn-group-watcheralready hasjinn-watcher-prefilter.sh, but the session boots, reads context, runs the prefilter, finds nothing, and exits — every hour. This moves the gate to the scheduler so no-op fires never spawn Claude.Contract
skipExitCodesis required for any skip — with none set, every non-zero is an error. Deliberate: stops a real outage being silently skipped. The first adopter's prefilter proves why blanket "non-zero = skip" is wrong —0=URLs,10=no-op,20=lock-contention,21–24=wacli/JSON/state/watermark failures. SoskipExitCodes: [10,20]; 21–24 surface asprecheck_error.Implementation
CronJob.precheck { command, timeoutMs?, skipExitCodes? }, optional/back-compat.runPrecheck():/bin/bash -cinJINN_HOME, timeout (default 60s), captured output; never rejects.gated-skip/precheck_error; ops-alert only on error, never on a no-work skip. Gated-skip still writes a runlog entry +lastStartedAt, so catch-up dedup treats the slot as serviced (no double-fire).Tests (790 pass, tsc clean)
Rollout (staged)
Migrate only
jinn-group-watcherfirst (jinn-data repo, after this deploys): prefilter moves from in-session Step 1 toprecheck, prompt collapses to the process-URLs branch. Observe live skip rate + cost delta, then apply toplan-url-queue-processor/sitrep-regen-watch/social-alert.🤖 Generated with Claude Code