Skip to content

Add system cron job observability debug endpoint (AXON-692)#12

Merged
Binlogo merged 1 commit into
Pyiner:mainfrom
Binlogo:feat/schedule_followup_observability_3decc4
Jun 1, 2026
Merged

Add system cron job observability debug endpoint (AXON-692)#12
Binlogo merged 1 commit into
Pyiner:mainfrom
Binlogo:feat/schedule_followup_observability_3decc4

Conversation

@Binlogo
Copy link
Copy Markdown
Collaborator

@Binlogo Binlogo commented May 29, 2026

Summary

Adds GET /api/debug/system-cron-jobs — debug observability for system-managed
cron jobs (AXON-692). schedule_followup jobs are created with system: true,
which keeps them out of the user-facing GET /api/cron/jobs. During incidents
("the agent promised a followup and it never came back") there was no way to
inspect them. This endpoint lists the system cron jobs with each job's recent
RunRecord history, and adds a system-only manual fire.

Scope

  • GET /api/debug/system-cron-jobs — lists system == true jobs (reuses
    CronService::list_all, which list() filters), each with recent
    RunRecords (reuses list_runs_for_job). Read-only; never repairs state.
    • thread_id query — exact match on the job's thread (empty/blank ignored).
    • since query — unix-seconds or RFC3339 lower bound on created_at; an
      unparseable value returns 400 invalid_since (never a silent full list).
    • runs_limit query — recent runs per job (default 20).
  • POST /api/debug/system-cron-jobs/{id}/run — system-only wrapper around
    CronService::run_now. Missing or non-system job → 404; disabled /
    already-running → 409. The debug channel never fires user automations.

Auth

Routes are registered under the protected router, so enforce_gateway_auth
gates them: loopback passes, everything else needs a valid gateway token. Reuses
the existing gateway token rather than introducing a separate debug-token
config surface. Never exposed unauthenticated to non-loopback callers.

Invariants honored

  • GET /api/cron/jobs and GET /api/cron/runs behavior unchanged (additive).
  • Read route does not repair/mutate cron state (per repository-contracts).
  • run_now wrapper strictly system-only.
  • No new dependencies; no CronService signature changes.

Tests

cargo test -p garyx-gateway --lib — 513 pass (7 new: no-service, system-only
listing, thread filter, invalid-since 400, since-unix filter, run 404 missing,
run 404 non-system). cargo clippy clean on the changed files.

Docs

New docs/schedule-followup-observability.md, cross-linked from
docs/schedule-followup.md.

Refs: AXON-692 (parent AXON-659; depends on AXON-687, merged in #10).

Local verify

Internal-tier local verification (garyx has no SCM/BOE pipeline; cargo-validated, delivered via GitHub PR).

  • cargo test -p garyx-gateway --lib513 passed; 0 failed (7 new debug-endpoint tests: no-service, system-only listing, thread_id filter, invalid-since → 400, since unix filter, run 404 missing, run 404 non-system).
  • cargo clippy -p garyx-gateway --lib --testsclean on the changed files (api.rs / route_graph.rs / api/tests.rs). The repo-wide -D warnings run surfaces pre-existing lints in garyx-models / claude-agent-sdk (unchanged source, newly flagged by clippy 1.95) — out of scope for this PR.

Reviewed via ag-dev internal single-role CR: risk MEDIUM (new additive HTTP surface), 0 blockers / 0 warnings.

Retrospective

  • 总耗时: ~14m (P1 → P6) | CR 轮次: 0 | 自愈: 0 | 自主决策: 5 条 | Deferred: 1
  • 干净一次过:实现单 crate additive 端点,0 blocker/warning。复用现有 cron_jobs/cron_runs 模式 + enforce_gateway_auth,无新抽象。
  • 一处摩擦:repo-wide cargo clippy -- -D warningsgaryx-models/claude-agent-sdk 既有 lint(clippy 1.95 新规则 doc_lazy_continuation / collapsible_if / sort_by_key)阻塞,与本 PR 无关 —— 建议单独 tech-debt 清理(follow-up)。

GET /api/debug/system-cron-jobs lists schedule_followup-created system
cron jobs (filtered out of the user-facing /api/cron/jobs) with each
job's recent RunRecord history. Supports thread_id and since filters and
a configurable runs_limit. POST .../{id}/run is a system-only wrapper
around CronService::run_now (non-system or missing jobs return 404).

Routes live under the protected router so enforce_gateway_auth gates
them: loopback passes, otherwise a valid gateway token is required. No
separate debug-token surface is introduced. Existing /api/cron/jobs and
/api/cron/runs behavior is unchanged (additive, read-only).

Docs: new docs/schedule-followup-observability.md, cross-linked from
schedule-followup.md.
@Binlogo Binlogo merged commit fe19dde into Pyiner:main Jun 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant