Skip to content

Milestone 7: long-running outbox worker process#12

Merged
rodrigobnogueira merged 1 commit into
mainfrom
milestone/07-outbox-worker
May 25, 2026
Merged

Milestone 7: long-running outbox worker process#12
rodrigobnogueira merged 1 commit into
mainfrom
milestone/07-outbox-worker

Conversation

@rodrigobnogueira
Copy link
Copy Markdown
Contributor

Summary

  • Brief §12 milestone 7: wraps milestone-6's OutboxClaimer in a polling loop with graceful shutdown so the post-commit side-effect path runs as a separate process from the API.
  • Manual smoke + 3 new integration tests; 45/45 total tests pass; npm run ci exits 0 locally.

Changes

  • scripts/start-worker.ts:
    • Boots a headless Nest application context, resolves OutboxClaimer from DI.
    • Polls on OUTBOX_POLL_MS (default 2_000ms); each iteration ticks once then waits.
    • Graceful shutdown: SIGTERM/SIGINT abort an AbortController that interrupts the in-flight await delay(..., { signal }), lets the current tick drain, closes the Nest container, exits cleanly (verified manually: signal → "received SIGTERM, draining current tick…" → "outbox worker stopped cleanly" → exit 0).
    • Per-tick errors are logged and the loop continues. Per-event retry/backoff/failure is the claimer's responsibility (from milestone 6).
    • The loop body is exported as runWorkerLoop(claimer, { pollIntervalMs, claimer, signal }) so tests can drive it with an AbortController without spawning a child process.
  • src/config/env.ts: typed outbox block — pollIntervalMs, batchSize, stuckTimeoutMs, workerInstanceId — read from OUTBOX_POLL_MS, OUTBOX_BATCH_SIZE, OUTBOX_STUCK_TIMEOUT_MS, OUTBOX_WORKER_ID. Defaults match brief §8.
  • npm run start:worker script entry point.
  • test/integration/outbox-worker.spec.ts — 3 tests:
    • drain: a freshly-invited user's user.invited outbox row flips to completed, claimed by the configured worker id; FakeEmailTransport records exactly one matching email.
    • prompt abort: with a 30s poll interval, AbortController.abort() mid-wait returns the loop within 5 seconds.
    • tick failure resilience: a synthetic throw from the first tick() is logged; subsequent iterations run normally.

Modules Touched

  • organizations / users / memberships
  • projects
  • audit-log
  • outbox (worker process wrapper around the existing claimer)
  • auth / context
  • trpc
  • database
  • Tooling / CI (npm run start:worker, OUTBOX_* env)

Public Surface (libraries)

  • No use of library internals introduced.
  • Worker uses only NestFactory.createApplicationContext + DI resolution — no nest-trpc-native / nest-drizzle-native surface is touched here.

Security Review

  • Auth bypass risk — the worker is a server-side process, not exposed via any HTTP/tRPC surface. It reads from the DB, invokes registered handlers, and writes status updates. No new auth surface.
  • Input validation — env values (OUTBOX_POLL_MS, OUTBOX_BATCH_SIZE, OUTBOX_STUCK_TIMEOUT_MS) are parsed via Number.parseInt with rejection for NaN/non-positive values (Invalid <NAME> thrown by loadEnv()).
  • Injection / path traversal / unsafe dynamic execution / unsafe deserialization — n/a. Loop is a polling controller; the actual DB queries are Drizzle-parameterized from milestone 6.
  • Secret leakage — worker logs only structural metadata (db=<path>, poll=<ms>, batch, stuck, claim counts). No payloads, tokens, or PII.
  • Signal handling — registered handlers are removed in finally; double-signal is idempotent (if (shuttingDown) return). No risk of partial shutdown loop.
  • Exactly-once delivery — already covered by the claimer's atomic claim + status state machine and the worker crash recovery test from milestone 6. The worker-loop tests in this PR additionally exercise the live polling path.
  • No unresolved high-risk security finding remains.

Dependency Review

  • No dependency or lockfile changes — runWorkerLoop uses node:timers/promises' setTimeout and AbortSignal, both Node built-ins.

Migrations

  • No schema changes.

Validation

  • npm run typecheck
  • npm run lint
  • npm run complexity:check
  • npm run test:cov — 45/45 tests; 91.43% statements / 95.55% functions
  • npm run security:audit — exits 0 (4 moderate dev-only findings unchanged)
  • npm run build
  • npm run smoke
  • Manual smoke: OUTBOX_POLL_MS=300 OUTBOX_WORKER_ID=manual-smoke npm run start:worker → boots, ticks, sends SIGTERM via timeout, drains, logs "stopped cleanly".

Release Notes

  • Release impact: CHANGELOG.md updated under [Unreleased] with milestone-7 entry.

Wraps the OutboxClaimer (milestone 6) in a polling loop with graceful
shutdown so the post-commit side-effect path can run as a separate
process from the API.

scripts/start-worker.ts boots a headless Nest application context,
resolves OutboxClaimer from DI, and ticks on a configurable interval
(default 2s). The loop is signal-cancellable: SIGTERM/SIGINT abort an
AbortController that interrupts the in-flight wait, lets the current
tick drain, closes the Nest container, and exits cleanly. Per-tick
errors are logged and the loop continues — per-event retry/backoff/
failure is already handled by the claimer.

The loop body is exported as runWorkerLoop(claimer, { pollIntervalMs,
claimer, signal }) so tests drive it with an AbortController without
spawning a child process.

Worker config flows through src/config/env.ts as a typed `outbox`
block: OUTBOX_POLL_MS (default 2_000), OUTBOX_BATCH_SIZE (32),
OUTBOX_STUCK_TIMEOUT_MS (60_000), OUTBOX_WORKER_ID (host-pid by
default in the claimer). All match brief §8.

Tests cover three properties:
  - drain: pending event flips to completed via the loop, FakeEmailTransport
    records exactly one matching email
  - prompt abort: with a 30s poll interval, AbortController.abort()
    mid-wait returns within 5 seconds
  - tick failure resilience: a synthetic throw on the first tick is
    logged; subsequent ticks run normally

Manual smoke confirmed via `npm run start:worker`: boots, ticks,
SIGTERM → drain → "stopped cleanly" → exit 0.

Local npm run ci passes: 45/45 tests, 91.43% statements, 95.55% functions.
@rodrigobnogueira rodrigobnogueira merged commit 2810573 into main May 25, 2026
2 checks passed
rodrigobnogueira added a commit that referenced this pull request May 25, 2026
Milestone 7: long-running outbox worker process
@rodrigobnogueira rodrigobnogueira deleted the milestone/07-outbox-worker branch May 25, 2026 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant