Skip to content

feat(orchestration): #307 D6 — slow-start batching primitive for fan-out spawns#479

Open
windoliver wants to merge 13 commits into
mainfrom
feat/307-slow-start-batching
Open

feat(orchestration): #307 D6 — slow-start batching primitive for fan-out spawns#479
windoliver wants to merge 13 commits into
mainfrom
feat/307-slow-start-batching

Conversation

@windoliver

Copy link
Copy Markdown
Owner

Closes #307.

Summary

  • Standalone, pure slowStartBatch primitive (k8s replica_set.go slowStartBatch style): geometric batches 1 → 2 → 4 → 8, halt the moment a batch sees a failure (later batches never dispatch).
  • Failure taxonomy from the issue follow-up: task/admissionhalted (report, wait for user; admission surfaces an actionable reason); backpressure (RuntimeUnavailableError) → throttled + bounded retryAfterMs for the controller to requeue. Terminal wins on mixed batches. Classifier is injectable (defaultFailureClassifier recognizes the existing AdmissionRejectError).
  • BatchStrategy config — the canonical home for spec.batchStrategy — with normalizeBatchStrategy validation and a deterministic, configurable-multiplier computeSlowStartBackoffMs (distinct from the jittered claim-retry backoff.ts:computeBackoffMs).
  • grove_spawn_batch_size metric surfaced via an onSpawnBatch observer callback (mirrors TaskController's onTransition/onError; feat(orchestration): spawn backpressure conditions + queue depth metrics #337 wires a real exporter later).
  • All in one cohesive module src/core/slow-start-batch.ts, re-exported from the core barrel.

Scope (deliberate)

No TaskGroup entity, no OwnerKind:"task", no live spawn-path wiring, no metrics exporter — there is no fan-out/TaskGroup component yet (TaskController reconciles single AgentTasks). This primitive is what #306 / #336 / #337 / #358 will call. Design + plan: docs/superpowers/specs/2026-06-08-307-slow-start-batching-design.md.

Acceptance mapping

  • Inject failure in first batch → subsequent batches don't fire — task failure → halted, attempted === 1
  • RuntimeUnavailable in batch-1 → no batch-2, recoverable throttled state — throttled + retryAfterMs
  • Admission reject (terminal) → halt + actionable reason — halted, reason surfaced ✅
  • Metric grove_spawn_batch_size per TaskGroup — onSpawnBatch({ taskGroupId, batchSize, ... })
  • Configurable via spec.batchStrategyBatchStrategy + normalizeBatchStrategy

Test Plan

  • bun test src/core/slow-start-batch.test.ts → 23 pass / 0 fail, 100% line+function coverage of the module
  • npx tsc --noEmit → clean (whole project)
  • pre-push typecheck + build gates green

windoliver added 13 commits June 8, 2026 15:32
Standalone pure slowStartBatch primitive (k8s replica_set.go style):
geometric batches 1->2->4->8, halt on first batch failure. Failure
taxonomy task|backpressure|admission -> halted/throttled. BatchStrategy
config (spec.batchStrategy home) + bounded computeBackoffMs. Metric
grove_spawn_batch_size via onSpawnBatch observer. No live wiring -
consumed by #306/#336/#337/#358.
6 TDD tasks: BatchStrategy+validation, computeBackoffMs, failure
classification, slowStartBatch runner+metric, failure-path coverage,
barrel exports. All in src/core/slow-start-batch.ts (pure, standalone).
- DEFAULT_MAX_BATCH_SIZE named constant (consistency)
- backoff baseMs/maxMs positive-int rejection coverage
- doc multiplier semantics; lock multiplier=1 acceptance with a test
  (fixed-size batches / constant backoff are intentionally valid)
- drop redundant '- offset' in initial batch size (offset is 0 here)
- comment-fill no-op spawn callbacks (biome noEmptyBlockStatements)
- add multiplier=1 termination + non-power-of-two final-batch tests
  through the public slowStartBatch API (lock termination + sizing)
…ackoff.ts collision)

src/core/backoff.ts already exports computeBackoffMs (full-jitter,
randomized, hardcoded x2 — for claim-retry thundering-herd). Ours is
deterministic + configurable-multiplier for predictable controller
requeue. Rename at the source so the core barrel exports it directly
with no alias; whole-project tsc --noEmit clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

D6: Slow-start batching for fan-out spawns

1 participant