Skip to content

[draft/foundation] Executor-drain settle(): non-starvable quiescence (coupled to expect follow-up)#23

Closed
mansbernhardt wants to merge 2 commits into
mainfrom
claude/settle-drain
Closed

[draft/foundation] Executor-drain settle(): non-starvable quiescence (coupled to expect follow-up)#23
mansbernhardt wants to merge 2 commits into
mainfrom
claude/settle-drain

Conversation

@mansbernhardt

@mansbernhardt mansbernhardt commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

STATUS: DRAFT / FOUNDATION — not independently mergeable. This lands the executor-drain infrastructure and settle() resolving on it, opt-in (SWIFT_MODEL_EXPERIMENTAL_DRAIN=1), inert by default. It is not a standalone, turn-on fix — see "Coupling" below. It becomes the on-by-default fix once expect/waitUntil are migrated (follow-up, same effort).

What

settle() (and the settle phase of expect { … }) resolves on the model's executor-drain fixpoint — a non-starvable quiescence signal — instead of the .deferentialDispatchQueue.global(qos: .background) quiet-check that macOS starves under parallel load. The settle mechanism is validated standalone: flag-on, the 72 false settle() timed out failures disappear; flag-off, the suite is unchanged.

Coupling — why it can't be on by default yet

Making settle use the drive requires the per-test executor to be on, which routes all model tasks through one shared GCD queue. That queue is slightly slower than the cooperative pool under parallel load — enough to trip expect's wall-clock budget in latency-sensitive clock tests. Verified: flipping the executor on by default regresses childTasksCompleteBeforeTeardown and testImmediateClock (both green on main at normal parallel). So enabling the drive is coupled to making expect/waitUntil drive-primary too (so they don't depend on a wall-clock budget). Until that lands, this stays opt-in.

How (settle slice)

  • _DrainTestExecutor: one shared concurrent GCD queue backs every per-test executor; each keeps its own outstanding-job count + event-driven waitUntilIdleOrDeadline.
  • Model task bodies adopt it via executorPreference under .modelTesting; the trait installs a per-test executor box.
  • _driveToStableFixpoint: executor idle + per-test bg-idle + no pending-start, persisted for a short non-starvable grace (debounces a clock-parked task's resume). mainCall excluded (process-global).
  • waitUntilSettled resolves on the fixpoint; a generous watchdog catches a true deadlock.

Validation

  • Flag ON: settle suites green incl. a 60-iteration load-stressed test; full-parallel = 0 settle timeouts.
  • Flag OFF (default): broad regression green; CI green ×3 (inert path = main).

Follow-up (the coupled remainder)

expect/waitUntil drive-primary — see docs/test-determinism-executor-drain.md (Updates 7–10) for the open problems (scaling, the fast-fail-vs-delayed-resume race, the long tail). Once done, the executor flips on by default and this becomes the real fix.

CHANGELOG: deliberately not added until the on-by-default fix is complete.

🤖 Generated with Claude Code

`settle()` (and the settle phase of `expect { … }`) resolves on the model's
executor-drain FIXPOINT instead of a `.deferential`/`.background`-QoS quiet-check.
Under heavy parallel load macOS starves `.background` indefinitely, so the
quiet-check never fired and settle reported a false `settle() timed out: model
still has active tasks` (empty task list) at ANY budget — the years-old flake the
serial-CI fallback and SWIFT_MODEL_TIMEOUT_SCALE were working around. The drain
signal is non-starvable (a job-count + GTS, never `.background`) and
dependency-free, so settle waits as long as necessary under load and resolves the
instant the model is genuinely quiescent.

How it works:
- `_DrainTestExecutor`: one shared concurrent GCD queue backs every per-test
  executor (avoids per-test thread-pool explosion); each keeps its own
  outstanding-job count + event-driven `waitUntilIdleOrDeadline`.
- Model task bodies (`node.task`/`forEach`) adopt it via `executorPreference`
  under `.modelTesting`; the trait installs a per-test executor box.
- `_driveToStableFixpoint`: quiescent = executor idle + per-test bg-idle + no
  pending-start task, persisted for a short NON-STARVABLE grace that debounces
  against ALL activity (every `_noteActivity` + executor enqueue) so a clock-
  parked task's resume resets it. `mainCall` excluded (process-global).
- `waitUntilSettled` resolves on the fixpoint; a generous watchdog only catches
  a true deadlock.

OPT-IN via SWIFT_MODEL_EXPERIMENTAL_DRAIN=1 — inert by default (executor box is
nil → every wait keeps its current path), so the suite is unchanged unless
enabled. Validated: flag ON, settle suites green incl. a 60-iteration
load-stressed child-task settle test; flag OFF, broad regression green
(unchanged). Custom task executors need Swift 6 runtime (macOS 15+); older
OS/WASM stay on the existing path.

`expect`/`waitUntil` drive-primary migration is deliberately NOT in this PR (the
fixpoint-as-fail judgment has open scaling/race work) — follow-up. Full design
arc in docs/test-determinism-executor-drain.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ve-primary (executor routing regresses expect+clock tests under load)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mansbernhardt mansbernhardt marked this pull request as draft June 17, 2026 09:35
@mansbernhardt mansbernhardt changed the title Executor-drain settle(): load-independent quiescence (opt-in) [draft/foundation] Executor-drain settle(): non-starvable quiescence (coupled to expect follow-up) Jun 17, 2026
@mansbernhardt

Copy link
Copy Markdown
Collaborator Author

Superseded by #24, which contains this settle work plus the expect/waitUntil drive-primary follow-up as a single combined PR (retargeted to main). Closing in favor of #24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant