Summary
Killing bun run dev routinely leaves orphaned processes behind: the in-process HTTP server's antfly child keeps 127.0.0.1:3738 bound, and (when the process is wedged) the dev process itself survives SIGTERM holding :3737. Multiple half-dead generations then overlap and produce phantom UI behavior. This predates the antfly-zig migration — the migration just made it much easier to hit.
Diagnosed live on a machine that had accumulated several generations overnight (2026-06-05/06).
Architecture (for context)
bun run dev → scripts/dev.ts, which imports server.ts in-process (there is no separate server child). Children of the one dev process: the tailwind watcher and (via the antfly adapter) the antfly swarm server. So "the server child holding 3737" is actually the dev process itself.
Three interacting defects
1. dev.ts's signal handlers preempt the lifecycle shutdown
scripts/dev.ts registerShutdown() registers SIGINT/SIGTERM handlers before await import('../server'):
process.on('SIGINT', () => { cleanup(); process.exit(0) }) // cleanup() only kills tailwind
process.on('SIGTERM', () => { cleanup(); process.exit(0) })
src/core/lifecycle.ts registerShutdownHandlers() later registers the real async shutdown (plugins → dispatch/watchdog/doctor → watcher → search.shutdown() → stopAntflyServer()) on the same signals. Node runs signal listeners in registration order: dev.ts's handler runs first and calls process.exit(0) synchronously, so the lifecycle handler never executes. The antfly child (spawned detached: false, but children survive parent death) is orphaned on 3738.
2. Unhandled EADDRINUSE crashes after full boot, skipping cleanup
server.ts calls server.listen(port, ...) with no 'error' listener, at the end of main() — after the watcher is started, dispatch state is written, and antfly is spawned/adopted. If another generation holds 3737, the EADDRINUSE 'error' event throws as an uncaught exception (verified under Bun 1.3.13: prints error: Failed to start server. Is port 34737 in use? and exits 1). An uncaught exception does not run the signal-based lifecycle shutdown → the freshly spawned/adopted antfly is orphaned again. Every retry against a squatted port mints another orphan.
3. When the process is deadlocked, no JS handler can run at all
The antfly-zig migration put the private instance's --data-dir at ~/.bakin/antfly/ — inside the chokidar-watched content dir. antfly's segment/WAL churn floods the watcher and deadlocks Bun natively (sampled: main thread + all 12 Bun Pool threads + the File Watcher thread parked on os_unfair_lock/__ulock_wait2, 0% CPU, every HTTP request hangs). A deadlocked event loop can't run SIGINT/SIGTERM JS handlers → operator escalates to kill -9 → antfly orphaned. (Observed directly: dev.ts survived two SIGTERMs, needed SIGKILL.)
The watcher fix (shouldIgnoreContentWatcherPath now ignores antfly/) ships on the migration branch (PR #457), as does a belt-and-braces sync process.on('exit') hook in the adapter that kills the antfly child on any JS-level exit. This issue tracks the dev-loop/server defects (1) and (2), which are independent of the migration.
Proposed fix (small, branch off main)
scripts/dev.ts: the dev shutdown handler should only own tailwind + fall-through exit while the server's lifecycle handlers aren't registered yet; once they are, it must NOT call process.exit(0) (kill tailwind, let the lifecycle listener — registered later on the same signal — run the full shutdown and exit).
server.ts: attach server.on('error', ...): on EADDRINUSE, log a clear message naming the port and the lsof -nP -iTCP:3737 -sTCP:LISTEN remediation, then run the same graceful-shutdown path (so the antfly child is stopped) and exit non-zero. Optionally pre-flight the bind early in main() to fail before any side effects.
Repro
bun run dev, wait for ready.
kill -TERM <dev pid> → dev exits, lsof -nP -iTCP:3738 -sTCP:LISTEN still shows antfly (defect 1).
bun run dev again (3738 squatter is adopted by the probe) — now squat 3737 with anything and start a third: full boot, then uncaught EADDRINUSE crash, antfly stays (defect 2).
Summary
Killing
bun run devroutinely leaves orphaned processes behind: the in-process HTTP server's antfly child keeps127.0.0.1:3738bound, and (when the process is wedged) the dev process itself survives SIGTERM holding:3737. Multiple half-dead generations then overlap and produce phantom UI behavior. This predates the antfly-zig migration — the migration just made it much easier to hit.Diagnosed live on a machine that had accumulated several generations overnight (2026-06-05/06).
Architecture (for context)
bun run dev→scripts/dev.ts, which importsserver.tsin-process (there is no separate server child). Children of the one dev process: the tailwind watcher and (via the antfly adapter) theantfly swarmserver. So "the server child holding 3737" is actually the dev process itself.Three interacting defects
1.
dev.ts's signal handlers preempt the lifecycle shutdownscripts/dev.tsregisterShutdown()registersSIGINT/SIGTERMhandlers beforeawait import('../server'):src/core/lifecycle.tsregisterShutdownHandlers()later registers the real async shutdown (plugins → dispatch/watchdog/doctor → watcher →search.shutdown()→stopAntflyServer()) on the same signals. Node runs signal listeners in registration order: dev.ts's handler runs first and callsprocess.exit(0)synchronously, so the lifecycle handler never executes. The antfly child (spawneddetached: false, but children survive parent death) is orphaned on 3738.2. Unhandled
EADDRINUSEcrashes after full boot, skipping cleanupserver.tscallsserver.listen(port, ...)with no'error'listener, at the end ofmain()— after the watcher is started, dispatch state is written, and antfly is spawned/adopted. If another generation holds 3737, theEADDRINUSE'error' event throws as an uncaught exception (verified under Bun 1.3.13: printserror: Failed to start server. Is port 34737 in use?and exits 1). An uncaught exception does not run the signal-based lifecycle shutdown → the freshly spawned/adopted antfly is orphaned again. Every retry against a squatted port mints another orphan.3. When the process is deadlocked, no JS handler can run at all
The antfly-zig migration put the private instance's
--data-dirat~/.bakin/antfly/— inside the chokidar-watched content dir. antfly's segment/WAL churn floods the watcher and deadlocks Bun natively (sampled: main thread + all 12 Bun Pool threads + the File Watcher thread parked onos_unfair_lock/__ulock_wait2, 0% CPU, every HTTP request hangs). A deadlocked event loop can't run SIGINT/SIGTERM JS handlers → operator escalates tokill -9→ antfly orphaned. (Observed directly: dev.ts survived two SIGTERMs, needed SIGKILL.)The watcher fix (
shouldIgnoreContentWatcherPathnow ignoresantfly/) ships on the migration branch (PR #457), as does a belt-and-braces syncprocess.on('exit')hook in the adapter that kills the antfly child on any JS-level exit. This issue tracks the dev-loop/server defects (1) and (2), which are independent of the migration.Proposed fix (small, branch off main)
scripts/dev.ts: the dev shutdown handler should only own tailwind + fall-through exit while the server's lifecycle handlers aren't registered yet; once they are, it must NOT callprocess.exit(0)(kill tailwind, let the lifecycle listener — registered later on the same signal — run the full shutdown and exit).server.ts: attachserver.on('error', ...): onEADDRINUSE, log a clear message naming the port and thelsof -nP -iTCP:3737 -sTCP:LISTENremediation, then run the same graceful-shutdown path (so the antfly child is stopped) and exit non-zero. Optionally pre-flight the bind early inmain()to fail before any side effects.Repro
bun run dev, wait for ready.kill -TERM <dev pid>→ dev exits,lsof -nP -iTCP:3738 -sTCP:LISTENstill shows antfly (defect 1).bun run devagain (3738 squatter is adopted by the probe) — now squat 3737 with anything and start a third: full boot, then uncaught EADDRINUSE crash, antfly stays (defect 2).