fix: Windows desktop app (daemon refuses to start; ships wrong-OS binary)#257
Open
harshitsinghbhandari wants to merge 2 commits into
Open
fix: Windows desktop app (daemon refuses to start; ships wrong-OS binary)#257harshitsinghbhandari wants to merge 2 commits into
harshitsinghbhandari wants to merge 2 commits into
Conversation
On Windows the desktop supervisor can only TerminateProcess the daemon (no POSIX signal reaches a detached child), so the daemon's graceful shutdown never runs and ~/.ao/running.json is never removed. The leaked file survives into the next launch, and because Windows reuses PIDs aggressively the recorded PID usually belongs to an unrelated process. The startup pre-flight trusted PID liveness alone (runfile.CheckStale -> processalive.Alive), so it concluded a daemon was "already running" and exited with "refusing to start" on every restart. A dead daemon then makes the renderer's loopback REST calls (e.g. Spawn Orchestrator) fail silently. Verify the recorded port is actually served by an AO daemon with the recorded PID (a /healthz probe matching service + pid, the same ground truth inspectDaemon already uses) before refusing. A run-file left by a crashed, hard-killed, or reused-PID predecessor is treated as stale and overwritten, so startup is robust to a leaked run-file from any cause. Fixes #256
build-daemon.mjs compiles the bundled `ao` daemon with the build host's
GOOS and names it off the host platform (ao.exe only when the builder is
Windows). The release workflow ran only on macos-latest, so a Windows
package would ship a macOS binary named `ao` with no `ao.exe`, and the
app could not launch a valid Windows daemon ("This program cannot be run
in DOS mode" / binary not found).
Run the release as a per-OS matrix (macOS + Windows) so host == target
and each installer bundles a daemon compiled for its own platform, and
pin the Go toolchain with setup-go since build-daemon needs it on every
runner.
Fixes #235
Contributor
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the two reported Windows failures. They are distinct root causes at different layers, both surfacing as "AO doesn't work on Windows," so they're fixed together.
#256 — daemon "refusing to start"; Spawn Orchestrator does nothing
Root cause. The Electron supervisor launches
ao daemon, whose startup pre-flight (runfile.CheckStale→processalive.Alive) decided "a daemon is already running" from PID liveness alone. On Windows the supervisor can onlyTerminateProcessthe daemon (a negative-PID process-groupSIGTERMis unsupported, andchild.kill("SIGTERM")maps to a hard kill that delivers no signal to the Go process), so the daemon'ssignal.NotifyContextgraceful shutdown never runs and~/.ao/running.jsonis never removed. The leaked file survives into the next launch, and because Windows reuses PIDs aggressively the recorded PID usually belongs to an unrelated live process — soAlivereturns true, the daemon prints…refusing to start, and exits 1. With the daemon dead, the renderer's loopback REST stays pointed at an unbound port, so the Spawn Orchestrator POST rejects and is swallowed by the click handler'scatch→ "nothing happens."This is genuinely Windows-specific: on Unix the same kill shuts the daemon down gracefully and removes the run-file, so the stale-file window never opens.
Fix. Before refusing, verify the recorded port is actually served by an AO daemon with the recorded PID (a
/healthzprobe matchingservice+pid— the same ground truthinspectDaemonalready uses forao status/ao stop). A run-file left by a crashed, hard-killed, or reused-PID predecessor is treated as stale and overwritten, so startup self-heals regardless of how the file leaked.backend/internal/daemon/stale.go—runFileOwnerServingownership probe (+ table tests instale_test.go)backend/internal/daemon/daemon.go— gate the "refusing to start" path on the probe#235 — "This program cannot be run in DOS mode"
Root cause.
frontend/scripts/build-daemon.mjscompiles the bundled daemon with the build host'sGOOSand names it off the host platform (ao.exeonly when the builder is Windows). The release workflow ran only onmacos-latestandforge.config.tsbundlesdaemon/into every target, so a Windows package would ship a macOS binary namedaowith no nativeao.exe— the app cannot launch a valid Windows daemon.Fix. Run the desktop release as a per-OS matrix (macOS + Windows) so host == target and each installer bundles a daemon compiled for its own platform; pin the Go toolchain with
setup-gosincebuild-daemonruns on every runner. (electron-forge already gates each maker to its own platform — squirrel→win32, zip→darwin — so each job emits the right artifact.).github/workflows/frontend-release.ymlTesting
go build ./...andgo test -race ./...— 1368 passed (68 packages), incl. newrunFileOwnerServingcases (matching owner, reused PID, foreign service, non-2xx, bad body, dead port, nil/zero port).desktop-v*tags / manual dispatch, so it cannot be exercised by PR CI; the matrix +setup-gochange is validated by review.Noted follow-up (not in this PR)
The run-file still leaks on Windows because the supervisor hard-kills the daemon; this PR makes that leak harmless (reaped on next start). A cleaner follow-up is to have the Electron supervisor stop the daemon via the existing
POST /shutdownloopback endpoint (graceful, cross-platform) instead ofSIGTERM, and to reconsiderdetached: trueon Windows. Kept out here to stay focused and because it's a separately-testable main-process change.🤖 Generated with Claude Code