Skip to content

fix: Windows desktop app (daemon refuses to start; ships wrong-OS binary)#257

Open
harshitsinghbhandari wants to merge 2 commits into
mainfrom
fix/windows-daemon-startup-and-packaging
Open

fix: Windows desktop app (daemon refuses to start; ships wrong-OS binary)#257
harshitsinghbhandari wants to merge 2 commits into
mainfrom
fix/windows-daemon-startup-and-packaging

Conversation

@harshitsinghbhandari

Copy link
Copy Markdown
Collaborator

Fixes the two reported Windows failures. They are distinct root causes at different layers, both surfacing as "AO doesn't work on Windows," so they're fixed together.

#256 — daemon "refusing to start"; Spawn Orchestrator does nothing

Root cause. The Electron supervisor launches ao daemon, whose startup pre-flight (runfile.CheckStaleprocessalive.Alive) decided "a daemon is already running" from PID liveness alone. On Windows the supervisor can only TerminateProcess the daemon (a negative-PID process-group SIGTERM is unsupported, and child.kill("SIGTERM") maps to a hard kill that delivers no signal to the Go process), so the daemon's signal.NotifyContext graceful shutdown never runs and ~/.ao/running.json is never removed. The leaked file survives into the next launch, and because Windows reuses PIDs aggressively the recorded PID usually belongs to an unrelated live process — so Alive returns true, the daemon prints …refusing to start, and exits 1. With the daemon dead, the renderer's loopback REST stays pointed at an unbound port, so the Spawn Orchestrator POST rejects and is swallowed by the click handler's catch → "nothing happens."

This is genuinely Windows-specific: on Unix the same kill shuts the daemon down gracefully and removes the run-file, so the stale-file window never opens.

Fix. Before refusing, verify the recorded port is actually served by an AO daemon with the recorded PID (a /healthz probe matching service + pid — the same ground truth inspectDaemon already uses for ao status/ao stop). A run-file left by a crashed, hard-killed, or reused-PID predecessor is treated as stale and overwritten, so startup self-heals regardless of how the file leaked.

  • backend/internal/daemon/stale.gorunFileOwnerServing ownership probe (+ table tests in stale_test.go)
  • backend/internal/daemon/daemon.go — gate the "refusing to start" path on the probe

#235 — "This program cannot be run in DOS mode"

Root cause. frontend/scripts/build-daemon.mjs compiles the bundled daemon with the build host's GOOS and names it off the host platform (ao.exe only when the builder is Windows). The release workflow ran only on macos-latest and forge.config.ts bundles daemon/ into every target, so a Windows package would ship a macOS binary named ao with no native ao.exe — the app cannot launch a valid Windows daemon.

Fix. Run the desktop release as a per-OS matrix (macOS + Windows) so host == target and each installer bundles a daemon compiled for its own platform; pin the Go toolchain with setup-go since build-daemon runs on every runner. (electron-forge already gates each maker to its own platform — squirrel→win32, zip→darwin — so each job emits the right artifact.)

  • .github/workflows/frontend-release.yml

Testing

  • go build ./... and go test -race ./... — 1368 passed (68 packages), incl. new runFileOwnerServing cases (matching owner, reused PID, foreign service, non-2xx, bad body, dead port, nil/zero port).
  • The release workflow only runs on desktop-v* tags / manual dispatch, so it cannot be exercised by PR CI; the matrix + setup-go change is validated by review.

Noted follow-up (not in this PR)

The run-file still leaks on Windows because the supervisor hard-kills the daemon; this PR makes that leak harmless (reaped on next start). A cleaner follow-up is to have the Electron supervisor stop the daemon via the existing POST /shutdown loopback endpoint (graceful, cross-platform) instead of SIGTERM, and to reconsider detached: true on Windows. Kept out here to stay focused and because it's a separately-testable main-process change.

🤖 Generated with Claude Code

On Windows the desktop supervisor can only TerminateProcess the daemon
(no POSIX signal reaches a detached child), so the daemon's graceful
shutdown never runs and ~/.ao/running.json is never removed. The leaked
file survives into the next launch, and because Windows reuses PIDs
aggressively the recorded PID usually belongs to an unrelated process.
The startup pre-flight trusted PID liveness alone (runfile.CheckStale ->
processalive.Alive), so it concluded a daemon was "already running" and
exited with "refusing to start" on every restart. A dead daemon then
makes the renderer's loopback REST calls (e.g. Spawn Orchestrator) fail
silently.

Verify the recorded port is actually served by an AO daemon with the
recorded PID (a /healthz probe matching service + pid, the same ground
truth inspectDaemon already uses) before refusing. A run-file left by a
crashed, hard-killed, or reused-PID predecessor is treated as stale and
overwritten, so startup is robust to a leaked run-file from any cause.

Fixes #256
build-daemon.mjs compiles the bundled `ao` daemon with the build host's
GOOS and names it off the host platform (ao.exe only when the builder is
Windows). The release workflow ran only on macos-latest, so a Windows
package would ship a macOS binary named `ao` with no `ao.exe`, and the
app could not launch a valid Windows daemon ("This program cannot be run
in DOS mode" / binary not found).

Run the release as a per-OS matrix (macOS + Windows) so host == target
and each installer bundles a daemon compiled for its own platform, and
pin the Go toolchain with setup-go since build-daemon needs it on every
runner.

Fixes #235

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant