grabber: release seize across sleep (fixes keyboard-dead-on-wake, same-id case)#49
Merged
Merged
Conversation
Root cause of the "keyboard dead on wake, same registry id" recurrence: we held the IOHIDManager seize across sleep. When the keyboard powers down mid-sleep while seized, the seize goes stale — on wake the manager still reports the device (same id, matched_count=1) but the event pipe is dead, and re-seizing in place can't revive it (only a full device re-enumeration does, which is why a process restart worked). Fix (Karabiner's pattern — devices are "ungrabbable while sleeping"): - New PowerNotify hook: on kIOMessageSystemWillSleep release the seize (keyboard falls back to the normal HID path), mark `sleeping`, then ack the sleep after a 1s delay so IOHIDManagerClose propagates before the device powers down. On kIOMessageSystemHasPoweredOn clear `sleeping` and re-acquire a fresh seize on the now-healthy device. - applyLatestRules and onDeviceChange respect `sleeping` (keep the seize torn down until wake), so nothing re-seizes mid-sleep. - This is the inverse of the old PowerNotify (which re-seized on wake, holding across sleep) — the trigger was right, the action was wrong. Observability (the gap that forced inference last time): - Log each seized device's IORegistry entry id, so a recurrence shows which device the seize holds vs the live keyboard (ioreg). - Prefix every grabber log line with a local [HH:MM:SS] so events correlate directly with `pmset -g log` sleep/wake times. Known minor noise (newly visible): DeviceNotify also matches our own vhidd VirtualHIDKeyboard, causing one spurious idempotent re-seize when it enumerates. Not a loop; left as a follow-up.
skhd-grabber --version printed a hardcoded "skhd-grabber (D1 skeleton)" because the grabber module never got the generated VERSION embed. Wire addVersionImport into grabber_mod and print the real build string (matching skhd's "vX.Y-dev-<hash>-<mode>" format). --status now prints both daemons' versions. The grabber's is queried LIVE over IPC: it returns its build version in the hello-ok reply, and the agent client captures it, so --status shows the actually-running grabber (not the on-disk binary). skhd's is this binary's own version, which equals the running agent (same install path; the install flow can't swap a running binary's inode without stopping it first). Backward-compatible: grabber_version is an extra json field on the hello-ok; older agents ignore it, and an older grabber that omits it shows as "not running" in the version line.
[HH:MM:SS] alone is ambiguous when the daemon sleeps for days. Use the full local [YYYY-MM-DD HH:MM:SS] (matches pmset -g log) so a sleep span across days reads unambiguously.
…ded) A forever-running daemon's ReleaseFast log (warn+) must only grow on anomalies, not routine operation. 'keyboard matched/terminated' and 'keyboard enumeration changed — re-seizing' fire on every wake / USB plug / vhidd reconnect — routine, not errors. Demote to info (compiled out of ReleaseFast, still visible in a ReleaseSafe diagnostic build). The FAILURE paths (rebuild failed, post-wake re-seize failed) stay warn.
main.zig logged the post-.path PATH at warn unconditionally on every start — a routine startup diagnostic, not an error (its sibling, the inherited-PATH line, is already info). Demote so the agent's ReleaseFast log doesn't carry a verbose PATH line per restart. Audit of the agent's warn/err logging otherwise: event-tap handlers are all catch-only (no per-keystroke noise); remaining warn/err are genuine anomalies or the once-per-restart skhd startup banner (kept at warn as a deliberate restart marker).
The '=== skhd vX started at … (PID) ===' session marker was at warn only so it'd show in release logs — the same info-kept-at-warn-for-diagnosis pattern as the PATH line. It's informational, not a problem, so it's info now. Version/start info is still discoverable via skhd --version and --status (which now reports both running daemons' versions).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Even after PR #48 (DeviceNotify re-seizes when the keyboard re-enumerates), the keyboard still died on wake in a distinct second failure mode: the builtin keyboard came back with its IORegistry id unchanged (
seized device entry_ididentical before and after sleep), so nothing re-enumerated, DeviceNotify correctly fired nothing, and an in-place re-seize couldn't revive it — only a full process restart did.Root cause
We held the IOHIDManager seize across the sleep power transition. When the keyboard powers down mid-sleep while seized, the seized connection goes stale — on wake the manager still reports the device (same id,
matched_count=1) but the event pipe is dead, and re-opening seize in place doesn't revive it.Karabiner-Elements avoids this in its grabber: devices are "ungrabbable while
system_sleeping" (make_grabbable_state) — it releases the grab on will-sleep and re-grabs on wake; it does not otherwise re-grab on wake. We adopt the same lifecycle.Fix
New
PowerNotifyhook (the inverse of the one removed in PR #48 — right trigger, correct action this time):kIOMessageSystemWillSleep→ tear down the seize (keyboard falls back to the normal HID path), setdaemon.sleeping = true, then ack the sleep after a 1s delay soIOHIDManagerClosepropagates before the device loses power (matches Karabiner's delayed ack).kIOMessageSystemHasPoweredOn→ clearsleeping, re-acquire a fresh seize on the now-healthy device.applyLatestRules/onDeviceChangerespectsleeping(no re-seize mid-sleep).Observability (the gap that forced inference last time)
[YYYY-MM-DD HH:MM:SS](matchespmset -g log), so a multi-day sleep span is unambiguous.entry_idis logged, so a recurrence shows which device the seize holds vs the live keyboard.Also in this PR
skhd-grabber --versionnow reports the real build (was hardcoded"D1 skeleton"— the grabber module never got the VERSION embed).skhd --statusprints both daemons' versions; the grabber's is queried live over IPC (returned in the hello-ok reply), so it reflects the actually-running daemon — surfaces the "agent updated, grabber not restarted" mismatch at a glance.keyboard matched/terminated,enumeration changed — re-seizing) and informational startup lines (PATH-applied, skhd start banner) demoted from warn → info, so a ReleaseFast release log grows only on real anomalies. Audit confirmed remaining warn/err are genuine failures.Validation — 11-day soak, one long-lived process
Deployed ReleaseSafe; the grabber held one process (PID 1150) for 11 days with no restart/crash, across multiple real sleep cycles:
0 post-wake failures, 0 device-rebuild failures. Every
releasing seize before sleeppaired with a cleanre-acquiringon wake; keyboard worked throughout (no SSH-to-restart). The pre-fix bug would have killed the keyboard within any one of these cycles.Tests
zig build testgreen.Follow-ups (deferred)
newsyslogat--install-grabbertime (bound/var/log/skhd-grabber.logfor a daemon that runs for years).