Skip to content

grabber: release seize across sleep (fixes keyboard-dead-on-wake, same-id case)#49

Merged
jackielii merged 6 commits into
mainfrom
grabber-release-seize-on-sleep
Jun 24, 2026
Merged

grabber: release seize across sleep (fixes keyboard-dead-on-wake, same-id case)#49
jackielii merged 6 commits into
mainfrom
grabber-release-seize-on-sleep

Conversation

@jackielii

Copy link
Copy Markdown
Owner

Problem

Even after PR #48 (DeviceNotify re-seizes when the keyboard re-enumerates), the keyboard still died on wake in a distinct second failure mode: the builtin keyboard came back with its IORegistry id unchanged (seized device entry_id identical before and after sleep), so nothing re-enumerated, DeviceNotify correctly fired nothing, and an in-place re-seize couldn't revive it — only a full process restart did.

Root cause

We held the IOHIDManager seize across the sleep power transition. When the keyboard powers down mid-sleep while seized, the seized connection goes stale — on wake the manager still reports the device (same id, matched_count=1) but the event pipe is dead, and re-opening seize in place doesn't revive it.

Karabiner-Elements avoids this in its grabber: devices are "ungrabbable while system_sleeping" (make_grabbable_state) — it releases the grab on will-sleep and re-grabs on wake; it does not otherwise re-grab on wake. We adopt the same lifecycle.

Fix

New PowerNotify hook (the inverse of the one removed in PR #48 — right trigger, correct action this time):

  • kIOMessageSystemWillSleep → tear down the seize (keyboard falls back to the normal HID path), set daemon.sleeping = true, then ack the sleep after a 1s delay so IOHIDManagerClose propagates before the device loses power (matches Karabiner's delayed ack).
  • kIOMessageSystemHasPoweredOn → clear sleeping, re-acquire a fresh seize on the now-healthy device.
  • applyLatestRules / onDeviceChange respect sleeping (no re-seize mid-sleep).

Observability (the gap that forced inference last time)

  • Every grabber log line is prefixed with local [YYYY-MM-DD HH:MM:SS] (matches pmset -g log), so a multi-day sleep span is unambiguous.
  • Each seized device's IORegistry entry_id is logged, so a recurrence shows which device the seize holds vs the live keyboard.

Also in this PR

  • skhd-grabber --version now reports the real build (was hardcoded "D1 skeleton" — the grabber module never got the VERSION embed).
  • skhd --status prints both daemons' versions; the grabber's is queried live over IPC (returned in the hello-ok reply), so it reflects the actually-running daemon — surfaces the "agent updated, grabber not restarted" mismatch at a glance.
  • Log-level hygiene for a forever-running daemon: per-wake routine events (keyboard matched/terminated, enumeration changed — re-seizing) and informational startup lines (PATH-applied, skhd start banner) demoted from warn → info, so a ReleaseFast release log grows only on real anomalies. Audit confirmed remaining warn/err are genuine failures.

Validation — 11-day soak, one long-lived process

Deployed ReleaseSafe; the grabber held one process (PID 1150) for 11 days with no restart/crash, across multiple real sleep cycles:

Cycle (same process) released → re-acquired span
06-13 13:24 → 19:45 ✓ → ✓ 6.3 h
06-14 10:35 → 06-16 09:40 ✓ → ✓ ~2 days
06-16 10:02 → 06-24 13:37 ✓ → ✓ ~8 days

0 post-wake failures, 0 device-rebuild failures. Every releasing seize before sleep paired with a clean re-acquiring on wake; keyboard worked throughout (no SSH-to-restart). The pre-fix bug would have killed the keyboard within any one of these cycles.

Tests

zig build test green.

Follow-ups (deferred)

  • Log rotation via newsyslog at --install-grabber time (bound /var/log/skhd-grabber.log for a daemon that runs for years).
  • DeviceNotify also matches our own vhidd VirtualHIDKeyboard → one idempotent spurious re-seize at startup/wake (harmless, not a loop).

Root cause of the "keyboard dead on wake, same registry id" recurrence:
we held the IOHIDManager seize across sleep. When the keyboard powers
down mid-sleep while seized, the seize goes stale — on wake the manager
still reports the device (same id, matched_count=1) but the event pipe
is dead, and re-seizing in place can't revive it (only a full device
re-enumeration does, which is why a process restart worked).

Fix (Karabiner's pattern — devices are "ungrabbable while sleeping"):
- New PowerNotify hook: on kIOMessageSystemWillSleep release the seize
  (keyboard falls back to the normal HID path), mark `sleeping`, then ack
  the sleep after a 1s delay so IOHIDManagerClose propagates before the
  device powers down. On kIOMessageSystemHasPoweredOn clear `sleeping`
  and re-acquire a fresh seize on the now-healthy device.
- applyLatestRules and onDeviceChange respect `sleeping` (keep the seize
  torn down until wake), so nothing re-seizes mid-sleep.
- This is the inverse of the old PowerNotify (which re-seized on wake,
  holding across sleep) — the trigger was right, the action was wrong.

Observability (the gap that forced inference last time):
- Log each seized device's IORegistry entry id, so a recurrence shows
  which device the seize holds vs the live keyboard (ioreg).
- Prefix every grabber log line with a local [HH:MM:SS] so events
  correlate directly with `pmset -g log` sleep/wake times.

Known minor noise (newly visible): DeviceNotify also matches our own
vhidd VirtualHIDKeyboard, causing one spurious idempotent re-seize when
it enumerates. Not a loop; left as a follow-up.
skhd-grabber --version printed a hardcoded "skhd-grabber (D1 skeleton)"
because the grabber module never got the generated VERSION embed. Wire
addVersionImport into grabber_mod and print the real build string
(matching skhd's "vX.Y-dev-<hash>-<mode>" format).

--status now prints both daemons' versions. The grabber's is queried
LIVE over IPC: it returns its build version in the hello-ok reply, and
the agent client captures it, so --status shows the actually-running
grabber (not the on-disk binary). skhd's is this binary's own version,
which equals the running agent (same install path; the install flow
can't swap a running binary's inode without stopping it first).

Backward-compatible: grabber_version is an extra json field on the
hello-ok; older agents ignore it, and an older grabber that omits it
shows as "not running" in the version line.
[HH:MM:SS] alone is ambiguous when the daemon sleeps for days. Use the
full local [YYYY-MM-DD HH:MM:SS] (matches pmset -g log) so a sleep span
across days reads unambiguously.
…ded)

A forever-running daemon's ReleaseFast log (warn+) must only grow on
anomalies, not routine operation. 'keyboard matched/terminated' and
'keyboard enumeration changed — re-seizing' fire on every wake / USB
plug / vhidd reconnect — routine, not errors. Demote to info (compiled
out of ReleaseFast, still visible in a ReleaseSafe diagnostic build).
The FAILURE paths (rebuild failed, post-wake re-seize failed) stay warn.
main.zig logged the post-.path PATH at warn unconditionally on every
start — a routine startup diagnostic, not an error (its sibling, the
inherited-PATH line, is already info). Demote so the agent's ReleaseFast
log doesn't carry a verbose PATH line per restart.

Audit of the agent's warn/err logging otherwise: event-tap handlers are
all catch-only (no per-keystroke noise); remaining warn/err are genuine
anomalies or the once-per-restart skhd startup banner (kept at warn as a
deliberate restart marker).
The '=== skhd vX started at … (PID) ===' session marker was at warn only
so it'd show in release logs — the same info-kept-at-warn-for-diagnosis
pattern as the PATH line. It's informational, not a problem, so it's
info now. Version/start info is still discoverable via skhd --version
and --status (which now reports both running daemons' versions).
@jackielii jackielii merged commit 6819863 into main Jun 24, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant