Skip to content

grabber: event-driven watchdog to recover a dead keyboard after DarkWake/hibernate#48

Merged
jackielii merged 5 commits into
mainfrom
grabber-seize-liveness-watchdog
Jun 9, 2026
Merged

grabber: event-driven watchdog to recover a dead keyboard after DarkWake/hibernate#48
jackielii merged 5 commits into
mainfrom
grabber-seize-liveness-watchdog

Conversation

@jackielii

@jackielii jackielii commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Problem

The grabber's IOHIDManager seize can silently die while the process, IPC, and vhidd stay healthy: the built-in keyboard re-enumerates during a DarkWake from Deep Idle / hibernate wake (old IORegistry entry terminates, a new one appears), but that wake never delivers kIOMessageSystemHasPoweredOn, so any wake-driven recovery never fires and the seize keeps holding the dead device. Result: keyboard unresponsive until the daemon is restarted (the user had to SSH in).

Every prior fix in this family (vhidd-disconnect recovery, a PowerNotify re-seize on wake, matching tweaks) was trigger-based and listened on triggers this failure doesn't raise — a DarkWake sends no power-on message.

Fix: event-driven re-seize on device re-enumeration

New DeviceNotify.zig subscribes to the IOKit registry directly — IOServiceAddMatchingNotification for kIOFirstMatch + kIOTerminated on a keyboard matching dict (IOHIDDevice, PrimaryUsagePage=GenericDesktop, PrimaryUsage=Keyboard). The kernel fires the callback exactly when a keyboard (re-)enumerates, and the daemon re-seizes via applyLatestRules. It's trigger-independent (works for DarkWake, hibernate, USB replug) and uses no polling — zero steady-state overhead on a 24/7 daemon.

This is the mechanism Karabiner-Elements' iokit_service_monitor is built on. Karabiner's grabber also keys devices by registry_entry_id and, notably, its power-management monitor does not re-grab on wake — it only tracks a sleeping flag. So device enumeration, not power state, is the correct axis.

What this branch removes

  • The earlier 5s liveness poll (first cut): worked and was proven against the repro, but paid a small cost forever to catch a rare event. Replaced by the event itself.
  • PowerNotify: introduced earlier to recover on kIOMessageSystemHasPoweredOn, but it literally can't see a DarkWake. Once DeviceNotify exists, PowerNotify was only a wake backstop for two unobserved cases, duplicating the re-seize on every normal wake. Removed (matching Karabiner). Easy revert if the dropped-notification case ever bites.

Commit history kept for the reasoning trail (poll → events → remove PowerNotify → review cleanup).

Reproduction (deterministic)

sudo pmset schedule wake "<~8 min out>"   # an RTC wake comes up as DarkWake, not a full Wake
sudo pmset sleepnow                        # leave it asleep past the scheduled wake

A plain lid sleep does not reproduce it — that delivers a full power-on.

Validation (clean-slate grabber logs)

Poll build, reproduced DarkWake — seized entry id vanished and self-healed (42949699984295340270).

Event-driven build, real clamshell→hibernate→wake — the device notifications fired and re-seized:

warning(device_notify): keyboard terminated: entry_id=4295340270
warning(grabber): keyboard enumeration changed — re-seizing
warning(device_notify): keyboard matched: entry_id=4295341490
info(hid_seize): seized matching devices (matched_count=1)

Confirms the keyboard gets a new registry entry id on the failure, and the IOKit notifications are delivered across a hibernate cycle. No re-seize feedback loop (re-seizing doesn't re-enumerate the device; verified one re-seize then stable).

Tests

zig build test green. Per-keystroke emit:/tap-hold logs lowered to .debug so a ReleaseSafe diagnostic build doesn't balloon; recovery logs at warn (survives ReleaseFast).

jackielii added 4 commits June 9, 2026 09:18
The grabber can keep its process + IPC + vhidd alive while its
IOHIDManager seize silently goes dead: a built-in keyboard re-enumerates
across sleep/DarkWake, the manager holds a stale device ref and never
re-matches, and no keystrokes flow. PowerNotify only fires on full
kIOMessageSystemHasPoweredOn (not DarkWake/Deep-Idle), and the
device-matched/removed callbacks are logging-only, so nothing detected
or recovered this state — the user had to SSH in and restart the daemon.

Add a 5s CFRunLoopTimer watchdog. At seize time capture each seized
device's IORegistry entry ID (IOHIDDeviceGetService +
IORegistryEntryGetRegistryEntryID); each tick re-resolve every ID via
IORegistryEntryIDMatching + IOServiceGetMatchingService. A vanished ID
means the device re-enumerated under us, so applyLatestRules rebuilds
the seize. The probe queries the IORegistry directly and never opens a
device — a throwaway second IOHIDManager opened with
kIOHIDOptionsTypeNone is rejected with kIOReturnExclusiveAccess because
our own seize already holds the device.

Recovery logs at warn (survives ReleaseFast); the per-tick
seized-vs-alive dump is info, compiled out of the release build, so it
serves as gateable forensics in a ReleaseSafe build without adding noise
or overhead to users' daemons.

Open question pending a real recurrence: whether a dead keyboard gets a
fresh entry ID (recovered here) or keeps the same ID with only the mach
connection broken (would need a different signal). The info forensics
capture which case it is.
…e watch

emit:/taphold commit/pass-through/buffer/flush logged at info, so a
ReleaseSafe diagnostic build writes a line on every keystroke and the
log balloons over a multi-day wait for a recurrence. Drop them to debug
(visible only in a full Debug build); liveness/power/seize stay at info
so the dead-keyboard forensics remain visible in ReleaseSafe.
The 5s liveness poll worked but pays a (small) cost forever on a 24/7
daemon to catch a rare event. Replace it with the event itself:
IOServiceAddMatchingNotification (kIOFirstMatch + kIOTerminated) on a
keyboard matching dict, so the kernel tells us exactly when a keyboard
(re-)enumerates and we re-seize only then — zero steady-state overhead.

This is the mechanism Karabiner-Elements' iokit_service_monitor is built
on. The reproduced failure (built-in keyboard re-enumerates during a
DarkWake: old IORegistry entry terminates, new one appears) maps directly
onto a terminated+matched notification pair.

New DeviceNotify.zig owns the IONotificationPort on the run loop, arms
both notifications by draining their initial iterators silently (the
already-seized startup set), and on any later keyboard match/terminate
calls back into the daemon, which re-seizes via applyLatestRules.
Re-seizing does not re-enumerate the device, so there is no feedback loop
(verified: exactly one re-seize, then stable).

Removes the poll machinery from HidSeize (seized_entry_ids,
captureSeizedEntryIds, collectEntryIds, pollLiveness, liveSeizedIds,
registryEntryAlive, entryIdSetsMatch + test) and the liveness_timer from
the daemon. Recovery still logs at warn (survives ReleaseFast); the
device match/terminate lines log the entry ids for forensics.
- Fix stale buildMatchDicts comment (referenced the removed pollLiveness).
- Document PowerNotify's new role: a wake backstop to DeviceNotify (the
  primary, event-driven recovery), covering dropped notifications / a
  stale-without-reenumeration wake. Clarify in onSystemWake that the
  overlapping re-seizes are idempotent and double as settle retries.
- Add errdefers in DeviceNotify.init so a mid-init failure doesn't leak
  the notification iterators.

No behavior change.
@jackielii jackielii changed the title grabber: seize-liveness watchdog to recover dead keyboard after DarkWake grabber: event-driven watchdog to recover a dead keyboard after DarkWake/hibernate Jun 9, 2026
PowerNotify was the original attempt to recover a dead keyboard after
sleep/wake by re-seizing on kIOMessageSystemHasPoweredOn. But the real
failure happens on a DarkWake that never sends that message, so
PowerNotify couldn't see it — DeviceNotify (re-seize on the keyboard's
IOService re-enumeration) is what actually fixes it, and it's
trigger-independent.

That left PowerNotify as only a wake backstop for two unobserved cases
(stale-without-reenumeration, or a dropped device notification),
duplicating DeviceNotify's re-seize on every normal wake. Karabiner's
grabber takes the same position: its power monitor only tracks a
sleeping flag + acks sleep — it does NOT re-grab on wake; device
(re-)grab is driven entirely by IOService match/terminate notifications.

Remove PowerNotify.zig, its wiring, onSystemWake, and the now-orphaned
power-notification c bindings (IORegisterForSystemPower, IOAllowPower
Change, kIOMessageSystem*). Keep IONotificationPort* (used by
DeviceNotify). If the dropped-notification case ever bites, this is a
clean revert.
@jackielii jackielii merged commit 7540933 into main Jun 9, 2026
2 checks passed
@jackielii jackielii deleted the grabber-seize-liveness-watchdog branch June 9, 2026 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant