grabber: event-driven watchdog to recover a dead keyboard after DarkWake/hibernate#48
Merged
Merged
Conversation
The grabber can keep its process + IPC + vhidd alive while its IOHIDManager seize silently goes dead: a built-in keyboard re-enumerates across sleep/DarkWake, the manager holds a stale device ref and never re-matches, and no keystrokes flow. PowerNotify only fires on full kIOMessageSystemHasPoweredOn (not DarkWake/Deep-Idle), and the device-matched/removed callbacks are logging-only, so nothing detected or recovered this state — the user had to SSH in and restart the daemon. Add a 5s CFRunLoopTimer watchdog. At seize time capture each seized device's IORegistry entry ID (IOHIDDeviceGetService + IORegistryEntryGetRegistryEntryID); each tick re-resolve every ID via IORegistryEntryIDMatching + IOServiceGetMatchingService. A vanished ID means the device re-enumerated under us, so applyLatestRules rebuilds the seize. The probe queries the IORegistry directly and never opens a device — a throwaway second IOHIDManager opened with kIOHIDOptionsTypeNone is rejected with kIOReturnExclusiveAccess because our own seize already holds the device. Recovery logs at warn (survives ReleaseFast); the per-tick seized-vs-alive dump is info, compiled out of the release build, so it serves as gateable forensics in a ReleaseSafe build without adding noise or overhead to users' daemons. Open question pending a real recurrence: whether a dead keyboard gets a fresh entry ID (recovered here) or keeps the same ID with only the mach connection broken (would need a different signal). The info forensics capture which case it is.
…e watch emit:/taphold commit/pass-through/buffer/flush logged at info, so a ReleaseSafe diagnostic build writes a line on every keystroke and the log balloons over a multi-day wait for a recurrence. Drop them to debug (visible only in a full Debug build); liveness/power/seize stay at info so the dead-keyboard forensics remain visible in ReleaseSafe.
The 5s liveness poll worked but pays a (small) cost forever on a 24/7 daemon to catch a rare event. Replace it with the event itself: IOServiceAddMatchingNotification (kIOFirstMatch + kIOTerminated) on a keyboard matching dict, so the kernel tells us exactly when a keyboard (re-)enumerates and we re-seize only then — zero steady-state overhead. This is the mechanism Karabiner-Elements' iokit_service_monitor is built on. The reproduced failure (built-in keyboard re-enumerates during a DarkWake: old IORegistry entry terminates, new one appears) maps directly onto a terminated+matched notification pair. New DeviceNotify.zig owns the IONotificationPort on the run loop, arms both notifications by draining their initial iterators silently (the already-seized startup set), and on any later keyboard match/terminate calls back into the daemon, which re-seizes via applyLatestRules. Re-seizing does not re-enumerate the device, so there is no feedback loop (verified: exactly one re-seize, then stable). Removes the poll machinery from HidSeize (seized_entry_ids, captureSeizedEntryIds, collectEntryIds, pollLiveness, liveSeizedIds, registryEntryAlive, entryIdSetsMatch + test) and the liveness_timer from the daemon. Recovery still logs at warn (survives ReleaseFast); the device match/terminate lines log the entry ids for forensics.
- Fix stale buildMatchDicts comment (referenced the removed pollLiveness). - Document PowerNotify's new role: a wake backstop to DeviceNotify (the primary, event-driven recovery), covering dropped notifications / a stale-without-reenumeration wake. Clarify in onSystemWake that the overlapping re-seizes are idempotent and double as settle retries. - Add errdefers in DeviceNotify.init so a mid-init failure doesn't leak the notification iterators. No behavior change.
PowerNotify was the original attempt to recover a dead keyboard after sleep/wake by re-seizing on kIOMessageSystemHasPoweredOn. But the real failure happens on a DarkWake that never sends that message, so PowerNotify couldn't see it — DeviceNotify (re-seize on the keyboard's IOService re-enumeration) is what actually fixes it, and it's trigger-independent. That left PowerNotify as only a wake backstop for two unobserved cases (stale-without-reenumeration, or a dropped device notification), duplicating DeviceNotify's re-seize on every normal wake. Karabiner's grabber takes the same position: its power monitor only tracks a sleeping flag + acks sleep — it does NOT re-grab on wake; device (re-)grab is driven entirely by IOService match/terminate notifications. Remove PowerNotify.zig, its wiring, onSystemWake, and the now-orphaned power-notification c bindings (IORegisterForSystemPower, IOAllowPower Change, kIOMessageSystem*). Keep IONotificationPort* (used by DeviceNotify). If the dropped-notification case ever bites, this is a clean revert.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The grabber's IOHIDManager seize can silently die while the process, IPC, and vhidd stay healthy: the built-in keyboard re-enumerates during a
DarkWake from Deep Idle/ hibernate wake (old IORegistry entry terminates, a new one appears), but that wake never deliverskIOMessageSystemHasPoweredOn, so any wake-driven recovery never fires and the seize keeps holding the dead device. Result: keyboard unresponsive until the daemon is restarted (the user had to SSH in).Every prior fix in this family (vhidd-disconnect recovery, a
PowerNotifyre-seize on wake, matching tweaks) was trigger-based and listened on triggers this failure doesn't raise — a DarkWake sends no power-on message.Fix: event-driven re-seize on device re-enumeration
New
DeviceNotify.zigsubscribes to the IOKit registry directly —IOServiceAddMatchingNotificationforkIOFirstMatch+kIOTerminatedon a keyboard matching dict (IOHIDDevice, PrimaryUsagePage=GenericDesktop, PrimaryUsage=Keyboard). The kernel fires the callback exactly when a keyboard (re-)enumerates, and the daemon re-seizes viaapplyLatestRules. It's trigger-independent (works for DarkWake, hibernate, USB replug) and uses no polling — zero steady-state overhead on a 24/7 daemon.This is the mechanism Karabiner-Elements'
iokit_service_monitoris built on. Karabiner's grabber also keys devices byregistry_entry_idand, notably, its power-management monitor does not re-grab on wake — it only tracks a sleeping flag. So device enumeration, not power state, is the correct axis.What this branch removes
PowerNotify: introduced earlier to recover onkIOMessageSystemHasPoweredOn, but it literally can't see a DarkWake. Once DeviceNotify exists, PowerNotify was only a wake backstop for two unobserved cases, duplicating the re-seize on every normal wake. Removed (matching Karabiner). Easy revert if the dropped-notification case ever bites.Commit history kept for the reasoning trail (poll → events → remove PowerNotify → review cleanup).
Reproduction (deterministic)
A plain lid sleep does not reproduce it — that delivers a full power-on.
Validation (clean-slate grabber logs)
Poll build, reproduced DarkWake — seized entry id vanished and self-healed (
4294969998→4295340270).Event-driven build, real clamshell→hibernate→wake — the device notifications fired and re-seized:
Confirms the keyboard gets a new registry entry id on the failure, and the IOKit notifications are delivered across a hibernate cycle. No re-seize feedback loop (re-seizing doesn't re-enumerate the device; verified one re-seize then stable).
Tests
zig build testgreen. Per-keystrokeemit:/tap-hold logs lowered to.debugso a ReleaseSafe diagnostic build doesn't balloon; recovery logs atwarn(survives ReleaseFast).