fix: revive pane-watcher / no-handoff watchdog (zellij 0.45 dump-screen)#579
Merged
Conversation
… (drop sidecar/spawn-capture)
…ane-watch-1781349207187-11308-0 fix: revive pane watcher no-handoff watchdog
…ane-reuse-1781352144177-11308-0 refactor: reuse pane resolver for watcher
…ane-audit-fix-1781353854343-11308-0 Fix pane watcher audit regressions
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Revives the no-handoff watchdog, which had been silently dead. The watchdog is supposed to detect an agent that finished but never called
notify_parent(or stalled) and escalate its pane tail to the parent. It never fired because the pane-content feed relied onzellij subscribe pane-viewport, an invocation broken on zellij 0.45.0 (thepane-viewportpositional was removed, and--pane-idrequires a real pane-id, not a tab name). The supervisor died on spawn, the/tmp/choir-viewport-*snapshot was never written,observe_idlealways returnedNone, and the watchdog looped over nothing — which is why silent workers produced no escalation.This is bead choir-8unx / choir-jckm.
What changed
zellij action dump-screen --pane-id <id> --fullon the existing ~30s poller tick — focus-free (verified on host), no long-running supervisor, no/tmpsnapshot files.@tools.resolve_zellij_pane_idagainst a cachedlist-panes --json --tabsnapshot (injectedlist_panes_snapshot_capturecapability) — no duplicate resolver, no sidecar, no spawn-flow changes.apply_recovered_agent→watch_pane, guarded byalive && terminal_target), so the watchdog survives the exec-in-place reload.LocalTerminalInputinto agent panes to demand a handoff (that text triggered codex apology-loops). It escalates the pane tail straight to the parent at a singleworker_no_handoff_idle_secthreshold. TheProdenum branch andprodded_atstate are deleted.choir_spawn_zellij_pane_viewport_subscribe+ SIGTERM helpers, the FFI binding, io_stub, and the snapshot-file path.src/runtimeI/O-free: the native dump/list-panes adapters are injected at the server/bin boundary;PaneWatcher::newis hermetic.Review trail
This branch took four leaf PRs and one audit round, disclosed in full:
resolve_zellij_pane_idinstead of reusing it.f2358fc8, 5 findings): recovery didn't re-watch; the prod was only half-removed;src/runtimecalled@execdirectly; the spec-required live-zellij host check was missing.automerge=false, merged viamerge_prafter CI green).The
spawn_diagnosticssubsystem that briefly appeared is pre-existing main infra; #577 reverted #576's edits to it, so this surface carries 0 lines of it.Verification
moon test --target native: 2009/2009 green on the integrated branch.choir_lint): clean.src/runtimefree of@exec.live_zellij_dump_screen_test.mbt): when a zellij session exists, creates a scratch tab, assertsdump-screen --pane-idreturns non-empty text and leaves focus unchanged, then closes the tab — the exact observable check that would have caught the original 0.45 breakage. Self-skips (and passes) in CI where no session exists.Follow-ups (filed, not in this PR)
notify_parentfrom pane text on done-without-handoff (this PR forwards the raw pane tail).merge_pr/automerge (today's 220-branch cleanup was manual).