feat: kill agents via owned process-group handles, delete /proc scans#575
Merged
Conversation
kill_agent now reads the recorded pgid from .choir/pids/<sanitized-agent-id> (written at spawn by the setsid wrapper), SIGTERMs the group, pauses ~300ms, SIGKILLs it, then deletes the pidfile. Recovery liveness probes the same recorded pgid with kill(pgid, 0), keyed by agent id instead of workspace. Deleted outright: list_pids_with_cwd_prefix (+ C helper and hard cap), list_orphan_choir_mcp_stdio_pids, list_choir_mcp_stdio_pids_for_agent, read_proc_process_table_entries, process_table_ppid_from_proc_stat, ProcessTableEntry and friends, init_reap_orphan_mcp_stdio_bridges (PDEATHSIG makes orphan shims impossible), the kill_agent /proc scan gate, the RemoveWorktree cwd-scan reap, and agent_process_alive_for_workspace.
The hermeticity lint bans @exec helpers in optional-parameter defaults in src/tools/. Move pidfile parsing to pure @workspace.parse_agent_pgid (next to agent_pid_file_path), default read_agent_pgid to the permitted read-only @sys.read_file at the dispatch seam, and remove_pid_file to @sys.delete_file_sync — matching the sanctioned @sys-default pattern. @exec.read_agent_pgid (recovery liveness) reuses the same parser.
…proc-kill-1781304502863-3356972-0 Replace /proc process-table scanning with owned process-group handles in kill_agent
…audit-fixes2-1781309936825-3356972-0 Fix owned process handle audit regressions
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces all
/procprocess-table scanning inkill_agentand the startup reaper with the owned process-group handle that spawn already records.kill_agentwas finding victims by forensics —/proc/*/cwdmatched against the worktree path,/proc/*/cmdlinegrepped forchoir mcp-stdio— which had false positives (any process thatcds into a worktree got SIGKILLed; a guard existed solely to stop an empty workspace from nuking the host) and false negatives (an agent thatcds out escaped).The owned handles already existed: spawn wraps every agent in
setsid bashand writes the pgid to.choir/pids/<id>(src/workspace/launch.mbt), and the mcp-stdio shim setsPR_SET_PDEATHSIG. This wireskill_agentto read that pgid and do a TERM→pause→KILL on the process group, then deletes the pidfile — and removes the entire scanning subsystem.Net: −494 lines of
/procforensics deleted (list_pids_with_cwd_prefix+ C helper, both mcp-stdio cmdline scans, the process-table parser, the orphan-shim startup reaper).Spec
.choir/context/owned-process-handles-spec.mdWhat changed
kill_pgid_sequence_best_effort(TERM → ~300ms → KILL) andpgid_is_alive(kill(-pgid, 0)) insrc/sys.interpret_kill_agent: pgid read from pidfile → pgroup kill → pidfile delete, via dispatch-seam-injected capabilities (lint-clean, no@sysmutation defaults insrc/tools)./proccwd probe topgid_is_aliveon the recorded group./proctable scanners, C bodies, stubs, and the orphan reaper.Review trail
kill(pid,0), which only proves the setsid leader is alive — a live descendant after the leader exits was misread as dead. Fixed withpgid_is_alive(kill(-pgid,0)); plus the dispatch-seam injection and a leader-exited/descendant-alive test.Verification
moon test --target native: 2008/2008 green on the integrated branch.src/.Known follow-ups (not in this PR, filed on choir-upk0 / spec)
.choir/pids.🤖 Generated with Claude Code