Skip to content

feat: kill agents via owned process-group handles, delete /proc scans#575

Merged
brickfrog merged 8 commits into
mainfrom
feature/owned-process-handles
Jun 13, 2026
Merged

feat: kill agents via owned process-group handles, delete /proc scans#575
brickfrog merged 8 commits into
mainfrom
feature/owned-process-handles

Conversation

@brickfrog

Copy link
Copy Markdown
Owner

Summary

Replaces all /proc process-table scanning in kill_agent and the startup reaper with the owned process-group handle that spawn already records. kill_agent was finding victims by forensics — /proc/*/cwd matched against the worktree path, /proc/*/cmdline grepped for choir mcp-stdio — which had false positives (any process that cds into a worktree got SIGKILLed; a guard existed solely to stop an empty workspace from nuking the host) and false negatives (an agent that cds out escaped).

The owned handles already existed: spawn wraps every agent in setsid bash and writes the pgid to .choir/pids/<id> (src/workspace/launch.mbt), and the mcp-stdio shim sets PR_SET_PDEATHSIG. This wires kill_agent to read that pgid and do a TERM→pause→KILL on the process group, then deletes the pidfile — and removes the entire scanning subsystem.

Net: −494 lines of /proc forensics deleted (list_pids_with_cwd_prefix + C helper, both mcp-stdio cmdline scans, the process-table parser, the orphan-shim startup reaper).

Spec

.choir/context/owned-process-handles-spec.md

What changed

  • New kill_pgid_sequence_best_effort (TERM → ~300ms → KILL) and pgid_is_alive (kill(-pgid, 0)) in src/sys.
  • interpret_kill_agent: pgid read from pidfile → pgroup kill → pidfile delete, via dispatch-seam-injected capabilities (lint-clean, no @sys mutation defaults in src/tools).
  • Recovery liveness re-keyed from a /proc cwd probe to pgid_is_alive on the recorded group.
  • Deleted: all /proc table scanners, C bodies, stubs, and the orphan reaper.

Review trail

Verification

  • moon test --target native: 2008/2008 green on the integrated branch.
  • grep gates pass: no scanner symbols remain in src/.

Known follow-ups (not in this PR, filed on choir-upk0 / spec)

  • PID-reuse hardening (start-time stamp alongside pgid).
  • Startup sweep of stale .choir/pids.

🤖 Generated with Claude Code

kill_agent now reads the recorded pgid from .choir/pids/<sanitized-agent-id>
(written at spawn by the setsid wrapper), SIGTERMs the group, pauses ~300ms,
SIGKILLs it, then deletes the pidfile. Recovery liveness probes the same
recorded pgid with kill(pgid, 0), keyed by agent id instead of workspace.

Deleted outright: list_pids_with_cwd_prefix (+ C helper and hard cap),
list_orphan_choir_mcp_stdio_pids, list_choir_mcp_stdio_pids_for_agent,
read_proc_process_table_entries, process_table_ppid_from_proc_stat,
ProcessTableEntry and friends, init_reap_orphan_mcp_stdio_bridges (PDEATHSIG
makes orphan shims impossible), the kill_agent /proc scan gate, the
RemoveWorktree cwd-scan reap, and agent_process_alive_for_workspace.
The hermeticity lint bans @exec helpers in optional-parameter defaults in
src/tools/. Move pidfile parsing to pure @workspace.parse_agent_pgid (next to
agent_pid_file_path), default read_agent_pgid to the permitted read-only
@sys.read_file at the dispatch seam, and remove_pid_file to
@sys.delete_file_sync — matching the sanctioned @sys-default pattern.
@exec.read_agent_pgid (recovery liveness) reuses the same parser.
…proc-kill-1781304502863-3356972-0

Replace /proc process-table scanning with owned process-group handles in kill_agent
…audit-fixes2-1781309936825-3356972-0

Fix owned process handle audit regressions
@brickfrog brickfrog merged commit 67c7cf9 into main Jun 13, 2026
1 check passed
@brickfrog brickfrog deleted the feature/owned-process-handles branch June 13, 2026 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant