Skip to content

Replace /proc process-table scanning with owned process-group handles in kill_agent#573

Merged
brickfrog merged 3 commits into
feature/owned-process-handlesfrom
feature/owned-process-handles.proc-kill-1781304502863-3356972-0
Jun 12, 2026
Merged

Replace /proc process-table scanning with owned process-group handles in kill_agent#573
brickfrog merged 3 commits into
feature/owned-process-handlesfrom
feature/owned-process-handles.proc-kill-1781304502863-3356972-0

Conversation

@brickfrog

Copy link
Copy Markdown
Owner

Implements .choir/context/owned-process-handles-spec.md (choir-upk0).

What

  • kill_agent now terminates agents via the owned handle: read the recorded pgid from .choir/pids/<sanitized-agent-id> (written at spawn by the setsid wrapper), SIGTERM the process group, ~300ms pause, SIGKILL, then delete the pidfile. Ordering preserved: pane close → worktree remove → pgroup kill → pidfile delete → registry Failed → beads mirror. New injected capabilities: read_agent_pgid?, kill_pgid?, remove_pid_file? (defaults via @exec/@sys, no direct mutations from src/tools).
  • New sys capability kill_pgid_sequence_best_effort (C helper choir_kill_pgid_sequence, mirrors choir_init_kill_server_pid_sequence signaling -pgid; rejects pgid <= 1).
  • Recovery liveness re-keyed from workspace-cwd scanning to the owned handle: @exec.agent_recorded_process_alive(project_dir, agent_id) = pidfile pgid + kill(pgid, 0).
  • Warning text distinguishes pane-close-failed+pgroup-killed from pane-close-failed+no-pidfile (process may survive).

Deleted (no /proc table iteration remains)

list_pids_with_cwd_prefix (+ choir_list_pids_with_cwd_prefix C body, hard cap, stubs), list_orphan_choir_mcp_stdio_pids, list_choir_mcp_stdio_pids_for_agent, read_proc_process_table_entries, process_table_ppid_from_proc_stat, ProcessTableEntry + helpers, proc_name_pid, init_reap_orphan_mcp_stdio_bridges (PDEATHSIG makes orphan shims impossible), the kill_agent /proc scan + mcp-stdio cmdline grep, the handler_disconnect RemoveWorktree cwd-scan reap (tracked-pgroup kill already covers it; pgroup-kill logic untouched), and agent_process_alive_for_workspace/process_liveness_scan_prefix. Net −863 lines.

Tests (TDD: red gate 194e457, 9 failing → green f130e33)

  • Hermetic native test: setsid'd /tmp sleeper dies from the new pgroup sequence; pgid 0/1/negative signal nothing.
  • kill_agent: kill order, pidfile delete after kill, default wiring reads+deletes a real pidfile in /tmp, non-worktree workspaces killable via pidfile, no-pidfile path kills nothing with the right warning, supervisor refusal has zero side effects.
  • exec: pidfile parse/reject (<=1, garbage, missing) and recorded-liveness probe.
  • Recovery/disconnect fixtures re-keyed to agent-id liveness.

moon test --target native: 2006/2006 green. All spec grep gates pass.

Verification

Generated by Choir from commands executed in the leaf workspace.

  • moon test --target native
    • exit: 0
    • head: f130e33
    • output tail:
Total tests: 2006, passed: 2006, failed: 0.
  • ! grep -rn "list_pids_with_cwd_prefix" src/

  • ! grep -rn "mcp_stdio_pids" src/

  • ! grep -rn "read_proc_process_table_entries\|process_table_ppid_from_proc_stat" src/

  • ! grep -rn "choir_list_pids_with_cwd_prefix" src/sys/stub.c

kill_agent now reads the recorded pgid from .choir/pids/<sanitized-agent-id>
(written at spawn by the setsid wrapper), SIGTERMs the group, pauses ~300ms,
SIGKILLs it, then deletes the pidfile. Recovery liveness probes the same
recorded pgid with kill(pgid, 0), keyed by agent id instead of workspace.

Deleted outright: list_pids_with_cwd_prefix (+ C helper and hard cap),
list_orphan_choir_mcp_stdio_pids, list_choir_mcp_stdio_pids_for_agent,
read_proc_process_table_entries, process_table_ppid_from_proc_stat,
ProcessTableEntry and friends, init_reap_orphan_mcp_stdio_bridges (PDEATHSIG
makes orphan shims impossible), the kill_agent /proc scan gate, the
RemoveWorktree cwd-scan reap, and agent_process_alive_for_workspace.
@brickfrog

Copy link
Copy Markdown
Owner Author

Choir: CI checks failing — routed to feature/owned-process-handles.proc-kill-1781304502863-3356972-0 for fixes.

The hermeticity lint bans @exec helpers in optional-parameter defaults in
src/tools/. Move pidfile parsing to pure @workspace.parse_agent_pgid (next to
agent_pid_file_path), default read_agent_pgid to the permitted read-only
@sys.read_file at the dispatch seam, and remove_pid_file to
@sys.delete_file_sync — matching the sanctioned @sys-default pattern.
@exec.read_agent_pgid (recovery liveness) reuses the same parser.
@brickfrog brickfrog merged commit 7335f7e into feature/owned-process-handles Jun 12, 2026
1 check passed
@brickfrog brickfrog deleted the feature/owned-process-handles.proc-kill-1781304502863-3356972-0 branch June 13, 2026 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant