fix(jobs): kill/reaper lifecycle correctness + tests by sebyx07 · Pull Request #2 · developerz-ai/mcp-ssh

sebyx07 · 2026-06-26T22:15:22Z

Summary

Spawn each job in its own process group so kills reach descendants
Kill the process group with TERM→KILL escalation; return accurate bool status
Reaper kills still-Running jobs before eviction (was leaking processes)
Tests: kill_terminates_child_process, kill_unknown_id_returns_false, list_reports_all_jobs (+ earlier: kill_reaches_descendants, kill_escalates_to_sigkill_when_term_ignored, reaper_kills_running_job_before_eviction)

Tests

19 pass, 0 fail

Summary by CodeRabbit

Bug Fixes
- Improved job termination to stop descendant processes via process-group handling, with TERM→SIGKILL escalation when a job doesn’t exit in time.
- Refined job state transitions during kill/reaper flows to improve reliability and correctness.
New Features
- Added an hourly job reaper that evicts jobs older than 24 hours, terminates their process groups, and removes associated logs.
Documentation
- Updated internal guidance to reflect the split job engine and reaper structure.
Tests
- Expanded Unix-gated test coverage for process-group killing, escalation behavior, and reaper eviction outcomes.
Chores
- Updated automated review and linting/security configuration to enforce stricter standards.

Each job's `sh` now leads its own process group (process_group(0) on unix), and kill signals the negative pgid via `kill -- -<pgid>`. Without this, killing only the shell's pid orphaned any children it spawned (e.g. `cmd &`), leaving them running after a kill. Test `kill_reaches_descendants` proves a backgrounded grandchild is reaped by the kill; verified it fails without the process-group change. Co-Authored-By: Claude <noreply@anthropic.com>

- kill() now SIGTERMs the process group, then SIGKILLs after a grace period if it hasn't exited (escalation via signal_group helper). - Return false when the job is not Running (unknown id or already finished) — nothing to signal; previously always returned true. - Wait out the grace event-driven on the job's completion watch flag (exited_within) instead of polling. - Tests: unknown id, finished job, and TERM-ignored escalation to KILL. Co-Authored-By: Claude <noreply@anthropic.com>

The hourly reaper evicted aged-out jobs (map.remove + log delete) but never signalled a still-`Running` job's process group, orphaning the whole process tree. Now it kills the group (TERM->KILL escalation) before eviction, reusing the same `kill_job` helper as `job(action="kill")`. mod.rs crossed 300 LOC, so per the module convention the reaper + signal helpers move to `src/jobs/reaper.rs`. `reap_once(jobs, retention)` is extracted with an injectable retention so the new test drives a real eviction with zero retention and asserts the descendant is reaped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- kill_terminates_child_process: bg sleep 1000, kill, assert state leaves Running - list_reports_all_jobs: two jobs, assert both appear sorted in list() - kill_unknown_id_returns_false already existed; no duplicate added Co-Authored-By: Claude <noreply@anthropic.com>

Adapted from the developerz-ai house template (db-mcp-gateway), scoped to mcp-ssh's modules and CLAUDE.md NEVER/Conventions rules. Beta features on (early_access), concise path instructions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-26T22:15:43Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4d212621-e8b3-4e10-b6a8-77dcf5e31a49

📥 Commits

Reviewing files that changed from the base of the PR and between 2c1ab88 and 525e009.

📒 Files selected for processing (1)

src/jobs/mod.rs

Walkthrough

The PR adds CodeRabbit repository rules and updates job handling to use process groups, a separate reaper module, and completion signaling. Unix tests expand around kill behavior, stale-job reaping, and job listing/polling.

Changes

Repository guidance

Layer / File(s)	Summary
CodeRabbit configuration `.coderabbit.yaml`	Top-level review settings, tool selection, path instructions, path filters, chat behavior, and knowledge-base settings are added.
CLAUDE references `CLAUDE.md`	The job engine module map and lookup guidance now reference `src/jobs/mod.rs` and `src/jobs/reaper.rs`.

Job process-group lifecycle

Layer / File(s)	Summary
Job wiring and state `src/jobs/mod.rs`	The job store uses the reaper submodule, tracks process-group ids and completion state, and starts shells in their own process groups.
Kill and reaper logic `src/jobs/mod.rs`, `src/jobs/reaper.rs`	`kill` delegates to `reaper::kill_job`, and the new reaper module removes stale jobs, kills their process groups, and deletes log files.
Job tests `src/jobs/mod.rs`	Job behavior tests cover inline completion, backgrounding, descendant termination, TERM-to-KILL escalation, reaper eviction, retention on kill failure, list ordering, and poll pagination.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly matches the main job kill/reaper lifecycle fix and mentions the added tests.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.coderabbit.yaml:
- Around line 132-136: The path_filters in the .coderabbit.yaml config are
excluding lockfiles, including Cargo.lock, from review. Remove the lockfile
ignore entries from the path_filters list so dependency changes in the Rust
binary are visible during review, while keeping the other target/** exclusions
intact.

In `@src/jobs/mod.rs`:
- Around line 32-38: Replace the bare `Option<u32>` process-group identifier in
`Job` with a private `ProcessGroupId` newtype and thread that type through the
job lifecycle instead of using the raw primitive. Update `Job::run`,
`Job::kill`, and the `reaper` integration so they construct, store, and pass
`ProcessGroupId` consistently, while keeping the underlying value encapsulated.
Make sure any matching logic or signaling in `src/jobs/reaper.rs` uses the
newtype rather than `u32`, and preserve the existing semantics for the “no pid”
case with `Option<ProcessGroupId>`.
- Around line 466-480: The list test currently claims to cover a backgrounded
job but both store.run calls in the job listing test use background=false, so
the background execution path is not exercised. Update the test around store.run
and store.list in src/jobs/mod.rs so one job is launched with background=true
and the assertions verify that list includes both the inline and backgrounded
commands, ensuring the new public behavior is actually covered.

In `@src/jobs/reaper.rs`:
- Around line 20-32: The kill_job flow currently reports success even when
signal_group fails to actually send TERM or KILL to the process group. Update
kill_job to check and propagate signaling failures from signal_group before
returning true, and only continue to the KILL escalation path if TERM was
successfully delivered; use the existing kill_job, signal_group, and
exited_within helpers to keep the TERM->KILL behavior while ensuring false is
returned on any signal error.
- Around line 70-87: In reap_once in reaper.rs, stale jobs are being removed
from the jobs map before kill_job() runs, which leaves still-running process
groups untracked during TERM grace/escalation. Change the ordering so each stale
Arc<Job> is passed to kill_job() first, then remove its entry from the
Mutex<HashMap<String, Arc<Job>>> afterward, and only then delete the log file;
keep the stale collection logic intact while ensuring running jobs remain
pollable/killable until termination completes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ef757399-409e-427e-b0d2-90bd3e5ef6c6

📥 Commits

Reviewing files that changed from the base of the PR and between 0dda8fc and 8d30efc.

📒 Files selected for processing (4)

.coderabbit.yaml
CLAUDE.md
src/jobs/mod.rs
src/jobs/reaper.rs

- reaper: signal_group returns delivery status; kill_job reports false when TERM/KILL never reach the group - reaper: reap_once kills stale jobs before evicting them, so a still-Running group is never orphaned mid-grace - jobs: wrap pgid in a private ProcessGroupId newtype - test: list_reports_all_jobs now exercises the backgrounded path - coderabbit: stop filtering Cargo.lock/lockfiles out of review Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/jobs/reaper.rs`:
- Around line 95-106: The reaper currently removes every stale job from the jobs
map even when kill delivery fails in `kill_job`, which causes running jobs to be
evicted prematurely. Update `reap_once` in `src/jobs/reaper.rs` to inspect the
result of `kill_job(job).await` and only remove entries from `jobs.lock().await`
when the job is finished or when the running job’s group was successfully
signalled; keep failed running jobs tracked for a later reap attempt. Use the
existing `kill_job`, `reap_once`, and `jobs` map logic to separate successful
kills from failed ones before calling `remove`.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 376058fd-c1c4-42af-86a5-3a9c241c2548

📥 Commits

Reviewing files that changed from the base of the PR and between 8d30efc and 09dcbfe.

📒 Files selected for processing (3)

.coderabbit.yaml
src/jobs/mod.rs
src/jobs/reaper.rs

💤 Files with no reviewable changes (1)

.coderabbit.yaml

reap_once removed every stale job and deleted its log even when kill_job failed on a still-Running job, orphaning the live process group and deleting its log out from under it. Only evict jobs that finished or whose group was successfully signalled; keep failed running jobs tracked for a later reap pass. Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/jobs/mod.rs`:
- Around line 453-458: The test around reaper::reap_once is currently using
tokio::time::Instant::now(), which can skip the stale path when retention is
Duration::ZERO instead of deterministically exercising kill-failure handling.
Backdate the job’s started timestamp in the test setup within src/jobs/mod.rs so
duration_since(started) is definitely greater than the retention, and the reaper
branch that attempts to kill a stale job is always hit. Use the existing job
insertion and reaper::reap_once test case to locate the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6e2df70c-4af1-445f-9452-b5557bd74ecb

📥 Commits

Reviewing files that changed from the base of the PR and between 09dcbfe and 2c1ab88.

📒 Files selected for processing (2)

src/jobs/mod.rs
src/jobs/reaper.rs

- reaper_keeps_running_job_when_kill_fails used Instant::now(), letting the test pass without exercising the kill-failure path under ZERO retention - Backdate started by 1s so duration_since > retention is guaranteed Co-Authored-By: Claude <noreply@anthropic.com>

sebyx07 and others added 5 commits June 26, 2026 16:59

sebyx07 added the claudetm Created by Claude Task Master label Jun 26, 2026

coderabbitai Bot requested changes Jun 26, 2026

View reviewed changes

Comment thread .coderabbit.yaml Outdated

Comment thread src/jobs/mod.rs

Comment thread src/jobs/mod.rs Outdated

Comment thread src/jobs/reaper.rs

Comment thread src/jobs/reaper.rs

coderabbitai Bot requested changes Jun 26, 2026

View reviewed changes

Comment thread src/jobs/reaper.rs Outdated

coderabbitai Bot requested changes Jun 26, 2026

View reviewed changes

Comment thread src/jobs/mod.rs Outdated

coderabbitai Bot approved these changes Jun 26, 2026

View reviewed changes

sebyx07 merged commit a7621dd into main Jun 26, 2026
6 checks passed

This was referenced Jun 27, 2026

feat: human-readable job ids (title-HH:MM format) #9

Merged

feat: opt-in interactive bash + one-shot installer (v1.1.0) #11

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(jobs): kill/reaper lifecycle correctness + tests#2

fix(jobs): kill/reaper lifecycle correctness + tests#2
sebyx07 merged 8 commits into
mainfrom
fix/job-kill-reaper-correctness

sebyx07 commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sebyx07 commented Jun 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sebyx07 commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading