Skip to content

[shell-operator] fix: track child PIDs to avoid zombie reaper race#886

Merged
ldmonster merged 9 commits into
mainfrom
fix/sigchld-handler-race-condition
May 28, 2026
Merged

[shell-operator] fix: track child PIDs to avoid zombie reaper race#886
ldmonster merged 9 commits into
mainfrom
fix/sigchld-handler-race-condition

Conversation

@fuldaxxx
Copy link
Copy Markdown
Member

@fuldaxxx fuldaxxx commented May 22, 2026

Overview

Adds a process registry to the executor package so that the PID-1 zombie reaper can distinguish between children it owns and children spawned by the executor (hooks, scripts). This prevents a race condition where SIGCHLD handling reaps a child that cmd.Wait still expects to collect, causing "signal: child exited already" errors and lost exit codes.

What this PR does / why we need it

When shell-operator runs as PID 1 in a container, its zombie reaper loop calls waitpid(-1, ...) on every SIGCHLD. This can steal a child that was just started by the executor - cmd.Start() returns, the child exits, SIGCHLD fires, and the reaper reaps the process before cmd.Wait() gets to it. The result is a spurious exec: signal: child exited already or missing exit-code propagation.

This PR introduces:

  • processRegistry (registry.go) — a thread-safe, package-internal map that tracks PIDs of processes started by the executor. Exposes a read-only ProcessTracker interface via Tracker() for external consumers (zombie reaper) and package-private registerPID/unregisterPID helpers for the executor methods.
  • Modified Run, Output, and RunAndLogLines in executor.go — split cmd.Run() / cmd.Output() into explicit StartregisterPIDWaitunregisterPID sequences so the child PID is
    tracked for the entire lifetime between fork and reap.
  • Comprehensive tests (executor_test.go) — covers registry basics, double-unregister safety, concurrent access, and integration with Output / RunAndLogLines (including failed-start scenarios).

The zombie reaper can now call executor.Tracker().IsActive(pid) to skip PIDs that the executor will reap itself.

Signed-off-by: Ruslan Gorbunov <ruslan.gorbunov@flant.com>
@fuldaxxx fuldaxxx requested review from ipaqsa and ldmonster May 22, 2026 12:05
@fuldaxxx fuldaxxx self-assigned this May 22, 2026
@fuldaxxx fuldaxxx added the go Pull requests that update Go code label May 22, 2026
fuldaxxx added 2 commits May 23, 2026 07:07
Signed-off-by: Ruslan Gorbunov <ruslan.gorbunov@flant.com>
Signed-off-by: Ruslan Gorbunov <ruslan.gorbunov@flant.com>
@fuldaxxx fuldaxxx added the bug Something isn't working label May 23, 2026
fuldaxxx added 3 commits May 24, 2026 13:07
Signed-off-by: Ruslan Gorbunov <ruslan.gorbunov@flant.com>
Signed-off-by: Ruslan Gorbunov <ruslan.gorbunov@flant.com>
Signed-off-by: Ruslan Gorbunov <ruslan.gorbunov@flant.com>
@fuldaxxx fuldaxxx changed the title fix: add ProcessRegistry to prevent zombie reaper race [shell-operator] fix(executor): track child PIDs to avoid zombie reaper race May 25, 2026
@fuldaxxx fuldaxxx changed the title [shell-operator] fix(executor): track child PIDs to avoid zombie reaper race [shell-operator] fix: track child PIDs to avoid zombie reaper race May 25, 2026
@ldmonster ldmonster requested a review from Copilot May 25, 2026 07:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds executor-owned PID tracking so a PID-1 zombie reaper can avoid reaping processes that the executor will Wait() on, preventing “child exited already” / lost-exit-code races when running as PID 1.

Changes:

  • Introduced a package-internal, thread-safe PID registry with a read-only ProcessTracker interface (Tracker()).
  • Updated command execution paths (Run, Output, RunAndLogLines) to use Start → registerPID → Wait → unregisterPID to keep child PIDs tracked during their lifetime.
  • Added unit/integration-style tests around the registry and executor integration.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
pkg/executor/registry.go Adds the PID registry + ProcessTracker interface and global tracker access.
pkg/executor/executor.go Registers/unregisters child PIDs around Start/Wait for Run/Output/RunAndLogLines.
pkg/executor/executor_test.go Adds registry tests and new executor/registry integration tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/executor/executor_test.go Outdated
Comment thread pkg/executor/executor_test.go Outdated
Comment thread pkg/executor/executor_test.go Outdated
Comment thread pkg/executor/executor_test.go Outdated
Signed-off-by: Ruslan Gorbunov <ruslan.gorbunov@flant.com>
@fuldaxxx fuldaxxx marked this pull request as ready for review May 25, 2026 08:34
Comment thread pkg/executor/executor.go Outdated
Comment thread pkg/executor/registry.go Outdated
Signed-off-by: Ruslan Gorbunov <ruslan.gorbunov@flant.com>
@ldmonster ldmonster merged commit ea6245d into main May 28, 2026
9 checks passed
@ldmonster ldmonster deleted the fix/sigchld-handler-race-condition branch May 28, 2026 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants