CI failure is pre-existing and unrelated to this PR
The background-job-lifecycle integration test failure is caused by a race condition in src/eca/features/background_tasks.clj — nothing touched by this PR.
Root cause: kill-job! calls kill-job-impl (sends SIGTERM) and then atomically sets :status :killed. The monitor-process-exit! daemon thread can race ahead: it dereferences the now-exited process, gets exit code 143, and calls update-job-on-exit before the kill status has been committed — overwriting it with :failed instead.
Fix: swap the order in kill-job! — set :status :killed in the registry before calling kill-job-impl, so the monitor thread always sees :killed and the guard in update-job-on-exit correctly skips the overwrite:
(defn kill-job! [job-id]
(if-let [job (get-job job-id)]
(if (= :running (:status job))
(do
;; Commit :killed status before destroying the process so that
;; monitor-process-exit! never overwrites it with :failed (exit 143).
(swap! registry* (fn [reg]
(-> reg
(assoc-in [:jobs job-id :status] :killed)
(assoc-in [:jobs job-id :ended-at] (Instant/now)))))
(kill-job-impl job)
(logger/info logger-tag "Killed background job" {:id job-id})
true)
...)))
The existing unit test "does not overwrite :killed status" in background_tasks_test.clj already covers this scenario.
Originally posted by @itkonen in #508 (comment)
CI failure is pre-existing and unrelated to this PR
The
background-job-lifecycleintegration test failure is caused by a race condition insrc/eca/features/background_tasks.clj— nothing touched by this PR.Root cause:
kill-job!callskill-job-impl(sends SIGTERM) and then atomically sets:status :killed. Themonitor-process-exit!daemon thread can race ahead: it dereferences the now-exited process, gets exit code 143, and callsupdate-job-on-exitbefore the kill status has been committed — overwriting it with:failedinstead.Fix: swap the order in
kill-job!— set:status :killedin the registry before callingkill-job-impl, so the monitor thread always sees:killedand the guard inupdate-job-on-exitcorrectly skips the overwrite:The existing unit test
"does not overwrite :killed status"inbackground_tasks_test.cljalready covers this scenario.Originally posted by @itkonen in #508 (comment)