plugin-native: teardown race in NativeNodeWrapper::run_source between detached worker thread and dlclose

## Summary

`NativeNodeWrapper::run_source` returns from its pre-Start exit branches (Shutdown / control-channel-close / pre-Start UpdateParams handshake) **without joining or signalling the per-instance worker thread**. The detached worker continues to hold an `Arc<InstanceState>` and ultimately runs `InstanceState::drop`, which calls back into the loaded `.so` via `destroy_instance`. When the test (or process) scope ends quickly after `run()` returns, the worker can still be inside the `.so` at the moment dlclose runs, causing a SIGSEGV during process teardown.

## Repro

Run the new fixture-based source tests under coverage instrumentation:

```
CARGO_LLVM_COV_TARGET_DIR=target/coverage \
  cargo llvm-cov --no-report nextest -p streamkit-plugin-native --test source_plugin
```

Without the test-side workaround introduced in the accompanying PR, three tests intermittently abort with signal 11 _after_ they have already asserted and reported `test result: ok`:

- `source_plugin_shutdown_before_start_exits_cleanly`
- `source_plugin_control_channel_close_before_start_exits_cleanly`
- `source_plugin_update_params_before_start_takes_effect`

(occasionally also the running-then-shutdown variants, which hit the same teardown path).

## Why coverage exposes it

`cargo-llvm-cov` instruments the .so with first-touch profile setup, which makes the very first FFI call into a freshly loaded library noticeably slower. The pre-Start exit branches return without ever forcing the worker through an FFI call (no Tick / Process / Flush ever runs), so the first FFI hit is the eventual `destroy_instance` triggered from worker drop — exactly when dlclose is most likely to race it.

## Pinned by

The new tests in `crates/plugin-native/tests/source_plugin.rs` call a small `drain_detached_worker().await` helper (200 ms sleep) before returning so the detached worker can complete `destroy_instance` before the subprocess starts unloading. The helper carries a doc comment that links back to this issue. The race still exists in production code — only the test reproducer is masked.

- File: `crates/plugin-native/src/wrapper.rs` (function `run_source`, ~lines 1740-1789)
- Tests: `crates/plugin-native/tests/source_plugin.rs` (helper `drain_detached_worker` and the pre-Start tests)

## Suggested fix

Make `run_source` deterministic with respect to its worker on the pre-Start exit branches. Two viable approaches:

1. Drop the locally-held `InstanceWorker` before returning from each early-exit branch and explicitly wait on a worker-shutdown signal before `run()` returns; or
2. Drive the worker through an explicit no-op FFI "warm" call during `new()` so the first FFI hit can never coincide with teardown.

The current `InstanceWorker::Drop` doc-comment already calls out that joining synchronously is unacceptable; the fix needs to route through the async path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plugin-native: teardown race in NativeNodeWrapper::run_source between detached worker thread and dlclose #481

Summary

Repro

Why coverage exposes it

Pinned by

Suggested fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

plugin-native: teardown race in NativeNodeWrapper::run_source between detached worker thread and dlclose #481

Description

Summary

Repro

Why coverage exposes it

Pinned by

Suggested fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions