Skip to content

feat(remote): boot RC from Tracer#trace so non-Rack workloads get remote configuration#5765

Draft
bm1549 wants to merge 8 commits into
masterfrom
brian.marks/rc-bootstrap-decouple
Draft

feat(remote): boot RC from Tracer#trace so non-Rack workloads get remote configuration#5765
bm1549 wants to merge 8 commits into
masterfrom
brian.marks/rc-bootstrap-decouple

Conversation

@bm1549
Copy link
Copy Markdown
Contributor

@bm1549 bm1549 commented May 15, 2026

What does this PR do?
Calls Datadog::Core::Remote::Tie.boot from Tracer#trace so the RC worker starts on the first span in any Ruby workload — Sidekiq, gRPC, Karafka, plain scripts — without requiring the Rack middleware.

Motivation:
RC was previously only bootstrapped from the Rack request middleware, so non-Rack workloads never received remote configuration (APM sampling rules, ASM WAF updates, DI probes). This PR makes RC bootstrap unconditional for all tracing-enabled workloads.

Change log entry
None.

Additional Notes:
Key design decisions:

  • Tie.boot is idempotent, keyed on (Process.pid, remote_component.object_id) — only the first call per (process, RC component) pair blocks on barrier(:once). All subsequent calls return PASS via a lock-free fast path (single atomic @boot_key reference, TTAS pattern).
  • Fork safety: Component#after_fork now calls Worker#reset_after_fork! to clear the dead thread state, then restarts the worker only if it was running in the parent.
  • Tie.boot is placed after the tracer enabled guard so disabled tracers stay inert.
  • Tie::Tracing.tag fixed to capture active_remote once per call (eliminates TOCTOU and redundant lock acquisitions).
  • Tests that stub Datadog.send(:components) with a strict double now include remote: nil so Tie.boot returns early (active.nil? guard).

How to test the change?
New specs cover all added behavior:

  • spec/datadog/core/remote/tie_spec.rb — unit tests: nil remote, first boot, PASS on repeat, idempotency, pid-change reboot, component-change reboot
  • spec/datadog/core/remote/worker_spec.rb#reset_after_fork! state transitions
  • spec/datadog/core/remote/component_forking_spec.rb — worker restarts in forked child
  • spec/datadog/core/remote/rc_non_rack_spec.rb — integration: RC starts on first span; stays quiescent when no span is created
bundle exec rspec spec/datadog/core/remote/tie_spec.rb \
  spec/datadog/core/remote/rc_non_rack_spec.rb \
  spec/datadog/core/remote/worker_spec.rb \
  spec/datadog/core/remote/component_forking_spec.rb

bm1549 and others added 7 commits May 15, 2026 13:09
Tie.boot is about to be called on every Tracer#trace invocation, so it
must be a no-op after the first boot in each process. Guards with a pid
keyed cache so it re-boots in forked children.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ruby fork only preserves the calling thread, so the RC worker thread is
gone in the child while @started == true and @THR points at a dead
thread. Adds Worker#reset_after_fork! and wires it into
Component#after_fork along with a Barrier reset and a conditional start.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Today RC only starts when a Rack request comes in, so Sidekiq, gRPC,
Karafka, and plain-Ruby workloads never get adaptive sampling or other
RC-driven features. Hooks Tie.boot into Tracer#trace, the universal
entrypoint every contrib funnels through. Tie.boot is pid-guarded so
this costs one mutex compare per trace after the first call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
End-to-end regression covering the original adaptive-sampling complaint:
calling Datadog::Tracing.trace from a non-Rack context now starts the
RC worker, while a process that never creates a span still leaves it
quiescent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tie.boot is called on every Tracer#trace invocation, so it must be a
no-op after the first boot in each process. Guards with a cache keyed on
both Process.pid and the active remote component's object_id:

- Same process, same component (common path): returns PASS immediately,
  preserving :pass semantics for Tie::Tracing.tag so it skips per-request
  boot metrics on spans after the first request.
- New pid (fork): pid mismatch triggers a fresh boot in the child.
- New remote component (Datadog.configure): object_id mismatch triggers
  a fresh boot on the replacement component so it is not left idle.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Collapse @booted_in_pid + @booted_remote_id into a single @boot_key
  reference so the TTAS fast path reads one atomic value instead of two
- Fix Tie::Tracing.tag to capture active_remote once, eliminating three
  lock acquisitions and a TOCTOU window per tagged request
- Move Tie.boot below the tracer enabled guard so disabled tracers stay
  inert and RC startup never blocks a skip_trace path
- Upgrade Tie.boot rescue to logger.warn so RC startup failures surface
  at a visible log level in non-Rack workloads
- Fix RBS Boot#barrier type from Component::Barrier to Symbol
- Remove sleep from rc_non_rack_spec (project convention: no sleep in tests)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bm1549 bm1549 added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label May 15, 2026
@dd-octo-sts dd-octo-sts Bot added core Involves Datadog core libraries tracing labels May 15, 2026
Tracer#trace now calls Tie.boot on every span, which calls
Datadog::Core::Remote.active_remote -> components.remote. Tests that
stub Datadog.send(:components) with a strict double omitting :remote
started failing with 'received unexpected message :remote'. Add
remote: nil to those doubles so Tie.boot returns immediately (active.nil?).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@datadog-official
Copy link
Copy Markdown

datadog-official Bot commented May 15, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 96.00%
Overall Coverage: 97.15% (-0.01%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: e4ee9f3 | Docs | Datadog PR Page | Give us feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos core Involves Datadog core libraries tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant