feat(remote): boot RC from Tracer#trace so non-Rack workloads get remote configuration#5765
Draft
bm1549 wants to merge 8 commits into
Draft
feat(remote): boot RC from Tracer#trace so non-Rack workloads get remote configuration#5765bm1549 wants to merge 8 commits into
bm1549 wants to merge 8 commits into
Conversation
Tie.boot is about to be called on every Tracer#trace invocation, so it must be a no-op after the first boot in each process. Guards with a pid keyed cache so it re-boots in forked children. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ruby fork only preserves the calling thread, so the RC worker thread is gone in the child while @started == true and @THR points at a dead thread. Adds Worker#reset_after_fork! and wires it into Component#after_fork along with a Barrier reset and a conditional start. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Today RC only starts when a Rack request comes in, so Sidekiq, gRPC, Karafka, and plain-Ruby workloads never get adaptive sampling or other RC-driven features. Hooks Tie.boot into Tracer#trace, the universal entrypoint every contrib funnels through. Tie.boot is pid-guarded so this costs one mutex compare per trace after the first call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
End-to-end regression covering the original adaptive-sampling complaint: calling Datadog::Tracing.trace from a non-Rack context now starts the RC worker, while a process that never creates a span still leaves it quiescent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tie.boot is called on every Tracer#trace invocation, so it must be a no-op after the first boot in each process. Guards with a cache keyed on both Process.pid and the active remote component's object_id: - Same process, same component (common path): returns PASS immediately, preserving :pass semantics for Tie::Tracing.tag so it skips per-request boot metrics on spans after the first request. - New pid (fork): pid mismatch triggers a fresh boot in the child. - New remote component (Datadog.configure): object_id mismatch triggers a fresh boot on the replacement component so it is not left idle. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Collapse @booted_in_pid + @booted_remote_id into a single @boot_key reference so the TTAS fast path reads one atomic value instead of two - Fix Tie::Tracing.tag to capture active_remote once, eliminating three lock acquisitions and a TOCTOU window per tagged request - Move Tie.boot below the tracer enabled guard so disabled tracers stay inert and RC startup never blocks a skip_trace path - Upgrade Tie.boot rescue to logger.warn so RC startup failures surface at a visible log level in non-Rack workloads - Fix RBS Boot#barrier type from Component::Barrier to Symbol - Remove sleep from rc_non_rack_spec (project convention: no sleep in tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tracer#trace now calls Tie.boot on every span, which calls Datadog::Core::Remote.active_remote -> components.remote. Tests that stub Datadog.send(:components) with a strict double omitting :remote started failing with 'received unexpected message :remote'. Add remote: nil to those doubles so Tie.boot returns immediately (active.nil?). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: e4ee9f3 | Docs | Datadog PR Page | Give us feedback! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Calls
Datadog::Core::Remote::Tie.bootfromTracer#traceso the RC worker starts on the first span in any Ruby workload — Sidekiq, gRPC, Karafka, plain scripts — without requiring the Rack middleware.Motivation:
RC was previously only bootstrapped from the Rack request middleware, so non-Rack workloads never received remote configuration (APM sampling rules, ASM WAF updates, DI probes). This PR makes RC bootstrap unconditional for all tracing-enabled workloads.
Change log entry
None.
Additional Notes:
Key design decisions:
Tie.bootis idempotent, keyed on(Process.pid, remote_component.object_id)— only the first call per (process, RC component) pair blocks onbarrier(:once). All subsequent calls returnPASSvia a lock-free fast path (single atomic@boot_keyreference, TTAS pattern).Component#after_forknow callsWorker#reset_after_fork!to clear the dead thread state, then restarts the worker only if it was running in the parent.Tie.bootis placed after the tracerenabledguard so disabled tracers stay inert.Tie::Tracing.tagfixed to captureactive_remoteonce per call (eliminates TOCTOU and redundant lock acquisitions).Datadog.send(:components)with a strict double now includeremote: nilsoTie.bootreturns early (active.nil? guard).How to test the change?
New specs cover all added behavior:
spec/datadog/core/remote/tie_spec.rb— unit tests: nil remote, first boot, PASS on repeat, idempotency, pid-change reboot, component-change rebootspec/datadog/core/remote/worker_spec.rb—#reset_after_fork!state transitionsspec/datadog/core/remote/component_forking_spec.rb— worker restarts in forked childspec/datadog/core/remote/rc_non_rack_spec.rb— integration: RC starts on first span; stays quiescent when no span is created