impl(o11y): record telemetry attributes on LRO span by haphungw · Pull Request #5695 · googleapis/google-cloud-rust

haphungw · 2026-05-19T23:08:09Z

In our previous iteration (#5694), we introduced an operation_name(&self) query method to the public Poller trait.

To avoid polluting the public API, we utilize thread-local span propagation: because the LRO span is active in the thread context, the internal pollers (PollerImpl and DiscoveryPoller) record the LRO operation name and destination resource ID directly onto the current active span using tracing::Span::current().record().

gemini-code-assist

Code Review

This pull request enhances tracing for Long Running Operations (LROs) by recording operation names and resource IDs in the aip151 and discovery modules, and updating the 'LRO Wait' span with additional metadata. Feedback was provided regarding the removal of instrumentation from the into_stream method in src/lro/src/internal/tracing.rs, which would lead to fragmented traces and a loss of parent span context for operations executed within the stream.

codecov · 2026-05-19T23:30:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.89%. Comparing base (c24629c) to head (35fe855).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5695      +/-   ##
==========================================
- Coverage   97.89%   97.89%   -0.01%     
==========================================
  Files         226      226              
  Lines       55471    55485      +14     
==========================================
+ Hits        54304    54316      +12     
- Misses       1167     1169       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dbolduc · 2026-05-20T00:25:06Z

            query,
        );
-        let p0 = poller.poll().await;
+        let p0 = poller.poll().instrument(test_span()).await;


Shouldn't we verify the span contents?

Aside: consider writing new tests just for tracing. Often it is quickest to use existing tests to test a new feature... but when you do that enough times the tests get complicated and lose their focus.

ack. I added unit tests for both aip151 and discovery to verify the 2 specific attributes we populate for LROs.

westarle · 2026-05-20T14:58:52Z

+                span.record("gcp.longrunning.operation_name", name);
+                span.record("gcp.resource.destination.id", name);


are these ever different?

in the final design we decided that they're always the same. we expected that downstream tools (e.g., AppHub) will use Operation ID as the default.

westarle · 2026-05-20T15:17:48Z

    async fn poll(&mut self) -> Option<PollingResult<ResponseType, MetadataType>> {
        if let Some(start) = self.start.take() {
            let result = start().await;
            let (op, poll) = crate::details::handle_start(result);


Is it possible that the operation is marked done inside the start()? If so we might want to extract the name from the result above handle_start?

That's a good point. I fixed the loops to extract the name directly from result (if Ok) before calling handle_start. Now we can make sure we don't miss these attributes.

I added a test for this case.

westarle · 2026-05-20T15:19:41Z

            let (op, poll) = crate::details::handle_start(result);
+            #[cfg(google_cloud_unstable_tracing)]
+            if let Some(ref name) = op {
+                let span = tracing::Span::current();


if tracing is disabled, this might interfere with the customers' spans. Can we attach the span directly to the PollerImpl?

Yeah you're right. I considered a few approaches for context propagation:

adding operation_name() as a trait method (impl(o11y): add operation name support to Poller interface #5694): this pollutes our public API and intermediate decorators.

direct span mutation (what you were looking at): this might interfere with customers' spans.

tokio::task_local! (the newer commits).

We also used task-local back then for gaxi RequestRecorder, so I guess it's okay to adopt this.

Regarding attaching the span directly to PollerImpl: I wanted to avoid adding a tracing field to PollerImpl because that forces our polling struct to actively manage and carry o11y variables even when tracing is disabled.

…tion IDs internally

westarle · 2026-05-21T13:20:35Z

    async fn poll(&mut self) -> Option<PollingResult<ResponseType, MetadataType>> {
        if let Some(start) = self.start.take() {
            let result = start().await;
            let (op, poll) = crate::details::handle_start(result);


westarle · 2026-05-22T13:08:03Z

+                0 // Initial triggers record nothing
+            };
+
+            LRO_SPAN


would it make sense to combine into a single "LRO Recorder" that captures the LRO-level span and poll-attempt count? You might be able to make it a struct so that it encapsulates the use of task-locals.

Only consider if it simplifies the code.

I think it does make the code looks more neat.

westarle · 2026-05-22T15:58:08Z

+#[cfg(google_cloud_unstable_tracing)]
+use crate::POLL_ATTEMPT_COUNT;
+
+#[cfg(google_cloud_unstable_tracing)]
+tokio::task_local! {
+    pub(crate) static LRO_SPAN: Span;
+}


can these be unexported and use the LroRecorder API instead of LRO_SPAN directly?

you're right, on it right now

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

Comment thread src/lro/src/internal/tracing.rs

haphungw mentioned this pull request May 19, 2026

impl(o11y): add operation name support to Poller interface #5694

Closed

haphungw force-pushed the stacked-pr-3-rust-declare-and-record-telemetry branch 2 times, most recently from 1471b6c to 5978295 Compare May 19, 2026 23:19

haphungw force-pushed the stacked-pr-3-rust-declare-and-record-telemetry branch from 5978295 to 7355448 Compare May 19, 2026 23:42

haphungw marked this pull request as ready for review May 20, 2026 00:13

haphungw requested a review from a team as a code owner May 20, 2026 00:13

dbolduc reviewed May 20, 2026

View reviewed changes

haphungw force-pushed the stacked-pr-3-rust-declare-and-record-telemetry branch from 7355448 to a983c67 Compare May 20, 2026 00:37

haphungw requested review from coryan and westarle May 20, 2026 13:45

westarle reviewed May 20, 2026

View reviewed changes

haphungw force-pushed the stacked-pr-3-rust-declare-and-record-telemetry branch from 3b62b4a to 5b5d0da Compare May 20, 2026 20:09

haphungw closed this May 20, 2026

haphungw force-pushed the stacked-pr-3-rust-declare-and-record-telemetry branch from 5b5d0da to 27f2c17 Compare May 20, 2026 20:17

haphungw added 4 commits May 20, 2026 20:31

impl(o11y): declare telemetry attributes on LRO span and record opera…

40c804f

…tion IDs internally

add tracing unit tests to verify telemetry attributes

4052cc3

use task-local LRO_SPAN context to prevent customer span pollution

585d9ab

fix immediate-done LRO edge case and add decorator error unit tests

aa25b16

haphungw reopened this May 20, 2026

haphungw requested a review from westarle May 20, 2026 20:56

haphungw added 4 commits May 21, 2026 20:53

encapsulate LRO attempt counting privately inside Tracing decorator

45ad64b

pre-declare LRO metrics fields in client request signals macro

177b6f0

remove redundant non-standard gcp.longrunning.operation_name attribute

99ded01

declare POLL_ATTEMPT_COUNT static tokio task-local count context

e8ac197

haphungw force-pushed the stacked-pr-3-rust-declare-and-record-telemetry branch from 58f49ef to e8ac197 Compare May 21, 2026 20:55

format

da20176

westarle reviewed May 22, 2026

View reviewed changes

haphungw force-pushed the stacked-pr-3-rust-declare-and-record-telemetry branch from b48ba0f to 6da39a2 Compare May 22, 2026 15:45

haphungw added 3 commits May 22, 2026 15:47

remove redundant comment

3fa1020

reuse global testlayer

ebaf8d9

introduce LroRecorder to refactor task local

87ae245

haphungw force-pushed the stacked-pr-3-rust-declare-and-record-telemetry branch from 6da39a2 to 87ae245 Compare May 22, 2026 15:48

haphungw requested a review from westarle May 22, 2026 15:57

westarle reviewed May 22, 2026

View reviewed changes

haphungw added 3 commits May 22, 2026 16:25

make LRO_SPAN private and expose record_destination_id helper

db9448e

expose record_error on LroRecorder to clean up until_done

a43b28e

add LroRecorder unit tests

35fe855

		span.record("gcp.longrunning.operation_name", name);
		span.record("gcp.resource.destination.id", name);

Conversation

haphungw commented May 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

codecov Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented May 19, 2026 •

edited

Loading