Skip to content

osdtrace: queue_lat underflow with value 18446743547040292 #107

@taodd

Description

@taodd

Summary

osdtrace occasionally emits an op_w row whose queue_lat is an unsigned-underflow value (~UINT64_MAX − N) far larger than the row's total op_lat. The pattern reproduces on ops written to a single-replica pool — both peer slots are the (-1, 0) sentinel, the per-stage bluestore_lat / prepare / aio_wait / seq_wait / kv_commit / osd_lat values are all 0, but queue_lat is the underflow.

Sample row

Captured during a cephadm v18.2.7 (reef) cluster's test workload (.mgr pool, id 0):

osd 1 pg 0.0 op_w size 27 client 14233 tid 3126
   throttle_lat 0 recv_lat 7 dispatch_lat 6
   queue_lat 18446743547040292 osd_lat 0
   peers [(-1, 0), (-1, 0)]
   bluestore_lat 0 (prepare 0 aio_wait 0 (aio_size 0) seq_wait 0 kv_commit 0)
   op_lat 2895
  • queue_lat = 18446743547040292 (UINT64_MAX − ~9·10⁸) → ~unsigned underflow of ~9·10⁸ µs
  • op_lat = 2895 µs (sane)
  • peers = [(-1, 0), (-1, 0)] → pool size = 1, no replication path taken
  • every other per-stage latency = 0

How it was caught

The new strict verify_osdtrace_output added in PR #106 (tests/lib/verify-trace-output.sh) enforces each-sub-latency <= op_lat per row. This invariant flagged the row with:

ERROR: Sub-latency queue_lat=18446743547040292 µs > op_lat=2895 µs (op=op_w osd=1 pool=0 tid=3126)

Reproduction

Suggested investigation

src/osdtrace.bpf.c — look at the code path that handles op_w for pools with size == 1 (no replicas). The queue-start timestamp is probably set behind a code path that's only entered when replication runs; on the fast path the per-CPU queue_start_ts map entry is stale (zero / leftover from the prior op), so the end-time subtraction underflows.

Workaround in test infrastructure (PR #106)

PR #106 has the test verifier downstream of this bug. Short-term, the verifier may skip the strict sub-latency check for op_w rows where both peer slots are (-1, …) so CI stays useful while this bug is fixed. Long-term fix belongs in the BPF prog.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions