[Trino] Restore metadata-table read-amplification coverage in TestHudi*FileOperations lost to the span-leak de-flake

## Task Description

**What needs to be done:**

PR #19004 de-flaked the three Trino-plugin file-operation tests (`TestHudiNoCacheFileOperations`, `TestHudiMemoryCacheFileOperations`, `TestHudiAlluxioCacheFileOperations`) by dropping all `METADATA_TABLE` operations from `getFileOperations` (and all `Alluxio.*` operations in the Alluxio class) before asserting the per-query multiset of filesystem-access spans. That removed the per-query flakiness but also removed the assertions' ability to detect metadata-table read amplification: a future change that, for example, doubles the number of metadata-table reads per query would now pass silently because no test counts those reads anymore.

Find a way to restore a regression signal on metadata-table read volume for these tests without re-introducing the span-leak flakiness that #19004 (and the earlier #18766 / #18995) fought.

**Why this task is needed:**

The metadata-table read counts were the main thing these `FileOperations` tests pinned down - how many low-level reads each query issues against the metadata table. After #19004 the metadata-table dimension is no longer asserted at all, so read-amplification regressions on the Trino read path are now invisible to CI. (The Alluxio cache-hit dimension is separately re-covered by the count-independent `testReadsServedFromAlluxioCache` added in the same PR, so only the metadata-table dimension is uncovered.)

## Background: why the obvious fixes do not work

Trino resets the OpenTelemetry span exporter at the start of each `executeWithPlan`, so any span emitted by a Hudi background thread (the shared split-loader / split-manager / `ForkJoinPool.commonPool` pools that read the metadata table) after the synchronous query returns lands in the *next* measurement window. The result is a symmetric off-by-N: one query is counted long and the paired query short by almost the same amount.

- An **exact-count** assertion on metadata-table spans flakes - this is the original failure.
- A **tolerance / lower-bound** assertion on metadata-table spans still flakes, because the leak is bidirectional: a query can be counted short (its own spans leaked out) as well as long, and a lower bound is violated by the short case. This is the key difference from the Alluxio cache-hit check, where leaked spans only ever *add* hits (monotonic), so a lower bound there is safe.

## Candidate directions (to validate, not decided)

These are hypotheses for the follow-up, not a committed design:

1. **Aggregate / conservation assertion.** The leak shifts spans between adjacent windows but does not create or destroy them, so the *total* metadata-table read count across the paired measurements (or across the whole test class) should be conserved even though the per-query split is not. Asserting that aggregate would still catch a 2x amplification (which doubles the total) while tolerating the attribution jitter. Needs validation that nothing leaks past the chosen aggregation boundary (for example the last query's late spans).
2. **Deterministic drain / quiesce** of the background metadata-table reader pools before the measurement window closes, so the metadata-table spans are captured inside the synchronous query window and exact counts become deterministic again. The obstacle is that those pools are shared / global with no clean await hook exposed to the Trino test harness.
3. (Recorded as rejected) A span-stability poll - the #18766 approach - did not bound the race and is not a path to revisit.

## Task Type

Test enhancement

## Related Issues

- Originating PR: #19004 (prior attempts: #18766, #18995)
- Reviewer call-out: https://github.com/apache/hudi/pull/19004#pullrequestreview-4522612971


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Trino] Restore metadata-table read-amplification coverage in TestHudi*FileOperations lost to the span-leak de-flake #19037

Task Description

Background: why the obvious fixes do not work

Candidate directions (to validate, not decided)

Task Type

Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Trino] Restore metadata-table read-amplification coverage in TestHudi*FileOperations lost to the span-leak de-flake #19037

Description

Task Description

Background: why the obvious fixes do not work

Candidate directions (to validate, not decided)

Task Type

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions