You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #19004 de-flaked the three Trino-plugin file-operation tests (TestHudiNoCacheFileOperations, TestHudiMemoryCacheFileOperations, TestHudiAlluxioCacheFileOperations) by dropping all METADATA_TABLE operations from getFileOperations (and all Alluxio.* operations in the Alluxio class) before asserting the per-query multiset of filesystem-access spans. That removed the per-query flakiness but also removed the assertions' ability to detect metadata-table read amplification: a future change that, for example, doubles the number of metadata-table reads per query would now pass silently because no test counts those reads anymore.
Find a way to restore a regression signal on metadata-table read volume for these tests without re-introducing the span-leak flakiness that #19004 (and the earlier #18766 / #18995) fought.
Why this task is needed:
The metadata-table read counts were the main thing these FileOperations tests pinned down - how many low-level reads each query issues against the metadata table. After #19004 the metadata-table dimension is no longer asserted at all, so read-amplification regressions on the Trino read path are now invisible to CI. (The Alluxio cache-hit dimension is separately re-covered by the count-independent testReadsServedFromAlluxioCache added in the same PR, so only the metadata-table dimension is uncovered.)
Background: why the obvious fixes do not work
Trino resets the OpenTelemetry span exporter at the start of each executeWithPlan, so any span emitted by a Hudi background thread (the shared split-loader / split-manager / ForkJoinPool.commonPool pools that read the metadata table) after the synchronous query returns lands in the next measurement window. The result is a symmetric off-by-N: one query is counted long and the paired query short by almost the same amount.
An exact-count assertion on metadata-table spans flakes - this is the original failure.
A tolerance / lower-bound assertion on metadata-table spans still flakes, because the leak is bidirectional: a query can be counted short (its own spans leaked out) as well as long, and a lower bound is violated by the short case. This is the key difference from the Alluxio cache-hit check, where leaked spans only ever add hits (monotonic), so a lower bound there is safe.
Candidate directions (to validate, not decided)
These are hypotheses for the follow-up, not a committed design:
Aggregate / conservation assertion. The leak shifts spans between adjacent windows but does not create or destroy them, so the total metadata-table read count across the paired measurements (or across the whole test class) should be conserved even though the per-query split is not. Asserting that aggregate would still catch a 2x amplification (which doubles the total) while tolerating the attribution jitter. Needs validation that nothing leaks past the chosen aggregation boundary (for example the last query's late spans).
Deterministic drain / quiesce of the background metadata-table reader pools before the measurement window closes, so the metadata-table spans are captured inside the synchronous query window and exact counts become deterministic again. The obstacle is that those pools are shared / global with no clean await hook exposed to the Trino test harness.
Task Description
What needs to be done:
PR #19004 de-flaked the three Trino-plugin file-operation tests (
TestHudiNoCacheFileOperations,TestHudiMemoryCacheFileOperations,TestHudiAlluxioCacheFileOperations) by dropping allMETADATA_TABLEoperations fromgetFileOperations(and allAlluxio.*operations in the Alluxio class) before asserting the per-query multiset of filesystem-access spans. That removed the per-query flakiness but also removed the assertions' ability to detect metadata-table read amplification: a future change that, for example, doubles the number of metadata-table reads per query would now pass silently because no test counts those reads anymore.Find a way to restore a regression signal on metadata-table read volume for these tests without re-introducing the span-leak flakiness that #19004 (and the earlier #18766 / #18995) fought.
Why this task is needed:
The metadata-table read counts were the main thing these
FileOperationstests pinned down - how many low-level reads each query issues against the metadata table. After #19004 the metadata-table dimension is no longer asserted at all, so read-amplification regressions on the Trino read path are now invisible to CI. (The Alluxio cache-hit dimension is separately re-covered by the count-independenttestReadsServedFromAlluxioCacheadded in the same PR, so only the metadata-table dimension is uncovered.)Background: why the obvious fixes do not work
Trino resets the OpenTelemetry span exporter at the start of each
executeWithPlan, so any span emitted by a Hudi background thread (the shared split-loader / split-manager /ForkJoinPool.commonPoolpools that read the metadata table) after the synchronous query returns lands in the next measurement window. The result is a symmetric off-by-N: one query is counted long and the paired query short by almost the same amount.Candidate directions (to validate, not decided)
These are hypotheses for the follow-up, not a committed design:
Task Type
Test enhancement
Related Issues