Skip to content

Flaky test report: committed-code failures on 2026-05-26 #277

@andrross

Description

@andrross

Flaky test report: committed-code failures on 2026-05-26

Summary

18 test failures were recorded against committed code (Timer/Post Merge Action builds on main) in the past 24 hours across 8 distinct builds. These failures map to 10 distinct test methods (capping at 10 as requested).

None of the failures reproduced locally with the original seed, which is consistent with timing-dependent flakiness (seeds control Random streams but not thread scheduling, GC pauses, or network timing).

Summary Table (sorted by total unique builds affected)

# Test Builds Affected (all-time) First Failure Recent Build Reproduced? Pattern
1 MixedClusterClientYamlTestSuiteIT (310_match_bool_prefix) 482 2024-03-25 78259 Skipped (requires BWC cluster) Chronic, stable
2 ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability 124 2024-10-03 78284 No Chronic, worsening since Apr 2026
3 ClusterShardLimitIT.testOpenIndexOverLimit 53 2025-10-15 78218 No Stable ~7/month
4 RareClusterStateIT.testDisassociateNodesWhileShardInit 44 2024-11-04 78270 No Worsening since Apr 2026
5 DataFormatAwareEngineTests.testApplyMergeChangesUpdatesCatalogAndNotifiesListeners 15 2026-05-08 78265 No New, worsening (15 builds in May)
6 WarmIndexBasicIT.testLocalDirectoryFilesAfterRefresh 14 2026-04-29 78271 No New, worsening
7 WarmIndexSegmentReplicationIT.testReplicationAfterForceMergeOnPrimaryShardsOnly 13 2025-03-17 78284 No Chronic, low rate
8 FlightOutboundHandlerContextPropagationTests.testThreadContextPropagatedThroughStreamResponseBatch 12 2026-04-14 78275 No New, worsening
9 MergedSegmentWarmerIT.testCleanupRedundantPendingMergeFile 8 2025-07-31 78246 No Chronic, low rate
10 OsProbeTests.testGetProcessNativeMemoryBytes_returnsDifferenceWhenRssAnonExceedsCommitted 1 2026-05-25 78260 N/A (test not in current checkout) Brand new

Detailed Findings

1. MixedClusterClientYamlTestSuiteIT (310_match_bool_prefix)

  • Build: 78259
  • Seed: 5F29C2578D2BCC55
  • Reproduction: Skipped — requires multi-version BWC cluster infrastructure not available locally
  • History: 482 unique builds affected since March 2024. This is a chronically flaky BWC test with periodic spikes (185 builds in Sep 2024, 26 in May 2026). The failure rate is stable and has been ongoing for over 2 years.
  • Pattern: Chronic, stable. Multiple test parameterizations fail together (both "complete term" and "partial term" variants).

2. ConcurrentSeqNoVersioningIT.testSeqNoCASLinearizability

  • Build: 78284
  • Seed: 3B342D4DE6E6AEC5
  • Reproduction: Did not reproduce with seed
  • History: 124 unique builds since Oct 2024. Notable spike: 35 builds in Apr 2026, 24 in May 2026 (up from 3-6/month previously).
  • Pattern: Chronic, worsening significantly since April 2026. The timing aligns with the m7a.8xlarge runner migration — faster CPUs likely tighten the race window in this CAS linearizability test. Error was a cluster health timeout.

3. ClusterShardLimitIT.testOpenIndexOverLimit

  • Build: 78218
  • Seed: 1692B8C36D0111B8
  • Reproduction: Did not reproduce with seed
  • History: 53 unique builds since Oct 2025. Steady rate of 6-12 builds/month.
  • Pattern: Stable flake. The parameterized variant with writable_warm_index.enabled=true is the one failing.

4. RareClusterStateIT.testDisassociateNodesWhileShardInit

  • Build: 78270
  • Seed: CF25BD78D2BF7182
  • Reproduction: Did not reproduce with seed
  • History: 44 unique builds since Nov 2024. Spike from 3-4/month to 12 in Apr 2026 and 14 in May 2026.
  • Pattern: Chronic, worsening since April 2026. Another candidate for CPU-speed amplification on the new runners.

5. DataFormatAwareEngineTests.testApplyMergeChangesUpdatesCatalogAndNotifiesListeners

  • Build: 78265 (also 78249)
  • Seed: 8757F4613DE85FFF (build 78265), DFE8C39BE1E377C (build 78249)
  • Reproduction: Did not reproduce with either seed
  • History: 15 unique builds, all in May 2026 (first failure May 8).
  • Pattern: Brand new and worsening rapidly. Error: "afterRefresh must fire exactly once for the merge — Expected: <1> but: was <0>". This suggests a race between merge completion and refresh listener notification.

6. WarmIndexBasicIT.testLocalDirectoryFilesAfterRefresh

  • Build: 78271
  • Seed: DE8A4CBCF51653EE
  • Reproduction: Did not reproduce with seed
  • History: 14 unique builds since Apr 29, 2026 (4 in Apr, 10 in May).
  • Pattern: New and worsening. Appeared shortly after the runner migration.

7. WarmIndexSegmentReplicationIT.testReplicationAfterForceMergeOnPrimaryShardsOnly

  • Build: 78284
  • Seed: 3B342D4DE6E6AEC5
  • Reproduction: Did not reproduce with seed
  • History: 13 unique builds since Mar 2025. Low, sporadic rate (0-2/month).
  • Pattern: Chronic, low rate. Error: "Expected: a value equal to or greater than <2L> but: <0L> was less than <2L>".

8. FlightOutboundHandlerContextPropagationTests.testThreadContextPropagatedThroughStreamResponseBatch

  • Build: 78275
  • Seed: 40CB66C49A747F9F
  • Reproduction: Did not reproduce with seed
  • History: 12 unique builds since Apr 14, 2026 (4 in Apr, 8 in May).
  • Pattern: New and worsening. First appeared around the runner migration date. Thread context propagation in async streaming paths is timing-sensitive.

9. MergedSegmentWarmerIT.testCleanupRedundantPendingMergeFile

  • Build: 78246
  • Seed: DEE521945FF2C973
  • Reproduction: Did not reproduce with seed
  • History: 8 unique builds since Jul 2025. Very low rate (0-2/month).
  • Pattern: Chronic, low rate, stable.

10. OsProbeTests.testGetProcessNativeMemoryBytes_returnsDifferenceWhenRssAnonExceedsCommitted

  • Build: 78260
  • Seed: E4AF6E84948102C1
  • Reproduction: N/A — test method does not exist in the current local checkout (likely introduced in a recent commit not yet pulled)
  • History: 1 unique build (first and only failure: May 25, 2026).
  • Pattern: Brand new, single occurrence. May be a genuine test bug in newly-merged code rather than a flake.

Observations

  1. Runner migration impact: Tests Bump com.diffplug.spotless from 5.6.1 to 6.2.1 #2, Bump opensearch-core from current to 1.2.4 in /buildSrc/src/testKit/opensearch.build #4, Bump junit from 4.13.1 to 4.13.2 in /buildSrc/src/testKit/testingConventions #6, and Bump jopt-simple from 5.0.2 to 5.0.4 in /libs/cli #8 all show significant worsening starting in April 2026, coinciding with the m5.8xlarge → m7a.8xlarge CI runner migration. Faster CPUs tighten race windows in concurrent tests.

  2. New tests with high flake rates: Tests Bump com.diffplug.spotless from 5.6.1 to 6.3.0 #5 (DataFormatAwareEngineTests) and Bump junit from 4.13.1 to 4.13.2 in /buildSrc/src/testKit/testingConventions #6 (WarmIndexBasicIT) are both new (first failures in late April/May 2026) and already accumulating failures rapidly. These likely need immediate attention.

  3. No seed-based reproduction: None of the 9 tests that could be run locally reproduced with their original seeds. This is expected for timing-dependent races — the seeds control randomized parameters but not thread interleaving.

  4. Chronic offenders: MixedClusterClientYamlTestSuiteIT has been flaky for over 2 years with 482 affected builds. ConcurrentSeqNoVersioningIT has been flaky since Oct 2024 with 124 affected builds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions