Skip to content

Flaky test report: committed-code failures on 2026-05-25 #276

@andrross

Description

@andrross

Flaky Test Report: Committed-Code Failures on 2026-05-25

Tests that failed against committed code (Timer/Post Merge runs on main) in the past 24 hours, with historical context.

Summary Table

Test Builds Affected First Failure Pattern Reproduced Locally
SearchWeightedRoutingIT.testStrictWeightedRoutingWithCustomString_FailOpenEnabled 85 2024-06-04 Chronic, stable No
LangPainlessClientYamlTestSuiteIT (derived_fields/30) 37 2024-06-11 Chronic, worsening No
WarmIndexBasicIT.testLocalDirectoryFilesAfterRefresh 13 2026-04-29 New, worsening No

1. SearchWeightedRoutingIT.testStrictWeightedRoutingWithCustomString_FailOpenEnabled

Recent build: #78171
Seed: BC1A22DE8640D04A:F0F173AACDEE76E5
Error: java.lang.AssertionError: expected:<0> but was:<35> — assertion at assertNoSearchInAZ (line 851)
Local reproduction: Passed with seed (not deterministically reproducible)

History: Chronic flaky test active since June 2024. 85 unique builds affected across 24 months. Failure rate is low but persistent (typically 1-6 builds/month). No clear trend of improvement or worsening. The test exercises weighted routing with fail-open enabled and appears to have a race condition where searches still reach a zone that should be excluded.

Monthly failure counts (unique builds):

  • 2024: Jun(7), Jul(5), Aug(5), Sep(12), Oct(6), Nov(5), Dec(3)
  • 2025: Jan(1), Mar(2), Apr(4), May(1), Jun(2), Jul(6), Aug(3), Sep(1), Oct(6), Nov(6), Dec(1)
  • 2026: Jan(6), Mar(2), May(1)

2. LangPainlessClientYamlTestSuiteIT.test {yaml=painless/derived_fields/30_derived_field_search_definition/Test derived_field supported type using search definition}

Recent build: #78144
Seed: 77E9A2B8A52F2B50:FFBD9D620BD346A8
Error: hits.total didn't match expected value: expected Integer [4] but was Integer [3]
Local reproduction: Passed with seed (not deterministically reproducible)

History: Chronic flaky test active since June 2024. 37 unique builds affected. Had a notable spike in December 2024 (12 builds) and is currently worsening in May 2026 (5 builds so far). The test indexes documents and searches using derived fields; the missing hit suggests a timing issue where not all documents are visible at search time.

Monthly failure counts (unique builds):

  • 2024: Jun(5), Jul(1), Aug(1), Dec(12)
  • 2025: Mar(3), Apr(1), May(1), Jul(1), Sep(1), Nov(1), Dec(2)
  • 2026: Jan(1), Apr(2), May(5)

3. WarmIndexBasicIT.testLocalDirectoryFilesAfterRefresh

Recent build: #78171
Seed: BC1A22DE8640D04A:7E4F7959DE9E799A
Error: java.lang.AssertionError (assertTrue at line 187)
Local reproduction: Passed with seed (not deterministically reproducible)

History: New flaky test — first appeared 2026-04-29 (build 75374). 13 unique builds affected in less than a month. Rapidly worsening: 4 builds in April, 9 builds in May so far. The timing of first appearance coincides with the CI runner migration from m5.8xlarge to m7a.8xlarge (mid-April 2026), suggesting possible CPU-speed sensitivity in the test's assumptions about file state after refresh.

Monthly failure counts (unique builds):

  • 2026: Apr(4), May(9)

Reproduction Methodology

Each test was run locally with the exact seed from the failing CI build:

./gradlew :server:internalClusterTest --tests "<class.method>" -Dtests.seed=<SEED>
./gradlew :modules:lang-painless:yamlRestTest --tests "<class.method>" -Dtests.seed=<SEED>

All three tests passed locally, confirming these are non-deterministic failures where the seed alone does not control the relevant scheduling/timing factors.

Notes

  • None of these failures are deterministically reproducible with their seeds, which is consistent with race conditions or timing-dependent behavior.
  • The WarmIndexBasicIT test is the most actionable — it is new, worsening, and likely related to the April 2026 CI runner change.
  • The SearchWeightedRoutingIT test is a long-standing chronic flake with no sign of resolution.
  • The LangPainlessClientYamlTestSuiteIT test shows episodic spikes and may be worsening again.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions