Flaky Test Report: Committed-Code Failures on 2026-05-25
Tests that failed against committed code (Timer/Post Merge runs on main) in the past 24 hours, with historical context.
Summary Table
| Test |
Builds Affected |
First Failure |
Pattern |
Reproduced Locally |
SearchWeightedRoutingIT.testStrictWeightedRoutingWithCustomString_FailOpenEnabled |
85 |
2024-06-04 |
Chronic, stable |
No |
LangPainlessClientYamlTestSuiteIT (derived_fields/30) |
37 |
2024-06-11 |
Chronic, worsening |
No |
WarmIndexBasicIT.testLocalDirectoryFilesAfterRefresh |
13 |
2026-04-29 |
New, worsening |
No |
1. SearchWeightedRoutingIT.testStrictWeightedRoutingWithCustomString_FailOpenEnabled
Recent build: #78171
Seed: BC1A22DE8640D04A:F0F173AACDEE76E5
Error: java.lang.AssertionError: expected:<0> but was:<35> — assertion at assertNoSearchInAZ (line 851)
Local reproduction: Passed with seed (not deterministically reproducible)
History: Chronic flaky test active since June 2024. 85 unique builds affected across 24 months. Failure rate is low but persistent (typically 1-6 builds/month). No clear trend of improvement or worsening. The test exercises weighted routing with fail-open enabled and appears to have a race condition where searches still reach a zone that should be excluded.
Monthly failure counts (unique builds):
- 2024: Jun(7), Jul(5), Aug(5), Sep(12), Oct(6), Nov(5), Dec(3)
- 2025: Jan(1), Mar(2), Apr(4), May(1), Jun(2), Jul(6), Aug(3), Sep(1), Oct(6), Nov(6), Dec(1)
- 2026: Jan(6), Mar(2), May(1)
2. LangPainlessClientYamlTestSuiteIT.test {yaml=painless/derived_fields/30_derived_field_search_definition/Test derived_field supported type using search definition}
Recent build: #78144
Seed: 77E9A2B8A52F2B50:FFBD9D620BD346A8
Error: hits.total didn't match expected value: expected Integer [4] but was Integer [3]
Local reproduction: Passed with seed (not deterministically reproducible)
History: Chronic flaky test active since June 2024. 37 unique builds affected. Had a notable spike in December 2024 (12 builds) and is currently worsening in May 2026 (5 builds so far). The test indexes documents and searches using derived fields; the missing hit suggests a timing issue where not all documents are visible at search time.
Monthly failure counts (unique builds):
- 2024: Jun(5), Jul(1), Aug(1), Dec(12)
- 2025: Mar(3), Apr(1), May(1), Jul(1), Sep(1), Nov(1), Dec(2)
- 2026: Jan(1), Apr(2), May(5)
3. WarmIndexBasicIT.testLocalDirectoryFilesAfterRefresh
Recent build: #78171
Seed: BC1A22DE8640D04A:7E4F7959DE9E799A
Error: java.lang.AssertionError (assertTrue at line 187)
Local reproduction: Passed with seed (not deterministically reproducible)
History: New flaky test — first appeared 2026-04-29 (build 75374). 13 unique builds affected in less than a month. Rapidly worsening: 4 builds in April, 9 builds in May so far. The timing of first appearance coincides with the CI runner migration from m5.8xlarge to m7a.8xlarge (mid-April 2026), suggesting possible CPU-speed sensitivity in the test's assumptions about file state after refresh.
Monthly failure counts (unique builds):
Reproduction Methodology
Each test was run locally with the exact seed from the failing CI build:
./gradlew :server:internalClusterTest --tests "<class.method>" -Dtests.seed=<SEED>
./gradlew :modules:lang-painless:yamlRestTest --tests "<class.method>" -Dtests.seed=<SEED>
All three tests passed locally, confirming these are non-deterministic failures where the seed alone does not control the relevant scheduling/timing factors.
Notes
- None of these failures are deterministically reproducible with their seeds, which is consistent with race conditions or timing-dependent behavior.
- The WarmIndexBasicIT test is the most actionable — it is new, worsening, and likely related to the April 2026 CI runner change.
- The SearchWeightedRoutingIT test is a long-standing chronic flake with no sign of resolution.
- The LangPainlessClientYamlTestSuiteIT test shows episodic spikes and may be worsening again.
Flaky Test Report: Committed-Code Failures on 2026-05-25
Tests that failed against committed code (Timer/Post Merge runs on
main) in the past 24 hours, with historical context.Summary Table
SearchWeightedRoutingIT.testStrictWeightedRoutingWithCustomString_FailOpenEnabledLangPainlessClientYamlTestSuiteIT(derived_fields/30)WarmIndexBasicIT.testLocalDirectoryFilesAfterRefresh1. SearchWeightedRoutingIT.testStrictWeightedRoutingWithCustomString_FailOpenEnabled
Recent build: #78171
Seed:
BC1A22DE8640D04A:F0F173AACDEE76E5Error:
java.lang.AssertionError: expected:<0> but was:<35>— assertion atassertNoSearchInAZ(line 851)Local reproduction: Passed with seed (not deterministically reproducible)
History: Chronic flaky test active since June 2024. 85 unique builds affected across 24 months. Failure rate is low but persistent (typically 1-6 builds/month). No clear trend of improvement or worsening. The test exercises weighted routing with fail-open enabled and appears to have a race condition where searches still reach a zone that should be excluded.
Monthly failure counts (unique builds):
2. LangPainlessClientYamlTestSuiteIT.test {yaml=painless/derived_fields/30_derived_field_search_definition/Test derived_field supported type using search definition}
Recent build: #78144
Seed:
77E9A2B8A52F2B50:FFBD9D620BD346A8Error:
hits.total didn't match expected value: expected Integer [4] but was Integer [3]Local reproduction: Passed with seed (not deterministically reproducible)
History: Chronic flaky test active since June 2024. 37 unique builds affected. Had a notable spike in December 2024 (12 builds) and is currently worsening in May 2026 (5 builds so far). The test indexes documents and searches using derived fields; the missing hit suggests a timing issue where not all documents are visible at search time.
Monthly failure counts (unique builds):
3. WarmIndexBasicIT.testLocalDirectoryFilesAfterRefresh
Recent build: #78171
Seed:
BC1A22DE8640D04A:7E4F7959DE9E799AError:
java.lang.AssertionError(assertTrue at line 187)Local reproduction: Passed with seed (not deterministically reproducible)
History: New flaky test — first appeared 2026-04-29 (build 75374). 13 unique builds affected in less than a month. Rapidly worsening: 4 builds in April, 9 builds in May so far. The timing of first appearance coincides with the CI runner migration from m5.8xlarge to m7a.8xlarge (mid-April 2026), suggesting possible CPU-speed sensitivity in the test's assumptions about file state after refresh.
Monthly failure counts (unique builds):
Reproduction Methodology
Each test was run locally with the exact seed from the failing CI build:
All three tests passed locally, confirming these are non-deterministic failures where the seed alone does not control the relevant scheduling/timing factors.
Notes