Spark: Bound runaway serializable-isolation and concurrent-refresh tests#16562
Open
wombatu-kun wants to merge 1 commit into
Open
Spark: Bound runaway serializable-isolation and concurrent-refresh tests#16562wombatu-kun wants to merge 1 commit into
wombatu-kun wants to merge 1 commit into
Conversation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #16359
Summary
Several
spark-extensionsconcurrency tests ran two worker threads in a barrier-synchronized loop bounded byInteger.MAX_VALUEwhile the main thread blocked onassertThatThrownBy(<op>Future::get)with no timeout, expecting a conflict exception. When that exception was never thrown, nothing bounded either thread — the append thread kept committing data files until CI hit its wall-clock limit or ran out of disk (as in #16303, where the runaway filled the GitHub Actions disk and was retriggered several times before the cause was found). This makes those tests fail fast instead of relying on external limits.What changed
MAX_OPERATIONS = 20constant — the same value the sibling*WithSnapshotIsolationtests already use with this identical harness — so the loop can no longer run unbounded.Future.get(OPERATION_TIMEOUT_MINUTES, TimeUnit.MINUTES)(5 minutes) and cancelled the operation future infinally, so a stuck operation is interrupted and the wait can't block forever.MAX_OPERATIONSandOPERATION_TIMEOUT_MINUTESintoSparkRowLevelOperationsTestBaseso these concurrency tests share one bound (the snapshot-isolation siblings now reference the same constant).Affected methods (across Spark v3.5/v4.0/v4.1 where present):
testMergeWithSerializableIsolation,testDeleteWithSerializableIsolation,testUpdateWithSerializableIsolation, and the copy-on-writetestMergeWithConcurrentTableRefresh,testDeleteWithConcurrentTableRefresh,testUpdateWithConcurrentTableRefresh.In a passing run the conflict still fires within the first couple of iterations, so behavior is unchanged; in a regression the bounded loop plus timeout make the test fail fast with a clear assertion instead of exhausting CI resources.
🤖 Generated with Claude Code