test(tables): stale-base concurrent-commit regression tests#620
Open
mkuchenbecker wants to merge 14 commits into
Open
test(tables): stale-base concurrent-commit regression tests#620mkuchenbecker wants to merge 14 commits into
mkuchenbecker wants to merge 14 commits into
Conversation
…cident-12185) Single-JVM, no-threads, no-docker reproduction of the BaseTransaction.applyUpdates silent-rebase lost-update. A stale writer stages its L1 snapshot view + COMMIT_KEY=L1 in a held transaction; a racing writer advances the catalog to L2; committing the stale transaction rebases the stale payload onto L2 so doCommit would subtract the racing snapshot. Asserts the racing snapshot survives. Passes on main (with the linkedin#612 stale-base abort CAS); reproduces the silent drop on the pre-fix tree. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Factor the staged-conflict concurrency into a shared core and add a second test where the table already holds committed data history (two snapshots) before the race begins. Both assert the racing snapshot survives. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both contending writers now append data: the stale writer commits its own data snapshot at base L1 (payload omitting the racing snapshot), modeling a genuine concurrent-insert race rather than a property-only update. Also assert the stale writer's conflicting insert is rejected wholesale, not merged onto the racing commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Covers the race on a freshly created table with no committed data snapshots: the stale writer's first insert races a concurrent first insert. Confirms the racing snapshot survives even when the base has no prior history. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Models the production maintenance shape: an expire-snapshots commit (keep-subset payload computed at base L1) racing a concurrent data insert. The stale expire's subset omits the racing snapshot, so the subtractive merge would expire the fresh data commit. Asserts the data commit survives and the stale expire is rejected wholesale. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
count=0 (plain create) already reproduces the insert race; the single-snapshot case added no coverage. Keep: zero-prior-data insert, two-snapshot prior-data insert, and the expire-vs-insert race (which requires base snapshots).
mkuchenbecker
commented
May 29, 2026
Co-authored-by: Mike Kuchenbecker <mike@michaelkuchenbecker.com>
Validated by running pre-fix (d4fc9fe): both tests FAIL (racing snapshot silently dropped); post-fix (linkedin#612): both PASS. Three layered guards must all be bypassed for the lost update: 1. HTS optimistic-version CAS -> bypassed by applyUpdates' silent rebase (held txn) 2. failIfRetryUpdate per-JVM cache -> bypassed cross-replica (cleared in the test) 3. Iceberg snapshot sequence-number validation -> only bypassed by a SUBTRACTIVE stale commit (adds no new snapshot). A stale writer adding its own snapshot on a multi-snapshot base is rejected by guard #3, so it cannot lose data -- which is why incident-12185 was an expire/optimizer drop, not an insert race. Tests (both subtractive): expire racing a data insert (prior history); two concurrent first-inserts on a fresh table.
…scribe behavior only Third case: two writers insert into a table that already holds data; the catalog rejects the stale commit and the concurrent insert remains. Comments rewritten to describe observable behavior without external references.
mkuchenbecker
commented
May 29, 2026
mkuchenbecker
commented
May 29, 2026
mkuchenbecker
commented
May 29, 2026
mkuchenbecker
commented
May 29, 2026
mkuchenbecker
commented
May 29, 2026
mkuchenbecker
commented
May 29, 2026
Inline commitThroughRepository, snapshotJson, idOf, and refs at their call sites. Assert the table holds exactly the prior snapshots plus the concurrently committed snapshot, regardless of whether the stale commit throws.
Prepare each commit's snapshot list explicitly at the call site so the base + new snapshot is visible in the flow.
mkuchenbecker
commented
May 29, 2026
Use assertThrows(CommitFailedException) for the rejection, then assert the table holds exactly the prior snapshots plus the concurrent commit.
…ate case Assert the stale commit is rejected (any exception) rather than a specific type, so the contract is rejection + final table state. Add a case where the stale commit changes metadata without adding a snapshot.
Set an actual table property (plus the base snapshot view, no new snapshot) so the case is a genuine property-changing commit rather than internal bookkeeping only.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Concurrency regression tests (
StaleBaseLostUpdateTest, H2-backed, single-JVM) for committing a snapshot set against a stale base version: a writer that still declares an older base must not silently drop a snapshot another writer committed concurrently.Scenarios
Each test loads a table at version L1, lets a second writer commit a new snapshot (advancing the catalog to L2), then commits a writer that still declares L1 as its base with a snapshot set computed at L1 — which omits the second writer's snapshot. The concurrently added snapshot must survive, and the stale commit must be rejected.
testStaleInsertDropsConcurrentDataCommitOnFreshTable— two writers race the first insert into a fresh table.testExpireSnapshotsDropsConcurrentDataCommit— an expiration (keep only the current head) races a concurrent insert.testStaleMetadataUpdateDropsConcurrentDataCommit— a metadata change that adds no snapshot races a concurrent insert.testConcurrentInsertOnPopulatedTableIsRejected— two writers insert into a table that already holds data.Results across commits
ff9a40c114fe4603d4fc9fe7Why the results differ
Before #406, the catalog rejected a stale commit that deletes the latest snapshot only when the commit adds no snapshot of its own (
SnapshotInspector.validateSnapshotsUpdate, which exempts snapshot-adding commits). So:The single check that covers all of these is #612's base-divergence abort: it rejects any commit whose declared base no longer matches the catalog.
Testing
All four pass with #612. Each test asserts the stale commit is rejected (throws) and that the table holds exactly the prior snapshots plus the concurrently committed snapshot. Test-only; no production changes.
🤖 Generated with Claude Code