fix(llm-access-store): serialize duckdb tiered append against retention#18
Conversation
The tiered usage repo's append and retention paths run on two separate OS threads of the usage worker, both holding `Arc::clone` of the same repo. `append_usage_events_to_tiered` takes the active `PersistentUsageWriter` out of the state, drops the `std::sync::Mutex` (can't hold its guard across `.await`), and awaits the insert. During that window a retention cycle can lock the state, and `rollover_expired_active_segment` / `discard_expired_ active_segment` can checkpoint + delete / roll the active segment and advance `active_path`. The append then restores its now-stale writer, so `active_path` and `active_writer` diverge: subsequent current (non-expired) usage events are written through the stale writer into a deleted/rolled segment and silently lost (the divergence persists until a config change forces a writer reopen). Verified the enabling fact with a probe: two RW `Connection::open` to the same DuckDB file in one process both succeed (shared DatabaseInstance), so retention's checkpoint is NOT lock-rejected while an append is in flight — the race is real, not self-healing. Precondition is narrow (the active segment's newest committed row must be older than the retention cutoff while it still receives appends — i.e. old-timestamp backfill / journal replay, or a very low-traffic node whose active segment aged out), but the data lost is current, billing-relevant usage, so it's worth fixing. Fix: a per-tiered-repo async write gate (`tokio::sync::Mutex`) serializing the append write path against retention's active-segment rollover/discard. Append holds the gate for its whole body; retention holds it only around `rollover_expired_active_segment` (the sole active-segment mutator) so the archived/detail cleanup never stalls appends. Lock order is gate-before-std in both paths (the brief clone-lock is released before awaiting the gate), so no inversion/deadlock; reads/queries take only the std mutex and stay concurrent. Test: a deterministic `#[cfg(test)]` seam parks an in-flight append while it holds the gate; the test asserts a concurrent retention `timeout`s (blocked), then proceeds once released. Verified it has teeth (fails on exactly the "retention must block" assertion when the gate is removed). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
/gemini review |
|
@codex review |
There was a problem hiding this comment.
Code Review
This pull request introduces a write_gate mutex to serialize the append write path against active-segment rollover and discard during retention cycles in the DuckDB usage store. This prevents in-flight writers from being orphaned onto deleted or rolled segments. A test-only hook (append_seam) and a corresponding integration test have been added to verify this serialization behavior. There are no review comments, and I have no feedback to provide.
There was a problem hiding this comment.
Code Review
This pull request introduces a synchronization mechanism using a write_gate (an Arc<tokio::sync::Mutex<()>>) to serialize the append write path against active-segment rollover and discard during retention cycles. This prevents in-flight writers from being orphaned onto deleted or rolled segments, avoiding data loss. Additionally, a test-only append_seam hook and a corresponding integration test have been added to verify this serialization behavior. I have no feedback to provide as there are no review comments.
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
What
Fixes the cross-thread write race in the DuckDB tiered usage store surfaced during PR #16's review (gemini flagged the shape; this is the verified, corrected mechanism). Follow-up to the duckdb split (#16), branched from clean master so it edits the just-moved
append.rs/retention.rswithout conflict.The bug
The tiered repo's append and retention paths run on two separate OS threads of the usage worker (
llm-access-usage-workerspawns the import loop and the maintenance loop as independentcurrent_threadruntimes), both holdingArc::cloneof the sameDuckDbUsageRepository.append_usage_events_to_tieredtakes the activePersistentUsageWriterout of the state, drops thestd::sync::Mutex(its guard can't be held across.await), and awaits the insert. During that window a retention cycle can lock the state, androllover_expired_active_segment/discard_expired_active_segmentcheckpoint + delete/roll the active segment and advanceactive_path. The append then restores its now-stale writer →active_pathandactive_writerdiverge: subsequent current (non-expired) usage events are written through the stale writer into a deleted/rolled segment and silently lost (the divergence persists until a config change forces a writer reopen).Verified the enabling fact with a probe (not assumed): two RW
Connection::opento the same DuckDB file in one process both succeed (sharedDatabaseInstancekeyed by canonical path), so retention's checkpoint is not lock-rejected while an append is in flight — the race is real, not self-healing.Severity (calibrated honestly): the precondition is narrow — the active segment's newest committed row must be older than the retention cutoff while it still receives appends (old-timestamp backfill / journal replay, or a very low-traffic node whose active segment aged out). But the data lost is current, billing-relevant usage, so it's worth fixing rather than leaving latent.
The fix
A per-tiered-repo async write gate (
tokio::sync::Mutex<()>onTieredDuckDbUsageState) serializing the append write path against retention's active-segment rollover/discard:rollover_expired_active_segment(the sole active-segment mutator) — archived/detail cleanup stays ungated so it never stalls appends.Lock order is gate-before-std in both paths (the brief clone-lock is released before awaiting the gate → gate never awaited while holding std), so no inversion/deadlock. Reads/queries take only the std mutex and stay fully concurrent. Production cost: one
Arcclone + an uncontended async-mutex acquire per append; in normal operation there's a single import loop so append-vs-append never contends, and append-vs-retention only contends during the hourly rollover window.Considered and rejected a lighter "generation-check" (don't restore the writer if
active_pathchanged): it fixes only the persistent divergence, not the in-flight loss of a current event being inserted into a segment retention discards mid-await(the segment's committedend_mscan be fully expired while a current row is in flight). Only serialization closes that.Test
A deterministic
#[cfg(test)]seam parks an in-flight append while it holds the gate; the test asserts a concurrent retentiontimeouts (blocked by the gate), then proceeds once released (gate freed, no deadlock). Verified it has teeth — with the gate removed it fails on exactly theretention must block on the write gateassertion.Verification
cargo clippy -p llm-access-store --all-targets -- -D warnings→ cleancargo test -p llm-access-store→ 68 passed (67 prior + the new seam test)cargo build -p llm-access(reverse dep) → okrustfmton the 5 changed files only;deps/lance/deps/lancedbuntouched🤖 Generated with Claude Code