Skip to content

fix(llm-access-store): serialize duckdb tiered append against retention#18

Merged
acking-you merged 1 commit into
masterfrom
fix/duckdb-tiered-write-race
May 31, 2026
Merged

fix(llm-access-store): serialize duckdb tiered append against retention#18
acking-you merged 1 commit into
masterfrom
fix/duckdb-tiered-write-race

Conversation

@acking-you
Copy link
Copy Markdown
Owner

What

Fixes the cross-thread write race in the DuckDB tiered usage store surfaced during PR #16's review (gemini flagged the shape; this is the verified, corrected mechanism). Follow-up to the duckdb split (#16), branched from clean master so it edits the just-moved append.rs/retention.rs without conflict.

The bug

The tiered repo's append and retention paths run on two separate OS threads of the usage worker (llm-access-usage-worker spawns the import loop and the maintenance loop as independent current_thread runtimes), both holding Arc::clone of the same DuckDbUsageRepository.

append_usage_events_to_tiered takes the active PersistentUsageWriter out of the state, drops the std::sync::Mutex (its guard can't be held across .await), and awaits the insert. During that window a retention cycle can lock the state, and rollover_expired_active_segment / discard_expired_active_segment checkpoint + delete/roll the active segment and advance active_path. The append then restores its now-stale writer → active_path and active_writer diverge: subsequent current (non-expired) usage events are written through the stale writer into a deleted/rolled segment and silently lost (the divergence persists until a config change forces a writer reopen).

Verified the enabling fact with a probe (not assumed): two RW Connection::open to the same DuckDB file in one process both succeed (shared DatabaseInstance keyed by canonical path), so retention's checkpoint is not lock-rejected while an append is in flight — the race is real, not self-healing.

Severity (calibrated honestly): the precondition is narrow — the active segment's newest committed row must be older than the retention cutoff while it still receives appends (old-timestamp backfill / journal replay, or a very low-traffic node whose active segment aged out). But the data lost is current, billing-relevant usage, so it's worth fixing rather than leaving latent.

The fix

A per-tiered-repo async write gate (tokio::sync::Mutex<()> on TieredDuckDbUsageState) serializing the append write path against retention's active-segment rollover/discard:

  • append holds the gate for its whole body;
  • retention holds it only around rollover_expired_active_segment (the sole active-segment mutator) — archived/detail cleanup stays ungated so it never stalls appends.

Lock order is gate-before-std in both paths (the brief clone-lock is released before awaiting the gate → gate never awaited while holding std), so no inversion/deadlock. Reads/queries take only the std mutex and stay fully concurrent. Production cost: one Arc clone + an uncontended async-mutex acquire per append; in normal operation there's a single import loop so append-vs-append never contends, and append-vs-retention only contends during the hourly rollover window.

Considered and rejected a lighter "generation-check" (don't restore the writer if active_path changed): it fixes only the persistent divergence, not the in-flight loss of a current event being inserted into a segment retention discards mid-await (the segment's committed end_ms can be fully expired while a current row is in flight). Only serialization closes that.

Test

A deterministic #[cfg(test)] seam parks an in-flight append while it holds the gate; the test asserts a concurrent retention timeouts (blocked by the gate), then proceeds once released (gate freed, no deadlock). Verified it has teeth — with the gate removed it fails on exactly the retention must block on the write gate assertion.

Verification

  • cargo clippy -p llm-access-store --all-targets -- -D warnings → clean
  • cargo test -p llm-access-store → 68 passed (67 prior + the new seam test)
  • cargo build -p llm-access (reverse dep) → ok
  • rustfmt on the 5 changed files only; deps/lance/deps/lancedb untouched

🤖 Generated with Claude Code

The tiered usage repo's append and retention paths run on two separate OS
threads of the usage worker, both holding `Arc::clone` of the same repo.
`append_usage_events_to_tiered` takes the active `PersistentUsageWriter` out
of the state, drops the `std::sync::Mutex` (can't hold its guard across
`.await`), and awaits the insert. During that window a retention cycle can
lock the state, and `rollover_expired_active_segment` / `discard_expired_
active_segment` can checkpoint + delete / roll the active segment and advance
`active_path`. The append then restores its now-stale writer, so `active_path`
and `active_writer` diverge: subsequent current (non-expired) usage events are
written through the stale writer into a deleted/rolled segment and silently
lost (the divergence persists until a config change forces a writer reopen).

Verified the enabling fact with a probe: two RW `Connection::open` to the same
DuckDB file in one process both succeed (shared DatabaseInstance), so
retention's checkpoint is NOT lock-rejected while an append is in flight — the
race is real, not self-healing. Precondition is narrow (the active segment's
newest committed row must be older than the retention cutoff while it still
receives appends — i.e. old-timestamp backfill / journal replay, or a very
low-traffic node whose active segment aged out), but the data lost is current,
billing-relevant usage, so it's worth fixing.

Fix: a per-tiered-repo async write gate (`tokio::sync::Mutex`) serializing the
append write path against retention's active-segment rollover/discard. Append
holds the gate for its whole body; retention holds it only around
`rollover_expired_active_segment` (the sole active-segment mutator) so the
archived/detail cleanup never stalls appends. Lock order is gate-before-std
in both paths (the brief clone-lock is released before awaiting the gate), so
no inversion/deadlock; reads/queries take only the std mutex and stay
concurrent.

Test: a deterministic `#[cfg(test)]` seam parks an in-flight append while it
holds the gate; the test asserts a concurrent retention `timeout`s (blocked),
then proceeds once released. Verified it has teeth (fails on exactly the
"retention must block" assertion when the gate is removed).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@acking-you
Copy link
Copy Markdown
Owner Author

/gemini review

@acking-you
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a write_gate mutex to serialize the append write path against active-segment rollover and discard during retention cycles in the DuckDB usage store. This prevents in-flight writers from being orphaned onto deleted or rolled segments. A test-only hook (append_seam) and a corresponding integration test have been added to verify this serialization behavior. There are no review comments, and I have no feedback to provide.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a synchronization mechanism using a write_gate (an Arc<tokio::sync::Mutex<()>>) to serialize the append write path against active-segment rollover and discard during retention cycles. This prevents in-flight writers from being orphaned onto deleted or rolled segments, avoiding data loss. Additionally, a test-only append_seam hook and a corresponding integration test have been added to verify this serialization behavior. I have no feedback to provide as there are no review comments.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@acking-you acking-you merged commit 5a10c52 into master May 31, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant