perf(driver): cache OpenLedger result to eliminate per-request DB queries#1355
perf(driver): cache OpenLedger result to eliminate per-request DB queries#1355ariel-formance wants to merge 16 commits into
Conversation
…ries Every HTTP request previously ran GetLedger + CountLedgersInBucket before any business logic, consuming connection slots during burst load. Add a TTL-based in-memory cache (default 60s) on Driver with explicit eviction on UpdateLedgerMetadata and DeleteLedgerMetadata. A singleflight group collapses concurrent cache-miss requests for the same ledger into a single DB call, preventing thundering-herd fan-out on cold-cache bursts. aloneInBucket is already shared per-bucket via the ledgerstore Factory atomic, so cache hits can skip CountLedgersInBucket entirely. Scope-risk: low — cache is write-through evicted on all mutation paths; TTL bounds cross-pod staleness to 60s for user-defined ledger metadata only. Not-tested: integration tests require Docker (go:build it tag).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughDriver now caches loaded ledgers in-memory with TTL expiry, coalesces concurrent cache-miss loads per ledger using singleflight, and evicts entries after metadata/bucket update or delete; OpenLedger serves cached stores when valid and reloads from systemStore on miss or TTL expiry. ChangesLedger Caching Implementation
Sequence Diagram — OpenLedger cache flow sequenceDiagram
participant Client
participant Driver
participant Cache as Driver.ledgerCache
participant SF as singleflight.Group
participant SystemStore
Client->>Driver: OpenLedger(ctx, name)
Driver->>Cache: getCachedLedger(name)
alt cache hit (not expired)
Cache-->>Driver: cached ledger
Driver-->>Client: return ledger
else cache miss
Driver->>SF: Do(name, loader)
SF->>SystemStore: GetLedger(name) [loader ctx ~30s]
SystemStore-->>SF: ledger + metadata
SF->>Driver: loader result
Driver->>Cache: setCachedLedgerGen(name) (if gen unchanged)
Driver-->>Client: return ledger (respect caller ctx)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
internal/storage/driver/driver_test.go (1)
34-36: ⚡ Quick winKeep the counting hook transparent to the caller context.
Returning
context.Background()here drops cancellation, deadlines, and any values carried bylogging.TestingContext(). This helper should return the incoming context unchanged so it only counts queries and does not alter behavior under test.♻️ Suggested fix
-func (h *queryCounter) BeforeQuery(_ context.Context, _ *bun.QueryEvent) context.Context { - return context.Background() +func (h *queryCounter) BeforeQuery(ctx context.Context, _ *bun.QueryEvent) context.Context { + return ctx }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/driver/driver_test.go` around lines 34 - 36, The BeforeQuery hook in queryCounter currently returns context.Background(), which discards cancellation, deadlines, and any values (e.g., logging.TestingContext()); change queryCounter.BeforeQuery to return the incoming context parameter unchanged so the hook remains transparent to the caller context and only performs counting.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/storage/driver/driver_test.go`:
- Around line 334-338: The test's assertion on DB query coalescing is too loose;
replace the current require.Less on counter.n.Load() (used after the concurrent
OpenLedger calls) with a tight fixed upper bound reflecting the singleflight
contract (e.g., assert counter.n.Load() <= 2 to allow only the single cold-miss
load plus an optional count), so update the assertion that references
counter.n.Load(), goroutines and the concurrent OpenLedger calls to require a
small fixed maximum (2) instead of `< goroutines`.
In `@internal/storage/driver/driver.go`:
- Around line 107-126: getCachedLedger and setCachedLedger need to remove
expired entries and treat non‑positive TTL as "disabled": change setCachedLedger
(check d.cacheTTL <= 0) to return early so you never store entries when TTL is
zero or negative, and change getCachedLedger to delete expired entries from
d.ledgerCache instead of just returning a miss — when you detect entry.expiresAt
is before time.Now(), drop the map key under a write lock (e.g. RUnlock(),
Lock(), check again, delete, Unlock()) and then return (ledger.Ledger{}, false);
keep using the cachedLedger type and the ledgerCache map and preserve proper
locking around reads/writes.
- Around line 260-267: The current implementation in Driver.UpdateLedgerMetadata
and Driver.DeleteLedgerMetadata calls d.evictCachedLedger(name) before
performing the DB mutation, which can race and leave the cache stale; change
both methods to call the store update/delete first via
d.systemStoreFactory.Create(d.db).UpdateLedgerMetadata(...) and
.DeleteLedgerMetadata(...), check for a nil error, and only then call
d.evictCachedLedger(name) so eviction happens after a successful write (do not
evict if the write returned an error).
- Around line 149-174: The singleflight call currently uses d.group.Do(name,
func...) which binds the shared DB work to the caller's ctx and can cancel all
waiters; change this to use d.group.DoChan(name) and run the DB lookup inside
the singleflight winner with an internal context (e.g., ctxDB :=
context.WithTimeout(context.Background(), <reasonableTimeout>)) when calling
systemStore.GetLedger and systemStore.CountLedgersInBucket so the shared work
isn't tied to a single request's ctx; concurrently, have each waiter select on
their own ctx.Done() vs the DoChan result so they can return early on
cancellation, and still call d.setCachedLedger, create the store via
d.ledgerStoreFactory.Create/d.bucketFactory.Create and return openLedgerResult
as before when the winner completes. Ensure you keep existing behavior like
calling store.SetAloneInBucket(count == 1) and caching via
d.setCachedLedger(*ret).
---
Nitpick comments:
In `@internal/storage/driver/driver_test.go`:
- Around line 34-36: The BeforeQuery hook in queryCounter currently returns
context.Background(), which discards cancellation, deadlines, and any values
(e.g., logging.TestingContext()); change queryCounter.BeforeQuery to return the
incoming context parameter unchanged so the hook remains transparent to the
caller context and only performs counting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: aa9a83db-e252-423a-a2e6-e426c2900227
📒 Files selected for processing (3)
internal/storage/driver/driver.gointernal/storage/driver/driver_cache_test.gointernal/storage/driver/driver_test.go
- getCachedLedger: delete expired entries from the map under a write lock (double-checked) instead of returning a silent miss; prevents unbounded growth of stale entries - setCachedLedger: guard on cacheTTL <= 0 (not just == 0) to treat any non-positive TTL as disabled - UpdateLedgerMetadata / DeleteLedgerMetadata: evict after a successful DB write, not before — evicting first lets a concurrent OpenLedger re-populate the cache with the old value before the mutation commits - TestOpenLedgerConcurrentColdCacheCoalesces: tighten assertion to counter.n.Load() <= 2 (GetLedger + CountLedgersInBucket) instead of the loose < goroutines bound that would pass even with broken singleflight
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal/storage/driver/driver.go (1)
168-180:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftAvoid re-caching a snapshot loaded before a metadata write committed.
A cold
OpenLedgercan read the old row, thenUpdateLedgerMetadata/DeleteLedgerMetadatacan succeed and evict the key, and finally this path still executesd.setCachedLedger(*ret). That restores stale metadata for the full TTL. You need a per-ledger invalidation/version check, or equivalent coordination, before writing back intoledgerCache.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/driver/driver.go` around lines 168 - 180, The code is re-caching a potentially-stale ledger snapshot after a concurrent metadata write/delete; before calling d.setCachedLedger(*ret) in OpenLedger (after systemStore.GetLedger/CountLedgersInBucket), check a per-ledger version/invalidation token in ledgerCache (or add one) and only write back if the cached entry is absent or its version equals the version returned by ret; alternatively implement an atomic compare-and-set on the ledgerCache keyed by ledger name (use a version/updatedAt/etag field on the Ledger struct returned by systemStore.GetLedger) so concurrent UpdateLedgerMetadata/DeleteLedgerMetadata that evict/increment version prevents overwriting with the old snapshot.internal/storage/driver/driver_test.go (1)
372-372:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winRemove the extra closing brace at line 372.
internal/storage/driver/driver_test.gohas two consecutive}right after the finalrequire.Equal(...); the extra one leaves invalid Go syntax and will stop this test package from compiling.🛠️ Proposed fix
-}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/driver/driver_test.go` at line 372, There is an extra closing brace after the final require.Equal(...) in internal/storage/driver/driver_test.go which breaks Go syntax; remove the redundant `}` so that the test function and enclosing block (the function containing the require.Equal call) have balanced braces and the file compiles, verifying the only remaining `}` closes the function or test scope referenced by the require.Equal assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@internal/storage/driver/driver_test.go`:
- Line 372: There is an extra closing brace after the final require.Equal(...)
in internal/storage/driver/driver_test.go which breaks Go syntax; remove the
redundant `}` so that the test function and enclosing block (the function
containing the require.Equal call) have balanced braces and the file compiles,
verifying the only remaining `}` closes the function or test scope referenced by
the require.Equal assertion.
In `@internal/storage/driver/driver.go`:
- Around line 168-180: The code is re-caching a potentially-stale ledger
snapshot after a concurrent metadata write/delete; before calling
d.setCachedLedger(*ret) in OpenLedger (after
systemStore.GetLedger/CountLedgersInBucket), check a per-ledger
version/invalidation token in ledgerCache (or add one) and only write back if
the cached entry is absent or its version equals the version returned by ret;
alternatively implement an atomic compare-and-set on the ledgerCache keyed by
ledger name (use a version/updatedAt/etag field on the Ledger struct returned by
systemStore.GetLedger) so concurrent UpdateLedgerMetadata/DeleteLedgerMetadata
that evict/increment version prevents overwriting with the old snapshot.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f46c0d36-1aa5-4853-bf31-f163c38e2805
📒 Files selected for processing (2)
internal/storage/driver/driver.gointernal/storage/driver/driver_test.go
…Chan Switch OpenLedger from group.Do to group.DoChan so the shared DB queries run under an independent context (30s timeout). A single request cancellation can no longer abort the in-flight lookup for all other waiters sharing the same singleflight key.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal/storage/driver/driver.go (2)
164-186:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftPrevent stale misses from repopulating the cache after a metadata write.
A miss can read the old ledger, then
UpdateLedgerMetadataorDeleteLedgerMetadatasucceeds and evicts the key, and then Line 186 writes the stale row back intoledgerCache. That makes the post-write eviction ineffective under concurrency and can serve old metadata for the full TTL. You need a write barrier here — e.g. a per-ledger invalidation generation/version captured before the load and checked again beforesetCachedLedger.Also applies to: 283-295
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/driver/driver.go` around lines 164 - 186, The open-path read can race with metadata writes so a stale read may be written back into the ledger cache after a concurrent UpdateLedgerMetadata/DeleteLedgerMetadata evicted it; to fix, introduce a per-ledger invalidation generation/version (or reuse an existing ledgerCache generation) and capture its value before the DB load inside d.group.DoChan, then after retrieving ret and before calling d.setCachedLedger(*ret) compare the current generation for that ledger name to the captured one and skip setCachedLedger if they differ; apply the same fix at the other occurrence referenced (the block around the 283-295 region) and update getCachedLedger/setCachedLedger logic to maintain and check the generation on evictions/updates.
164-202:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReturn fresh objects to each coalesced caller.
Line 164 builds a single
*openLedgerResult, and Line 201 hands that same*ledgerstore.Store/*ledger.Ledgerto every waiter on a cold miss. Warm-cache hits return fresh instances, so this introduces cross-request aliasing only on misses. Please have the singleflight closure return immutable data and construct the store per waiter after the receive.Proposed fix
type openLedgerResult struct { - store *ledgerstore.Store - ledger *ledger.Ledger + ledger ledger.Ledger + aloneInBucket bool } @@ ch := d.group.DoChan(name, func() (any, error) { @@ if l, ok := d.getCachedLedger(name); ok { - store := d.ledgerStoreFactory.Create(d.bucketFactory.Create(l.Bucket), l) - return &openLedgerResult{store: store, ledger: &l}, nil + return openLedgerResult{ledger: l}, nil } @@ d.setCachedLedger(*ret) - - store := d.ledgerStoreFactory.Create(d.bucketFactory.Create(ret.Bucket), *ret) - store.SetAloneInBucket(count == 1) - - return &openLedgerResult{store: store, ledger: ret}, nil + return openLedgerResult{ + ledger: *ret, + aloneInBucket: count == 1, + }, nil }) @@ case result := <-ch: if result.Err != nil { return nil, nil, result.Err } - res := result.Val.(*openLedgerResult) - return res.store, res.ledger, nil + res := result.Val.(openLedgerResult) + store := d.ledgerStoreFactory.Create(d.bucketFactory.Create(res.ledger.Bucket), res.ledger) + store.SetAloneInBucket(res.aloneInBucket) + l := res.ledger + return store, &l, nil } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/driver/driver.go` around lines 164 - 202, The singleflight closure passed to d.group.DoChan currently returns a shared *openLedgerResult containing a store and ledger pointer which is then handed to every waiter; change the closure to return only immutable ledger metadata (e.g. the Ledger value or a small struct with Bucket and other immutable fields) and any derived flags (like count==1) but not a ledgerstore.Store or pointer to ledger; after receiving the result in the select branch (where result := <-ch and res := result.Val.(*openLedgerResult)), create a fresh store for that specific waiter using d.ledgerStoreFactory.Create(d.bucketFactory.Create(...)) and a fresh ledger value (copy the returned metadata or call d.getCachedLedger as needed), call store.SetAloneInBucket(...) with the derived flag, and then return that freshly constructed store and ledger so each caller gets its own instances; keep setCachedLedger as-is for caching the metadata only.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@internal/storage/driver/driver.go`:
- Around line 164-186: The open-path read can race with metadata writes so a
stale read may be written back into the ledger cache after a concurrent
UpdateLedgerMetadata/DeleteLedgerMetadata evicted it; to fix, introduce a
per-ledger invalidation generation/version (or reuse an existing ledgerCache
generation) and capture its value before the DB load inside d.group.DoChan, then
after retrieving ret and before calling d.setCachedLedger(*ret) compare the
current generation for that ledger name to the captured one and skip
setCachedLedger if they differ; apply the same fix at the other occurrence
referenced (the block around the 283-295 region) and update
getCachedLedger/setCachedLedger logic to maintain and check the generation on
evictions/updates.
- Around line 164-202: The singleflight closure passed to d.group.DoChan
currently returns a shared *openLedgerResult containing a store and ledger
pointer which is then handed to every waiter; change the closure to return only
immutable ledger metadata (e.g. the Ledger value or a small struct with Bucket
and other immutable fields) and any derived flags (like count==1) but not a
ledgerstore.Store or pointer to ledger; after receiving the result in the select
branch (where result := <-ch and res := result.Val.(*openLedgerResult)), create
a fresh store for that specific waiter using
d.ledgerStoreFactory.Create(d.bucketFactory.Create(...)) and a fresh ledger
value (copy the returned metadata or call d.getCachedLedger as needed), call
store.SetAloneInBucket(...) with the derived flag, and then return that freshly
constructed store and ledger so each caller gets its own instances; keep
setCachedLedger as-is for caching the metadata only.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cfa1fe2c-21d8-4fee-bc59-cbbf23eb0e33
📒 Files selected for processing (1)
internal/storage/driver/driver.go
Track a monotonic generation counter per ledger name. evictCachedLedger bumps it; OpenLedger snapshots it before the DB queries and skips the write-back if the generation advanced (i.e. a metadata mutation evicted the entry while the query was in-flight), preventing stale snapshots from re-populating the cache for up to the TTL period.
The singleflight closure previously embedded a *ledgerstore.Store in its return value, which was then shared with every waiter. Change the result type to carry only the immutable ledger.Ledger value (plus an optional aloneInBucket flag from the count query). Each waiter now constructs its own fresh Store after receiving the result, eliminating the shared pointer and making the intent explicit for future Store mutations.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
internal/storage/driver/driver_cache_test.go (2)
121-133: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftMissing test coverage for generation counter behavior.
The commit message describes a monotonic generation counter that prevents stale cache writes: evictions bump the generation, and singleflight write-backs skip populating the cache if the generation has advanced. This is a critical correctness mechanism to prevent races where a slow DB load overwrites a fresher eviction. However, there are no unit tests verifying:
- That eviction increments the generation counter for that ledger
- That a cache write is rejected when the generation has advanced
While
TestCacheUpdateEntryverifies overwriting an existing entry, it doesn't test the stale-write prevention logic.🧪 Suggested test outline
func TestCacheGenerationPreventsStaleWrites(t *testing.T) { d := newTestDriver(time.Minute) l := ledger.MustNewWithDefault("test-ledger") // Simulate: set entry (gen=0), evict (gen=1), attempt stale set d.setCachedLedger(l) gen0 := d.cacheGens[l.Name] // snapshot generation d.evictCachedLedger(l.Name) gen1 := d.cacheGens[l.Name] require.Greater(t, gen1, gen0, "eviction should bump generation") // Test that a write with old generation is rejected // (may require exposing generation-aware set method or // testing through OpenLedger integration test instead) }Note: If
cacheGensis not exported, consider testing the observable behavior through integration tests where concurrent OpenLedger calls race with evictions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/driver/driver_cache_test.go` around lines 121 - 133, Add a unit test that asserts eviction increments the per-ledger generation and that a write with an older generation is rejected: create TestCacheGenerationPreventsStaleWrites which uses newTestDriver/new ledger, call setCachedLedger(l), snapshot the generation for l.Name from the driver (cacheGens), call evictCachedLedger(l.Name) and require the generation increased, then attempt a stale write and assert it does not populate the cache (using getCachedLedger to confirm absence). If cacheGens or a generation-aware set is not accessible, add a small test-only accessor or expose a generation-aware set method used by the singleflight writeback path so the test can simulate an older-generation write and verify it’s ignored.
35-39:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFlaky test: nanosecond-level timing is unreliable.
The test uses a 1-nanosecond TTL and sleeps for 2 nanoseconds. OS scheduling granularity, GC pauses, and timing precision can make this test flaky in CI environments, even with the 2× margin.
⏱️ Proposed fix using millisecond-scale timing
func TestCacheMissAfterTTL(t *testing.T) { - d := newTestDriver(time.Nanosecond) + d := newTestDriver(10 * time.Millisecond) l := ledger.MustNewWithDefault("test-ledger") d.setCachedLedger(l) - time.Sleep(2 * time.Nanosecond) + time.Sleep(20 * time.Millisecond) _, ok := d.getCachedLedger(l.Name) require.False(t, ok)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/driver/driver_cache_test.go` around lines 35 - 39, The test is flaky due to nanosecond TTL/sleep; change the test to use millisecond-scale timing by calling newTestDriver with time.Millisecond (or another ms-level TTL) instead of time.Nanosecond and replace the Sleep(2 * time.Nanosecond) with Sleep(2 * time.Millisecond) (or an equivalent >TTL margin) so that setCachedLedger expiry behavior is tested reliably; update newTestDriver and the Sleep call in the test that references setCachedLedger and d to use these millisecond values.internal/storage/driver/driver.go (1)
166-170:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReturn fresh
StoreandLedgerobjects to each waiter.
singleflight.DoChanfan-outs the sameValto every waiter, so this cold-miss path now gives concurrent callers one shared*ledgerstore.Storeand*ledger.Ledger. Cache hits still return fresh objects per call, so coalesced misses now have different ownership semantics and can race if either value is mutated downstream. Have the singleflight closure return immutable load data only, then build the store and take the ledger address after each caller receives the result.Suggested refactor
- if l, ok := d.getCachedLedger(name); ok { - store := d.ledgerStoreFactory.Create(d.bucketFactory.Create(l.Bucket), l) - return &openLedgerResult{store: store, ledger: &l}, nil - } + if l, ok := d.getCachedLedger(name); ok { + return &openLedgerResult{ledger: l}, nil + } @@ - store := d.ledgerStoreFactory.Create(d.bucketFactory.Create(ret.Bucket), *ret) - store.SetAloneInBucket(count == 1) - - return &openLedgerResult{store: store, ledger: ret}, nil + aloneInBucket := count == 1 + return &openLedgerResult{ + ledger: *ret, + aloneInBucket: &aloneInBucket, + }, nil }) @@ - res := result.Val.(*openLedgerResult) - return res.store, res.ledger, nil + res := result.Val.(*openLedgerResult) + store := d.ledgerStoreFactory.Create(d.bucketFactory.Create(res.ledger.Bucket), res.ledger) + if res.aloneInBucket != nil { + store.SetAloneInBucket(*res.aloneInBucket) + } + l := res.ledger + return store, &l, nil } }Also applies to: 207-221
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/driver/driver.go` around lines 166 - 170, The singleflight closure used in d.group.DoChan returns shared *ledgerstore.Store and *ledger.Ledger (via openLedgerResult), causing coalesced waiters to receive the same mutable objects; instead have the closure return only immutable load data (e.g., a ledger value copy or identifier) and an error, and then inside each waiter build its own store and take its own ledger address by calling ledgerStoreFactory.Create and bucketFactory.Create after receiving the DoChan result; update the closure around d.group.DoChan, the getCachedLedger path, and the similar coalescing block later (the other DoChan block) so openLedgerResult holds only immutable data and each caller constructs its own Store and *Ledger locally.
🧹 Nitpick comments (1)
internal/storage/driver/driver_cache_test.go (1)
73-94: ⚡ Quick winConsider adding final state assertions.
The test validates thread-safety (no panics/deadlocks), which is valuable. However, it could also verify correctness properties such as checking the final cache state or that concurrent operations don't corrupt data.
📊 Example: add final state check
} wg.Wait() + + // Verify cache state is consistent (not corrupted) + if cached, ok := d.getCachedLedger(l.Name); ok { + require.Equal(t, l.Name, cached.Name, "cached ledger should have correct name") + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/driver/driver_cache_test.go` around lines 73 - 94, The test TestCacheConcurrentAccess currently only ensures no panics; update it to assert final cache correctness by calling d.getCachedLedger(l.Name) after wg.Wait() and asserting the returned value is either nil (if evict should win) or a valid ledger equal to ledger.MustNewWithDefault("test-ledger"), and additionally verify any internal flag/state on the driver (via newTestDriver helpers) that indicates cache emptiness or size; use the existing methods setCachedLedger, getCachedLedger, and evictCachedLedger to perform the check and fail the test if the cache is corrupted or inconsistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@internal/storage/driver/driver_cache_test.go`:
- Around line 121-133: Add a unit test that asserts eviction increments the
per-ledger generation and that a write with an older generation is rejected:
create TestCacheGenerationPreventsStaleWrites which uses newTestDriver/new
ledger, call setCachedLedger(l), snapshot the generation for l.Name from the
driver (cacheGens), call evictCachedLedger(l.Name) and require the generation
increased, then attempt a stale write and assert it does not populate the cache
(using getCachedLedger to confirm absence). If cacheGens or a generation-aware
set is not accessible, add a small test-only accessor or expose a
generation-aware set method used by the singleflight writeback path so the test
can simulate an older-generation write and verify it’s ignored.
- Around line 35-39: The test is flaky due to nanosecond TTL/sleep; change the
test to use millisecond-scale timing by calling newTestDriver with
time.Millisecond (or another ms-level TTL) instead of time.Nanosecond and
replace the Sleep(2 * time.Nanosecond) with Sleep(2 * time.Millisecond) (or an
equivalent >TTL margin) so that setCachedLedger expiry behavior is tested
reliably; update newTestDriver and the Sleep call in the test that references
setCachedLedger and d to use these millisecond values.
In `@internal/storage/driver/driver.go`:
- Around line 166-170: The singleflight closure used in d.group.DoChan returns
shared *ledgerstore.Store and *ledger.Ledger (via openLedgerResult), causing
coalesced waiters to receive the same mutable objects; instead have the closure
return only immutable load data (e.g., a ledger value copy or identifier) and an
error, and then inside each waiter build its own store and take its own ledger
address by calling ledgerStoreFactory.Create and bucketFactory.Create after
receiving the DoChan result; update the closure around d.group.DoChan, the
getCachedLedger path, and the similar coalescing block later (the other DoChan
block) so openLedgerResult holds only immutable data and each caller constructs
its own Store and *Ledger locally.
---
Nitpick comments:
In `@internal/storage/driver/driver_cache_test.go`:
- Around line 73-94: The test TestCacheConcurrentAccess currently only ensures
no panics; update it to assert final cache correctness by calling
d.getCachedLedger(l.Name) after wg.Wait() and asserting the returned value is
either nil (if evict should win) or a valid ledger equal to
ledger.MustNewWithDefault("test-ledger"), and additionally verify any internal
flag/state on the driver (via newTestDriver helpers) that indicates cache
emptiness or size; use the existing methods setCachedLedger, getCachedLedger,
and evictCachedLedger to perform the check and fail the test if the cache is
corrupted or inconsistent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 31fd6c44-f96c-4ef4-8850-b997aa7e9fa4
📒 Files selected for processing (2)
internal/storage/driver/driver.gointernal/storage/driver/driver_cache_test.go
- Extract inline singleflight write-back into setCachedLedgerGen so the generation check path can be exercised directly in unit tests - Add TestCacheGenerationPreventsStaleWrites: sets cache, snapshots generation, evicts, verifies generation bumped, then confirms a setCachedLedgerGen call with the old generation is silently rejected - Fix TestCacheMissAfterTTL: nanosecond TTL/sleep was unreliable on loaded CI; switch to millisecond scale - Extend TestCacheConcurrentAccess: assert the cache holds a valid ledger or nothing after concurrent set/get/evict (was panic-only before)
|
This PR adds an in-memory cache around The performance intent makes sense, but this introduces a new invalidation contract: every mutation of There is also an operational impact in multi-instance deployments: this cache is local to each process. If one Ledger instance updates metadata, deletes/restores a bucket, or otherwise changes |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/storage/driver/driver.go (1)
160-165:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBreak the in-flight singleflight entry when you evict.
setCachedLedgerGenonly prevents writing stale data back intoledgerCache(cacheGens[name]check), butOpenLedgerstill returns the DB snapshot (ret) to the caller even if eviction advanced the generation mid-flight. BecauseUpdateLedgerMetadata/DeleteLedgerMetadatacallevictCachedLedgerwithout also clearing thesingleflightin-flight entry, a laterOpenLedgerfor the samenamecan join the pre-evictionDoChanand observe old ledger/metadata until the flight completes.Suggested fix
func (d *Driver) evictCachedLedger(name string) { - d.mu.Lock() - defer d.mu.Unlock() + d.mu.Lock() delete(d.ledgerCache, name) d.cacheGens[name]++ + d.mu.Unlock() + + d.group.Forget(name) }Applies to
internal/storage/driver/driver.goaround160-165and theOpenLedgercache-miss/singleflight path around184-216.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/driver/driver.go` around lines 160 - 165, When evicting a ledger you must also break any in-flight singleflight entry so concurrent OpenLedger callers can't join a pre-eviction flight and receive stale snapshots; update evictCachedLedger to not only delete d.ledgerCache[name] and bump d.cacheGens[name] but also call the singleflight "forget" for that key (e.g. d.sf.Forget(name) or d.singleflight.Forget(name) depending on the group's field name) so the OpenLedger cache-miss/singleflight path (around OpenLedger and setCachedLedgerGen) will start a fresh flight after eviction; ensure the call is no-op if the group field is nil.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@internal/storage/driver/driver.go`:
- Around line 160-165: When evicting a ledger you must also break any in-flight
singleflight entry so concurrent OpenLedger callers can't join a pre-eviction
flight and receive stale snapshots; update evictCachedLedger to not only delete
d.ledgerCache[name] and bump d.cacheGens[name] but also call the singleflight
"forget" for that key (e.g. d.sf.Forget(name) or d.singleflight.Forget(name)
depending on the group's field name) so the OpenLedger cache-miss/singleflight
path (around OpenLedger and setCachedLedgerGen) will start a fresh flight after
eviction; ensure the call is no-op if the group field is nil.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 24d01ba2-be59-4314-a137-5d471fb1e4e7
📒 Files selected for processing (3)
internal/storage/driver/driver.gointernal/storage/driver/driver_cache_test.gointernal/storage/driver/driver_test.go
💤 Files with no reviewable changes (1)
- internal/storage/driver/driver_test.go
…tations UpdateLedgerMetadata and DeleteLedgerMetadata in the system controller were calling ctrl.driver.GetSystemStore().xxx() directly, bypassing the Driver methods that call evictCachedLedger — making the eviction code effectively dead in the production path. DeleteBucket and RestoreBucket had no eviction path at all: they soft- delete/restore every ledger in a bucket via a single SQL UPDATE, but nothing touched the per-ledger cache entries. Fixes: - Add evictCachedLedgersByBucket to Driver: scans ledgerCache under lock and bumps cacheGens for every entry whose bucket matches, ensuring any in-flight singleflight write-backs for those ledgers are also rejected. - Add DeleteBucket and RestoreBucket to driver.Driver, each delegating to the system store and then calling evictCachedLedgersByBucket. - Add UpdateLedgerMetadata, DeleteLedgerMetadata, DeleteBucket, and RestoreBucket to the controller Driver interface so the system controller can call them directly instead of going through GetSystemStore. - Route all four mutation methods in DefaultController through ctrl.driver.xxx so eviction is always triggered.
After evicting a cache entry, call d.group.Forget(name) so any callers that arrive after the eviction start a fresh singleflight instead of joining a pre-eviction in-flight query that will return stale data. Forget is called after releasing d.mu in both evictCachedLedger and evictCachedLedgersByBucket to avoid a lock-ordering dependency with the singleflight closure, which acquires d.mu.RLock while executing.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1355 +/- ##
==========================================
+ Coverage 80.36% 81.15% +0.78%
==========================================
Files 205 205
Lines 11104 11195 +91
==========================================
+ Hits 8924 9085 +161
+ Misses 1585 1570 -15
+ Partials 595 540 -55 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/controller/system/store.go (1)
14-30: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftDon't keep cache-bypassing mutators on
GetSystemStore().
Drivernow has cache-aware mutation methods, butGetSystemStore()still hands out aStorewithUpdateLedgerMetadata,DeleteLedgerMetadata,DeleteBucket, andRestoreBucket. Any caller that uses that older path can still skip eviction and reintroduce stale reads, so this boundary should be narrowed to a read-only store surface.♻️ Suggested API direction
type Store interface { GetLedger(ctx context.Context, name string) (*ledger.Ledger, error) Ledgers() common.PaginatedResource[ledger.Ledger, system.ListLedgersQueryPayload] - UpdateLedgerMetadata(ctx context.Context, name string, m metadata.Metadata) error - DeleteLedgerMetadata(ctx context.Context, param string, key string) error - DeleteBucket(ctx context.Context, bucket string) error - RestoreBucket(ctx context.Context, bucket string) error }Callers that need mutations should go through the cache-aware
Drivermethods instead.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/controller/system/store.go` around lines 14 - 30, The Store interface currently exposes mutators (UpdateLedgerMetadata, DeleteLedgerMetadata, DeleteBucket, RestoreBucket) which allows callers using GetSystemStore() to bypass cache-aware logic on Driver; remove these mutation methods from the Store interface so Store becomes read-only (keep GetLedger and Ledgers only), update GetSystemStore() to return this read-only Store type, and ensure all mutation callers use the cache-aware Driver methods (CreateLedger, UpdateLedgerMetadata, DeleteLedgerMetadata, DeleteBucket, RestoreBucket) instead; also update any concrete implementations to implement the reduced Store interface and adjust call sites to call Driver mutation methods where needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/storage/driver/driver.go`:
- Around line 331-345: evictCachedLedgersByBucket only bumps generations for
entries already in ledgerCache, allowing concurrent cold OpenLedger to
repopulate a stale snapshot; fix by introducing a bucket-level generation
counter (e.g., cacheBucketGens map[string]uint64) guarded by d.mu, increment
cacheBucketGens[bucket] inside evictCachedLedgersByBucket (in addition to
deleting entries and calling d.group.Forget), and update setCachedLedgerGen and
the OpenLedger hot/cold-paths to consult the bucket generation (compare
ledger-specific gen + bucket gen) and refuse to install a cache entry if the
bucket gen changed. Ensure all accesses to cacheBucketGens, ledgerCache and
cacheGens use d.mu to avoid races.
---
Outside diff comments:
In `@internal/controller/system/store.go`:
- Around line 14-30: The Store interface currently exposes mutators
(UpdateLedgerMetadata, DeleteLedgerMetadata, DeleteBucket, RestoreBucket) which
allows callers using GetSystemStore() to bypass cache-aware logic on Driver;
remove these mutation methods from the Store interface so Store becomes
read-only (keep GetLedger and Ledgers only), update GetSystemStore() to return
this read-only Store type, and ensure all mutation callers use the cache-aware
Driver methods (CreateLedger, UpdateLedgerMetadata, DeleteLedgerMetadata,
DeleteBucket, RestoreBucket) instead; also update any concrete implementations
to implement the reduced Store interface and adjust call sites to call Driver
mutation methods where needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d83ddd2a-d7e9-447d-80a6-59e661341946
📒 Files selected for processing (5)
internal/controller/system/controller.gointernal/controller/system/store.gointernal/storage/driver/driver.gointernal/storage/driver/driver_cache_test.gointernal/storage/driver/driver_test.go
💤 Files with no reviewable changes (1)
- internal/storage/driver/driver_test.go
Unit tests (no DB): - TestSetCachedLedgerGenValidGeneration: happy path — matching gen writes entry - TestSetCachedLedgerGenDisabledWhenTTLZero: TTL=0 skips write - TestEvictCachedLedgersByBucketEvictsMatchingEntries: all ledgers in bucket removed - TestEvictCachedLedgersByBucketPreservesOtherBuckets: other buckets unaffected - TestEvictCachedLedgersByBucketBumpsGenerations: per-ledger generation incremented - TestEvictCachedLedgersByBucketGenerationPreventsStaleWrites: stale gen rejected - TestEvictCachedLedgersByBucketEmptyBucketIsNoOp: no match is a safe no-op Integration tests (//go:build it): - TestDeleteBucketEvictsCache: after DeleteBucket, OpenLedger re-queries DB and gets not-found (proving cache was evicted, not a stale cache hit) - TestRestoreBucketEvictsCache: after a cross-driver delete+restore, the local cache is evicted and OpenLedger returns fresh data Coverage for new cache methods: getCachedLedger, setCachedLedger, setCachedLedgerGen, evictCachedLedger, evictCachedLedgersByBucket all 100%.
Remove UpdateLedgerMetadata, DeleteLedgerMetadata, DeleteBucket, and RestoreBucket from the Store interface returned by GetSystemStore(). These mutations now live exclusively on the cache-aware Driver methods, so exposing them on Store created a path to bypass cache eviction. The only GetSystemStore() call sites in the controller already use only GetLedger and Ledgers — no callers are affected.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
internal/storage/driver/driver.go (1)
331-345:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftBucket eviction still misses in-flight cold loads.
evictCachedLedgersByBucketonly bumpscacheGensfor names already present inledgerCache. A concurrent coldOpenLedgerfor another ledger in this bucket snapshots an unchanged generation at Line 196 and can still passsetCachedLedgerGenat Line 215 afterDeleteBucketorRestoreBucket, repopulating pre-mutation state until TTL expiry. This still needs bucket-scoped invalidation, or another way to advance generations for all ledgers in the bucket, not just cached ones.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/storage/driver/driver_test.go`:
- Around line 413-445: The test currently can pass via a stale in-memory cache;
fix by asserting the post-restore OpenLedger actually hits the DB: create a
small test spy around ledgerstore.Factory (or wrap ledgerstore.NewFactory(db))
that increments a counter whenever the ledger load path performs a DB read,
construct d using that spy factory, warm the cache, perform DeleteBucket with
the other driver, call d.RestoreBucket(ctx, bucketName) and then call
d.OpenLedger(ctx, l.Name) and assert the spy's DB-read counter increased (i.e.
>0) to prove eviction triggered a re-read from the DB; reference symbols:
TestRestoreBucketEvictsCache, d.OpenLedger, driver.RestoreBucket,
ledgerstore.NewFactory.
In `@internal/storage/driver/driver.go`:
- Around line 160-167: The eviction helpers (e.g., evictCachedLedger) are
incrementing d.cacheGens even when caching is disabled (d.cacheTTL <= 0); add an
early-return to skip generation bookkeeping when cache is off: inside
evictCachedLedger (and the similar helper around lines 331-345), acquire the
mutex, perform any necessary delete from d.ledgerCache/d.bucketCache, then if
d.cacheTTL <= 0 release the mutex and call d.group.Forget(name) (or release and
return if group.Forget should be skipped) but do not touch d.cacheGens; in
short, guard updates to d.cacheGens with a d.cacheTTL > 0 check so names are not
accumulated when caching is disabled.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1a436e4d-f3ca-4f7d-9a0d-be852adcc8b7
📒 Files selected for processing (5)
internal/controller/system/controller.gointernal/controller/system/store.gointernal/storage/driver/driver.gointernal/storage/driver/driver_cache_test.gointernal/storage/driver/driver_test.go
Add driver_errors_test.go with a stub systemstore.Store so error-return branches in UpdateLedgerMetadata, DeleteLedgerMetadata, DeleteBucket, RestoreBucket, and OpenLedger (GetLedger / CountLedgersInBucket failures, ctx.Done cancellation, cold/warm cache paths) are exercised without a DB. Raises non-integration unit coverage from 17.6% to ~40%. Guard cacheGens increments in evictCachedLedger and evictCachedLedgersByBucket with cacheTTL > 0 so the map does not accumulate entries when caching is disabled. Strengthen TestRestoreBucketEvictsCache to use a query-counting bun.DB and assert counter > before after the post-restore OpenLedger, proving the cache was evicted and a real DB read occurred rather than a stale cache hit returning the same ledger name.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
internal/storage/driver/driver_cache_test.go (1)
34-43: ⚡ Quick winAvoid fixed millisecond sleep for TTL expiry checks.
Line 39 uses a very tight sleep window (
2msfor a1msTTL), which is prone to CI flakiness due to scheduler jitter. Prefer polling with a bounded timeout.Proposed change
func TestCacheMissAfterTTL(t *testing.T) { d := newTestDriver(time.Millisecond) l := ledger.MustNewWithDefault("test-ledger") d.setCachedLedger(l) - time.Sleep(2 * time.Millisecond) - - _, ok := d.getCachedLedger(l.Name) - require.False(t, ok) + require.Eventually(t, func() bool { + _, ok := d.getCachedLedger(l.Name) + return !ok + }, 200*time.Millisecond, 5*time.Millisecond) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/driver/driver_cache_test.go` around lines 34 - 43, TestCacheMissAfterTTL uses a fixed Sleep(2ms) against a 1ms TTL which can flake; replace the hard sleep with a polling loop that repeatedly calls d.getCachedLedger(l.Name) until it returns missing or a bounded timeout elapses. Update the TestCacheMissAfterTTL test to create d via newTestDriver(...), set the cache with d.setCachedLedger(l), then poll (e.g., loop with short sleeps) checking d.getCachedLedger(l.Name) and fail if the ledger is still present after the timeout to make the test robust against scheduler jitter.internal/storage/driver/driver_errors_test.go (1)
173-201: ⚡ Quick winWarm-cache behavior is not actually asserted.
The second call currently validates only success/output. It doesn’t verify that
GetLedgerwas skipped on warm hit, so cache regressions can still pass this test.Proposed change
func TestOpenLedgerColdCacheSuccess(t *testing.T) { ctrl := gomock.NewController(t) l := ledger.MustNewWithDefault("test-ledger") + getLedgerCalls := 0 mockBucket := NewMockBucket(ctrl) bucketFact := NewBucketFactory(ctrl) bucketFact.EXPECT().Create(l.Bucket).Return(mockBucket).AnyTimes() ledgerFact := NewLedgerStoreFactory(ctrl) ledgerFact.EXPECT().Create(gomock.Any(), gomock.Any()).Return(&ledgerstore.Store{}).AnyTimes() stub := &stubSystemStore{ - getLedgerFn: func() (*ledger.Ledger, error) { return &l, nil }, + getLedgerFn: func() (*ledger.Ledger, error) { + getLedgerCalls++ + return &l, nil + }, countLedgersResult: 1, } @@ _, got2, err := d.OpenLedger(context.Background(), l.Name) require.NoError(t, err) require.Equal(t, l.Name, got2.Name) + require.Equal(t, 1, getLedgerCalls, "warm cache hit should not call GetLedger again") }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/storage/driver/driver_errors_test.go` around lines 173 - 201, The test TestOpenLedgerColdCacheSuccess currently only checks outputs and not that the warm-cache path skipped backing-store lookups; modify the test to assert GetLedger was not called on the second OpenLedger invocation by instrumenting the stubSystemStore/getLedgerFn (or replacing it before the second call) so any invocation fails the test (e.g., set getLedgerFn = func() (*ledger.Ledger, error) { t.Fatalf("GetLedger called on warm cache"); return nil, nil }) or by recording a call counter on stubSystemStore and asserting it did not increase after the warm call; keep all other assertions (OpenLedger success and ledger.Name checks) intact and refer to TestOpenLedgerColdCacheSuccess, d.OpenLedger and stubSystemStore.getLedgerFn when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/storage/driver/driver_cache_test.go`:
- Around line 131-145: The test TestCacheEvictionDoesNotBumpGenWhenDisabled
should assert the cache key is absent rather than conditionally checking its
value: after calling d.evictCachedLedger("test-ledger") twice, acquire the lock
(d.mu.RLock/d.mu.RUnlock) and assert that "test-ledger" does not exist in
d.cacheGens (i.e., existence is false) to ensure evictCachedLedger did not
create an entry when caching is disabled; update the assertion to explicitly
require non-existence of the key instead of only validating gen when exists is
true.
---
Nitpick comments:
In `@internal/storage/driver/driver_cache_test.go`:
- Around line 34-43: TestCacheMissAfterTTL uses a fixed Sleep(2ms) against a 1ms
TTL which can flake; replace the hard sleep with a polling loop that repeatedly
calls d.getCachedLedger(l.Name) until it returns missing or a bounded timeout
elapses. Update the TestCacheMissAfterTTL test to create d via
newTestDriver(...), set the cache with d.setCachedLedger(l), then poll (e.g.,
loop with short sleeps) checking d.getCachedLedger(l.Name) and fail if the
ledger is still present after the timeout to make the test robust against
scheduler jitter.
In `@internal/storage/driver/driver_errors_test.go`:
- Around line 173-201: The test TestOpenLedgerColdCacheSuccess currently only
checks outputs and not that the warm-cache path skipped backing-store lookups;
modify the test to assert GetLedger was not called on the second OpenLedger
invocation by instrumenting the stubSystemStore/getLedgerFn (or replacing it
before the second call) so any invocation fails the test (e.g., set getLedgerFn
= func() (*ledger.Ledger, error) { t.Fatalf("GetLedger called on warm cache");
return nil, nil }) or by recording a call counter on stubSystemStore and
asserting it did not increase after the warm call; keep all other assertions
(OpenLedger success and ledger.Name checks) intact and refer to
TestOpenLedgerColdCacheSuccess, d.OpenLedger and stubSystemStore.getLedgerFn
when making the change.
🪄 Autofix (Beta)
✅ Autofix completed
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: da1b7543-c505-4e32-9fab-952a3e7c3736
📒 Files selected for processing (4)
internal/storage/driver/driver.gointernal/storage/driver/driver_cache_test.gointernal/storage/driver/driver_errors_test.gointernal/storage/driver/driver_test.go
…assertion Replace conditional existence check with an unconditional require.False so the test fails if evictCachedLedger creates a cacheGens entry when caching is disabled, rather than silently passing when the key is absent.
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Fixes Applied SuccessfullyFixed 1 file(s) based on 1 unresolved review comment. Files modified:
Commit: The changes have been pushed to the Time taken: |
Fixed 1 file(s) based on 1 unresolved review comment. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
Replace polling loop in TestCacheMissAfterTTL with require.Eventually for idiomatic testify-style flake resistance. Replace fatal-sentinel swap in TestOpenLedgerColdCacheSuccess with a getLedgerCalls counter asserted to equal 1 after both calls, giving a diagnostic count on failure rather than an abrupt t.Fatal.
Summary
GetLedger+CountLedgersInBucket) insideOpenLedgerbefore any business logic executed, consuming connection pool slots on every request// todo: keep the ledger in cache somewherecomment indriver.goacknowledging thisWhat this PR adds:
TTL cache (60s default) —
Drivernow holds a mutex-protectedmap[string]cachedLedger. Cache hits skip both DB queries entirely. Explicit eviction onUpdateLedgerMetadataandDeleteLedgerMetadatamakes single-pod behaviour correct; the TTL is a cross-pod safety net.Singleflight —
golang.org/x/sync/singleflight(already in go.mod) collapses concurrent cache-miss requests for the same ledger name into a single DB call. Without this, a post-TTL burst of N concurrent requests would still fan out to N×2 DB queries. There is a double-check inside the singleflight function to handle the case where a previous waiter has already populated the cache.aloneInBucketcorrectness —CountLedgersInBucketwas the second query per request. It's safe to skip on cache hits becauseledgerstore.DefaultFactoryalready maintains a shared*atomic.Boolper bucket; anyCreateLedgercall updates it immediately for all stores in the same bucket.Test plan
driver_cache_test.go, no build tag, no DB) — hit before TTL, miss after TTL (1ns TTL), unknown key, eviction, TTL=0 disables cache, concurrent access with-race, isolation between two ledger names, evict non-existent key, update existing entryTestOpenLedgerCacheHitNoDBQueries(integration) — bun query hook on a freshbun.DBinstance proves zero DB queries on warm-cache hitTestOpenLedgerConcurrentColdCacheCoalesces(integration) — 50 concurrent goroutines on cold cache, all succeed, query count well below 50TestOpenLedgerCacheTTLExpiry(integration) — 50ms TTL, metadata mutated via a second driver instance (bypassing cache), sleep, verify stale entry re-reads from DBTestOpenLedgerCacheEvictionOnMetadataUpdate/...OnMetadataDelete(integration) — eviction wired correctly on both write pathsgo:build it)Related
Part of connection pool saturation fix for prod-us-east-1-deriv. Companion PR: #1354 (retry jitter).