Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions packages/semantic-cache/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,30 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- **Per-entry hit analytics** — every entry now tracks `hit_count` and
`last_accessed_at`, incremented atomically on each cache hit (batched into the
existing TTL-refresh pipeline — no extra round trip). New `entryAnalytics()`
method reports total / never-hit / cold entry counts and the hottest entries.
When the index includes usage fields, counts use server-side `FT.SEARCH` with
`LIMIT 0 0` (exact, no materialization); `topEntries` uses `SORTBY hit_count
DESC` with `LIMIT 0 topN`. Older indexes fall back to `SCAN` + pipelined
`HMGET` (5 fields only) over a sample of up to 10,000 entries. New
`entry_analytics` capability on the discovery marker. `EntryAnalyticsOptions`,
`EntryAnalyticsResult`, and `EntrySummary` exported from the package root.
- **LLM-as-judge for borderline hits** — `CacheCheckOptions.judge` accepts a `judgeFn` that adjudicates hits whose cosine distance lands in the uncertainty band (`threshold - uncertaintyBand < score <= threshold`). The judge promotes accepted hits to `confidence: 'high'` and demotes rejected hits to a miss with `nearestMiss` populated. Configurable `timeoutMs` (default 2000) and `onError` (default `'accept'`, fail-open). Direct response to user feedback that single-threshold matching produced too many uncertain borderline returns on chat.betterdb.com.
- New Prometheus metrics `{prefix}_judge_decisions_total{decision}` and `{prefix}_judge_duration_seconds{decision}` with decision labels `accept | reject | error_accept | error_reject | timeout_accept | timeout_reject`.
- New OTel span attributes `cache.judge.invoked`, `cache.judge.decision`, `cache.judge.latency_ms`.
- `JudgeOptions` type exported from the package root.

### Changed

- Cache hits now incur one Valkey round trip for usage tracking even when
`defaultTtl` is not configured. Previously a hit was a pure read.
- The FT index schema gains `hit_count NUMERIC SORTABLE` and `last_accessed_at
NUMERIC SORTABLE`. Existing indexes keep working; run `flush()` +
`initialize()` to rebuild the schema and enable the fast analytics path.
`HINCRBY` auto-creates the counter, so pre-existing entries begin tracking
correctly on their first hit after upgrade with no migration.
- `nearestMiss.deltaToThreshold` may be `<= 0` when a miss originates from a judge rejection (the score did clear the threshold but the judge said no). Existing miss paths still produce `> 0`. Documented on the type.
- `checkBatch()` throws `SemanticCacheUsageError` when `judge` is supplied, matching the existing handling of `rerank` and `staleAfterModelChange`.

Expand Down
23 changes: 23 additions & 0 deletions packages/semantic-cache/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -291,6 +291,29 @@ Stops the analytics client, cancels the stats snapshot timer, and disposes the d

Graceful shutdown of the discovery layer for in-process caches without destroying data. Stops the discovery heartbeat and deletes the heartbeat key; does not touch the index or entries.

### `cache.entryAnalytics(options?)`

Per-entry usage analytics: how many stored entries have ever been returned as a hit, which entries are hottest, and how many are cold (never hit, or not accessed within `coldAfterDays`). Use this to find dead-weight entries when sizing TTLs and keeping the HNSW index lean.

```typescript
const a = await cache.entryAnalytics({ topN: 5, coldAfterDays: 14 });
console.log(`${a.neverHitCount} of ${a.totalEntries} entries have never been hit`);
console.log('hottest:', a.topEntries.map((e) => `${e.key} (${e.hitCount})`));
```

Returns `totalEntries`, `neverHitCount`, `hitAtLeastOnceCount`, `coldEntryCount`,
`topEntries` (sorted by `hitCount` descending, capped at `topN`), and the
`coldAfterDays` value applied.

When the FT index was created with the `hit_count` / `last_accessed_at` sortable
fields, counts use server-side `FT.SEARCH` with `LIMIT 0 0` (exact, no
materialization) and `topEntries` uses `SORTBY hit_count DESC` with `LIMIT 0 topN`.

Older indexes fall back to `SCAN` + pipelined `HMGET` (fetching only the 5
analytics fields) over a sample of up to 10,000 entries in implementation-defined
order (`totalEntries` is the sample size). Run `flush()` + `initialize()` to
rebuild the index and enable exact counts.

### `cache.thresholdEffectiveness(options?)`

Analyzes the rolling similarity score window (last 10,000 entries, up to 7 days) and returns:
Expand Down
Loading
Loading