Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ poetry run ruff check slayer/ tests/

**Embedding storage**: `SQLiteStorage` writes embeddings into the main `.db`; `YAMLStorage` uses a sidecar `<base_dir>/embeddings.db` so the YAML store stays git-diffable. Both go through `slayer/storage/sidecar_embedding_store.py`. Cascade-delete on a `canonical_id` matches exactly or as a strict dotted-path descendant — never as a character prefix.

**Sample-value snapshots** are cached on `Column.sampled` (text), `Column.sampled_values` (structured top-50 list for categorical columns, DEV-1480), and `Column.distinct_count` (true cardinality for categorical columns, DEV-1480). Refreshed on `slayer ingest` (table-backed models only), on `slayer search refresh-samples`, on `edit_model` (column edits → that column; model-level changes to `filters` / `sql` / `source_queries` → every column), and lazily on `inspect_model` cache miss (best-effort write-back). Categorical columns are ordered by count desc with alphabetical tie-break; the structured list is the consumer-facing way to compare predicate literals against actual stored values (text-split on `sampled` is ambiguous for values containing commas, e.g. `"R$ 1,000–3,000"`). Cache validity for categorical columns requires `sampled_values is not None` (v6 → v7 upgrades re-profile on next `inspect_model`). sql-mode and query-backed models do not yet have sample-value coverage.
**Sample-value snapshots** are cached on `Column.sampled` (text), `Column.sampled_values` (structured top-50 list for categorical columns, DEV-1480), and `Column.distinct_count` (true cardinality for categorical columns, DEV-1480). Refreshed on `slayer ingest` (table-backed models only), on `slayer search refresh-samples`, on `edit_model` (column edits → that column; model-level changes to `filters` / `sql` / `source_queries` → every column), lazily on `inspect_model` cache miss (best-effort write-back), and — DEV-1516 — lazily inside `search()` for any column hit whose persisted `sampled_values` is stale (pre-DEV-1480 legacy data or count_distinct-failed retry). Categorical columns are ordered by count desc with alphabetical tie-break; the structured list is the consumer-facing way to compare predicate literals against actual stored values (text-split on `sampled` is ambiguous for values containing commas, e.g. `"R$ 1,000–3,000"`). Cache validity for categorical columns requires `sampled_values is not None` (v6 → v7 upgrades re-profile on next `inspect_model` or next `search()` column hit). sql-mode and query-backed models do not yet have sample-value coverage. **DEV-1516**: the shared refresh helper `slayer/engine/profiling.py::ensure_column_sample_fresh(model, column, engine, storage)` is the single source of truth for the categorical cache-miss + persist pattern — both `inspect_model` (categorical loop) and `SearchService` (post-fusion column-hit hook) delegate to it. `SearchService.__init__` accepts an optional `engine: Optional[SlayerQueryEngine] = None`; when supplied (MCP `create_mcp_server` + REST `create_app` both wire it), the post-fusion hook walks column hits, groups by `(data_source, model_name)` (serialised within group, parallelised across via `asyncio.gather`), refreshes via the helper, and re-renders `EntityHit.text` via `render_column_text`. When `engine=None` the hook is a silent no-op (storage-only test contexts keep working). **DEV-1516 rendering**: `render_column_text` surfaces the **full** `sampled_values` list (typically 50) when populated — emitted as a JSON-encoded array (`["paid", "refunded", ...]`) so values containing commas (e.g. `"R$ 1,000–3,000"`) survive intact to the consumer, plus a `Distinct count: N` follow-up line when `distinct_count > len(sampled_values)`; falls back to the persisted `sampled` text only when `sampled_values is None`. Empty list (`[]`) is authoritative — skips the line entirely (no fallback to stale `sampled`). `inspect_model` markdown table continues to show the 20-truncated `sampled` text per column (readability on wide models). Known limitation: ranking (BM25 + tantivy + embeddings) still operates on stale corpus text — only the returned `EntityHit.text` is refreshed; sample-value-token search recall past position 20 stays bounded by indexed text until next ingest pass content-hashes the column and re-embeds it.

`inspect_model` auto-renders a `Learnings` section showing only learning-only memories (`query is None`); query-bearing memories surface only via `search` in the `example_queries` bucket.

Expand Down
59 changes: 56 additions & 3 deletions docs/concepts/search.md
Original file line number Diff line number Diff line change
Expand Up @@ -317,12 +317,65 @@ All three are populated:
- on `edit_model` (column edits → that column; model-level filter / sql /
source-query body change → every column);
- lazily on `inspect_model` when the cached value is missing (write-back
best-effort). Cache validity for categorical columns requires
`sampled_values is not None` — v6 (legacy `sampled` only) models
re-profile on next call so the structured field gets populated.
best-effort);
- lazily inside `search()` itself for any column hit whose persisted
`sampled_values` is stale (DEV-1516). The post-fusion column-hit hook
groups hits by `(data_source, model_name)` — refreshes within a model
serialise (the storage write is a model-level read-modify-write);
refreshes across different models run concurrently via
`asyncio.gather`. When `search()` is constructed without an engine
(storage-only contexts), the hook is a silent no-op.

Cache validity for categorical columns requires `sampled_values is not None` —
v6 (legacy `sampled` only) models re-profile on the next `inspect_model`
or `search()` column hit so the structured field gets populated.

sql-mode and query-backed models are silently skipped in v1.

### How sample values surface in search results

The per-column doc rendered by `slayer/search/render.py:render_column_text`
prefers the structured `sampled_values` list (full top-50) over the
20-truncated `sampled` text. When `sampled_values` is populated:

```text
Column: warehouse.orders.status
Type: TEXT
Description: Order status.
Sample values: ["paid", "refunded", "cancelled", "pending", …] ← JSON-encoded, all 50
Distinct count: 12345 ← only when distinct_count > len(sampled_values)
```

The list is rendered as a JSON array (not comma-joined) so values that
themselves contain commas — `"R$ 1,000–3,000"`, locale-formatted numbers,
multi-clause labels — survive unambiguously to the consumer. This is why
DEV-1480 introduced the structured `sampled_values` field in the first
place; comma-joining it back to a flat string would re-introduce the
exact ambiguity it was meant to solve.

When `sampled_values` is `None` (numeric / temporal columns, or legacy
v6 data, or rare overflow-with-failed-count_distinct rows), the renderer
falls back to the persisted `sampled` text — which already carries the
`... (N distinct)` suffix for the legacy overflow case, so no extra
`Distinct count` line is emitted. An empty `sampled_values=[]` list is
authoritative-empty: the line is skipped entirely (no fallback to stale
`sampled`).

This same text feeds both the per-column search index doc AND
`EntityHit.text` returned by `search()` — single renderer, single
source of truth. `inspect_model`'s markdown `## Columns` table is the
**all-columns-at-once** surface and continues to show the 20-truncated
`sampled` text per column for readability on wide models. JSON
`inspect_model` output already carries the full `sampled_values` list.

**Known limitation.** The refresh hook runs **after** RRF fusion, on the
top-K hits being returned. Ranking (BM25 / tantivy / embeddings) still
operates on whatever the corpus held at index-build time. A query whose
only match against a column is a newly-revealed value in positions 21-50
may still fail to surface that column. The text the agent sees IS
refreshed; tantivy / embeddings will catch up on the next
`slayer ingest` content-hash pass.

## Index design notes

- The tantivy index is built **fresh on every search call** in v1 (no
Expand Down
4 changes: 3 additions & 1 deletion slayer/api/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -650,7 +650,9 @@ async def delete_memory(memory_id: str) -> Dict[str, Any]:

# ---------- DEV-1375: semantic search -----------------------------

search_service = SearchService(storage=storage)
# DEV-1516: pass the engine so the search service's post-fusion
# column-hit hook can auto-refresh stale categorical columns.
search_service = SearchService(storage=storage, engine=engine)

@app.post(
"/search",
Expand Down
122 changes: 120 additions & 2 deletions slayer/engine/profiling.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,13 @@
- Categorical query orders by per-value count desc (alphabetical tie-break
in SQL) so the persisted top-N is "most common values first".
- New ``Column.sampled_values: Optional[List[str]]`` carries the top-50
list verbatim (no ambiguous text split). Stays ``None`` for overflow >50
and for numeric/temporal columns.
list verbatim (no ambiguous text split). For categorical columns it is
populated on ≤50 distinct AND on overflow when the secondary
``count_distinct`` query succeeds. It stays ``None`` only for
numeric/temporal columns AND for the rare overflow branch where the
secondary ``count_distinct`` query fails (intentional cache-miss retry
signal — ``_is_sample_cached`` flags it stale so the next
``inspect_model`` / ``refresh-samples`` call retries).
- New ``Column.distinct_count: Optional[int]`` carries the true total
cardinality; the overflow branch fires a second ``count_distinct`` query
via a transient ``ModelExtension`` (bypassing ``Column.allowed_aggregations``
Expand All @@ -40,10 +45,18 @@
``values=None, distinct_count=None`` to signal "data omitted from the
legacy entry". The richer DEV-1480 data only lives on ``ColumnSample``
produced by ``profile_column``.

DEV-1516 additions:
- :func:`ensure_column_sample_fresh` — shared cache-aware refresh helper
used by both ``inspect_model``'s categorical loop and the search
service's post-fusion column-hit hook. Returns the input column on cache
hit / non-categorical / failure, and an in-memory refreshed copy on
success (after persisting via storage).
"""

from __future__ import annotations

import logging
from typing import Any, Dict, List, NamedTuple, Optional, Set, Tuple

from slayer.core.enums import DataType
Expand All @@ -53,6 +66,9 @@
from slayer.storage.base import StorageBackend


logger = logging.getLogger(__name__)


# ---------------------------------------------------------------------------
# DEV-1480: categorical cap and public-ish result type
# ---------------------------------------------------------------------------
Expand Down Expand Up @@ -629,3 +645,105 @@ async def handle_edit_refresh(
)
)
return warnings


# ---------------------------------------------------------------------------
# DEV-1516: shared cache-aware refresh helper
# ---------------------------------------------------------------------------


async def ensure_column_sample_fresh(
*,
model: SlayerModel,
column: Column,
engine: SlayerQueryEngine,
storage: StorageBackend,
) -> Column:
"""Best-effort refresh of a stale categorical column's persisted sample.

Used by both :func:`slayer.mcp.server.inspect_model` (categorical cache
miss path) and :class:`slayer.search.service.SearchService` (post-fusion
column-hit hook) so DEV-1516's "stale columns auto-refresh on the spot"
contract has a single source of truth.

Returns the **input column unchanged** when:

- ``_is_sample_cached(column)`` is True (cache hit; includes hidden /
primary-key columns by convention),
- the column is not categorical (numeric / temporal are handled by
``inspect_model``'s batched min/max path; the search hook never
refreshes them since their ``sampled`` text has no 20-vs-50 issue),
- :func:`profile_column` returns ``None`` (e.g. transient query failure
or no rows),
- :func:`profile_column` raises (logged + swallowed),
- ``storage.update_column_sampled`` raises (logged + swallowed; the
in-memory refresh is still returned so the caller can render fresh
data this call).

Returns a Pydantic ``model_copy``'d column with refreshed
``sampled`` / ``sampled_values`` / ``distinct_count`` fields on
success (after persisting via storage).

Logs ``WARNING`` on profile + persist failures with
``(data_source, model_name, column_name)`` context so observability
matches the pre-DEV-1516 inline implementation in ``inspect_model``.
"""
if _is_sample_cached(column):
return column
if column.type not in _CATEGORICAL_TYPES:
# Numeric / temporal: inspect_model handles them via the batched
# min/max query. The search refresh hook intentionally skips them
# because their ``sampled`` text is a min/max range, not a value
# list — the 20-vs-50 distinction does not apply.
return column
try:
sample = await profile_column(
model=model, column=column, engine=engine,
)
except Exception as exc: # NOSONAR(S112) — best-effort: see module docstring
logger.warning(
"ensure_column_sample_fresh: failed to profile %s.%s.%s: %s",
model.data_source, model.name, column.name, exc,
)
return column
if sample is None:
# No data to persist (e.g. PK / hidden / no rows). Helper short-
# circuits without writing — keeps cache predicate from flipping.
return column
if (
sample.sampled_values is None
and sample.distinct_count is None
and column.sampled
):
# Overflow-retry path failed to recover structured data: the
# ``ColumnSample`` carries only the generic ``"> 50 distinct"``
# marker. The column already has a richer ``sampled`` text
# (e.g. v6 legacy ``"a, b, c ... (1234 distinct)"`` or a
# previous successful-overflow run). Skip the persist + return
# the input so the rich text survives — cache predicate still
# flags the column stale so the next call retries.
return column
try:
await storage.update_column_sampled(
data_source=model.data_source,
model_name=model.name,
column_name=column.name,
sampled=sample.sampled,
sampled_values=sample.sampled_values,
distinct_count=sample.distinct_count,
)
except Exception as exc: # NOSONAR(S112) — best-effort: see module docstring
logger.warning(
"ensure_column_sample_fresh: failed to persist sample for "
"%s.%s.%s via update_column_sampled: %s",
model.data_source, model.name, column.name, exc,
)
# Fall through: surface the in-memory refresh so the caller can
# still render fresh data this call. Next call will retry — the
# cache predicate still flags the column stale because the persist
# never landed.
return column.model_copy(update={
"sampled": sample.sampled,
"sampled_values": sample.sampled_values,
"distinct_count": sample.distinct_count,
})
Comment thread
coderabbitai[bot] marked this conversation as resolved.
48 changes: 24 additions & 24 deletions slayer/mcp/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@
from slayer.engine.profiling import (
_is_sample_cached,
_profile_numeric_temporal_columns,
ensure_column_sample_fresh,
handle_edit_refresh,
profile_column,
)
from slayer.engine.query_engine import SlayerQueryEngine, SlayerResponse
from slayer.help import TOPIC_SUMMARY_LINE, render_help
Expand Down Expand Up @@ -1395,30 +1395,28 @@ async def _persist_sample(
)

# Categorical: one top-values query per column (+ optional
# count_distinct on overflow).
# count_distinct on overflow). DEV-1516: delegates to the
# shared ``ensure_column_sample_fresh`` helper so the
# cache-miss + persist + render-dict-population pattern is
# owned by exactly one place (also used by the search
# service's post-fusion column-hit hook).
for col in cat_uncached:
try:
sample = await profile_column(
model=model, column=col, engine=engine,
)
except Exception as exc:
logger.warning(
"inspect_model: failed to profile %s.%s.%s: %s",
model.data_source, model.name, col.name, exc,
)
sample = None
if sample is None:
continue
if sample.sampled is not None:
profile_by_name[col.name] = sample.sampled
profile_values_by_name[col.name] = sample.sampled_values
distinct_count_by_name[col.name] = sample.distinct_count
await _persist_sample(
col_name=col.name,
sampled=sample.sampled,
sampled_values=sample.sampled_values,
distinct_count=sample.distinct_count,
refreshed = await ensure_column_sample_fresh(
model=model, column=col,
engine=engine, storage=storage,
)
# On any failure (profile raise / None / persist raise)
# the helper returns the INPUT column. Legacy ``sampled``
# text on the input still feeds the markdown cell — the
# pre-pass at lines 1347-1354 has already populated
# ``profile_by_name[col.name]`` from ``col.sampled``,
# so we only overwrite when we actually have something
# fresher (avoids clobbering the legacy fallback with
# ``None`` and producing an empty cell).
if refreshed.sampled is not None:
profile_by_name[col.name] = refreshed.sampled
profile_values_by_name[col.name] = refreshed.sampled_values
distinct_count_by_name[col.name] = refreshed.distinct_count

# Numeric/temporal: one batched min/max query for all of
# them at once (restores the pre-DEV-1480 batching for
Expand Down Expand Up @@ -2825,7 +2823,9 @@ async def forget_memory(id: Any) -> str: # noqa: A002 — MCP arg name

# ---------- DEV-1375: semantic search -----------------------------

search_service = SearchService(storage=storage)
# DEV-1516: pass the engine so the search service's post-fusion
# column-hit hook can auto-refresh stale categorical columns.
search_service = SearchService(storage=storage, engine=engine)

@mcp.tool()
async def search(
Expand Down
Loading
Loading