MotleyAI · ZmeiGorynych · Jun 1, 2026 · Jun 1, 2026 · Jun 1, 2026 · Jun 1, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -116,7 +116,7 @@ poetry run ruff check slayer/ tests/
 
   **Embedding storage**: `SQLiteStorage` writes embeddings into the main `.db`; `YAMLStorage` uses a sidecar `<base_dir>/embeddings.db` so the YAML store stays git-diffable. Both go through `slayer/storage/sidecar_embedding_store.py`. Cascade-delete on a `canonical_id` matches exactly or as a strict dotted-path descendant — never as a character prefix.
 
-  **Sample-value snapshots** are cached on `Column.sampled` (text), `Column.sampled_values` (structured top-50 list for categorical columns, DEV-1480), and `Column.distinct_count` (true cardinality for categorical columns, DEV-1480). Refreshed on `slayer ingest` (table-backed models only), on `slayer search refresh-samples`, on `edit_model` (column edits → that column; model-level changes to `filters` / `sql` / `source_queries` → every column), and lazily on `inspect_model` cache miss (best-effort write-back). Categorical columns are ordered by count desc with alphabetical tie-break; the structured list is the consumer-facing way to compare predicate literals against actual stored values (text-split on `sampled` is ambiguous for values containing commas, e.g. `"R$ 1,000–3,000"`). Cache validity for categorical columns requires `sampled_values is not None` (v6 → v7 upgrades re-profile on next `inspect_model`). sql-mode and query-backed models do not yet have sample-value coverage.
+  **Sample-value snapshots** are cached on `Column.sampled` (text), `Column.sampled_values` (structured top-50 list for categorical columns, DEV-1480), and `Column.distinct_count` (true cardinality for categorical columns, DEV-1480). Refreshed on `slayer ingest` (table-backed models only), on `slayer search refresh-samples`, on `edit_model` (column edits → that column; model-level changes to `filters` / `sql` / `source_queries` → every column), lazily on `inspect_model` cache miss (best-effort write-back), and — DEV-1516 — lazily inside `search()` for any column hit whose persisted `sampled_values` is stale (pre-DEV-1480 legacy data or count_distinct-failed retry). Categorical columns are ordered by count desc with alphabetical tie-break; the structured list is the consumer-facing way to compare predicate literals against actual stored values (text-split on `sampled` is ambiguous for values containing commas, e.g. `"R$ 1,000–3,000"`). Cache validity for categorical columns requires `sampled_values is not None` (v6 → v7 upgrades re-profile on next `inspect_model` or next `search()` column hit). sql-mode and query-backed models do not yet have sample-value coverage. **DEV-1516**: the shared refresh helper `slayer/engine/profiling.py::ensure_column_sample_fresh(model, column, engine, storage)` is the single source of truth for the categorical cache-miss + persist pattern — both `inspect_model` (categorical loop) and `SearchService` (post-fusion column-hit hook) delegate to it. `SearchService.__init__` accepts an optional `engine: Optional[SlayerQueryEngine] = None`; when supplied (MCP `create_mcp_server` + REST `create_app` both wire it), the post-fusion hook walks column hits, groups by `(data_source, model_name)` (serialised within group, parallelised across via `asyncio.gather`), refreshes via the helper, and re-renders `EntityHit.text` via `render_column_text`. When `engine=None` the hook is a silent no-op (storage-only test contexts keep working). **DEV-1516 rendering**: `render_column_text` surfaces the **full** `sampled_values` list (typically 50) when populated — emitted as a JSON-encoded array (`["paid", "refunded", ...]`) so values containing commas (e.g. `"R$ 1,000–3,000"`) survive intact to the consumer, plus a `Distinct count: N` follow-up line when `distinct_count > len(sampled_values)`; falls back to the persisted `sampled` text only when `sampled_values is None`. Empty list (`[]`) is authoritative — skips the line entirely (no fallback to stale `sampled`). `inspect_model` markdown table continues to show the 20-truncated `sampled` text per column (readability on wide models). Known limitation: ranking (BM25 + tantivy + embeddings) still operates on stale corpus text — only the returned `EntityHit.text` is refreshed; sample-value-token search recall past position 20 stays bounded by indexed text until next ingest pass content-hashes the column and re-embeds it.
 
   `inspect_model` auto-renders a `Learnings` section showing only learning-only memories (`query is None`); query-bearing memories surface only via `search` in the `example_queries` bucket.
 

diff --git a/docs/concepts/search.md b/docs/concepts/search.md
@@ -317,12 +317,65 @@ All three are populated:
 - on `edit_model` (column edits → that column; model-level filter / sql /
   source-query body change → every column);
 - lazily on `inspect_model` when the cached value is missing (write-back
-  best-effort). Cache validity for categorical columns requires
-  `sampled_values is not None` — v6 (legacy `sampled` only) models
-  re-profile on next call so the structured field gets populated.
+  best-effort);
+- lazily inside `search()` itself for any column hit whose persisted
+  `sampled_values` is stale (DEV-1516). The post-fusion column-hit hook
+  groups hits by `(data_source, model_name)` — refreshes within a model
+  serialise (the storage write is a model-level read-modify-write);
+  refreshes across different models run concurrently via
+  `asyncio.gather`. When `search()` is constructed without an engine
+  (storage-only contexts), the hook is a silent no-op.
+
+Cache validity for categorical columns requires `sampled_values is not None` —
+v6 (legacy `sampled` only) models re-profile on the next `inspect_model`
+or `search()` column hit so the structured field gets populated.
 
 sql-mode and query-backed models are silently skipped in v1.
 
+### How sample values surface in search results
+
+The per-column doc rendered by `slayer/search/render.py:render_column_text`
+prefers the structured `sampled_values` list (full top-50) over the
+20-truncated `sampled` text. When `sampled_values` is populated:
+
+```text
+Column: warehouse.orders.status
+Type: TEXT
+Description: Order status.
+Sample values: ["paid", "refunded", "cancelled", "pending", …]  ← JSON-encoded, all 50
+Distinct count: 12345        ← only when distinct_count > len(sampled_values)
+```
+
+The list is rendered as a JSON array (not comma-joined) so values that
+themselves contain commas — `"R$ 1,000–3,000"`, locale-formatted numbers,
+multi-clause labels — survive unambiguously to the consumer. This is why
+DEV-1480 introduced the structured `sampled_values` field in the first
+place; comma-joining it back to a flat string would re-introduce the
+exact ambiguity it was meant to solve.
+
+When `sampled_values` is `None` (numeric / temporal columns, or legacy
+v6 data, or rare overflow-with-failed-count_distinct rows), the renderer
+falls back to the persisted `sampled` text — which already carries the
+`... (N distinct)` suffix for the legacy overflow case, so no extra
+`Distinct count` line is emitted. An empty `sampled_values=[]` list is
+authoritative-empty: the line is skipped entirely (no fallback to stale
+`sampled`).
+
+This same text feeds both the per-column search index doc AND
+`EntityHit.text` returned by `search()` — single renderer, single
+source of truth. `inspect_model`'s markdown `## Columns` table is the
+**all-columns-at-once** surface and continues to show the 20-truncated
+`sampled` text per column for readability on wide models. JSON
+`inspect_model` output already carries the full `sampled_values` list.
+
+**Known limitation.** The refresh hook runs **after** RRF fusion, on the
+top-K hits being returned. Ranking (BM25 / tantivy / embeddings) still
+operates on whatever the corpus held at index-build time. A query whose
+only match against a column is a newly-revealed value in positions 21-50
+may still fail to surface that column. The text the agent sees IS
+refreshed; tantivy / embeddings will catch up on the next
+`slayer ingest` content-hash pass.
+
 ## Index design notes
 
 - The tantivy index is built **fresh on every search call** in v1 (no

diff --git a/slayer/api/server.py b/slayer/api/server.py
@@ -650,7 +650,9 @@ async def delete_memory(memory_id: str) -> Dict[str, Any]:
 
     # ---------- DEV-1375: semantic search -----------------------------
 
-    search_service = SearchService(storage=storage)
+    # DEV-1516: pass the engine so the search service's post-fusion
+    # column-hit hook can auto-refresh stale categorical columns.
+    search_service = SearchService(storage=storage, engine=engine)
 
     @app.post(
         "/search",

diff --git a/slayer/engine/profiling.py b/slayer/engine/profiling.py
@@ -27,8 +27,13 @@
 - Categorical query orders by per-value count desc (alphabetical tie-break
   in SQL) so the persisted top-N is "most common values first".
 - New ``Column.sampled_values: Optional[List[str]]`` carries the top-50
-  list verbatim (no ambiguous text split). Stays ``None`` for overflow >50
-  and for numeric/temporal columns.
+  list verbatim (no ambiguous text split). For categorical columns it is
+  populated on ≤50 distinct AND on overflow when the secondary
+  ``count_distinct`` query succeeds. It stays ``None`` only for
+  numeric/temporal columns AND for the rare overflow branch where the
+  secondary ``count_distinct`` query fails (intentional cache-miss retry
+  signal — ``_is_sample_cached`` flags it stale so the next
+  ``inspect_model`` / ``refresh-samples`` call retries).
 - New ``Column.distinct_count: Optional[int]`` carries the true total
   cardinality; the overflow branch fires a second ``count_distinct`` query
   via a transient ``ModelExtension`` (bypassing ``Column.allowed_aggregations``
@@ -40,10 +45,18 @@
   ``values=None, distinct_count=None`` to signal "data omitted from the
   legacy entry". The richer DEV-1480 data only lives on ``ColumnSample``
   produced by ``profile_column``.
+
+DEV-1516 additions:
+- :func:`ensure_column_sample_fresh` — shared cache-aware refresh helper
+  used by both ``inspect_model``'s categorical loop and the search
+  service's post-fusion column-hit hook. Returns the input column on cache
+  hit / non-categorical / failure, and an in-memory refreshed copy on
+  success (after persisting via storage).
 """
 
 from __future__ import annotations
 
+import logging
 from typing import Any, Dict, List, NamedTuple, Optional, Set, Tuple
 
 from slayer.core.enums import DataType
@@ -53,6 +66,9 @@
 from slayer.storage.base import StorageBackend
 
 
+logger = logging.getLogger(__name__)
+
+
 # ---------------------------------------------------------------------------
 # DEV-1480: categorical cap and public-ish result type
 # ---------------------------------------------------------------------------
@@ -629,3 +645,105 @@ async def handle_edit_refresh(
             )
         )
     return warnings
+
+
+# ---------------------------------------------------------------------------
+# DEV-1516: shared cache-aware refresh helper
+# ---------------------------------------------------------------------------
+
+
+async def ensure_column_sample_fresh(
+    *,
+    model: SlayerModel,
+    column: Column,
+    engine: SlayerQueryEngine,
+    storage: StorageBackend,
+) -> Column:
+    """Best-effort refresh of a stale categorical column's persisted sample.
+
+    Used by both :func:`slayer.mcp.server.inspect_model` (categorical cache
+    miss path) and :class:`slayer.search.service.SearchService` (post-fusion
+    column-hit hook) so DEV-1516's "stale columns auto-refresh on the spot"
+    contract has a single source of truth.
+
+    Returns the **input column unchanged** when:
+
+    - ``_is_sample_cached(column)`` is True (cache hit; includes hidden /
+      primary-key columns by convention),
+    - the column is not categorical (numeric / temporal are handled by
+      ``inspect_model``'s batched min/max path; the search hook never
+      refreshes them since their ``sampled`` text has no 20-vs-50 issue),
+    - :func:`profile_column` returns ``None`` (e.g. transient query failure
+      or no rows),
+    - :func:`profile_column` raises (logged + swallowed),
+    - ``storage.update_column_sampled`` raises (logged + swallowed; the
+      in-memory refresh is still returned so the caller can render fresh
+      data this call).
+
+    Returns a Pydantic ``model_copy``'d column with refreshed
+    ``sampled`` / ``sampled_values`` / ``distinct_count`` fields on
+    success (after persisting via storage).
+
+    Logs ``WARNING`` on profile + persist failures with
+    ``(data_source, model_name, column_name)`` context so observability
+    matches the pre-DEV-1516 inline implementation in ``inspect_model``.
+    """
+    if _is_sample_cached(column):
+        return column
+    if column.type not in _CATEGORICAL_TYPES:
+        # Numeric / temporal: inspect_model handles them via the batched
+        # min/max query. The search refresh hook intentionally skips them
+        # because their ``sampled`` text is a min/max range, not a value
+        # list — the 20-vs-50 distinction does not apply.
+        return column
+    try:
+        sample = await profile_column(
+            model=model, column=column, engine=engine,
+        )
+    except Exception as exc:  # NOSONAR(S112) — best-effort: see module docstring
+        logger.warning(
+            "ensure_column_sample_fresh: failed to profile %s.%s.%s: %s",
+            model.data_source, model.name, column.name, exc,
+        )
+        return column
+    if sample is None:
+        # No data to persist (e.g. PK / hidden / no rows). Helper short-
+        # circuits without writing — keeps cache predicate from flipping.
+        return column
+    if (
+        sample.sampled_values is None
+        and sample.distinct_count is None
+        and column.sampled
+    ):
+        # Overflow-retry path failed to recover structured data: the
+        # ``ColumnSample`` carries only the generic ``"> 50 distinct"``
+        # marker. The column already has a richer ``sampled`` text
+        # (e.g. v6 legacy ``"a, b, c ... (1234 distinct)"`` or a
+        # previous successful-overflow run). Skip the persist + return
+        # the input so the rich text survives — cache predicate still
+        # flags the column stale so the next call retries.
+        return column
+    try:
+        await storage.update_column_sampled(
+            data_source=model.data_source,
+            model_name=model.name,
+            column_name=column.name,
+            sampled=sample.sampled,
+            sampled_values=sample.sampled_values,
+            distinct_count=sample.distinct_count,
+        )
+    except Exception as exc:  # NOSONAR(S112) — best-effort: see module docstring
+        logger.warning(
+            "ensure_column_sample_fresh: failed to persist sample for "
+            "%s.%s.%s via update_column_sampled: %s",
+            model.data_source, model.name, column.name, exc,
+        )
+        # Fall through: surface the in-memory refresh so the caller can
+        # still render fresh data this call. Next call will retry — the
+        # cache predicate still flags the column stale because the persist
+        # never landed.
+    return column.model_copy(update={
+        "sampled": sample.sampled,
+        "sampled_values": sample.sampled_values,
+        "distinct_count": sample.distinct_count,
+    })
diff --git a/slayer/mcp/server.py b/slayer/mcp/server.py
@@ -25,8 +25,8 @@
 from slayer.engine.profiling import (
     _is_sample_cached,
     _profile_numeric_temporal_columns,
+    ensure_column_sample_fresh,
     handle_edit_refresh,
-    profile_column,
 )
 from slayer.engine.query_engine import SlayerQueryEngine, SlayerResponse
 from slayer.help import TOPIC_SUMMARY_LINE, render_help
@@ -1395,30 +1395,28 @@ async def _persist_sample(
                         )
 
                 # Categorical: one top-values query per column (+ optional
-                # count_distinct on overflow).
+                # count_distinct on overflow). DEV-1516: delegates to the
+                # shared ``ensure_column_sample_fresh`` helper so the
+                # cache-miss + persist + render-dict-population pattern is
+                # owned by exactly one place (also used by the search
+                # service's post-fusion column-hit hook).
                 for col in cat_uncached:
-                    try:
-                        sample = await profile_column(
-                            model=model, column=col, engine=engine,
-                        )
-                    except Exception as exc:
-                        logger.warning(
-                            "inspect_model: failed to profile %s.%s.%s: %s",
-                            model.data_source, model.name, col.name, exc,
-                        )
-                        sample = None
-                    if sample is None:
-                        continue
-                    if sample.sampled is not None:
-                        profile_by_name[col.name] = sample.sampled
-                    profile_values_by_name[col.name] = sample.sampled_values
-                    distinct_count_by_name[col.name] = sample.distinct_count
-                    await _persist_sample(
-                        col_name=col.name,
-                        sampled=sample.sampled,
-                        sampled_values=sample.sampled_values,
-                        distinct_count=sample.distinct_count,
+                    refreshed = await ensure_column_sample_fresh(
+                        model=model, column=col,
+                        engine=engine, storage=storage,
                     )
+                    # On any failure (profile raise / None / persist raise)
+                    # the helper returns the INPUT column. Legacy ``sampled``
+                    # text on the input still feeds the markdown cell — the
+                    # pre-pass at lines 1347-1354 has already populated
+                    # ``profile_by_name[col.name]`` from ``col.sampled``,
+                    # so we only overwrite when we actually have something
+                    # fresher (avoids clobbering the legacy fallback with
+                    # ``None`` and producing an empty cell).
+                    if refreshed.sampled is not None:
+                        profile_by_name[col.name] = refreshed.sampled
+                    profile_values_by_name[col.name] = refreshed.sampled_values
+                    distinct_count_by_name[col.name] = refreshed.distinct_count
 
                 # Numeric/temporal: one batched min/max query for all of
                 # them at once (restores the pre-DEV-1480 batching for
@@ -2825,7 +2823,9 @@ async def forget_memory(id: Any) -> str:  # noqa: A002 — MCP arg name
 
     # ---------- DEV-1375: semantic search -----------------------------
 
-    search_service = SearchService(storage=storage)
+    # DEV-1516: pass the engine so the search service's post-fusion
+    # column-hit hook can auto-refresh stale categorical columns.
+    search_service = SearchService(storage=storage, engine=engine)
 
     @mcp.tool()
     async def search(