DEV-1452: Delete legacy enrichment pipeline + migrate pre-existing tests#144
Conversation
…hedMeasure DEV-1452 prerequisite #3 — first of four staged commits. Net-neutral refactor of slayer/sql/generator.py's aggregation-emission codebase so the legacy `EnrichedMeasure` type can be deleted (Stage D) without forking dialect SQL emission. Existing test_sql_generator.py + unit suite (3688 passed, 0 regressions) act as the byte-identical SQL parity gate. What changed - New frozen Pydantic `AggRenderSpec` in slayer/sql/generator.py with exactly 11 fields (locked decision #4 in the approved plan): {sql, name, model_name, aggregation, aggregation_def, agg_kwargs, alias, filter_sql, time_column, type, column_type}. Drops the ticket's illustrative `agg_args` / `source_measure_name` / `distinct` — the dialect helpers don't read any of those; count_distinct dispatches by agg name, and first/last's positional time arg is pre-resolved into `time_column` at spec-build time. - 7 dialect helpers refactored to consume AggRenderSpec: `_build_agg`, `_build_formula_agg`, `_build_percentile`, `_build_stat_agg`, `_build_ranked_subquery_from_planned`, `_resolve_value_sql`, `_resolve_agg_param`. `_build_last_ranked_from` is deliberately NOT refactored — it's the legacy ranked builder that dies wholesale in Stage D. - `_synthesize_enriched_measure_from_planned` (8 call sites in generator.py) renamed to `_build_agg_render_spec_from_planned` and now returns `AggRenderSpec` directly — no more synthetic-EnrichedMeasure adapter on the typed-pipeline path. - Transitional shim `_agg_render_spec_from_enriched(em)` — a pure field-mapping function the legacy `SQLGenerator.generate(enriched=…)` path uses to feed the refactored helpers. Five legacy call sites in generator.py (cross-model, ranked-from, having-rewrite, base-select, formula-agg-wrapper) and twelve direct-call sites in tests/test_sql_generator.py wrap via this shim. Deleted in Stage D along with the rest of the legacy stack. - `AggregateKey.args` schema loosened from `Tuple[Scalar, ...]` to `Tuple[_AggregateArgValue, ...]` where `_AggregateArgValue = Union[ColumnKey, ColumnSqlKey, Decimal, str, bool, None]`. This was authorized by the user during Stage A after a bug was uncovered: `_bind_agg_arg` (binding.py:818-819) returns `ColumnKey` for identifier positional args, but the old strict-scalar schema rejected them — so `amount:last(created_at)` failed at AggregateKey validation. End-to-end repro on a fresh fixture confirmed the breakage. The legacy `_synthesize_enriched_measure_from_planned`'s `for a in key.args: if isinstance(a, ColumnKey)` branch was actually correct and forward-looking but had been unreachable. The schema fix aligns AggregateKey with the binder's actual output and matches the existing `_AggregateKwargValue` union shape used for kwargs. Tests - tests/test_agg_render_spec.py (new, 666 lines, 48 tests) — the AggRenderSpec field surface + frozen contract + every _build_agg_render_spec_from_planned call-site shape: StarKey count and rejection branches, ColumnKey bare, ColumnKey with column_filter_key, ColumnSqlKey derived, first/last with explicit time arg, custom aggregation_def threading, percentile parametric, stat agg with other=, unknown aggregation, kwarg ColumnKey path mismatch, column-not-found, slot=None HAVING branch, slot.type vs column_type independence. - tests/test_agg_render_spec_shim.py (new, 252 lines, 33 tests) — per-field shim mapping + 9 parametrized SQL byte-identical parity tests (postgres dialect) against legacy outputs captured pre-refactor (basic sum, count*, filtered sum, filtered count*, percentile, stat corr with other=, non-aggregation, sql=None bare-column sum, median). Verification - `poetry run pytest -m "not integration" -q` — 3688 passed, 2 skipped, 251 deselected. No regressions. - `poetry run ruff check slayer/ tests/` — clean. Codex pre-impl review (Stage A test plan): 6 findings, all folded in. Codex post-impl review (Stage A diff): 3 findings. - LOW (test_sql_generator.py:2235 direct _build_percentile calls): fixed in this commit (14 sites wrapped via the shim). - 2 MEDIUM findings on cross-model reroot path (cross_model_planner.py _local_agg_formula, generator.py _maybe_reroot_cross_model_plan) — tracked as follow-up alongside a third related gap (_build_first_last_base_select doesn't honor spec.time_column from args). All three are end-to-end gaps for `first/last(time_col)` that the legacy _query_as_model path currently masks; must be addressed in Stage B (the query-backed expansion migration) before Stage D deletes legacy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRefactors SQL aggregation rendering to use a frozen AggRenderSpec model, expands AggregateKey argument identity to accept column identifiers, migrates builder and planned-query paths to spec-driven emission, and adds Stage B engine/bundle/planner changes plus tests and parity checks. ChangesAggregation rendering refactor
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
slayer/sql/generator.py (2)
7215-7230:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve built-in
Aggregationoverrides on the planned path.
agg_defis only populated for non-built-ins here, so planned queries lose model-defined overrides/default params for built-in aggregations likeweighted_avgandpercentile. That breaks_build_formula_agg/_resolve_agg_paramfallback behavior and can turn a previously valid model-level default into a missing-param error.Suggested fix
- agg_def = None - if key.agg not in _BUILTIN_BAREARG_AGGS_LOCAL_SLICE: - agg_def = next( - (a for a in (source_model.aggregations or []) if a.name == key.agg), - None, - ) - if agg_def is None: - raise AggregationNotAllowedError( - column=src_leaf, - agg=key.agg, - reason=( - f"unknown aggregation {key.agg!r} — not a built-in " - f"and not defined in {source_model.name!r}." - f"aggregations." - ), - ) + agg_def = next( + (a for a in (source_model.aggregations or []) if a.name == key.agg), + None, + ) + if agg_def is None and key.agg not in _BUILTIN_BAREARG_AGGS_LOCAL_SLICE: + raise AggregationNotAllowedError( + column=src_leaf, + agg=key.agg, + reason=( + f"unknown aggregation {key.agg!r} — not a built-in " + f"and not defined in {source_model.name!r}." + f"aggregations." + ), + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/sql/generator.py` around lines 7215 - 7230, The current logic skips looking up model-defined Aggregation overrides for built-in aggs, so planned queries lose model defaults; change the code in the block around agg_def and key.agg so you always attempt to find a matching aggregation in source_model.aggregations (assign agg_def = next(...)) regardless of whether key.agg is in _BUILTIN_BAREARG_AGGS_LOCAL_SLICE, and only raise AggregationNotAllowedError if key.agg is not a builtin and no agg_def was found; this preserves model-defined overrides used by _build_formula_agg and _resolve_agg_param while keeping the existing disallow rule for unknown non-builtins.
7200-7250:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftComplete the new
ColumnSqlKeysupport in this adapter.After widening aggregate args/kwargs to accept derived-column identifiers, this branch still special-cases only
ColumnKey. That leaves two correctness gaps:first/lastwith a derived time arg never setstime_columnand silently falls back to the default ranking column, and cross-model kwarg path validation can be bypassed with aColumnSqlKey, letting the kwarg render in the wrong scope.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/sql/generator.py` around lines 7200 - 7250, The branch that handles time-column extraction and kwarg path validation only treats ColumnKey but must also treat derived-column identifiers (ColumnSqlKey); update the first/last arg loop (the block that sets explicit_time_col) to treat instances of ColumnSqlKey the same as ColumnKey (inspect their .path and leaf-style identifier to build explicit_time_col or fall back to source_relation) and break once set, and update the kwarg validation loop (the for kname, kval in key.kwargs block) to include ColumnSqlKey in the isinstance check so that kval with type ColumnSqlKey and a mismatched .path triggers the same AggregationNotAllowedError; keep referencing key.agg, explicit_time_col, source_relation, source.path, and AggregationNotAllowedError to locate the spots to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@slayer/sql/generator.py`:
- Around line 7215-7230: The current logic skips looking up model-defined
Aggregation overrides for built-in aggs, so planned queries lose model defaults;
change the code in the block around agg_def and key.agg so you always attempt to
find a matching aggregation in source_model.aggregations (assign agg_def =
next(...)) regardless of whether key.agg is in
_BUILTIN_BAREARG_AGGS_LOCAL_SLICE, and only raise AggregationNotAllowedError if
key.agg is not a builtin and no agg_def was found; this preserves model-defined
overrides used by _build_formula_agg and _resolve_agg_param while keeping the
existing disallow rule for unknown non-builtins.
- Around line 7200-7250: The branch that handles time-column extraction and
kwarg path validation only treats ColumnKey but must also treat derived-column
identifiers (ColumnSqlKey); update the first/last arg loop (the block that sets
explicit_time_col) to treat instances of ColumnSqlKey the same as ColumnKey
(inspect their .path and leaf-style identifier to build explicit_time_col or
fall back to source_relation) and break once set, and update the kwarg
validation loop (the for kname, kval in key.kwargs block) to include
ColumnSqlKey in the isinstance check so that kval with type ColumnSqlKey and a
mismatched .path triggers the same AggregationNotAllowedError; keep referencing
key.agg, explicit_time_col, source_relation, source.path, and
AggregationNotAllowedError to locate the spots to change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ec79ae24-98a2-4b2c-b77d-92090b54c3ee
📒 Files selected for processing (5)
slayer/core/keys.pyslayer/sql/generator.pytests/test_agg_render_spec.pytests/test_agg_render_spec_shim.pytests/test_sql_generator.py
…n-of-syntax' into egor/dev-1452-delete-legacy-enrichment-pipeline-migrate-pre-existing-tests
Two findings collapse to one method (slayer/sql/generator.py:7143 _build_agg_render_spec_from_planned): - Codex MAJOR on PR #144: the explicit-time-arg loop only matched ColumnKey, silently dropping ColumnSqlKey args. With the Stage A schema loosen, the binder returns ColumnSqlKey for derived columns; amount:last(derived_time) would fall back to the query's default ranking column instead of ordering by the explicit derived expression — incorrect first/last results with no error. - Sonar python:S3776 on the same method: cognitive complexity 43 > 15. Extracts a new _resolve_explicit_time_col helper that handles both ColumnKey and ColumnSqlKey time args. Bare-identifier derived columns expand to their underlying SQL qualified under source_relation; non-trivial derived expressions are emitted as-is and their inner bare refs resolve against the ranked-subquery's FROM clause. Cross- model paths on ColumnSqlKey args raise NotImplementedError rather than silently emitting against the wrong relation alias — that case is tracked alongside the Stage B cross-model reroot bug (DEV-1452 latent bug (c) in the Linear comment). The extraction drops the caller's complexity below 15 as a side effect. End-to-end caveat: first/last with an explicit time arg renders correctly only when a time dimension is also present in the query (default_time_col_sql comes from _resolve_ranking_time_column_from_planned). With no TD, deferred bug (b) — _build_first_last_base_select raises before any spec is rendered — still bites. Stage B fixes that boundary as planned. Tests in tests/test_agg_render_spec.py: - test_last_with_derived_bare_time_column_local (bare-identifier derived column expands to underlying SQL) - test_first_with_derived_expression_time_column_local (non-trivial derived expression round-trips through sqlglot, qualified inside the ranked subquery's FROM) - test_cross_model_derived_time_column_raises (NotImplementedError) - test_unknown_derived_time_column_raises (ValueError mirroring the source-column lookup-miss path) Unit suite: 3758 passed, 7 skipped, 0 failed (flight tests deselected — pyarrow not installed in this env). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…anned Sonar python:S3776 on slayer/sql/generator.py:3822 — cognitive complexity 33 > 15 on _build_ranked_subquery_from_planned. Extracts the two sub-passes into named helpers, leaving the outer method as setup → unfiltered pass → filtered pass → assembly: - _build_unfiltered_rn_columns: one ROW_NUMBER projection per distinct effective time column for unfiltered first/last specs, with stable per-time-column suffixes (first sorted gets "", then "_2", ...). - _build_filtered_rn_columns: one dedicated (ROW_NUMBER + match-flag) pair per distinct (filter, time, agg) triple for filtered first/last, cached so specs sharing a triple reuse the same pair. Pure structural refactor — no behaviour change. The dedup semantics, stable suffix assignment, ORDER BY direction, and CASE WHEN match-flag emission are byte-identical to the pre-split implementation. Unit suite: 3758 passed, 7 skipped, 0 failed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sonar python:S8396 on slayer/sql/generator.py:60 — `sql: Optional[str]` without a default flags as a Pydantic anti-pattern. The intent here is "required field that can hold None" (`*:count` needs to pass `sql=None` explicitly). Per the rule's documented exception, `Type | None` is treated as explicit nullability and does not trigger the warning. Switch `sql: Optional[str]` to `sql: str | None` — preserves the required-but-may-be-None semantics, dodges the false-positive, and adds a docstring note pinning the intent so future readers don't "fix" it back to `Optional[str]`. Also wires the new DEV-1476 ticket reference into the group 1 helper docstring + error message + test comment (DEV-1476 is the consolidated 4-bug Stage B package opened today for the explicit-time-arg gaps on first/last; the cross-model derived-time NotImplementedError in `_resolve_explicit_time_col` is bug (c) of that package). Unit suite: 3758 passed, 7 skipped, 0 failed. Ruff: clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two outside-diff CodeRabbit findings on the spec-build adapter, both verified against current code and fixed minimally. Finding A — slayer/sql/generator.py:~7350 (agg_def lookup for built-ins) The lookup for source_model.aggregations was gated on 'key.agg not in _BUILTIN_BAREARG_AGGS_LOCAL_SLICE'. That set includes weighted_avg / percentile / corr / etc., so a user model that overrode the default params of a built-in (e.g. supplied a default weight=quantity for weighted_avg) had the override silently dropped on the planned path — _resolve_agg_param then raised 'Aggregation \'weighted_avg\' requires parameter \'weight\''. Real regression vs legacy; the pre-existing comment above the set even admits this as a deferred follow-up. Fix: always look up the model-level Aggregation; only raise when the lookup misses AND key.agg is not a built-in. Built-ins keep their default renderer when no override exists. Updated the comment above _BUILTIN_BAREARG_AGGS_LOCAL_SLICE to retire the obsolete TODO. Finding B — slayer/sql/generator.py:~7375 (kwarg path validation gap) The for-kname-kval-in-kwargs loop only matched ColumnKey, so a ColumnSqlKey kwarg with a non-empty path slipped past the host-vs-target-path check and would have rendered against the wrong relation. Bug surfaced by the Stage A AggregateKey.args / kwarg-value schema loosen (which now permits ColumnSqlKey as a kwarg value). Fix: extend the isinstance check to (ColumnKey, ColumnSqlKey); use type(kval).__name__ in the error reason so the message reflects the actual rejected kind. Both Key types carry .path, so the equality check works unchanged. CodeRabbit's third sub-finding on the same comment (derived time arg for first/last silently dropping ColumnSqlKey) was already addressed in group 1 via _resolve_explicit_time_col + tests; verified no residual gap. Tests added in tests/test_agg_render_spec.py: - test_builtin_aggregation_model_override_threaded: model-level weighted_avg override surfaces in spec.aggregation_def with default weight=quantity. - test_builtin_aggregation_no_override_leaves_def_none: a vanilla built-in (sum) with no override keeps aggregation_def=None so the built-in renderer dispatches. - test_kwarg_columnsqlkey_path_mismatch_raises: a ColumnSqlKey kwarg with a non-empty path against a local source raises the same AggregationNotAllowedError as the ColumnKey case. - _orders_model() fixture grew a weighted_avg model-level override so the built-in-override tests don't need a parallel fixture. Unit suite: 3933 passed, 7 skipped, 0 failed. Ruff: clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sonar python:S3776 re-flagged both functions after groups 1-4 splits:
- _build_ranked_subquery_from_planned cognitive complexity 21 > 15
(was 33 pre-Group-2 split)
- _build_agg_render_spec_from_planned cognitive complexity 26 > 15
(was 43 pre-Group-1 extract)
Hybrid approach (option c on the plan):
1) Extract two more cohesive helpers from _build_agg_render_spec_from_planned:
- _resolve_aggregation_def: model-level Aggregation lookup + raise on
unknown non-built-in. Encapsulates the lookup-runs-for-built-ins-too
semantics (CodeRabbit fold-in on PR #144 group 4) so a future reader
can see the contract in one place.
- _validate_aggregate_kwarg_paths: cross-model kwarg path validation
for both ColumnKey and ColumnSqlKey. Tightens the dispatch in the
main method down to a single line per concern.
2) NOSONAR(S3776) both functions with documented rationale:
- The spec-build adapter is fundamentally a sequential isinstance
dispatch over StarKey / ColumnKey / ColumnSqlKey; each branch has
its own contract. Further splitting would scatter the per-source-
kind contract across helpers and obscure flow.
- The ranked subquery builder, after Group 2 factored
_build_unfiltered_rn_columns / _build_filtered_rn_columns, is left
with exp.Select / from / joins / where assembly that has to live in
one place.
Pure refactor — no behaviour change. Existing tests pin the integrated
behaviour through the public method.
Unit suite: 3933 passed, 7 skipped, 0 failed.
Ruff: clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three legacy-stack call sites (_expand_query_backed_model, _validate_and_populate_cache, get_column_types) now route through build_resolved_source_bundle → plan_stages → generate_planned_stages instead of the legacy _query_as_model / _enrich / SQLGenerator.generate chain. Folds in DEV-1476 bugs (b), (c), (d-cross) so query-backed models using first / last with an explicit positional time arg keep working end-to-end after Stage D deletes the legacy fallback. Helpers extracted (decisions B + F): * slayer/sql/stage_wrapper.py::build_flat_rename_wrapper — shared with generate_planned_stages' CTE chaining + the virtual-model wrap. * slayer/engine/stage_ordering.py::topologically_order_stages — Kahn topo-sort, called by both the runtime-list execute path and the stored source_queries save path. _extract_sibling_refs now recursively walks inline SlayerModel.source_queries (decision E). * slayer/engine/source_bundle.py::expand_query_backed_models_in_bundle — source + referenced + stage_source expansion + inline_extensions re-apply in one place; recursion guard via _resolving handles a query-backed join target that references its parent. Typed primitives extended (decisions #2, #8): * StageColumn.format retyped Optional[NumberFormat]. * ValueSlot.format / .description added. * DeclaredMeasure.type / .format / .description carry through ProjectionPlanner intern → ValueSlot → StageColumn → virtual Column. * Slot-type inference: *:count → INT, count/avg/median → built-ins, sum/min/max inherit source column type. Replaces legacy blanket DOUBLE for virtual-model columns (decision #2). DEV-1476 fixes (B.6): * (b) _build_first_last_base_select now only raises when an explicit time arg is also missing from every first/last spec. * (c) _render_cross_model_cte reroots positional ColumnKey / ColumnSqlKey args symmetric to kwargs, and wraps the FROM in a ROW_NUMBER-ranked subquery for first/last aggregates. * (d-cross) _build_agg_render_spec_from_planned threads ``bundle`` so derived ColumnSqlKey aggregate sources get expand_derived_refs_ sync (their inner bare refs qualify under source_relation). * tests/test_agg_render_spec.py::TestBuilderFirstLast flipped from _raises to _resolves_after_reroot. Migrated-path acceptance (decision D): * SourceModelOrigin / agg_column_names dropped from the virtual model. Outer dotted-ref re-aggregation against canonical aliases uses the flat ``__``-joined name (P4 closure); a dotted ref that matched the legacy lineage walk now surfaces a clear binder error. * The two ContextVars (_join_target_resolving_var, _forbidden_sibling_refs_var) are NOT touched by the migrated paths. They remain alive for _query_as_model until Stage D. Test additions / migrations: * tests/test_query_backed_typed_expansion.py — 32 new tests covering topo-sort acceptance / cycle / root-as-sink / self-ref / duplicate / inline-nested decision E, virtual-model column types / format / description / default_time_dimension / public-only slots, P4 closure pin, ContextVar non-touch, save path with placeholder fill + data_source refresh + save=False, nested query-backed expansion (qb-A joins qb-B / overlay-over-qb-base / runtime stage source), cross-DS rejection, wrapper alias shapes (local agg / cross-model agg / cross-model parametric agg / joined dim). * tests/test_dev1476_first_last_explicit_time.py — 6 tests (3 helper-level _local_agg_formula args reroot + 3 end-to-end against seeded SQLite for bugs b / c / d-cross). * tests/test_topologically_order_stages.py — 12 tests pinning the extracted helper + decision E recursion (typed / dict / nested ModelExtension / inline-nested cycle). * tests/test_stage_wrapper_helper.py — 4 tests pinning the extracted build_flat_rename_wrapper. * tests/test_query_backed_models.py — loosened the inner-stage-rename regex to accept the quoted alias form (DEV-1452 Stage B emits ``AS "rev"``; pre-Stage-B legacy emitted ``AS rev``). * tests/test_projection_trim.py — migrated test_get_column_types_uses_outer_render_mode → _does_not_call_ legacy_generate (the typed pipeline has no render_mode contract). Verification: * poetry run pytest -m "not integration" -q — 3985 passed, 0 failed. * poetry run ruff check slayer/ tests/ — clean. * SQLite + DuckDB integration smoke — 131 passed. * Query-backed model tests — 46 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@slayer/engine/planning.py`:
- Around line 119-120: When promoting a hidden slot to declared, backfill its
missing metadata (type/label/format/description) so declaration-order doesn't
matter: update the _merge_into_existing(...) path to, when visibility changes
from hidden to declared (or when existing metadata fields are None/empty), copy
over non-null fields from the incoming slot (type, label, format, description)
into the existing slot; ensure the same logic is applied in the "existing-key"
branch that currently drops these fields (the code around the existing-key
promotion at lines ~511–513). Use the slot object and its visibility
attribute/name (e.g., Slot.visibility, Slot.type, Slot.label, format,
description and the _merge_into_existing function) as anchors to find and modify
the logic so promoted slots receive the full metadata from the declaring input.
In `@slayer/engine/query_engine.py`:
- Around line 1188-1193: The persisted model can contain non-topological
source_queries which later breaks _execute_by_name and
_scope_named_queries_to_prior due to insertion-order sibling visibility; to fix,
ensure source_queries are normalized to topological order either at save-time
(call topologically_order_stages on model.source_queries and persist that order)
or immediately before execution (rebuild/stabilize named_q by running
topologically_order_stages and using its output to set named_q and final_stage
prior to _execute_by_name); update the code paths that construct named_q (the
block using stages = topologically_order_stages(...), final_stage, named_q) so
the execution path always uses the topo-sorted list rather than the original
model.source_queries.
- Around line 1218-1225: The call to expand_query_backed_models_in_bundle is
passing final_stage.variables and thus loses the merged defaults/overrides
stored in bundle.query_variables; change the call to pass the merged root
variables (bundle.query_variables) instead of final_stage.variables so nested
query-backed join targets and stage sources receive the fully merged variables;
ensure you keep the same runtime_kwarg, dry_run_placeholders,
expander=self._expand_query_backed_model and the _resolving union ({model.name})
unchanged.
In `@slayer/engine/stage_ordering.py`:
- Around line 10-12: Update the module docstring in
slayer/engine/stage_ordering.py to correct the contract: replace the clause that
says "forward references all raise ``ValueError``" with wording that forward
references are reordered/handled (not rejected) by the helper (e.g., mention
that the helper will reorder forward references rather than raising), and keep
the rest of the documented failure cases (cycles, self-references, duplicate
names, root referenced by another stage) as raising ValueError; reference the
reordering behavior provided by the helper function (reorder logic used by
functions like reorder_stages or the stage ordering helper) to make intent
explicit.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e18c62e4-a6b8-476b-85b8-488a89023e74
📒 Files selected for processing (16)
slayer/core/scope.pyslayer/engine/planned.pyslayer/engine/planning.pyslayer/engine/query_engine.pyslayer/engine/source_bundle.pyslayer/engine/stage_ordering.pyslayer/engine/stage_planner.pyslayer/sql/generator.pyslayer/sql/stage_wrapper.pytests/test_agg_render_spec.pytests/test_dev1476_first_last_explicit_time.pytests/test_projection_trim.pytests/test_query_backed_models.pytests/test_query_backed_typed_expansion.pytests/test_stage_wrapper_helper.pytests/test_topologically_order_stages.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/test_agg_render_spec.py
- slayer/sql/generator.py
…adata Two MAJOR Codex findings around typed-metadata propagation through the planner. Both surface as wrong virtual-column ``type`` on query-backed models after the Stage B migration. * _declared_measures_from_query (stage_planner.py:~1005) ignored a user-supplied ``ModelMeasure.type`` on a query measure — only ``_type_for_measure_formula`` ran. Now ``m.type or _type_for_ measure_formula(...)``: explicit override wins, inference is the fallback. Mirrors legacy ``EnrichedMeasure.type``'s "honor explicit type before inference" contract. * ValueRegistry._merge_into_existing (planning.py:~221) dropped ``type`` / ``format`` / ``label`` / ``description`` on hidden→public promotion. Repro: declare ``rank(*:count)`` first (hoists ``*:count`` hidden) then declare ``*:count`` public — the promoted slot kept ``type=None`` and the migrated virtual column regressed to DOUBLE fallback. Now the merge fills any None-on-existing metadata field from the public re-intern; first-intern wins for already-set fields. Tests: * test_query_measure_type_override_wins_over_inference — explicit ``type=DOUBLE`` on ``*:count`` surfaces as DOUBLE despite the INT inference default. * test_promoted_hidden_slot_keeps_metadata — ``rank(*:count)`` + ``*:count`` ordering yields INT on the public ``_count`` column. Full non-integration suite green (3987 passed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* S1172 stage_planner.py:840 — drop the unused ``formula`` param from ``_format_description_for_measure_formula``. The function takes ``scope`` + ``bound`` already; the formula string was a leftover from an earlier draft. * S100 test_query_backed_typed_expansion.py:698 — rename ``test_qb_A_joins_qb_B_save_path`` → ``test_qb_a_joins_qb_b_save_path`` to match ``^[a-z_][a-z0-9_]*$``. Full non-integration suite green (3987 passed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Six CRITICAL S3776 cognitive-complexity hits on Stage B surface plus one pre-existing Stage A leftover. Each NOSONAR'd with a documented rationale on the def line. * generator.py:3832 _resolve_explicit_time_col (complexity 21, pre-existing from Stage A b67eecb) — sequential isinstance dispatch over ColumnKey + ColumnSqlKey; splitting fragments the per-shape contract. * generator.py:4068 _build_first_last_base_select (complexity 42) — dim/td/derived-dim classification pass + agg-spec synth + ranked-subquery wrap + outer SELECT/GROUP BY assembly; shared mutable state (partition_exprs / extra_projections / outer_ref_ by_sid / synth_by_sid) keeps it as one conceptual unit. * generator.py:5113 _render_cross_model_cte (complexity 56) — shared-grain projection + agg reroot (source / args / kwargs) + first/last ranked-subquery wrap + target-model-filter qualify + WHERE/HAVING routing; each block is interdependent CTE state. * query_engine.py:788 get_column_types (complexity 22) — linear probe pipeline: query-backed prelude → bundle → expand-nested → plan → render → execute → result-key map-back. * source_bundle.py:411 expand_query_backed_models_in_bundle (complexity 25) — three sequential expansion blocks (source + referenced + stage_source) mutating bundle in series; recursion guard is the only shared state. * stage_ordering.py:47 _walk_spec (complexity 49) — recursive walker over five ``source_model`` shapes plus nested ``source_queries`` recursion. The recursion + isinstance dispatch IS the function. Full non-integration suite green (3987 passed); ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… cross-model first/last fallback
Two MAJOR Codex findings on the round-1 fixes:
* query_engine.py:_expand_query_backed_model — passed
``outer_vars=final_stage.variables`` to
``expand_query_backed_models_in_bundle``, dropping the outer model's
``query_variables`` layer for nested expansions / sibling stages.
Substitution then diverged between execute (pre-merged path) and
save-time dry-run (placeholder ``0``). Now passes the bundle's
merged ``bundle.query_variables`` (which already includes
``{model.query_variables, outer_vars, stage, runtime}`` per the
layered precedence) so nested layers see the full variable context.
* generator.py:_render_cross_model_cte — cross-model ``first`` /
``last`` without an explicit positional time arg emitted the
``MAX(CASE WHEN _last_rn = 1 ...)`` expression but only built the
ROW_NUMBER-ranked subquery when ``synth.time_column`` was truthy.
Result: agg_expr referenced ``_first_rn`` / ``_last_rn`` against a
bare ``FROM target`` that didn't supply it. Now falls back to
``target_model.default_time_dimension`` (qualified under the
target relation); raises a clear ``first/last requires a ranking
time column`` error when even that is unset (mirrors the local
first/last path's behaviour).
Tests:
* test_cross_model_first_last_uses_target_default_time_dimension —
``customers.amount:last`` (no explicit time arg) against a
customers model with ``default_time_dimension="signup_at"``
executes end-to-end against seeded SQLite.
* test_cross_model_first_last_with_no_time_at_all_raises — same
query against a customers model WITHOUT default_time_dimension
raises the typed error.
Full non-integration suite green (3989 passed).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…act metadata-fill helper The round-1 metadata-carry fix added 4 parallel if-blocks to ``ValueRegistry._merge_into_existing``, bumping cognitive complexity to 18 (over the 15 cap). Extract a module-level ``_fill_missing_metadata`` that walks the four fields in a single loop and mutates the ``updates`` dict in place. The ``slot.field is None and new_value is not None`` rule is identical across the four fields, so the extraction simplifies rather than obscures. Full non-integration suite green (3989 passed); ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/test_dev1476_first_last_explicit_time.py (1)
242-245: ⚡ Quick winMove function-local imports to module scope.
These imports duplicate existing top-level imports and break the repo’s import-placement rule.
Proposed cleanup
- import os - import sqlite3 - import tempfile @@ - import os - import sqlite3 - import tempfileAs per coding guidelines "Place imports at the top of files".
Also applies to: 325-327
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_dev1476_first_last_explicit_time.py` around lines 242 - 245, The test contains function-local imports for os, sqlite3, and tempfile that duplicate top-level imports and violate the import-placement rule; move these imports (the occurrences around the two blocks that currently import "os", "sqlite3", "tempfile") to the module scope at the top of tests/test_dev1476_first_last_explicit_time.py, remove the duplicated local import statements inside the functions, and ensure any functions reference the now-top-level names (no other code changes needed).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_dev1476_first_last_explicit_time.py`:
- Around line 312-318: The test currently only asserts resp.data exists after
running engine.execute(SlayerQuery(...)) with the measure formula
"customers.amount:last", which allows regressions where the :last measure stops
using customers.default_time_dimension; update the test to explicitly assert
that the generated SQL or response metadata includes the expected default time
dimension (customers.default_time_dimension) and that the time filter or ORDER
BY in resp.sql references that dimension; locate the SlayerQuery call and add
assertions against resp.sql (and/or resp.data structure) to confirm the SQL
contains the default time dimension name and/or an appropriate time-based ORDER
BY or WHERE using that dimension so the fallback behavior is verified.
---
Nitpick comments:
In `@tests/test_dev1476_first_last_explicit_time.py`:
- Around line 242-245: The test contains function-local imports for os, sqlite3,
and tempfile that duplicate top-level imports and violate the import-placement
rule; move these imports (the occurrences around the two blocks that currently
import "os", "sqlite3", "tempfile") to the module scope at the top of
tests/test_dev1476_first_last_explicit_time.py, remove the duplicated local
import statements inside the functions, and ensure any functions reference the
now-top-level names (no other code changes needed).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 52582d13-3dda-47d9-a01b-b44797220286
📒 Files selected for processing (4)
slayer/engine/planning.pyslayer/engine/query_engine.pyslayer/sql/generator.pytests/test_dev1476_first_last_explicit_time.py
🚧 Files skipped from review as they are similar to previous changes (3)
- slayer/engine/planning.py
- slayer/engine/query_engine.py
- slayer/sql/generator.py
… call sites Codex MAJOR finding (generator.py:4229) flagged the local first/last ranked-subquery call site for skipping ``_expand_derived_column_sql`` on derived ``ColumnSqlKey`` aggregate sources because ``bundle`` was not passed to ``_build_agg_render_spec_from_planned``. Audit revealed the same gap at 4 sibling call sites: * generator.py:4229 _build_first_last_base_select (ranked-subquery path; Codex flagged). * generator.py:4340 _render_aggregate_composite_expr (single-agg- inside-arithmetic). * generator.py:5579 _render_filter_value_key_in_target_scope HAVING agg. * generator.py:6743 transform input agg. * generator.py:7851 _render_value_key_for_filter (filter ref). Two enclosing functions did not have ``bundle`` in scope and were extended to accept it: * _render_aggregate_composite_expr (4306) — gained ``bundle=None`` kwarg; its 5 recursive / external callers (3691, 4287, 4352, 4365, 8166) now thread ``bundle`` through. * _apply_order_limit_from_planned (8135) — gained ``bundle=None``; the one caller at 2901 threads it through. Test: * test_local_first_last_over_derived_column_expands_inner_refs — ``net_amount:last(created_at)`` where ``net_amount.sql="amount * 0.9"`` executes end-to-end against seeded SQLite; without the expansion the bare ``amount`` inside ``MAX(CASE WHEN _last_rn=1 ...)`` would render unqualified and SQLite would error. Full non-integration suite green (3990 passed); ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codex MAJOR (round 4) — sibling-stage variable substitution in the
migrated ``_expand_query_backed_model`` used ``root_vars =
final_stage.variables`` as a base layer, but
``apply_variables_to_query`` only substitutes into filters and never
updates ``.variables`` — so ``root_vars`` was just the stage-level
dict, dropping ``model.query_variables`` and the outer-query layer.
Repro: a query-backed model with ``query_variables={"threshold":
"100"}`` and a NAMED non-final sibling whose filter references
``{threshold}``. Pre-fix, save-time dry-run silently substituted
the placeholder ``"0"``. Post-fix the sibling sees the outer
default and substitutes ``100``.
Fix: swap ``root_vars`` for ``bundle.query_variables`` as the
base layer. ``bundle.query_variables`` is the merged
``{runtime > final_stage.variables > model.query_variables >
outer_vars > source_model_defaults}`` set so sibling substitution
sees the full outer context.
Test:
* test_sibling_stage_inherits_outer_model_query_variables — pins
the rendered SQL contains ``> 100`` (the outer model default),
not ``> 0`` (the placeholder).
Also dismissed Codex round-4 finding "_get_column_types rejects
__ columns" as a false positive — direct repro returns the
flattened ``customers__region`` key correctly; the binder's
literal-column carve-out for ``Column.name`` containing ``__``
works on the migrated path.
Full non-integration suite green (3991 passed); ruff clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…re.type Two Codex findings (1 MAJOR + 1 MINOR): * MAJOR query_engine.py:_execute_by_name — built ``named_queries`` from persisted ``source_queries`` order before ``_execute_pipeline``, but ``plan_stages._topo_sort`` only handles ``source_model`` deps. A model that saved successfully with out-of-order ``joins[].target_ model`` sibling stages (the save path goes through ``_expand_query_ backed_model`` which DOES topo-sort) would fail when run directly by name. Fix: call ``topologically_order_stages`` at the top of ``_execute_by_name`` so run-by-name matches save-time semantics. * MINOR stage_planner.py:_declared_measures_from_query — a query formula that is a bare reference to a saved ``ModelMeasure(formula =..., type=DataType.X)`` on the source model lost the explicit ``type=X``. ``expand_model_measures`` rewrites the AST but drops the measure's type metadata; the planner then fell back to inference, which only happened to match the explicit type by coincidence on the pre-existing test. Fix: extend the type- priority chain to ``m.type or _saved_model_measure_type(...) or _type_for_measure_formula(...)``. New helper ``_saved_model_measure_type`` re-looks-up the saved measure when the query formula is a bare identifier matching a ``ModelMeasure.name`` on the source model. Tests: * test_run_by_name_accepts_nonordered_join_target_stages — saves a model with out-of-order ``joins.target_model`` sibling, then runs it by name; pre-fix this would fail at ``plan_stages``. * test_saved_model_measure_type_wins_over_inference — ``ModelMeasure(formula="amount:sum", type=DataType.INT)`` is referenced from a query; pre-fix the resulting column had ``type=DOUBLE`` (inference), now ``type=INT`` (explicit). Full non-integration suite green (3993 passed); ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…/github.com/MotleyAI/slayer into egor/dev-1452-delete-legacy-enrichment-pipeline-migrate-pre-existing-tests
- Codex major (generator.py): cross-model first/last computed ROW_NUMBER over the full target table, then applied target_model_filters + routed WHERE filters on the outer CTE. A filtered-out row could still win _last_rn = 1, NULL-ing out the MAX(CASE WHEN _last_rn = 1 ...) aggregate for that group. Push WHERE into the ranked subquery via the helper's existing where_clause param so RN is computed over filtered rows. Regression test pins last(amount) = 100.0 when the newer row is soft-deleted (instead of NULL). - CodeRabbit thread (test_dev1476): strengthen the default_time_dimension fallback test from "data exists" to an exact expected-value pin (NA last = 50.0 by signup_at). - CodeRabbit nitpick (test_dev1476): drop function-local import os/sqlite3/tempfile in two test bodies; they duplicate the module-top imports and break the imports-at-top rule. - CodeRabbit thread (stage_ordering.py): docstring claimed forward references raise ValueError, but the topo-sort reorders them — that is the entire fault-tolerance contract. Rewrite to list the actual raise cases (cycles, self-refs, dupes, root referenced by other) and call out forward-reference reordering explicitly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…s-model first/last _render_cross_model_cte built the first/last aggregate via _build_agg BEFORE constructing the ROW_NUMBER-ranked subquery, passing a bare rn_suffix_map and discarding the subquery's filtered_rn_map / filtered_match_map. For a cross-model first/last over a target column carrying a Column.filter, the ranked subquery emits _last_rn_f0 / _match_f0 (skipping the bare _last_rn for filtered specs), so the aggregate referenced a _last_rn column the subquery never projects — "no such column: _last_rn". Reorder to build the ranked subquery first (mirroring the local first/last path), then thread its rn_suffix_map + filtered_rn_map + filtered_match_map into _build_agg. Also drops the previously-wrong tuple-keyed rn_suffix_map in favor of the subquery's correctly time-col-keyed map. Regression test: cross-model customers.active_amount:last where active_amount has filter="status = 'active'" returns the newest ACTIVE row's value (100.0), skipping newer inactive rows. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…qualify; dedup dev1476 tests Three review fixes from the process-reviews triage on PR #144: - stage_planner._type_for_dimension now walks the join chain to the terminal column's type for dotted (joined) dimensions instead of returning None. A None was coerced to DataType.DOUBLE in _query_as_model, mistyping joined string/temporal dims on the persisted query-backed virtual model (Codex #1). - generator._resolve_explicit_time_col routes a derived (ColumnSqlKey) first/last time arg through _expand_derived_column_sql so inner bare refs qualify to the source relation. An unqualified ref in a complex expression (e.g. date(created_at)) could bind to the wrong table when the ranked subquery includes a join with a same-named column (Codex #2). - Finish the in-progress dedup of test_dev1476_first_last_explicit_time: define the _engine_from_sql / _orders_model helpers the file already called, route the fixture through them, and drop the per-test SQLite boilerplate. Clears the SonarCloud new-code duplication gate and the CodeRabbit function-local-import nitpick. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
8b2239c
into
egor/dev-1450-principled-redesign-of-syntax



DEV-1452 — Delete legacy enrichment pipeline + migrate pre-existing tests
Stacked on:
egor/dev-1450-principled-redesign-of-syntax(PR #139).Linear: DEV-1452.
Status
Staged into four commits. Stage A landed; B / C / D remaining.
AggRenderSpec— decouple SQL generator fromEnrichedMeasure(prerequisite #3)_expand_query_backed_model+_validate_and_populate_cache+get_column_typesonto the typed pipeline; topo-sort storedsource_queries; extendStageColumn/ValueSlotwithformat/descriptiontyped primitives (prerequisite #1)test_aggregation_gating.py,test_cross_model_derived_columns.py, three direct_query_as_modelunit calls intest_query_backed_models.py)enrichment.py/enriched.py/MixedArithmeticField/parse_formula/ legacySQLGenerator.generate/_query_as_model/ both ContextVars + the legacy resolution machinery. Doc + comment scrubs.Stage A summary (the first commit)
AggRenderSpec(11 fields the dialect helpers empirically consume — drops the ticket's illustrativeagg_args/source_measure_name/distinct).AggRenderSpecinstead ofEnrichedMeasure:_build_agg,_build_formula_agg,_build_percentile,_build_stat_agg,_build_ranked_subquery_from_planned,_resolve_value_sql,_resolve_agg_param._build_last_ranked_fromdeliberately NOT refactored — it dies wholesale in Stage D._synthesize_enriched_measure_from_planned(8 call sites) →_build_agg_render_spec_from_planned._agg_render_spec_from_enriched(em)shim — legacySQLGenerator.generate(enriched=...)path keeps emitting byte-identical SQL through Stages B–C; shim deleted in Stage D.AggregateKey.argsschema loosened fromTuple[Scalar, ...]toTuple[_AggregateArgValue, ...](Union withColumnKey | ColumnSqlKey | Scalar) — see "Latent bugs" below.ruff check slayer/ tests/clean. Both Codex reviews passed (test plan + impl diff); 9 findings total, all folded in or tracked.Latent bugs surfaced (Stage A) — blocks Stage D
While writing the Stage A tests I uncovered three latent DEV-1450 bugs the legacy
_query_as_modelpath currently masks. They are load-bearing for DEV-1452: once Stage D deletes legacy, these break end-to-end for query-backed models usingfirst/last(time_col).(a)
AggregateKey.argsschema rejectedColumnKey—amount:last(created_at)failed at_bind_agg(binding.py:759). The legacy synth'sfor a in key.args: if isinstance(a, ColumnKey)branch was forward-looking and correct but unreachable. Fixed in this PR via the schema loosen (with user authorization).(b)
_build_first_last_base_selectdoesn't honorspec.time_columnfromkey.args(generator.py:~3971-3984). Soamount:last(created_at)binds (post fix) but render still fails with "no ranking time column resolvable" when there's no TD.(c) Cross-model reroot path (
cross_model_planner._local_agg_formula,generator._maybe_reroot_cross_model_plan) strips path from kwargs but leaveskey.argsuntouched. With positionalColumnKeyargs now valid, cross-modelcustomers.amount:last(created_at)keeps the host join path when rendered in target scope, and_local_agg_formularenders positionalColumnKeyas garbage formula text.(b) + (c) are tracked together; Stage B is the natural place to address them since the query-backed migration is what makes them reachable end-to-end.
Plan being executed
The full approved plan, verbatim:
DEV-1452 — Delete legacy enrichment pipeline + migrate pre-existing tests
Context
DEV-1450 cut the engine over from the legacy
_enrich/EnrichedQuery/legacy
SQLGeneratorpipeline to the typed pipeline(
build_resolved_source_bundle → normalize → plan_stages → generate_planned_stages). To keep that PR's size tractable the legacy stackstayed in tree — and in production: every query-backed model's backing SQL still
renders via
_query_as_model → enrich_query → SQLGenerator.generate(
_expand_query_backed_modelexecute path +_validate_and_populate_cachesave path), the new generator adapts back to
EnrichedMeasurevia_synthesize_enriched_measure_from_planned, and a third legacy consumer(
engine.get_column_types) builds its probe SQL through_enrich+SQLGenerator.generate.DEV-1452 is the follow-up that decouples the new generator from
EnrichedMeasure, migrates the three load-bearing legacy consumers onto thetyped pipeline, migrates pre-existing tests off the legacy modules, and deletes
the legacy stack. Net result: large LOC deletion, two net-neutral prerequisite
refactors first. No public API / MCP / REST / CLI / schema-version changes.
Locked decisions (with user)
source_queriestopo-sort: migrate toplan_stages. Kahntopo-sort accepts non-canonical order as a fault-tolerance contract. The
last stage stays root/sink; cycles, self-refs, duplicate names, and "root
referenced by another stage" still raise. Docs (CLAUDE.md,
docs/concepts/models.md) recommend writing stages top-to-bottom as thecanonical convention and document topo-sort as the planner's
fault-tolerance backstop. The two ContextVars
(
_forbidden_sibling_refs_var,_join_target_resolving_var) and_scope_named_queries_to_priorbecome deletable.Codex finding Welcome to SLayer Discussions! #1:
plan_stages._topo_sortis NOT a drop-in — it onlyconsiders
source_modelsibling refs, ignoresjoins.target_model, doesnot enforce root-as-sink, and does not reject "root referenced by another
stage".
SlayerQueryEngine._topologically_order_queries(
query_engine.py:387) already has the full semantics. Reuse it: extractthe helper to a module-level function (e.g.
slayer/engine/stage_ordering.py::topologically_order_stages) keyed by(source_model, joins.target_model, ModelExtension.source_name)references,then call it from BOTH the migrated
_expand_query_backed_model/_validate_and_populate_cache(beforeplan_stages) AND theruntime-list
_execute_pipelinepath (replacing the currentself._topologically_order_queriescall). Stage A is unaffected;Stage B extracts and reuses this helper before invoking
plan_stages.(
StageColumn.type/ValueSlot.type); accept the more accurate persistedtypes (
*:count→INT, declaredModelMeasure.type, source column type forsum/min/max).
Hoisted hidden slots are NOT exposed; they're planner-synthesized
intermediates the user never declared, and exposing them lets downstream
queries synthesize new aggregates from canonical-looking bare names. The
pre-existing
tests/test_projection_trim.py:903test_query_as_model_wraps_with_full_projectionstill passes via itssubstring OR-fallback (the alias appears inside the rendered SQL text);
no test contract change required.
AggRenderSpecfrozen Pydantic with the 11 fields the dialect helpersempirically consume:
{sql, name, model_name, aggregation, aggregation_def, agg_kwargs, alias, filter_sql, time_column, type, column_type}. Drops the ticket'sagg_args/source_measure_name/distinct(count_distinct dispatches byagg name; first/last positional time arg pre-resolved into
time_column)._rewrite_funcstyle_aggregationsRETAINED as a pure core string utility— live callers in
slayer/core/query.py's ORDER BY validators(
_coerce_order_column,OrderItem._capture_raw_formula) feed the newpipeline's ORDER BY resolution; the slack layer doesn't cover
query.order.parse_formula+MixedArithmeticFieldare deleted. Acceptance grep doesnot name
_rewrite_funcstyle_aggregations. Out-of-scope follow-up filed:fold ORDER BY funcstyle into the slack layer.
test_filter_renamed_measure.py: delete the 2 skipped DEV-1445/1446legacy-path tests (covered by
test_dev1445_*/test_dev1446_*); migratethe rest off
enrich_query.TestFuncStyleRewrite(31 tests): audit each againsttests/test_slack_normalization.py; delete the class and backfill anyuncovered cases as
FUNC_STYLE_AGGwarning tests intest_slack_normalization.py.StageColumn.formatretyped fromOptional[str]toOptional[NumberFormat];ValueSlotgetsformat: Optional[NumberFormat]anddescription: Optional[str]. The planner populates them at slot creationfrom the source
ModelMeasure/Columnand_infer_aggregated_format.Staged commits
Per
feedback_tdd_style: each stage's failing behavior tests land first, thenimplementation, then the full non-integration suite green
(
poetry run pytest -m "not integration" -q) before the next stage. Perfeedback_intermediate_commits_no_ask: each stage commits at its end withoutconfirmation.
Stage A —
AggRenderSpec: decouple the generator fromEnrichedMeasureFiles:
slayer/sql/generator.py.AggRenderSpecwith the 11 fields (decision Add image #4)._build_agg,_build_formula_agg,_build_percentile,_build_stat_agg,_build_ranked_subquery_from_planned,_resolve_value_sql,_resolve_agg_param,_wrap_cast_for_typeto consumeAggRenderSpec._synthesize_enriched_measure_from_planned(8 call sites atgenerator.py:3614, 3974, 4084, 5039, 5259, 6410, 7471, def at 7033) with_build_agg_render_spec_from_planned. Carry over verbatim: StarKeynon-count/args rejection, first/last
time_columnderivation fromkey.args, customaggregation_deflookup + unknown-agg error, kwargColumnKeypath validation,column_filter_key→ qualifiedfilter_sql._agg_render_spec_from_enriched(em: EnrichedMeasure)so thestill-live legacy
SQLGenerator.generatekeeps emitting byte-identical SQLthrough Stages B–C. Deleted in Stage D.
EnrichedMeasureimport; the shim'sEnrichedMeasurereference stays local until the shim dies._build_last_ranked_from(legacy ranked builder; readsEnrichedQuery.last_agg_time_column) is NOT refactored — it dies inStage D with the legacy generate path.
Failing tests written first:
tests/test_agg_render_spec.py— one_build_agg_render_spec_from_plannedunit per call-site shape: StarKey (count*) rejection branch; ColumnKey bare;
ColumnKey filtered; ColumnSqlKey derived; first/last with explicit time arg
other=;kwarg
ColumnKeypath validation; cross-model rerooted.tests/test_agg_render_spec_shim.py— roundtripEnrichedMeasure → AggRenderSpec → SQLmatches legacyEnrichedMeasure → SQLbyte-for-byte across the existing parametric SQLfixtures in
tests/test_sql_generator.py.Stage B — migrate query-backed expansion +
get_column_typesto typed pipelineFiles:
slayer/engine/query_engine.py,slayer/core/scope.py,slayer/engine/planning.py,slayer/engine/stage_planner.py.StageColumn.format: Optional[NumberFormat](retype fromOptional[str]).ValueSlot.format: Optional[NumberFormat],ValueSlot.description: Optional[str]._infer_aggregated_format(currently inslayer/engine/response_meta.py) into a shared helper so the planneruses it without depending on response_meta.
source
ModelMeasure/Columnand_infer_aggregated_format._expand_query_backed_model(query_engine.py:1341) body:stages = topologically_order_stages(list(model.source_queries))(theextracted helper from decision Welcome to SLayer Discussions! #1; enforces root-as-sink + cycle/self/
duplicate-name guards including
joins.target_modelawareness).for the final stage AND every named non-final stage, run
_normalize_stage(query=stage, bundle=…, sibling_names=…)andapply_variables_to_query(query=stage, variables=…, dry_run_placeholders=<as-flagged>)before planning. Mirror_execute_pipeline:657-674so the migrated path collects warnings andhonors variable precedence (runtime > stage > outer > model defaults).
build_resolved_source_bundle(query=stages[-1], named_queries={named non-final}, data_source=…, runtime_variables=…).plan_stages(queries=stages, …)→ rootPlannedQuery.generate_planned_stages(...)→ rendered backing SQL.fully-qualified result aliases like
"source_relation.public_alias"(e.g.
"orders.customers.revenue_sum"), NOT barepublic_alias. So theflat-rename wrapper is
SELECT "<root.source_relation>.<sc.public_alias>" AS "<sc.name>", … FROM (<rendered>) AS _inner,driven by
stage_schema.columns(filter-by-non-hidden publicValueSlots whenstage_schema is Nonefor single-stage queries).The
root.source_relationcomes from the rootPlannedQuery(thebundle's
source_model.namepost-expansion). Alternative: re-parse therendered SQL via sqlglot to read
named_selectsand mirror_stage_rename_wrapper— preferred if the prefix construction breakson cross-model parametric aliases.
columns = [Column(name=sc.name, sql=sc.name, type=sc.type, label=sc.label, format=sc.format, description=sc.description) for sc in <public_slots>];data_source= resolved base;default_time_dimensioncarried from theinner source model.
_apply_extension_overlayre-application loop in_execute_pipeline(handles rootModelExtensionoverlay preservation)._validate_and_populate_cache(query_engine.py:1657)similarly; thread
dry_run_placeholders=Trueviaapply_variables_to_query.engine.get_column_types(query_engine.py:1012): keep theexisting
_build_type_probe_query; bundle + plan + generate; execute viaclient.get_column_types(sql); map each result key back to bare columnnames by walking the root
PlannedQuery's aggregate slots. Codex findingAdd image #4: probe sources can be either
ColumnKey(uses.leaf) orColumnSqlKey(uses.column_name) — for derived columns. Read the barename via
getattr(slot.key.source, "leaf", None) or getattr(slot.key.source, "column_name", None), then mapbare → slot.public_alias→ result key.Assert every probeable column maps back; an unmapped column raises rather
than silently drops. No
_enrich/SQLGenerator.generatecalls remain onthis path.
_execute_pipelinecall sites atquery_engine.py:581/621/640retain their existing call shape; only theircallee body changes.
Failing tests written first (
tests/test_query_backed_typed_expansion.py):test_stored_source_queries_topo_sort_accepts_nonordered_save_path— save amodel whose stored
source_queriesare in non-topological order; execute byname; expect well-formed SQL.
test_stored_source_queries_cycle_raises_clearly.test_stored_source_queries_root_referenced_raises.test_stored_source_queries_self_reference_raises.test_query_backed_columns_inherit_slot_types—*:count→INT, declaredModelMeasure.typehonored, source column type for sum.test_query_backed_columns_inherit_format— percent-formatted sourcemeasure → virtual column carries
NumberFormat(type=PERCENT).test_query_backed_columns_inherit_description— source-column descriptionpropagates.
test_query_backed_columns_carry_default_time_dimension.test_query_backed_columns_exclude_hidden_hoisted_slots— a model withrank(expensenet:sum)andname="rev"exposes only the user-declaredcolumns; the hoisted
expensenet_sumis NOT inmodel.columns.test_get_column_types_uses_typed_pipeline— patchSQLGenerator.generateto detect calls; assert zero calls and correcttype map.
test_expand_query_backed_model_does_not_touch_contextvars— monkeypatchthe two ContextVars to detect set/get; assert no touch.
test_save_path_dry_run_placeholders— model with{var}filters savescleanly with no caller-supplied variables (placeholder fill).
Stage C — test migrations
Files (the 7 in the ticket + 3 surfaced by validation):
tests/test_formula.py— rewrite the 8MixedArithmeticFieldisinstanceassertions (lines 188, 225, 840, 1045, 1067, 1079, 1091, 1101) to
ScalarCallKey/ArithmeticKeyshapes viaslayer/engine/syntax.py::parse_expr; audit + deleteTestFuncStyleRewrite(31 tests starting line 632) and backfill any uncovered case as
FUNC_STYLE_AGGtests intests/test_slack_normalization.py; replace_collect_needed_paths/extract_filter_transformsimports withwalk_parsed_refstraversals or relocate.tests/test_filter_renamed_measure.py— delete the 2 skipped tests(lines ~466, ~499; DEV-1445/1446 covered by
test_dev1445_*/test_dev1446_*); migrate the rest offenrich_queryontoengine.execute(...)+response.sqlassertions.tests/test_help.py— replaceextract_filter_transformsuse with awalk_parsed_refs-based equivalent (docs-snippet structural assertions).tests/test_named_measures.py— migrateenrich_querycalls toengine.execute+ PlannedQuery/SQL assertions.tests/test_reference_semantics.py— migrate offenrich_query.tests/test_projection_trim.py— migrate offenriched/enrich_query;rewrite the
_query_as_modelwrap test andtest_get_column_types_uses_outer_render_mode(the latter asserts a deletedlegacy contract — rewrite to assert the typed-pipeline equivalent: probe
path no longer touches
SQLGenerator.generate). Codex finding Rename PyPI package to semantic-slayer #7: thecanonical_expression_key(parse_formula(...))block at lines 48 + 1501depends on
slayer.engine.enrichmentandparse_formula, both deleted inStage D. Port each assertion to
slayer/engine/syntax.py::parse_exprplusstructural equality on the resulting typed key (
ValueKeyfamily —ColumnKey/AggregateKey/TransformKey/ArithmeticKey/ScalarCallKey). Drop any case that was strictly an enrichment-internalcontract.
tests/test_sql_generator.py— migrate everyawait enrich_query(...)+SQLGenerator(...).generate(enriched=...)call toengine.execute(...)+response.sqlassertions; migrateTestAutoMoveFields*(3642+) toMISPLACED_MEASUREslack tests (or delete if covered); keep the_build_type_probe_querytest (pipeline-agnostic). Codex finding Filtering on computed columns #6:also migrate the direct
_resolve_dimension_via_joinstest (line 3897) andthe direct
_resolve_cross_model_measuretest (line 3909) — both privatehelpers are deleted in Stage D. Rewrite each as
parse_expr+bind_expr(ModelScope(source_model=…))assertions on the resultingBoundExpr/ValueSlot, or as end-to-endengine.execute(...)checks.tests/test_aggregation_gating.py— migrate the 5+engine._enrichSQLGenerator.generate(enriched=...)call sites (lines 64-65 andrelated) onto
engine.execute(...)against the same query.tests/test_cross_model_derived_columns.py— migrate the 11+legacy call sites (lines 49, 984, 988, 1024, 1074, 1094, 1126, 1162, 1195,
1229, 1269); the
_join_aliases(enriched)helper rewrites to inspectPlannedQuery.join_planor the rendered SQL.tests/test_query_backed_models.py— migrate the 3 directengine._query_as_modelunit tests (lines 1446, 1465, 1484) onto_expand_query_backed_model(post Stage B) or end-to-endengine.executeassertions. Codex finding Move starts next to license etc #5: also migrate (or delete)the direct
_get_join_target_resolvingtest at line ~890 — Stage Dremoves that engine helper. Replace with an end-to-end
concurrency / recursion-guard assertion through
engine.executeof aquery-backed model whose join target is itself query-backed.
tests/test_filter_renamed_measure.py:502,513,tests/test_memories_resolver_typed.py,tests/test_schema_drift_typed.py,tests/test_response_meta.py,tests/test_generator2_multistage.py:5,tests/integration/test_integration.py:1318,1340,1447.Stage D — deletions + cleanup
Delete whole files:
slayer/engine/enrichment.py.slayer/engine/enriched.py.Delete symbols in
slayer/core/formula.py:MixedArithmeticField(line 139).parse_formula(line ~482) + helpers used exclusively by it (verify bygrep): likely
_expand_named_measures,_preprocess_agg_refs._rewrite_funcstyle_aggregationsandparse_filter.Delete symbols in
slayer/sql/generator.py:SQLGenerator.generate(enriched=...)method body (def at 488) andits exclusively-legacy helpers:
_has_cross_model_filter(239),_is_windowed_measure(262),_build_last_ranked_from(1981),_resolve_order_column(enriched=...)legacy branch (verify), the_agg_render_spec_from_enrichedshim (added Stage A).EnrichedMeasure/EnrichedQuery/public_projection_aliasesimport.
Delete symbols in
slayer/engine/query_engine.py(each gated on grep provingzero non-legacy callers + suite green; the new pipeline does not call any of
these):
_query_as_model(1862),_enrich(1685), thefrom slayer.engine.enrichment import enrich_queryimport (29),_build_rerooted_enriched(2337),_resolve_cross_model_measure(2182),_render_query_backed_join_target(1816),_scope_named_queries_to_prior(214),_get_join_target_resolving(203),_join_target_resolving_var(61),_forbidden_sibling_refs_var(72),_auto_move_fields_to_dimensions(2107),_resolve_model(1458),_resolve_model_inner(1492),_resolve_query_model(1371),_resolve_dimension_via_joins(2044),_resolve_dimension_with_terminal(2060),_walk_join_chainengine-method shim (2085).
slayer/engine/path_resolution.py::walk_join_chain(separatemodule with its own tests; survives the shim deletion).
Doc-comment scrub for the acceptance regex (no behavior change):
slayer/core/refs.py,slayer/core/models.py,slayer/core/query.py,slayer/engine/response_meta.py,slayer/mcp/server.py,slayer/sql/sql_expr.py,slayer/engine/normalization.py:17,280,slayer/engine/schema_drift.py:520,slayer/engine/syntax.py:276.Docs:
CLAUDE.md: update the stored-source_queries note to "write top-to-bottomis the canonical convention; planner topo-sorts as fault tolerance"; drop
_query_as_modeland ContextVar references; add anAggRenderSpeclineunder SQL emission; tighten the
__-carve-out wording (legacy persistedquery-backed columns remain reachable only as passive
Column.namematchesin
ModelScope).docs/architecture/index.md: resolve deviations Welcome to SLayer Discussions! #1 (legacy stack deleted)and Add image and link to starts #3 (new generator no longer adapts back to
EnrichedMeasure); note thenew out-of-scope follow-up "fold ORDER BY funcstyle into the slack layer
(P0 completion)".
docs/architecture/engine-orchestration.md: drop the "where legacy runs"deviation section; update the
_execute_pipelinemermaid to remove the(LEGACY path)qualifier on_expand_query_backed_model.docs/architecture/sql-generation.md: drop the synthetic-EnrichedMeasureadapter deviation section; add a brief
AggRenderSpecparagraph.docs/architecture/scopes-and-bundle.md: update theStageColumnfieldtable to reflect typed
formatand planner-populateddescription.mkdocs build --strictmust pass.Critical files
slayer/engine/query_engine.py— primary surgery site (expansion,save-path, type-probe, deletions).
slayer/sql/generator.py— AggRenderSpec, dialect-helper refactor, legacygenerate deletion.
slayer/engine/stage_planner.py— populateStageColumn.format/description.slayer/engine/planning.py— populateValueSlot.format/description;consume the extracted
_infer_aggregated_format.slayer/core/scope.py— retypeStageColumn.formattoOptional[NumberFormat].slayer/engine/response_meta.py— extract_infer_aggregated_formatto ashared helper.
slayer/engine/enrichment.py,slayer/engine/enriched.py— deleted.slayer/core/formula.py—MixedArithmeticField,parse_formuladeletion (keep
_rewrite_funcstyle_aggregations+parse_filter).Reusable existing utilities
slayer/engine/source_bundle.py::build_resolved_source_bundle— bundleentry; eager P11 resolution.
slayer/engine/source_bundle.py::synthetic_model_from_stage_schema/stage_bundle_with_siblings— sibling-stage synthetic models.slayer/engine/stage_planner.py::plan_stages/plan_query— typedplanning; topo-sort built in.
slayer/sql/generator.py::generate_planned_stages/generate_from_planned— typed rendering.slayer/engine/variables.py::merge_query_variables/apply_variables_to_query— variable precedence + dry-run placeholders.slayer/engine/syntax.py::parse_expr/walk_parsed_refs— replacement forparse_formula/_collect_needed_paths/extract_filter_transformscallers.
slayer/engine/path_resolution.py::walk_join_chain— canonical join-chainwalker; survives the engine-method shim deletion.
slayer/engine/normalization.py::FUNC_STYLE_AGG/MISPLACED_MEASURE—slack-rule equivalents for legacy
_rewrite_funcstyle_aggregations(inquery.measures/filters scope) +
_auto_move_fields_to_dimensions.Verification
End-to-end gates:
Out-of-scope follow-ups
Requires de-coercing
OrderItemat construction so_rewrite_funcstyle_aggregationscan finally retire.slayer/engine/path_resolution.walk_join_chainretains any new-pipelinecall site; if not, consider deletion in a separate cleanup PR.
Implementation notes
feedback_unit_tests_only: each stage runspoetry run pytest -m "not integration" -q; full integration smoke oncebefore final PR.
feedback_no_inline_imports: all imports at module top(
AggRenderSpec, extracted_infer_aggregated_format).CLAUDE.md: keyword args for >1-param functions; ruff check ateach stage; per-stage commit with named-file
git add(nogit add -A).~28.9k insertions vs main). Before starting Stage A, scan DEV-1450's open
review comments — any further fixes touching
_synthesize_enriched_measure_from_planned,_query_as_model,_expand_query_backed_model,cross_model_planner.plan, orgenerate_planned_stageswill need absorbing into the AggRenderSpec /expansion design before this stage lands.
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
Tests