DEV-1452: Delete legacy enrichment pipeline + migrate pre-existing tests by ZmeiGorynych · Pull Request #144 · MotleyAI/slayer

ZmeiGorynych · 2026-05-26T10:59:33Z

DEV-1452 — Delete legacy enrichment pipeline + migrate pre-existing tests

Stacked on: egor/dev-1450-principled-redesign-of-syntax (PR #139).
Linear: DEV-1452.

Status

Staged into four commits. Stage A landed; B / C / D remaining.

Stage	What	Status
A	`AggRenderSpec` — decouple SQL generator from `EnrichedMeasure` (prerequisite #3)	✅ landed (this PR's first commit)
B	Migrate `_expand_query_backed_model` + `_validate_and_populate_cache` + `get_column_types` onto the typed pipeline; topo-sort stored `source_queries`; extend `StageColumn`/`ValueSlot` with `format`/`description` typed primitives (prerequisite #1)	pending
C	Migrate the 7 pre-existing test files named in the ticket plus 3 surfaced during validation (`test_aggregation_gating.py`, `test_cross_model_derived_columns.py`, three direct `_query_as_model` unit calls in `test_query_backed_models.py`)	pending
D	Delete `enrichment.py` / `enriched.py` / `MixedArithmeticField` / `parse_formula` / legacy `SQLGenerator.generate` / `_query_as_model` / both ContextVars + the legacy resolution machinery. Doc + comment scrubs.	pending

Stage A summary (the first commit)

New frozen Pydantic AggRenderSpec (11 fields the dialect helpers empirically consume — drops the ticket's illustrative agg_args/source_measure_name/distinct).
7 dialect helpers refactored to consume AggRenderSpec instead of EnrichedMeasure: _build_agg, _build_formula_agg, _build_percentile, _build_stat_agg, _build_ranked_subquery_from_planned, _resolve_value_sql, _resolve_agg_param. _build_last_ranked_from deliberately NOT refactored — it dies wholesale in Stage D.
_synthesize_enriched_measure_from_planned (8 call sites) → _build_agg_render_spec_from_planned.
Transitional _agg_render_spec_from_enriched(em) shim — legacy SQLGenerator.generate(enriched=...) path keeps emitting byte-identical SQL through Stages B–C; shim deleted in Stage D.
AggregateKey.args schema loosened from Tuple[Scalar, ...] to Tuple[_AggregateArgValue, ...] (Union with ColumnKey | ColumnSqlKey | Scalar) — see "Latent bugs" below.
3688 unit tests pass, 0 regressions. ruff check slayer/ tests/ clean. Both Codex reviews passed (test plan + impl diff); 9 findings total, all folded in or tracked.

Latent bugs surfaced (Stage A) — blocks Stage D

While writing the Stage A tests I uncovered three latent DEV-1450 bugs the legacy _query_as_model path currently masks. They are load-bearing for DEV-1452: once Stage D deletes legacy, these break end-to-end for query-backed models using first/last(time_col).

(a) AggregateKey.args schema rejected ColumnKey — amount:last(created_at) failed at _bind_agg (binding.py:759). The legacy synth's for a in key.args: if isinstance(a, ColumnKey) branch was forward-looking and correct but unreachable. Fixed in this PR via the schema loosen (with user authorization).

(b) _build_first_last_base_select doesn't honor spec.time_column from key.args (generator.py:~3971-3984). So amount:last(created_at) binds (post fix) but render still fails with "no ranking time column resolvable" when there's no TD.

(c) Cross-model reroot path (cross_model_planner._local_agg_formula, generator._maybe_reroot_cross_model_plan) strips path from kwargs but leaves key.args untouched. With positional ColumnKey args now valid, cross-model customers.amount:last(created_at) keeps the host join path when rendered in target scope, and _local_agg_formula renders positional ColumnKey as garbage formula text.

(b) + (c) are tracked together; Stage B is the natural place to address them since the query-backed migration is what makes them reachable end-to-end.

Plan being executed

The full approved plan, verbatim:

DEV-1452 — Delete legacy enrichment pipeline + migrate pre-existing tests

Context

DEV-1450 cut the engine over from the legacy _enrich / EnrichedQuery /
legacy SQLGenerator pipeline to the typed pipeline
(build_resolved_source_bundle → normalize → plan_stages → generate_planned_stages). To keep that PR's size tractable the legacy stack
stayed in tree — and in production: every query-backed model's backing SQL still
renders via _query_as_model → enrich_query → SQLGenerator.generate
(_expand_query_backed_model execute path + _validate_and_populate_cache
save path), the new generator adapts back to EnrichedMeasure via
_synthesize_enriched_measure_from_planned, and a third legacy consumer
(engine.get_column_types) builds its probe SQL through _enrich +
SQLGenerator.generate.

DEV-1452 is the follow-up that decouples the new generator from
EnrichedMeasure, migrates the three load-bearing legacy consumers onto the
typed pipeline, migrates pre-existing tests off the legacy modules, and deletes
the legacy stack. Net result: large LOC deletion, two net-neutral prerequisite
refactors first. No public API / MCP / REST / CLI / schema-version changes.

Locked decisions (with user)

Stored source_queries topo-sort: migrate to plan_stages. Kahn
topo-sort accepts non-canonical order as a fault-tolerance contract. The
last stage stays root/sink; cycles, self-refs, duplicate names, and "root
referenced by another stage" still raise. Docs (CLAUDE.md,
docs/concepts/models.md) recommend writing stages top-to-bottom as the
canonical convention and document topo-sort as the planner's
fault-tolerance backstop. The two ContextVars
(_forbidden_sibling_refs_var, _join_target_resolving_var) and
_scope_named_queries_to_prior become deletable.
Codex finding Welcome to SLayer Discussions! #1: plan_stages._topo_sort is NOT a drop-in — it only
considers source_model sibling refs, ignores joins.target_model, does
not enforce root-as-sink, and does not reject "root referenced by another
stage". SlayerQueryEngine._topologically_order_queries
(query_engine.py:387) already has the full semantics. Reuse it: extract
the helper to a module-level function (e.g.
slayer/engine/stage_ordering.py::topologically_order_stages) keyed by
(source_model, joins.target_model, ModelExtension.source_name) references,
then call it from BOTH the migrated _expand_query_backed_model /
_validate_and_populate_cache (before plan_stages) AND the
runtime-list _execute_pipeline path (replacing the current
self._topologically_order_queries call). Stage A is unaffected;
Stage B extracts and reuses this helper before invoking plan_stages.
Virtual-model column TYPES come from slot types
(StageColumn.type / ValueSlot.type); accept the more accurate persisted
types (*:count→INT, declared ModelMeasure.type, source column type for
sum/min/max).
Virtual-model COLUMNS source = public slots only (P4 alignment).
Hoisted hidden slots are NOT exposed; they're planner-synthesized
intermediates the user never declared, and exposing them lets downstream
queries synthesize new aggregates from canonical-looking bare names. The
pre-existing tests/test_projection_trim.py:903
test_query_as_model_wraps_with_full_projection still passes via its
substring OR-fallback (the alias appears inside the rendered SQL text);
no test contract change required.
AggRenderSpec frozen Pydantic with the 11 fields the dialect helpers
empirically consume:
{sql, name, model_name, aggregation, aggregation_def, agg_kwargs, alias, filter_sql, time_column, type, column_type}. Drops the ticket's
agg_args/source_measure_name/distinct (count_distinct dispatches by
agg name; first/last positional time arg pre-resolved into time_column).
_rewrite_funcstyle_aggregations RETAINED as a pure core string utility
— live callers in slayer/core/query.py's ORDER BY validators
(_coerce_order_column, OrderItem._capture_raw_formula) feed the new
pipeline's ORDER BY resolution; the slack layer doesn't cover query.order.
parse_formula + MixedArithmeticField are deleted. Acceptance grep does
not name _rewrite_funcstyle_aggregations. Out-of-scope follow-up filed:
fold ORDER BY funcstyle into the slack layer.
test_filter_renamed_measure.py: delete the 2 skipped DEV-1445/1446
legacy-path tests (covered by test_dev1445_* / test_dev1446_*); migrate
the rest off enrich_query.
TestFuncStyleRewrite (31 tests): audit each against
tests/test_slack_normalization.py; delete the class and backfill any
uncovered cases as FUNC_STYLE_AGG warning tests in
test_slack_normalization.py.
Format/description propagation: extend typed primitives.
StageColumn.format retyped from Optional[str] to Optional[NumberFormat];
ValueSlot gets format: Optional[NumberFormat] and
description: Optional[str]. The planner populates them at slot creation
from the source ModelMeasure / Column and
_infer_aggregated_format.

Staged commits

Per feedback_tdd_style: each stage's failing behavior tests land first, then
implementation, then the full non-integration suite green
(poetry run pytest -m "not integration" -q) before the next stage. Per
feedback_intermediate_commits_no_ask: each stage commits at its end without
confirmation.

Stage A — `AggRenderSpec`: decouple the generator from `EnrichedMeasure`

Files: slayer/sql/generator.py.

New frozen Pydantic AggRenderSpec with the 11 fields (decision Add image #4).
Refactor _build_agg, _build_formula_agg, _build_percentile,
_build_stat_agg, _build_ranked_subquery_from_planned, _resolve_value_sql,
_resolve_agg_param, _wrap_cast_for_type to consume AggRenderSpec.
Replace _synthesize_enriched_measure_from_planned (8 call sites at
generator.py:3614, 3974, 4084, 5039, 5259, 6410, 7471, def at 7033) with
_build_agg_render_spec_from_planned. Carry over verbatim: StarKey
non-count/args rejection, first/last time_column derivation from
key.args, custom aggregation_def lookup + unknown-agg error, kwarg
ColumnKey path validation, column_filter_key → qualified filter_sql.
Temporary shim _agg_render_spec_from_enriched(em: EnrichedMeasure) so the
still-live legacy SQLGenerator.generate keeps emitting byte-identical SQL
through Stages B–C. Deleted in Stage D.
Drop the top-of-module EnrichedMeasure import; the shim's
EnrichedMeasure reference stays local until the shim dies.
_build_last_ranked_from (legacy ranked builder; reads
EnrichedQuery.last_agg_time_column) is NOT refactored — it dies in
Stage D with the legacy generate path.

Failing tests written first:

tests/test_agg_render_spec.py — one _build_agg_render_spec_from_planned
unit per call-site shape: StarKey (count*) rejection branch; ColumnKey bare;
ColumnKey filtered; ColumnSqlKey derived; first/last with explicit time arg
- filter; custom aggregation; percentile parametric; stat agg with other=;
  kwarg ColumnKey path validation; cross-model rerooted.
tests/test_agg_render_spec_shim.py — roundtrip
EnrichedMeasure → AggRenderSpec → SQL matches legacy
EnrichedMeasure → SQL byte-for-byte across the existing parametric SQL
fixtures in tests/test_sql_generator.py.

Stage B — migrate query-backed expansion + `get_column_types` to typed pipeline

Files: slayer/engine/query_engine.py, slayer/core/scope.py,
slayer/engine/planning.py, slayer/engine/stage_planner.py.

Extend typed primitives (decision Performance testing #8):
- StageColumn.format: Optional[NumberFormat] (retype from
  Optional[str]).
- ValueSlot.format: Optional[NumberFormat],
  ValueSlot.description: Optional[str].
- Extract _infer_aggregated_format (currently in
  slayer/engine/response_meta.py) into a shared helper so the planner
  uses it without depending on response_meta.
- The planner populates the new slot fields at projection time, from the
  source ModelMeasure/Column and _infer_aggregated_format.
Rewrite _expand_query_backed_model (query_engine.py:1341) body:
- stages = topologically_order_stages(list(model.source_queries)) (the
  extracted helper from decision Welcome to SLayer Discussions! #1; enforces root-as-sink + cycle/self/
  duplicate-name guards including joins.target_model awareness).
- Per-stage normalize + variable substitution (Codex finding Add image and link to starts #3):
  for the final stage AND every named non-final stage, run
  _normalize_stage(query=stage, bundle=…, sibling_names=…) and
  apply_variables_to_query(query=stage, variables=…, dry_run_placeholders=<as-flagged>) before planning. Mirror
  _execute_pipeline:657-674 so the migrated path collects warnings and
  honors variable precedence (runtime > stage > outer > model defaults).
- build_resolved_source_bundle(query=stages[-1], named_queries={named non-final}, data_source=…, runtime_variables=…).
- plan_stages(queries=stages, …) → root PlannedQuery.
- generate_planned_stages(...) → rendered backing SQL.
- Wrapper SQL (Codex finding Rewrite README intro with value proposition and roadmap #2): the rendered planned SQL exposes
  fully-qualified result aliases like "source_relation.public_alias"
  (e.g. "orders.customers.revenue_sum"), NOT bare public_alias. So the
  flat-rename wrapper is
  SELECT "<root.source_relation>.<sc.public_alias>" AS "<sc.name>", … FROM (<rendered>) AS _inner,
  driven by stage_schema.columns (filter-by-non-hidden public
  ValueSlots when stage_schema is None for single-stage queries).
  The root.source_relation comes from the root PlannedQuery (the
  bundle's source_model.name post-expansion). Alternative: re-parse the
  rendered SQL via sqlglot to read named_selects and mirror
  _stage_rename_wrapper — preferred if the prefix construction breaks
  on cross-model parametric aliases.
- Virtual model:
  columns = [Column(name=sc.name, sql=sc.name, type=sc.type, label=sc.label, format=sc.format, description=sc.description) for sc in <public_slots>];
  data_source = resolved base; default_time_dimension carried from the
  inner source model.
- Keep the _apply_extension_overlay re-application loop in
  _execute_pipeline (handles root ModelExtension overlay preservation).
Rewrite _validate_and_populate_cache (query_engine.py:1657)
similarly; thread dry_run_placeholders=True via
apply_variables_to_query.
Rewrite engine.get_column_types (query_engine.py:1012): keep the
existing _build_type_probe_query; bundle + plan + generate; execute via
client.get_column_types(sql); map each result key back to bare column
names by walking the root PlannedQuery's aggregate slots. Codex finding
Add image #4: probe sources can be either ColumnKey (uses .leaf) or
ColumnSqlKey (uses .column_name) — for derived columns. Read the bare
name via
getattr(slot.key.source, "leaf", None) or getattr(slot.key.source, "column_name", None), then map bare → slot.public_alias → result key.
Assert every probeable column maps back; an unmapped column raises rather
than silently drops. No _enrich / SQLGenerator.generate calls remain on
this path.
The three _execute_pipeline call sites at
query_engine.py:581/621/640 retain their existing call shape; only their
callee body changes.

Failing tests written first (tests/test_query_backed_typed_expansion.py):

test_stored_source_queries_topo_sort_accepts_nonordered_save_path — save a
model whose stored source_queries are in non-topological order; execute by
name; expect well-formed SQL.
test_stored_source_queries_cycle_raises_clearly.
test_stored_source_queries_root_referenced_raises.
test_stored_source_queries_self_reference_raises.
test_query_backed_columns_inherit_slot_types — *:count→INT, declared
ModelMeasure.type honored, source column type for sum.
test_query_backed_columns_inherit_format — percent-formatted source
measure → virtual column carries NumberFormat(type=PERCENT).
test_query_backed_columns_inherit_description — source-column description
propagates.
test_query_backed_columns_carry_default_time_dimension.
test_query_backed_columns_exclude_hidden_hoisted_slots — a model with
rank(expensenet:sum) and name="rev" exposes only the user-declared
columns; the hoisted expensenet_sum is NOT in model.columns.
test_get_column_types_uses_typed_pipeline — patch
SQLGenerator.generate to detect calls; assert zero calls and correct
type map.
test_expand_query_backed_model_does_not_touch_contextvars — monkeypatch
the two ContextVars to detect set/get; assert no touch.
test_save_path_dry_run_placeholders — model with {var} filters saves
cleanly with no caller-supplied variables (placeholder fill).

Stage C — test migrations

Files (the 7 in the ticket + 3 surfaced by validation):

tests/test_formula.py — rewrite the 8 MixedArithmeticField isinstance
assertions (lines 188, 225, 840, 1045, 1067, 1079, 1091, 1101) to
ScalarCallKey/ArithmeticKey shapes via
slayer/engine/syntax.py::parse_expr; audit + delete TestFuncStyleRewrite
(31 tests starting line 632) and backfill any uncovered case as
FUNC_STYLE_AGG tests in tests/test_slack_normalization.py; replace
_collect_needed_paths / extract_filter_transforms imports with
walk_parsed_refs traversals or relocate.
tests/test_filter_renamed_measure.py — delete the 2 skipped tests
(lines ~466, ~499; DEV-1445/1446 covered by test_dev1445_* /
test_dev1446_*); migrate the rest off enrich_query onto
engine.execute(...) + response.sql assertions.
tests/test_help.py — replace extract_filter_transforms use with a
walk_parsed_refs-based equivalent (docs-snippet structural assertions).
tests/test_named_measures.py — migrate enrich_query calls to
engine.execute + PlannedQuery/SQL assertions.
tests/test_reference_semantics.py — migrate off enrich_query.
tests/test_projection_trim.py — migrate off enriched/enrich_query;
rewrite the _query_as_model wrap test and
test_get_column_types_uses_outer_render_mode (the latter asserts a deleted
legacy contract — rewrite to assert the typed-pipeline equivalent: probe
path no longer touches SQLGenerator.generate). Codex finding Rename PyPI package to semantic-slayer #7: the
canonical_expression_key(parse_formula(...)) block at lines 48 + 1501
depends on slayer.engine.enrichment and parse_formula, both deleted in
Stage D. Port each assertion to slayer/engine/syntax.py::parse_expr plus
structural equality on the resulting typed key (ValueKey family —
ColumnKey / AggregateKey / TransformKey / ArithmeticKey /
ScalarCallKey). Drop any case that was strictly an enrichment-internal
contract.
tests/test_sql_generator.py — migrate every await enrich_query(...) +
SQLGenerator(...).generate(enriched=...) call to engine.execute(...) +
response.sql assertions; migrate TestAutoMoveFields* (3642+) to
MISPLACED_MEASURE slack tests (or delete if covered); keep the
_build_type_probe_query test (pipeline-agnostic). Codex finding Filtering on computed columns #6:
also migrate the direct _resolve_dimension_via_joins test (line 3897) and
the direct _resolve_cross_model_measure test (line 3909) — both private
helpers are deleted in Stage D. Rewrite each as parse_expr +
bind_expr(ModelScope(source_model=…)) assertions on the resulting
BoundExpr / ValueSlot, or as end-to-end engine.execute(...) checks.
Added tests/test_aggregation_gating.py — migrate the 5+ engine._enrich
- SQLGenerator.generate(enriched=...) call sites (lines 64-65 and
  related) onto engine.execute(...) against the same query.
Added tests/test_cross_model_derived_columns.py — migrate the 11+
legacy call sites (lines 49, 984, 988, 1024, 1074, 1094, 1126, 1162, 1195,
1229, 1269); the _join_aliases(enriched) helper rewrites to inspect
PlannedQuery.join_plan or the rendered SQL.
Added tests/test_query_backed_models.py — migrate the 3 direct
engine._query_as_model unit tests (lines 1446, 1465, 1484) onto
_expand_query_backed_model (post Stage B) or end-to-end
engine.execute assertions. Codex finding Move starts next to license etc #5: also migrate (or delete)
the direct _get_join_target_resolving test at line ~890 — Stage D
removes that engine helper. Replace with an end-to-end
concurrency / recursion-guard assertion through engine.execute of a
query-backed model whose join target is itself query-backed.
Doc-comment scrub (no code change, comment-only) for the acceptance grep:
tests/test_filter_renamed_measure.py:502,513,
tests/test_memories_resolver_typed.py, tests/test_schema_drift_typed.py,
tests/test_response_meta.py, tests/test_generator2_multistage.py:5,
tests/integration/test_integration.py:1318,1340,1447.

Stage D — deletions + cleanup

Delete whole files:

slayer/engine/enrichment.py.
slayer/engine/enriched.py.

Delete symbols in slayer/core/formula.py:

MixedArithmeticField (line 139).
parse_formula (line ~482) + helpers used exclusively by it (verify by
grep): likely _expand_named_measures, _preprocess_agg_refs.
Keep _rewrite_funcstyle_aggregations and parse_filter.

Delete symbols in slayer/sql/generator.py:

Legacy SQLGenerator.generate(enriched=...) method body (def at 488) and
its exclusively-legacy helpers: _has_cross_model_filter (239),
_is_windowed_measure (262), _build_last_ranked_from (1981),
_resolve_order_column(enriched=...) legacy branch (verify), the
_agg_render_spec_from_enriched shim (added Stage A).
The top-of-module EnrichedMeasure/EnrichedQuery/public_projection_aliases
import.

Delete symbols in slayer/engine/query_engine.py (each gated on grep proving
zero non-legacy callers + suite green; the new pipeline does not call any of
these):

_query_as_model (1862), _enrich (1685), the
from slayer.engine.enrichment import enrich_query import (29),
_build_rerooted_enriched (2337), _resolve_cross_model_measure (2182),
_render_query_backed_join_target (1816),
_scope_named_queries_to_prior (214), _get_join_target_resolving (203),
_join_target_resolving_var (61), _forbidden_sibling_refs_var (72),
_auto_move_fields_to_dimensions (2107), _resolve_model (1458),
_resolve_model_inner (1492), _resolve_query_model (1371),
_resolve_dimension_via_joins (2044),
_resolve_dimension_with_terminal (2060), _walk_join_chain
engine-method shim (2085).
Keep slayer/engine/path_resolution.py::walk_join_chain (separate
module with its own tests; survives the shim deletion).

Doc-comment scrub for the acceptance regex (no behavior change):
slayer/core/refs.py, slayer/core/models.py, slayer/core/query.py,
slayer/engine/response_meta.py, slayer/mcp/server.py,
slayer/sql/sql_expr.py, slayer/engine/normalization.py:17,280,
slayer/engine/schema_drift.py:520, slayer/engine/syntax.py:276.

Docs:

CLAUDE.md: update the stored-source_queries note to "write top-to-bottom
is the canonical convention; planner topo-sorts as fault tolerance"; drop
_query_as_model and ContextVar references; add an AggRenderSpec line
under SQL emission; tighten the __-carve-out wording (legacy persisted
query-backed columns remain reachable only as passive Column.name matches
in ModelScope).
docs/architecture/index.md: resolve deviations Welcome to SLayer Discussions! #1 (legacy stack deleted)
and Add image and link to starts #3 (new generator no longer adapts back to EnrichedMeasure); note the
new out-of-scope follow-up "fold ORDER BY funcstyle into the slack layer
(P0 completion)".
docs/architecture/engine-orchestration.md: drop the "where legacy runs"
deviation section; update the _execute_pipeline mermaid to remove the
(LEGACY path) qualifier on _expand_query_backed_model.
docs/architecture/sql-generation.md: drop the synthetic-EnrichedMeasure
adapter deviation section; add a brief AggRenderSpec paragraph.
docs/architecture/scopes-and-bundle.md: update the StageColumn field
table to reflect typed format and planner-populated description.
mkdocs build --strict must pass.

Critical files

slayer/engine/query_engine.py — primary surgery site (expansion,
save-path, type-probe, deletions).
slayer/sql/generator.py — AggRenderSpec, dialect-helper refactor, legacy
generate deletion.
slayer/engine/stage_planner.py — populate StageColumn.format /
description.
slayer/engine/planning.py — populate ValueSlot.format / description;
consume the extracted _infer_aggregated_format.
slayer/core/scope.py — retype StageColumn.format to
Optional[NumberFormat].
slayer/engine/response_meta.py — extract _infer_aggregated_format to a
shared helper.
slayer/engine/enrichment.py, slayer/engine/enriched.py — deleted.
slayer/core/formula.py — MixedArithmeticField, parse_formula
deletion (keep _rewrite_funcstyle_aggregations + parse_filter).

Reusable existing utilities

slayer/engine/source_bundle.py::build_resolved_source_bundle — bundle
entry; eager P11 resolution.
slayer/engine/source_bundle.py::synthetic_model_from_stage_schema /
stage_bundle_with_siblings — sibling-stage synthetic models.
slayer/engine/stage_planner.py::plan_stages / plan_query — typed
planning; topo-sort built in.
slayer/sql/generator.py::generate_planned_stages /
generate_from_planned — typed rendering.
slayer/engine/variables.py::merge_query_variables /
apply_variables_to_query — variable precedence + dry-run placeholders.
slayer/engine/syntax.py::parse_expr / walk_parsed_refs — replacement for
parse_formula / _collect_needed_paths / extract_filter_transforms
callers.
slayer/engine/path_resolution.py::walk_join_chain — canonical join-chain
walker; survives the engine-method shim deletion.
slayer/engine/normalization.py::FUNC_STYLE_AGG / MISPLACED_MEASURE —
slack-rule equivalents for legacy _rewrite_funcstyle_aggregations (in
query.measures/filters scope) + _auto_move_fields_to_dimensions.

Verification

End-to-end gates:

# Acceptance: forbidden imports / symbol-use gone (Codex finding #8 — code-
# oriented pattern, distinguishes real refs from test/doc comments).
git grep -nE '^(from slayer\.engine\.(enriched|enrichment) import|import slayer\.engine\.(enriched|enrichment))|\b(EnrichedQuery|EnrichedMeasure|MixedArithmeticField)\s*[=:,(]|\b_query_as_model\s*\(' -- slayer tests
# expected: empty.
# Historical-doc context in docs/architecture/index.md is allowed (the
# regex above doesn't match prose mentions).

# Unit suite
poetry run pytest -m "not integration" -q
# expected: green, zero @pytest.mark.skip related to DEV-1450/1452

# Lint + docs
poetry run ruff check slayer/ tests/
poetry run mkdocs build --strict

# Integration smoke (esp. query-backed paths)
poetry run pytest tests/integration/test_integration.py tests/integration/test_integration_duckdb.py -m integration -q
poetry run pytest tests/test_query_backed_models.py -q

Out-of-scope follow-ups

Fold ORDER BY funcstyle normalization into the slack layer (P0 completion).
Requires de-coercing OrderItem at construction so
_rewrite_funcstyle_aggregations can finally retire.
After Stage D, audit whether
slayer/engine/path_resolution.walk_join_chain retains any new-pipeline
call site; if not, consider deletion in a separate cleanup PR.

Implementation notes

Per feedback_unit_tests_only: each stage runs
poetry run pytest -m "not integration" -q; full integration smoke once
before final PR.
Per feedback_no_inline_imports: all imports at module top
(AggRenderSpec, extracted _infer_aggregated_format).
Per repo CLAUDE.md: keyword args for >1-param functions; ruff check at
each stage; per-stage commit with named-file git add (no git add -A).
Rebase risk: this branch is stacked on unmerged DEV-1450 (PR DEV-1450 principled redesign of syntax (WIP) #139,
~28.9k insertions vs main). Before starting Stage A, scan DEV-1450's open
review comments — any further fixes touching
_synthesize_enriched_measure_from_planned, _query_as_model,
_expand_query_backed_model, cross_model_planner.plan, or
generate_planned_stages will need absorbing into the AggRenderSpec /
expansion design before this stage lands.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Aggregations can take column identifiers as positional args and kwargs (e.g., last(created_at), weighted_avg(weight=qty)).
- Deterministic stage ordering and a wrapper to emit flattened, predictable column aliases for query-backed stages.
Improvements
- Spec-driven aggregation rendering with explicit-time resolution, validation, and preserved SQL parity.
- Planner and expansion now propagate type/format/description metadata into staged/virtual models.
Tests
- Many new and updated suites for spec contracts, SQL parity, first/last time behavior, stage ordering, and typed expansion.

…hedMeasure DEV-1452 prerequisite #3 — first of four staged commits. Net-neutral refactor of slayer/sql/generator.py's aggregation-emission codebase so the legacy `EnrichedMeasure` type can be deleted (Stage D) without forking dialect SQL emission. Existing test_sql_generator.py + unit suite (3688 passed, 0 regressions) act as the byte-identical SQL parity gate. What changed - New frozen Pydantic `AggRenderSpec` in slayer/sql/generator.py with exactly 11 fields (locked decision #4 in the approved plan): {sql, name, model_name, aggregation, aggregation_def, agg_kwargs, alias, filter_sql, time_column, type, column_type}. Drops the ticket's illustrative `agg_args` / `source_measure_name` / `distinct` — the dialect helpers don't read any of those; count_distinct dispatches by agg name, and first/last's positional time arg is pre-resolved into `time_column` at spec-build time. - 7 dialect helpers refactored to consume AggRenderSpec: `_build_agg`, `_build_formula_agg`, `_build_percentile`, `_build_stat_agg`, `_build_ranked_subquery_from_planned`, `_resolve_value_sql`, `_resolve_agg_param`. `_build_last_ranked_from` is deliberately NOT refactored — it's the legacy ranked builder that dies wholesale in Stage D. - `_synthesize_enriched_measure_from_planned` (8 call sites in generator.py) renamed to `_build_agg_render_spec_from_planned` and now returns `AggRenderSpec` directly — no more synthetic-EnrichedMeasure adapter on the typed-pipeline path. - Transitional shim `_agg_render_spec_from_enriched(em)` — a pure field-mapping function the legacy `SQLGenerator.generate(enriched=…)` path uses to feed the refactored helpers. Five legacy call sites in generator.py (cross-model, ranked-from, having-rewrite, base-select, formula-agg-wrapper) and twelve direct-call sites in tests/test_sql_generator.py wrap via this shim. Deleted in Stage D along with the rest of the legacy stack. - `AggregateKey.args` schema loosened from `Tuple[Scalar, ...]` to `Tuple[_AggregateArgValue, ...]` where `_AggregateArgValue = Union[ColumnKey, ColumnSqlKey, Decimal, str, bool, None]`. This was authorized by the user during Stage A after a bug was uncovered: `_bind_agg_arg` (binding.py:818-819) returns `ColumnKey` for identifier positional args, but the old strict-scalar schema rejected them — so `amount:last(created_at)` failed at AggregateKey validation. End-to-end repro on a fresh fixture confirmed the breakage. The legacy `_synthesize_enriched_measure_from_planned`'s `for a in key.args: if isinstance(a, ColumnKey)` branch was actually correct and forward-looking but had been unreachable. The schema fix aligns AggregateKey with the binder's actual output and matches the existing `_AggregateKwargValue` union shape used for kwargs. Tests - tests/test_agg_render_spec.py (new, 666 lines, 48 tests) — the AggRenderSpec field surface + frozen contract + every _build_agg_render_spec_from_planned call-site shape: StarKey count and rejection branches, ColumnKey bare, ColumnKey with column_filter_key, ColumnSqlKey derived, first/last with explicit time arg, custom aggregation_def threading, percentile parametric, stat agg with other=, unknown aggregation, kwarg ColumnKey path mismatch, column-not-found, slot=None HAVING branch, slot.type vs column_type independence. - tests/test_agg_render_spec_shim.py (new, 252 lines, 33 tests) — per-field shim mapping + 9 parametrized SQL byte-identical parity tests (postgres dialect) against legacy outputs captured pre-refactor (basic sum, count*, filtered sum, filtered count*, percentile, stat corr with other=, non-aggregation, sql=None bare-column sum, median). Verification - `poetry run pytest -m "not integration" -q` — 3688 passed, 2 skipped, 251 deselected. No regressions. - `poetry run ruff check slayer/ tests/` — clean. Codex pre-impl review (Stage A test plan): 6 findings, all folded in. Codex post-impl review (Stage A diff): 3 findings. - LOW (test_sql_generator.py:2235 direct _build_percentile calls): fixed in this commit (14 sites wrapped via the shim). - 2 MEDIUM findings on cross-model reroot path (cross_model_planner.py _local_agg_formula, generator.py _maybe_reroot_cross_model_plan) — tracked as follow-up alongside a third related gap (_build_first_last_base_select doesn't honor spec.time_column from args). All three are end-to-end gaps for `first/last(time_col)` that the legacy _query_as_model path currently masks; must be addressed in Stage B (the query-backed expansion migration) before Stage D deletes legacy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

linear · 2026-05-26T10:59:37Z

DEV-1452

DEV-1450

coderabbitai · 2026-05-26T11:02:46Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Refactors SQL aggregation rendering to use a frozen AggRenderSpec model, expands AggregateKey argument identity to accept column identifiers, migrates builder and planned-query paths to spec-driven emission, and adds Stage B engine/bundle/planner changes plus tests and parity checks.

Changes

Aggregation rendering refactor

Layer / File(s)	Summary
Aggregate argument expansion `slayer/core/keys.py`	`AggregateKey.args` now accepts `_AggregateArgValue` union (scalar literals or `ColumnKey`/`ColumnSqlKey`) instead of scalars-only, changing what participates in key equality/hash.
AggRenderSpec model and shim `slayer/sql/generator.py`	Introduces frozen `AggRenderSpec` (Pydantic) and `_agg_render_spec_from_enriched()` adapter to decouple aggregation rendering from `EnrichedMeasure`.
Aggregation builders refactored to spec-driven `slayer/sql/generator.py`	`_build_agg`, `_build_formula_agg`, `_build_percentile`, `_build_stat_agg`, `_resolve_value_sql`, and `_resolve_agg_param` now accept `AggRenderSpec`, with spec-based dispatch, param resolution/validation, and CASE WHEN filter wrapping.
First/last ranked RN split `slayer/sql/generator.py`	Adds `_resolve_explicit_time_col`, `_build_unfiltered_rn_columns`, and `_build_filtered_rn_columns`; first/last ranking planning and validation now use synth_specs and deduplicated RN projections.
Planned query path refactored `slayer/sql/generator.py`	Adds `_build_agg_render_spec_from_planned` to synthesize specs for planned aggregate slots; migrates ranked-subquery, base/outer select, composite aggregates, cross-model CTE, HAVING, and time_shift rendering to use `synth_specs`.
Cross-model CTE rerooting `slayer/sql/generator.py`	Spec-based aggregation propagates through cross-model CTEs with symmetric positional-arg rerooting and spec-driven emission (including ranked-subquery wrapping).
Aggregation def & kwarg-path validation `slayer/sql/generator.py`	Adds `_resolve_aggregation_def` and `_validate_aggregate_kwarg_paths` used by planned→spec builder to enforce star/count rules, kwarg path checks, derived-column expansion, and agg_kwarg canonicalization.
Stage SQL rename wrapper `slayer/sql/stage_wrapper.py`	Adds `build_flat_rename_wrapper()` to parse stage SQL and emit a flattened renaming SELECT that strips `<source_relation>.` prefixes and replaces `.` with `__`; existing wrapper delegates to this helper.
Stage B engine & bundle migration `slayer/engine/query_engine.py`, `slayer/engine/source_bundle.py`, `slayer/engine/stage_ordering.py`	Introduce deterministic Kahn topological ordering for stages, `expand_query_backed_models_in_bundle` to expand query-backed models in ResolvedSourceBundle, wire the engine to delegate expansion and use the typed pipeline for `_expand_query_backed_model`, `get_column_types`, and save-time cache population.
Planner metadata propagation `slayer/engine/stage_planner.py`, `slayer/engine/planned.py`, `slayer/engine/planning.py`	Propagate `type`, `format` (NumberFormat), and `description` through DeclaredMeasure/ValueSlot/StageColumn via inference helpers and planner wiring.
AggRenderSpec contract tests `tests/test_agg_render_spec.py`	New Stage-A tests validating AggRenderSpec's 11-field surface, immutability, defaulting, and builder behavior across StarKey, ColumnKey/ColumnSqlKey, first/last, custom/parametric/stat aggs, cross-model kwarg rules, and type propagation.
Shim field mapping and SQL parity `tests/test_agg_render_spec_shim.py`	Tests field-by-field mapping, dropped-field semantics, immutability, and byte-identical Postgres SQL parity for legacy `EnrichedMeasure` inputs passed through `_agg_render_spec_from_enriched` and the refactored builders.
Existing test migration `tests/test_sql_generator.py`	Update percentile and statistical aggregation tests to import and use `_agg_render_spec_from_enriched`, routing builder calls through the new spec-based path while preserving assertions.
Integration & regression tests `tests/test_dev1476_first_last_explicit_time.py`, `tests/test_query_backed_typed_expansion.py`, others	Add acceptance and integration tests covering explicit time args for first/last, typed query-backed expansion, save-time validation, alias flattening, cross-datasource rejection, and P4 closure behavior.
Stage wrapper unit tests `tests/test_stage_wrapper_helper.py`	Unit tests for `build_flat_rename_wrapper` verifying prefix stripping, dot-flattening, expected-column validation, and alias preservation.
Topo-ordering unit tests `tests/test_topologically_order_stages.py`	Unit tests for deterministic stage ordering, invariant validation, and nested inline sibling-edge extraction.
Projection-trim & minor test updates `tests/test_projection_trim.py`, `tests/test_query_backed_models.py`	Adjust tests to assert zero legacy enriched generate calls and relax inner-stage alias assertion to accept quoted/unquoted forms.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

MotleyAI/slayer#82: Related statistical-aggregation SQL generation changes intersecting with this spec refactor.
MotleyAI/slayer#22: Overlaps on aggregation rendering and first/last construction logic.
MotleyAI/slayer#125: Related stage DAG/topological ordering and engine delegation changes.

Suggested reviewers

AivanF

Poem

🐰 I hopped in with a spec so neat,

Columns and scalars now share a seat.
Builders hum, CTEs reroot true,
Tests check parity through and through.
🥕 A rabbit's tidy refactor stew.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 63.16% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and clearly describes the main objective of the PR: deleting the legacy enrichment pipeline and migrating pre-existing tests to the new typed pipeline, which are the primary changes across the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch egor/dev-1452-delete-legacy-enrichment-pipeline-migrate-pre-existing-tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

slayer/sql/generator.py (2)

7215-7230: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve built-in Aggregation overrides on the planned path.

agg_def is only populated for non-built-ins here, so planned queries lose model-defined overrides/default params for built-in aggregations like weighted_avg and percentile. That breaks _build_formula_agg / _resolve_agg_param fallback behavior and can turn a previously valid model-level default into a missing-param error.

Suggested fix

-            agg_def = None
-            if key.agg not in _BUILTIN_BAREARG_AGGS_LOCAL_SLICE:
-                agg_def = next(
-                    (a for a in (source_model.aggregations or []) if a.name == key.agg),
-                    None,
-                )
-                if agg_def is None:
-                    raise AggregationNotAllowedError(
-                        column=src_leaf,
-                        agg=key.agg,
-                        reason=(
-                            f"unknown aggregation {key.agg!r} — not a built-in "
-                            f"and not defined in {source_model.name!r}."
-                            f"aggregations."
-                        ),
-                    )
+            agg_def = next(
+                (a for a in (source_model.aggregations or []) if a.name == key.agg),
+                None,
+            )
+            if agg_def is None and key.agg not in _BUILTIN_BAREARG_AGGS_LOCAL_SLICE:
+                raise AggregationNotAllowedError(
+                    column=src_leaf,
+                    agg=key.agg,
+                    reason=(
+                        f"unknown aggregation {key.agg!r} — not a built-in "
+                        f"and not defined in {source_model.name!r}."
+                        f"aggregations."
+                    ),
+                )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/sql/generator.py` around lines 7215 - 7230, The current logic skips
looking up model-defined Aggregation overrides for built-in aggs, so planned
queries lose model defaults; change the code in the block around agg_def and
key.agg so you always attempt to find a matching aggregation in
source_model.aggregations (assign agg_def = next(...)) regardless of whether
key.agg is in _BUILTIN_BAREARG_AGGS_LOCAL_SLICE, and only raise
AggregationNotAllowedError if key.agg is not a builtin and no agg_def was found;
this preserves model-defined overrides used by _build_formula_agg and
_resolve_agg_param while keeping the existing disallow rule for unknown
non-builtins.

7200-7250: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Complete the new ColumnSqlKey support in this adapter.

After widening aggregate args/kwargs to accept derived-column identifiers, this branch still special-cases only ColumnKey. That leaves two correctness gaps: first/last with a derived time arg never sets time_column and silently falls back to the default ranking column, and cross-model kwarg path validation can be bypassed with a ColumnSqlKey, letting the kwarg render in the wrong scope.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/sql/generator.py` around lines 7200 - 7250, The branch that handles
time-column extraction and kwarg path validation only treats ColumnKey but must
also treat derived-column identifiers (ColumnSqlKey); update the first/last arg
loop (the block that sets explicit_time_col) to treat instances of ColumnSqlKey
the same as ColumnKey (inspect their .path and leaf-style identifier to build
explicit_time_col or fall back to source_relation) and break once set, and
update the kwarg validation loop (the for kname, kval in key.kwargs block) to
include ColumnSqlKey in the isinstance check so that kval with type ColumnSqlKey
and a mismatched .path triggers the same AggregationNotAllowedError; keep
referencing key.agg, explicit_time_col, source_relation, source.path, and
AggregationNotAllowedError to locate the spots to change.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@slayer/sql/generator.py`:
- Around line 7215-7230: The current logic skips looking up model-defined
Aggregation overrides for built-in aggs, so planned queries lose model defaults;
change the code in the block around agg_def and key.agg so you always attempt to
find a matching aggregation in source_model.aggregations (assign agg_def =
next(...)) regardless of whether key.agg is in
_BUILTIN_BAREARG_AGGS_LOCAL_SLICE, and only raise AggregationNotAllowedError if
key.agg is not a builtin and no agg_def was found; this preserves model-defined
overrides used by _build_formula_agg and _resolve_agg_param while keeping the
existing disallow rule for unknown non-builtins.
- Around line 7200-7250: The branch that handles time-column extraction and
kwarg path validation only treats ColumnKey but must also treat derived-column
identifiers (ColumnSqlKey); update the first/last arg loop (the block that sets
explicit_time_col) to treat instances of ColumnSqlKey the same as ColumnKey
(inspect their .path and leaf-style identifier to build explicit_time_col or
fall back to source_relation) and break once set, and update the kwarg
validation loop (the for kname, kval in key.kwargs block) to include
ColumnSqlKey in the isinstance check so that kval with type ColumnSqlKey and a
mismatched .path triggers the same AggregationNotAllowedError; keep referencing
key.agg, explicit_time_col, source_relation, source.path, and
AggregationNotAllowedError to locate the spots to change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ec79ae24-98a2-4b2c-b77d-92090b54c3ee

📥 Commits

Reviewing files that changed from the base of the PR and between 91d7ca2 and 9b36e71.

📒 Files selected for processing (5)

slayer/core/keys.py
slayer/sql/generator.py
tests/test_agg_render_spec.py
tests/test_agg_render_spec_shim.py
tests/test_sql_generator.py

…n-of-syntax' into egor/dev-1452-delete-legacy-enrichment-pipeline-migrate-pre-existing-tests

Two findings collapse to one method (slayer/sql/generator.py:7143 _build_agg_render_spec_from_planned): - Codex MAJOR on PR #144: the explicit-time-arg loop only matched ColumnKey, silently dropping ColumnSqlKey args. With the Stage A schema loosen, the binder returns ColumnSqlKey for derived columns; amount:last(derived_time) would fall back to the query's default ranking column instead of ordering by the explicit derived expression — incorrect first/last results with no error. - Sonar python:S3776 on the same method: cognitive complexity 43 > 15. Extracts a new _resolve_explicit_time_col helper that handles both ColumnKey and ColumnSqlKey time args. Bare-identifier derived columns expand to their underlying SQL qualified under source_relation; non-trivial derived expressions are emitted as-is and their inner bare refs resolve against the ranked-subquery's FROM clause. Cross- model paths on ColumnSqlKey args raise NotImplementedError rather than silently emitting against the wrong relation alias — that case is tracked alongside the Stage B cross-model reroot bug (DEV-1452 latent bug (c) in the Linear comment). The extraction drops the caller's complexity below 15 as a side effect. End-to-end caveat: first/last with an explicit time arg renders correctly only when a time dimension is also present in the query (default_time_col_sql comes from _resolve_ranking_time_column_from_planned). With no TD, deferred bug (b) — _build_first_last_base_select raises before any spec is rendered — still bites. Stage B fixes that boundary as planned. Tests in tests/test_agg_render_spec.py: - test_last_with_derived_bare_time_column_local (bare-identifier derived column expands to underlying SQL) - test_first_with_derived_expression_time_column_local (non-trivial derived expression round-trips through sqlglot, qualified inside the ranked subquery's FROM) - test_cross_model_derived_time_column_raises (NotImplementedError) - test_unknown_derived_time_column_raises (ValueError mirroring the source-column lookup-miss path) Unit suite: 3758 passed, 7 skipped, 0 failed (flight tests deselected — pyarrow not installed in this env). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…anned Sonar python:S3776 on slayer/sql/generator.py:3822 — cognitive complexity 33 > 15 on _build_ranked_subquery_from_planned. Extracts the two sub-passes into named helpers, leaving the outer method as setup → unfiltered pass → filtered pass → assembly: - _build_unfiltered_rn_columns: one ROW_NUMBER projection per distinct effective time column for unfiltered first/last specs, with stable per-time-column suffixes (first sorted gets "", then "_2", ...). - _build_filtered_rn_columns: one dedicated (ROW_NUMBER + match-flag) pair per distinct (filter, time, agg) triple for filtered first/last, cached so specs sharing a triple reuse the same pair. Pure structural refactor — no behaviour change. The dedup semantics, stable suffix assignment, ORDER BY direction, and CASE WHEN match-flag emission are byte-identical to the pre-split implementation. Unit suite: 3758 passed, 7 skipped, 0 failed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sonar python:S8396 on slayer/sql/generator.py:60 — `sql: Optional[str]` without a default flags as a Pydantic anti-pattern. The intent here is "required field that can hold None" (`*:count` needs to pass `sql=None` explicitly). Per the rule's documented exception, `Type | None` is treated as explicit nullability and does not trigger the warning. Switch `sql: Optional[str]` to `sql: str | None` — preserves the required-but-may-be-None semantics, dodges the false-positive, and adds a docstring note pinning the intent so future readers don't "fix" it back to `Optional[str]`. Also wires the new DEV-1476 ticket reference into the group 1 helper docstring + error message + test comment (DEV-1476 is the consolidated 4-bug Stage B package opened today for the explicit-time-arg gaps on first/last; the cross-model derived-time NotImplementedError in `_resolve_explicit_time_col` is bug (c) of that package). Unit suite: 3758 passed, 7 skipped, 0 failed. Ruff: clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two outside-diff CodeRabbit findings on the spec-build adapter, both verified against current code and fixed minimally. Finding A — slayer/sql/generator.py:~7350 (agg_def lookup for built-ins) The lookup for source_model.aggregations was gated on 'key.agg not in _BUILTIN_BAREARG_AGGS_LOCAL_SLICE'. That set includes weighted_avg / percentile / corr / etc., so a user model that overrode the default params of a built-in (e.g. supplied a default weight=quantity for weighted_avg) had the override silently dropped on the planned path — _resolve_agg_param then raised 'Aggregation \'weighted_avg\' requires parameter \'weight\''. Real regression vs legacy; the pre-existing comment above the set even admits this as a deferred follow-up. Fix: always look up the model-level Aggregation; only raise when the lookup misses AND key.agg is not a built-in. Built-ins keep their default renderer when no override exists. Updated the comment above _BUILTIN_BAREARG_AGGS_LOCAL_SLICE to retire the obsolete TODO. Finding B — slayer/sql/generator.py:~7375 (kwarg path validation gap) The for-kname-kval-in-kwargs loop only matched ColumnKey, so a ColumnSqlKey kwarg with a non-empty path slipped past the host-vs-target-path check and would have rendered against the wrong relation. Bug surfaced by the Stage A AggregateKey.args / kwarg-value schema loosen (which now permits ColumnSqlKey as a kwarg value). Fix: extend the isinstance check to (ColumnKey, ColumnSqlKey); use type(kval).__name__ in the error reason so the message reflects the actual rejected kind. Both Key types carry .path, so the equality check works unchanged. CodeRabbit's third sub-finding on the same comment (derived time arg for first/last silently dropping ColumnSqlKey) was already addressed in group 1 via _resolve_explicit_time_col + tests; verified no residual gap. Tests added in tests/test_agg_render_spec.py: - test_builtin_aggregation_model_override_threaded: model-level weighted_avg override surfaces in spec.aggregation_def with default weight=quantity. - test_builtin_aggregation_no_override_leaves_def_none: a vanilla built-in (sum) with no override keeps aggregation_def=None so the built-in renderer dispatches. - test_kwarg_columnsqlkey_path_mismatch_raises: a ColumnSqlKey kwarg with a non-empty path against a local source raises the same AggregationNotAllowedError as the ColumnKey case. - _orders_model() fixture grew a weighted_avg model-level override so the built-in-override tests don't need a parallel fixture. Unit suite: 3933 passed, 7 skipped, 0 failed. Ruff: clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sonar python:S3776 re-flagged both functions after groups 1-4 splits: - _build_ranked_subquery_from_planned cognitive complexity 21 > 15 (was 33 pre-Group-2 split) - _build_agg_render_spec_from_planned cognitive complexity 26 > 15 (was 43 pre-Group-1 extract) Hybrid approach (option c on the plan): 1) Extract two more cohesive helpers from _build_agg_render_spec_from_planned: - _resolve_aggregation_def: model-level Aggregation lookup + raise on unknown non-built-in. Encapsulates the lookup-runs-for-built-ins-too semantics (CodeRabbit fold-in on PR #144 group 4) so a future reader can see the contract in one place. - _validate_aggregate_kwarg_paths: cross-model kwarg path validation for both ColumnKey and ColumnSqlKey. Tightens the dispatch in the main method down to a single line per concern. 2) NOSONAR(S3776) both functions with documented rationale: - The spec-build adapter is fundamentally a sequential isinstance dispatch over StarKey / ColumnKey / ColumnSqlKey; each branch has its own contract. Further splitting would scatter the per-source- kind contract across helpers and obscure flow. - The ranked subquery builder, after Group 2 factored _build_unfiltered_rn_columns / _build_filtered_rn_columns, is left with exp.Select / from / joins / where assembly that has to live in one place. Pure refactor — no behaviour change. Existing tests pin the integrated behaviour through the public method. Unit suite: 3933 passed, 7 skipped, 0 failed. Ruff: clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three legacy-stack call sites (_expand_query_backed_model, _validate_and_populate_cache, get_column_types) now route through build_resolved_source_bundle → plan_stages → generate_planned_stages instead of the legacy _query_as_model / _enrich / SQLGenerator.generate chain. Folds in DEV-1476 bugs (b), (c), (d-cross) so query-backed models using first / last with an explicit positional time arg keep working end-to-end after Stage D deletes the legacy fallback. Helpers extracted (decisions B + F): * slayer/sql/stage_wrapper.py::build_flat_rename_wrapper — shared with generate_planned_stages' CTE chaining + the virtual-model wrap. * slayer/engine/stage_ordering.py::topologically_order_stages — Kahn topo-sort, called by both the runtime-list execute path and the stored source_queries save path. _extract_sibling_refs now recursively walks inline SlayerModel.source_queries (decision E). * slayer/engine/source_bundle.py::expand_query_backed_models_in_bundle — source + referenced + stage_source expansion + inline_extensions re-apply in one place; recursion guard via _resolving handles a query-backed join target that references its parent. Typed primitives extended (decisions #2, #8): * StageColumn.format retyped Optional[NumberFormat]. * ValueSlot.format / .description added. * DeclaredMeasure.type / .format / .description carry through ProjectionPlanner intern → ValueSlot → StageColumn → virtual Column. * Slot-type inference: *:count → INT, count/avg/median → built-ins, sum/min/max inherit source column type. Replaces legacy blanket DOUBLE for virtual-model columns (decision #2). DEV-1476 fixes (B.6): * (b) _build_first_last_base_select now only raises when an explicit time arg is also missing from every first/last spec. * (c) _render_cross_model_cte reroots positional ColumnKey / ColumnSqlKey args symmetric to kwargs, and wraps the FROM in a ROW_NUMBER-ranked subquery for first/last aggregates. * (d-cross) _build_agg_render_spec_from_planned threads ``bundle`` so derived ColumnSqlKey aggregate sources get expand_derived_refs_ sync (their inner bare refs qualify under source_relation). * tests/test_agg_render_spec.py::TestBuilderFirstLast flipped from _raises to _resolves_after_reroot. Migrated-path acceptance (decision D): * SourceModelOrigin / agg_column_names dropped from the virtual model. Outer dotted-ref re-aggregation against canonical aliases uses the flat ``__``-joined name (P4 closure); a dotted ref that matched the legacy lineage walk now surfaces a clear binder error. * The two ContextVars (_join_target_resolving_var, _forbidden_sibling_refs_var) are NOT touched by the migrated paths. They remain alive for _query_as_model until Stage D. Test additions / migrations: * tests/test_query_backed_typed_expansion.py — 32 new tests covering topo-sort acceptance / cycle / root-as-sink / self-ref / duplicate / inline-nested decision E, virtual-model column types / format / description / default_time_dimension / public-only slots, P4 closure pin, ContextVar non-touch, save path with placeholder fill + data_source refresh + save=False, nested query-backed expansion (qb-A joins qb-B / overlay-over-qb-base / runtime stage source), cross-DS rejection, wrapper alias shapes (local agg / cross-model agg / cross-model parametric agg / joined dim). * tests/test_dev1476_first_last_explicit_time.py — 6 tests (3 helper-level _local_agg_formula args reroot + 3 end-to-end against seeded SQLite for bugs b / c / d-cross). * tests/test_topologically_order_stages.py — 12 tests pinning the extracted helper + decision E recursion (typed / dict / nested ModelExtension / inline-nested cycle). * tests/test_stage_wrapper_helper.py — 4 tests pinning the extracted build_flat_rename_wrapper. * tests/test_query_backed_models.py — loosened the inner-stage-rename regex to accept the quoted alias form (DEV-1452 Stage B emits ``AS "rev"``; pre-Stage-B legacy emitted ``AS rev``). * tests/test_projection_trim.py — migrated test_get_column_types_uses_outer_render_mode → _does_not_call_ legacy_generate (the typed pipeline has no render_mode contract). Verification: * poetry run pytest -m "not integration" -q — 3985 passed, 0 failed. * poetry run ruff check slayer/ tests/ — clean. * SQLite + DuckDB integration smoke — 131 passed. * Query-backed model tests — 46 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/engine/planning.py`:
- Around line 119-120: When promoting a hidden slot to declared, backfill its
missing metadata (type/label/format/description) so declaration-order doesn't
matter: update the _merge_into_existing(...) path to, when visibility changes
from hidden to declared (or when existing metadata fields are None/empty), copy
over non-null fields from the incoming slot (type, label, format, description)
into the existing slot; ensure the same logic is applied in the "existing-key"
branch that currently drops these fields (the code around the existing-key
promotion at lines ~511–513). Use the slot object and its visibility
attribute/name (e.g., Slot.visibility, Slot.type, Slot.label, format,
description and the _merge_into_existing function) as anchors to find and modify
the logic so promoted slots receive the full metadata from the declaring input.

In `@slayer/engine/query_engine.py`:
- Around line 1188-1193: The persisted model can contain non-topological
source_queries which later breaks _execute_by_name and
_scope_named_queries_to_prior due to insertion-order sibling visibility; to fix,
ensure source_queries are normalized to topological order either at save-time
(call topologically_order_stages on model.source_queries and persist that order)
or immediately before execution (rebuild/stabilize named_q by running
topologically_order_stages and using its output to set named_q and final_stage
prior to _execute_by_name); update the code paths that construct named_q (the
block using stages = topologically_order_stages(...), final_stage, named_q) so
the execution path always uses the topo-sorted list rather than the original
model.source_queries.
- Around line 1218-1225: The call to expand_query_backed_models_in_bundle is
passing final_stage.variables and thus loses the merged defaults/overrides
stored in bundle.query_variables; change the call to pass the merged root
variables (bundle.query_variables) instead of final_stage.variables so nested
query-backed join targets and stage sources receive the fully merged variables;
ensure you keep the same runtime_kwarg, dry_run_placeholders,
expander=self._expand_query_backed_model and the _resolving union ({model.name})
unchanged.

In `@slayer/engine/stage_ordering.py`:
- Around line 10-12: Update the module docstring in
slayer/engine/stage_ordering.py to correct the contract: replace the clause that
says "forward references all raise ``ValueError``" with wording that forward
references are reordered/handled (not rejected) by the helper (e.g., mention
that the helper will reorder forward references rather than raising), and keep
the rest of the documented failure cases (cycles, self-references, duplicate
names, root referenced by another stage) as raising ValueError; reference the
reordering behavior provided by the helper function (reorder logic used by
functions like reorder_stages or the stage ordering helper) to make intent
explicit.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e18c62e4-a6b8-476b-85b8-488a89023e74

📥 Commits

Reviewing files that changed from the base of the PR and between 9b36e71 and 09ab5bc.

📒 Files selected for processing (16)

slayer/core/scope.py
slayer/engine/planned.py
slayer/engine/planning.py
slayer/engine/query_engine.py
slayer/engine/source_bundle.py
slayer/engine/stage_ordering.py
slayer/engine/stage_planner.py
slayer/sql/generator.py
slayer/sql/stage_wrapper.py
tests/test_agg_render_spec.py
tests/test_dev1476_first_last_explicit_time.py
tests/test_projection_trim.py
tests/test_query_backed_models.py
tests/test_query_backed_typed_expansion.py
tests/test_stage_wrapper_helper.py
tests/test_topologically_order_stages.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/test_agg_render_spec.py
slayer/sql/generator.py

…adata Two MAJOR Codex findings around typed-metadata propagation through the planner. Both surface as wrong virtual-column ``type`` on query-backed models after the Stage B migration. * _declared_measures_from_query (stage_planner.py:~1005) ignored a user-supplied ``ModelMeasure.type`` on a query measure — only ``_type_for_measure_formula`` ran. Now ``m.type or _type_for_ measure_formula(...)``: explicit override wins, inference is the fallback. Mirrors legacy ``EnrichedMeasure.type``'s "honor explicit type before inference" contract. * ValueRegistry._merge_into_existing (planning.py:~221) dropped ``type`` / ``format`` / ``label`` / ``description`` on hidden→public promotion. Repro: declare ``rank(*:count)`` first (hoists ``*:count`` hidden) then declare ``*:count`` public — the promoted slot kept ``type=None`` and the migrated virtual column regressed to DOUBLE fallback. Now the merge fills any None-on-existing metadata field from the public re-intern; first-intern wins for already-set fields. Tests: * test_query_measure_type_override_wins_over_inference — explicit ``type=DOUBLE`` on ``*:count`` surfaces as DOUBLE despite the INT inference default. * test_promoted_hidden_slot_keeps_metadata — ``rank(*:count)`` + ``*:count`` ordering yields INT on the public ``_count`` column. Full non-integration suite green (3987 passed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* S1172 stage_planner.py:840 — drop the unused ``formula`` param from ``_format_description_for_measure_formula``. The function takes ``scope`` + ``bound`` already; the formula string was a leftover from an earlier draft. * S100 test_query_backed_typed_expansion.py:698 — rename ``test_qb_A_joins_qb_B_save_path`` → ``test_qb_a_joins_qb_b_save_path`` to match ``^[a-z_][a-z0-9_]*$``. Full non-integration suite green (3987 passed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Six CRITICAL S3776 cognitive-complexity hits on Stage B surface plus one pre-existing Stage A leftover. Each NOSONAR'd with a documented rationale on the def line. * generator.py:3832 _resolve_explicit_time_col (complexity 21, pre-existing from Stage A b67eecb) — sequential isinstance dispatch over ColumnKey + ColumnSqlKey; splitting fragments the per-shape contract. * generator.py:4068 _build_first_last_base_select (complexity 42) — dim/td/derived-dim classification pass + agg-spec synth + ranked-subquery wrap + outer SELECT/GROUP BY assembly; shared mutable state (partition_exprs / extra_projections / outer_ref_ by_sid / synth_by_sid) keeps it as one conceptual unit. * generator.py:5113 _render_cross_model_cte (complexity 56) — shared-grain projection + agg reroot (source / args / kwargs) + first/last ranked-subquery wrap + target-model-filter qualify + WHERE/HAVING routing; each block is interdependent CTE state. * query_engine.py:788 get_column_types (complexity 22) — linear probe pipeline: query-backed prelude → bundle → expand-nested → plan → render → execute → result-key map-back. * source_bundle.py:411 expand_query_backed_models_in_bundle (complexity 25) — three sequential expansion blocks (source + referenced + stage_source) mutating bundle in series; recursion guard is the only shared state. * stage_ordering.py:47 _walk_spec (complexity 49) — recursive walker over five ``source_model`` shapes plus nested ``source_queries`` recursion. The recursion + isinstance dispatch IS the function. Full non-integration suite green (3987 passed); ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… cross-model first/last fallback Two MAJOR Codex findings on the round-1 fixes: * query_engine.py:_expand_query_backed_model — passed ``outer_vars=final_stage.variables`` to ``expand_query_backed_models_in_bundle``, dropping the outer model's ``query_variables`` layer for nested expansions / sibling stages. Substitution then diverged between execute (pre-merged path) and save-time dry-run (placeholder ``0``). Now passes the bundle's merged ``bundle.query_variables`` (which already includes ``{model.query_variables, outer_vars, stage, runtime}`` per the layered precedence) so nested layers see the full variable context. * generator.py:_render_cross_model_cte — cross-model ``first`` / ``last`` without an explicit positional time arg emitted the ``MAX(CASE WHEN _last_rn = 1 ...)`` expression but only built the ROW_NUMBER-ranked subquery when ``synth.time_column`` was truthy. Result: agg_expr referenced ``_first_rn`` / ``_last_rn`` against a bare ``FROM target`` that didn't supply it. Now falls back to ``target_model.default_time_dimension`` (qualified under the target relation); raises a clear ``first/last requires a ranking time column`` error when even that is unset (mirrors the local first/last path's behaviour). Tests: * test_cross_model_first_last_uses_target_default_time_dimension — ``customers.amount:last`` (no explicit time arg) against a customers model with ``default_time_dimension="signup_at"`` executes end-to-end against seeded SQLite. * test_cross_model_first_last_with_no_time_at_all_raises — same query against a customers model WITHOUT default_time_dimension raises the typed error. Full non-integration suite green (3989 passed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…act metadata-fill helper The round-1 metadata-carry fix added 4 parallel if-blocks to ``ValueRegistry._merge_into_existing``, bumping cognitive complexity to 18 (over the 15 cap). Extract a module-level ``_fill_missing_metadata`` that walks the four fields in a single loop and mutates the ``updates`` dict in place. The ``slot.field is None and new_value is not None`` rule is identical across the four fields, so the extraction simplifies rather than obscures. Full non-integration suite green (3989 passed); ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/test_dev1476_first_last_explicit_time.py (1)

242-245: ⚡ Quick win

Move function-local imports to module scope.

These imports duplicate existing top-level imports and break the repo’s import-placement rule.

Proposed cleanup

-    import os
-    import sqlite3
-    import tempfile
@@
-    import os
-    import sqlite3
-    import tempfile

As per coding guidelines "Place imports at the top of files".

Also applies to: 325-327

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_dev1476_first_last_explicit_time.py` around lines 242 - 245, The
test contains function-local imports for os, sqlite3, and tempfile that
duplicate top-level imports and violate the import-placement rule; move these
imports (the occurrences around the two blocks that currently import "os",
"sqlite3", "tempfile") to the module scope at the top of
tests/test_dev1476_first_last_explicit_time.py, remove the duplicated local
import statements inside the functions, and ensure any functions reference the
now-top-level names (no other code changes needed).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_dev1476_first_last_explicit_time.py`:
- Around line 312-318: The test currently only asserts resp.data exists after
running engine.execute(SlayerQuery(...)) with the measure formula
"customers.amount:last", which allows regressions where the :last measure stops
using customers.default_time_dimension; update the test to explicitly assert
that the generated SQL or response metadata includes the expected default time
dimension (customers.default_time_dimension) and that the time filter or ORDER
BY in resp.sql references that dimension; locate the SlayerQuery call and add
assertions against resp.sql (and/or resp.data structure) to confirm the SQL
contains the default time dimension name and/or an appropriate time-based ORDER
BY or WHERE using that dimension so the fallback behavior is verified.

---

Nitpick comments:
In `@tests/test_dev1476_first_last_explicit_time.py`:
- Around line 242-245: The test contains function-local imports for os, sqlite3,
and tempfile that duplicate top-level imports and violate the import-placement
rule; move these imports (the occurrences around the two blocks that currently
import "os", "sqlite3", "tempfile") to the module scope at the top of
tests/test_dev1476_first_last_explicit_time.py, remove the duplicated local
import statements inside the functions, and ensure any functions reference the
now-top-level names (no other code changes needed).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 52582d13-3dda-47d9-a01b-b44797220286

📥 Commits

Reviewing files that changed from the base of the PR and between feb76af and 6ca71e9.

📒 Files selected for processing (4)

slayer/engine/planning.py
slayer/engine/query_engine.py
slayer/sql/generator.py
tests/test_dev1476_first_last_explicit_time.py

🚧 Files skipped from review as they are similar to previous changes (3)

slayer/engine/planning.py
slayer/engine/query_engine.py
slayer/sql/generator.py

… call sites Codex MAJOR finding (generator.py:4229) flagged the local first/last ranked-subquery call site for skipping ``_expand_derived_column_sql`` on derived ``ColumnSqlKey`` aggregate sources because ``bundle`` was not passed to ``_build_agg_render_spec_from_planned``. Audit revealed the same gap at 4 sibling call sites: * generator.py:4229 _build_first_last_base_select (ranked-subquery path; Codex flagged). * generator.py:4340 _render_aggregate_composite_expr (single-agg- inside-arithmetic). * generator.py:5579 _render_filter_value_key_in_target_scope HAVING agg. * generator.py:6743 transform input agg. * generator.py:7851 _render_value_key_for_filter (filter ref). Two enclosing functions did not have ``bundle`` in scope and were extended to accept it: * _render_aggregate_composite_expr (4306) — gained ``bundle=None`` kwarg; its 5 recursive / external callers (3691, 4287, 4352, 4365, 8166) now thread ``bundle`` through. * _apply_order_limit_from_planned (8135) — gained ``bundle=None``; the one caller at 2901 threads it through. Test: * test_local_first_last_over_derived_column_expands_inner_refs — ``net_amount:last(created_at)`` where ``net_amount.sql="amount * 0.9"`` executes end-to-end against seeded SQLite; without the expansion the bare ``amount`` inside ``MAX(CASE WHEN _last_rn=1 ...)`` would render unqualified and SQLite would error. Full non-integration suite green (3990 passed); ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Codex MAJOR (round 4) — sibling-stage variable substitution in the migrated ``_expand_query_backed_model`` used ``root_vars = final_stage.variables`` as a base layer, but ``apply_variables_to_query`` only substitutes into filters and never updates ``.variables`` — so ``root_vars`` was just the stage-level dict, dropping ``model.query_variables`` and the outer-query layer. Repro: a query-backed model with ``query_variables={"threshold": "100"}`` and a NAMED non-final sibling whose filter references ``{threshold}``. Pre-fix, save-time dry-run silently substituted the placeholder ``"0"``. Post-fix the sibling sees the outer default and substitutes ``100``. Fix: swap ``root_vars`` for ``bundle.query_variables`` as the base layer. ``bundle.query_variables`` is the merged ``{runtime > final_stage.variables > model.query_variables > outer_vars > source_model_defaults}`` set so sibling substitution sees the full outer context. Test: * test_sibling_stage_inherits_outer_model_query_variables — pins the rendered SQL contains ``> 100`` (the outer model default), not ``> 0`` (the placeholder). Also dismissed Codex round-4 finding "_get_column_types rejects __ columns" as a false positive — direct repro returns the flattened ``customers__region`` key correctly; the binder's literal-column carve-out for ``Column.name`` containing ``__`` works on the migrated path. Full non-integration suite green (3991 passed); ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…re.type Two Codex findings (1 MAJOR + 1 MINOR): * MAJOR query_engine.py:_execute_by_name — built ``named_queries`` from persisted ``source_queries`` order before ``_execute_pipeline``, but ``plan_stages._topo_sort`` only handles ``source_model`` deps. A model that saved successfully with out-of-order ``joins[].target_ model`` sibling stages (the save path goes through ``_expand_query_ backed_model`` which DOES topo-sort) would fail when run directly by name. Fix: call ``topologically_order_stages`` at the top of ``_execute_by_name`` so run-by-name matches save-time semantics. * MINOR stage_planner.py:_declared_measures_from_query — a query formula that is a bare reference to a saved ``ModelMeasure(formula =..., type=DataType.X)`` on the source model lost the explicit ``type=X``. ``expand_model_measures`` rewrites the AST but drops the measure's type metadata; the planner then fell back to inference, which only happened to match the explicit type by coincidence on the pre-existing test. Fix: extend the type- priority chain to ``m.type or _saved_model_measure_type(...) or _type_for_measure_formula(...)``. New helper ``_saved_model_measure_type`` re-looks-up the saved measure when the query formula is a bare identifier matching a ``ModelMeasure.name`` on the source model. Tests: * test_run_by_name_accepts_nonordered_join_target_stages — saves a model with out-of-order ``joins.target_model`` sibling, then runs it by name; pre-fix this would fail at ``plan_stages``. * test_saved_model_measure_type_wins_over_inference — ``ModelMeasure(formula="amount:sum", type=DataType.INT)`` is referenced from a query; pre-fix the resulting column had ``type=DOUBLE`` (inference), now ``type=INT`` (explicit). Full non-integration suite green (3993 passed); ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…/github.com/MotleyAI/slayer into egor/dev-1452-delete-legacy-enrichment-pipeline-migrate-pre-existing-tests

- Codex major (generator.py): cross-model first/last computed ROW_NUMBER over the full target table, then applied target_model_filters + routed WHERE filters on the outer CTE. A filtered-out row could still win _last_rn = 1, NULL-ing out the MAX(CASE WHEN _last_rn = 1 ...) aggregate for that group. Push WHERE into the ranked subquery via the helper's existing where_clause param so RN is computed over filtered rows. Regression test pins last(amount) = 100.0 when the newer row is soft-deleted (instead of NULL). - CodeRabbit thread (test_dev1476): strengthen the default_time_dimension fallback test from "data exists" to an exact expected-value pin (NA last = 50.0 by signup_at). - CodeRabbit nitpick (test_dev1476): drop function-local import os/sqlite3/tempfile in two test bodies; they duplicate the module-top imports and break the imports-at-top rule. - CodeRabbit thread (stage_ordering.py): docstring claimed forward references raise ValueError, but the topo-sort reorders them — that is the entire fault-tolerance contract. Rewrite to list the actual raise cases (cycles, self-refs, dupes, root referenced by other) and call out forward-reference reordering explicitly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…s-model first/last _render_cross_model_cte built the first/last aggregate via _build_agg BEFORE constructing the ROW_NUMBER-ranked subquery, passing a bare rn_suffix_map and discarding the subquery's filtered_rn_map / filtered_match_map. For a cross-model first/last over a target column carrying a Column.filter, the ranked subquery emits _last_rn_f0 / _match_f0 (skipping the bare _last_rn for filtered specs), so the aggregate referenced a _last_rn column the subquery never projects — "no such column: _last_rn". Reorder to build the ranked subquery first (mirroring the local first/last path), then thread its rn_suffix_map + filtered_rn_map + filtered_match_map into _build_agg. Also drops the previously-wrong tuple-keyed rn_suffix_map in favor of the subquery's correctly time-col-keyed map. Regression test: cross-model customers.active_amount:last where active_amount has filter="status = 'active'" returns the newest ACTIVE row's value (100.0), skipping newer inactive rows. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…qualify; dedup dev1476 tests Three review fixes from the process-reviews triage on PR #144: - stage_planner._type_for_dimension now walks the join chain to the terminal column's type for dotted (joined) dimensions instead of returning None. A None was coerced to DataType.DOUBLE in _query_as_model, mistyping joined string/temporal dims on the persisted query-backed virtual model (Codex #1). - generator._resolve_explicit_time_col routes a derived (ColumnSqlKey) first/last time arg through _expand_derived_column_sql so inner bare refs qualify to the source relation. An unqualified ref in a complex expression (e.g. date(created_at)) could bind to the wrong table when the ranked subquery includes a join with a same-named column (Codex #2). - Finish the in-progress dedup of test_dev1476_first_last_explicit_time: define the _engine_from_sql / _orders_model helpers the file already called, route the fixture through them, and drop the per-test SQLite boilerplate. Clears the SonarCloud new-code duplication gate and the CodeRabbit function-local-import nitpick. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-05-28T15:39:54Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.8% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

ZmeiGorynych and others added 7 commits May 26, 2026 13:15

Merge remote-tracking branch 'origin/egor/dev-1450-principled-redesig…

ee18009

…n-of-syntax' into egor/dev-1452-delete-legacy-enrichment-pipeline-migrate-pre-existing-tests

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Comment thread slayer/engine/planning.py

Comment thread slayer/engine/query_engine.py

Comment thread slayer/engine/query_engine.py

Comment thread slayer/engine/stage_ordering.py Outdated

ZmeiGorynych and others added 5 commits May 27, 2026 12:41

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Comment thread tests/test_dev1476_first_last_explicit_time.py

ZmeiGorynych and others added 7 commits May 27, 2026 13:48

Merge branch 'egor/dev-1450-principled-redesign-of-syntax' of https:/…

e67a3de

…/github.com/MotleyAI/slayer into egor/dev-1452-delete-legacy-enrichment-pipeline-migrate-pre-existing-tests

ZmeiGorynych merged commit 8b2239c into egor/dev-1450-principled-redesign-of-syntax May 28, 2026
2 checks passed

ZmeiGorynych mentioned this pull request May 28, 2026

DEV-1484 Stage C: migrate pre-existing tests off legacy enrichment #153

Open

4 tasks

coderabbitai Bot mentioned this pull request May 31, 2026

DEV-1500: recognize joined-model custom aggregations in FUNC_STYLE_AGG slack rule #156

Merged

5 tasks

Conversation

ZmeiGorynych commented May 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DEV-1452 — Delete legacy enrichment pipeline + migrate pre-existing tests

Status

Stage A summary (the first commit)

Latent bugs surfaced (Stage A) — blocks Stage D

Plan being executed

DEV-1452 — Delete legacy enrichment pipeline + migrate pre-existing tests

Context

Locked decisions (with user)

Staged commits

Stage A — AggRenderSpec: decouple the generator from EnrichedMeasure

Stage B — migrate query-backed expansion + get_column_types to typed pipeline

Stage C — test migrations

Stage D — deletions + cleanup

Critical files

Reusable existing utilities

Verification

Out-of-scope follow-ups

Implementation notes

Summary by CodeRabbit

Uh oh!

linear Bot commented May 26, 2026

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 28, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ZmeiGorynych commented May 26, 2026 •

edited by coderabbitai Bot

Loading

Stage A — `AggRenderSpec`: decouple the generator from `EnrichedMeasure`

Stage B — migrate query-backed expansion + `get_column_types` to typed pipeline

coderabbitai Bot commented May 26, 2026 •

edited

Loading