DEV-1484 Stage C: migrate pre-existing tests off legacy enrichment by ZmeiGorynych · Pull Request #153 · MotleyAI/slayer

ZmeiGorynych · 2026-05-28T18:17:00Z

Summary

Stage C of DEV-1452 (delete the legacy enrichment pipeline): migrate every
pre-existing test file off slayer.engine.enrichment / slayer.engine.enriched
/ MixedArithmeticField so Stage D (DEV-1485) can delete the legacy modules.
Test-only changes — no production behaviour changed in Stage C.

The acceptance grep over tests/ is clean except test_agg_render_spec_shim.py
(the shim that dies in Stage D). This branch also merges in the latest
egor/dev-1450-principled-redesign-of-syntax (PR #144 / Stage A+B + Stage B
rounds 6–8); the merge was conflict-free.

Migrated test_sql_generator.py, test_reference_semantics.py,
test_formula.py, test_projection_trim.py, test_cross_model_derived_columns.py,
test_query_backed_models.py, test_named_measures.py, test_help.py,
test_aggregation_gating.py, test_filter_renamed_measure.py,
test_nested_dag_cross_stage_refs.py off the legacy stack — engine._enrich
/ enrich_query / raw EnrichedQuery sites now run through
engine.execute(dry_run=True) or AggRenderSpec + the dialect helpers, with
SQL-based assertions.
Deleted legacy-only test classes (parse_formula / named-measure expansion /
extract-filter / placeholder / TestAutoMoveDimensions / stage-origin
resolver) after auditing coverage; backfilled the unique cases additively
into test_value_registry.py, test_syntax.py, test_measure_expansion.py,
test_agg_render_spec.py, and test_slack_normalization.py.

Deferred typed-pipeline divergences (strict-xfail + tracking issue)

Genuine behaviour gaps the migration surfaced — each pinned by a
@pytest.mark.xfail(strict=True) test so it auto-promotes when fixed:

DEV-1502 — measure-source Column.sql __-path alias doesn't trigger join inference.
DEV-1494 — column-level Column.filter cross-table ref doesn't trigger join discovery (derived + direct-dotted-ref flavours).
DEV-1503 — typed pipeline lacks isolated-CTE handling for cross-model-FILTERED local measures (legacy _fm_ feature). Stage-D blocker — 13 tracking tests.

(Earlier Stage A/B sessions raised DEV-1492 / 1495 / 1496 / 1497 / 1499 / 1500 / 1501 the same way.)

Test plan

Acceptance grep clean over tests/ (only test_agg_render_spec_shim.py remains)
poetry run pytest -m "not integration" — 3902 passed, 6 skipped, 42 xfailed, 0 failed
poetry run ruff check slayer/ tests/ — clean
Integration smoke (SQLite + DuckDB) — 133 passed

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Expanded DSL and SQL-generation docs: lowercase like allowed; like(value, pattern) → SQL LIKE (NOT LIKE via not(...)); || → concat(...); detailed notes on CTEs, predicate join-paths, and filtered-local aggregate isolation.
New Features
- DSL recognizes 2-arg like(...) and concat via ||; SQL generator emits explicit LIKE/concat; planner supports filtered-local aggregate isolation to avoid join-intersection issues.
Bug Fixes
- change_pct divide-by-zero guarded; validation errors now map to client validation errors; improved join-path/derived-column handling to prevent incorrect SQL.
Tests
- Large suite updates and many new regression/integration tests covering DSL, SQL emission, planner isolation, join discovery, and normalization.

The typed-pipeline migration (Stage A/B) introduced dedicated SlayerError subclasses for binding-time failures that the legacy enrichment pipeline raised as bare ValueError. The REST /query endpoint maps ValueError -> HTTP 400, so these would have surfaced as 500s instead of user-correctable 400s. Add ValueError as a second base to the 7 affected classes (AggregationNotAllowedError, UnknownFunctionError, MeasureRecursionLimitError, MeasureCycleError, DuplicateMeasureNameError, MeasureNameCollidesWithColumnError, CanonicalAliasShadowsColumnError) so they match the 6 sibling resolution-time errors that already extend (SlayerError, ValueError). New tests/test_errors_hierarchy.py pins the contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…richment Migrate test_filter_renamed_measure, test_named_measures, test_help, and test_aggregation_gating off slayer.engine.enrichment / SQLGenerator.generate onto the typed pipeline via engine.execute(dry_run=True). New tests/_engine_helpers.py provides the shared Bucket-1 helpers: _engine_generate (ephemeral YAMLStorage + engine + dry_run), _assert_valid_sql (lifted from the legacy _validating_generate wrapper), and _where_text / _having_text (sqlglot clause extraction) for filter-classification assertions that previously read enriched.filters[*].is_having. test_help switches parse_formula -> parse_expr and extract_filter_transforms chain -> parse_filter_expr. Deletes the 2 skipped DEV-1445/1446 guard tests in TestDeferredScopeGuards (coverage lives in test_dev1445_* / test_dev1446_*). Error-message regexes updated to the typed-pipeline error text. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Migrate tests/test_reference_semantics.py off enrich_query onto engine.execute(dry_run=True) + sqlglot WHERE/HAVING clause extraction for filter-classification assertions. Delete the two drop_unreachable_filters tests (legacy enrich_query internal flag; coverage in test_cross_model_planner). tests/test_formula.py: delete the 7 classes that tested the removed parse_formula parser + legacy AST node types; keep the 5 classes covering the retained parse_filter / _rewrite_funcstyle_aggregations helpers. Backfills for the genuine gaps the deletions left: - tests/test_measure_expansion.py (new): negative eligibility cases for expand_model_measures (AggCall.source / DottedRef not expanded; reserved transform name rejected at construction). - tests/test_value_registry.py: cross-model vs local AggregateKey, ntile n-distinct, time_shift kwargs order-independence, repeated-intern stability (backfilled from the deleted TestCanonicalExpressionKey). - tests/test_syntax.py: first/last transform parse shape; plus a strict xfail pinning DEV-1492 (parse_filter_expr mangles transform kwargs in filters) so it auto-promotes when fixed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e _query_as_model test_nested_dag_cross_stage_refs.py: delete the direct-unit tests of the legacy stage-origin resolver (TestChainedDAGCandidates, TestCandidatePrecedence, TestNonVirtualModel's resolver-None test, and the intercept-dim unit test). They pinned the OLD ancestor-strip resolution of the *dotted* downstream form; the typed pipeline intentionally uses a flat-only downstream contract (dotted refs rejected), pinned by test_binding.py::TestStageSchemaScope and the file's remaining flat end-to-end engine.execute tests. Drop the resolver import (keep SourceModelOrigin, still used by serialization-roundtrip tests). test_query_backed_models.py: migrate the 3 direct _query_as_model unit tests to a _expand_stage_as_model helper over engine._expand_query_backed_model (the Stage B replacement); assertions on virtual columns + wrapped alias SQL carry over. Kept test_join_target_resolving_set_is_per_context — its ContextVar is still live in production until Stage D. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…s + fix derived-dim gap Migrate tests/test_cross_model_derived_columns.py off engine._enrich / SQLGenerator.generate onto engine.execute(dry_run=True); rewrite the resolved_joins assertions to a sqlglot-based _join_aliases(sql) that walks emitted JOINs. _gen_sql upserts the source model + datasource so inline-built models resolve from storage. Stage B fix-up (slayer/sql/generator.py): the typed pipeline could not select a cross-model DERIVED column (Column.sql set) as a dimension — the ColumnSqlKey branch fell back to a source-model-only lookup and raised "Column not found". Make the branch path-aware (mirror the time-column arm): expand the derived sql rooted at the owning joined model's __-path alias and pull its joins into the FROM. _expand_derived_column_sql gains an is_root flag so a further-joined ref inside a cross-model derived column prefixes to the full path (B reaching C -> B__C). The migrated dimension tests are the coverage. The column-level Column.filter derived-ref join-discovery gap (DEV-1494) and the cosmetic cycle-chain entry-point change are handled as a strict xfail and an updated assertion respectively. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… enrichment Migrate tests/test_projection_trim.py off enrich_query / SQLGenerator.generate onto engine.execute(dry_run=True). The legacy render_mode generator parameter is gone (typed pipeline emits the outer trim directly), so the two tests that exercised it directly are removed; the staged-trim behaviour is covered end-to-end via engine.execute / query-backed wrap. Rewrite the provenance/user_declared tests as outer-SELECT membership assertions (the typed-pipeline equivalent of "user-declared = in the public projection"); delete the legacy public_projection_aliases / user_projection / canonical_expression_key tests (the latter's coverage was backfilled into test_value_registry in commit 2). Rewrite the same-canonical-measures tests from "rejected" to the typed multi-alias output (P4/C13: one slot, both aliases project). The get_column_types-doesn't-call-legacy-generate contract pin stays. Two pre-existing typed-pipeline cross-model projection bugs surfaced and are strict-xfailed against DEV-1495: a cross-model dimension key flattening to ``__`` instead of dotted, and an order-by-only cross-model aggregate leaking into the outer projection with a malformed alias. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…t comments Reword docstring / comment / assert-message references to the dying legacy symbols (enrichment AST node types, the free-function formula parser, the query-backed wrap renderer, the public-projection helper) across the test suite so the acceptance grep is byte-clean except for test_sql_generator.py (Stage C commit 5) and test_agg_render_spec_shim.py (deleted in Stage D). Renames two integration test functions off the legacy token. Comment-only; no behavioural change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…partial) + Mode-B ||/LIKE + change_pct guard C.6 partial: migrate tests/test_sql_generator.py off slayer.engine.enrichment — local _generate now delegates to engine.execute(dry_run=True); Bucket-2 sites use AggRenderSpec directly; ~75% of the engine._enrich/generator.generate blocks rewritten to engine.execute (datasource + extra_models registered as needed). 23 strict-xfails track genuine typed-pipeline divergences surfaced by the migration (DEV-1496 windowed measures [Stage-D blocker], DEV-1497 partition_by grain, DEV-1499 cross-model agg in composite, DEV-1500 funcstyle custom agg on joined models, DEV-1501 parametric last time-col collapse, plus existing DEV-1445 / DEV-1494). Production fixes surfaced during migration (full non-integration suite green, 3918 passed / 27 xfailed): - Mode-B query filters now accept the SQL || concat operator (desugared to concat via a |/BitOr rewrite) and a like(value, pattern) scalar emitting the SQL LIKE operator — closing the gap docs/concepts/references.md already promised but the typed pipeline rejected. - change_pct desugar guards the divisor with NULLIF(prev, 0). - Nested self-join transforms now raise ValueError (HTTP 400) instead of NotImplementedError (HTTP 500). Remaining for next session: Phase-2 tail (TestIsolatedFilteredMeasureCTEs and the other enrich_query-helper / standalone-_enrich classes), Phase 4 (raw EnrichedQuery constructions), Phase 5 (delete TestAutoMoveDimensions + backfill into test_slack_normalization), then the acceptance grep + ruff + integration smoke gates and dropping the top-of-file legacy imports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…f legacy Finishes the C.6 Phase-2 tail so no test file imports the legacy enrichment stack (acceptance grep clean except the Stage-D-only agg_render_spec shim): - Migrate the remaining engine._enrich / enrich_query / raw EnrichedQuery sites to engine.execute(dry_run=True) and AggRenderSpec + dialect helpers; add a SQL-based _join_aliases helper; drop the legacy imports and the dead _validating_generate wrapper. - Delete TestAutoMoveDimensions (legacy _auto_move_fields_to_dimensions); backfill its 7 cases into test_slack_normalization.py (TestMisplacedMeasure + an end-to-end TestEngineWiring case). - Relocate the two source-alias filter tests to a unit test in test_agg_render_spec.py. Three genuine typed-pipeline divergences surfaced and tracked with strict xfails (never silently weakened): - DEV-1502: measure-source Column.sql __-path alias doesn't trigger joins. - DEV-1494: column-level Column.filter direct dotted cross-table ref doesn't trigger join discovery (2nd tracking test added). - DEV-1503 (Stage-D blocker): typed pipeline lacks isolated-CTE handling for cross-model-FILTERED local measures (legacy _fm_ feature); 13 tests xfailed. The cross-model-MEASURE _cm_ tests migrated to engine.execute and pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n-of-syntax' into egor/dev-1484-dev-1452-stage-c-migrate-pre-existing-tests-off-legacy

linear · 2026-05-28T18:17:05Z

DEV-1484

DEV-1452

coderabbitai · 2026-05-28T18:18:38Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6f19329c-9a42-43b5-9318-2c720bf41317

📥 Commits

Reviewing files that changed from the base of the PR and between e381429 and 1c69c78.

📒 Files selected for processing (4)

docs/architecture/sql-generation.md
slayer/sql/generator.py
tests/integration/test_integration.py
tests/test_sql_generator.py

📝 Walkthrough

Walkthrough

Adds Mode‑B like and ||→concat parsing and binder arity checks, records column-filter join-paths on SqlExprKey, implements filtered-local isolation (DEV‑1503) in cross-model planning, extends derived-ref expansion and join-threading in SQL generation, adjusts Aggregate canonical naming and change_pct lowering, updates several error classes to inherit ValueError, and migrates many tests to engine-driven SQL helpers.

Changes

DSL scalar functions and operator support

Layer / File(s)	Summary
**Documentation for `like` and `
Parser and binder support `slayer/engine/syntax.py`, `slayer/engine/binding.py`, `slayer/core/keys.py`	Mode‑B preprocessing rewrites `
SQL generation for `like` `slayer/sql/generator.py`	Typed SQL rendering now emits `exp.Like(...)` for `ScalarCallKey(name="like")` across aggregate composite, post-filter WHERE/HAVING, and value-key rendering paths.

Error hierarchy and planning guard

Layer / File(s)	Summary
Binding-time errors inherit ValueError `slayer/core/errors.py`, `tests/test_errors_hierarchy.py`	Seven validation error classes now multi-inherit from `ValueError` alongside `SlayerError`; unit tests assert the inheritance contract.
Guard change_pct divisor against zero `slayer/engine/planning.py`, `tests/test_transform_lowerer.py`	`desugar_change_pct` now divides by `nullif(shifted, 0)` (ScalarCallKey `nullif`) to avoid divide-by-zero; tests assert `nullif` shape and reuse of the same `time_shift` instance.

Derived-column SQL qualification and cross-model path support

Layer / File(s)	Summary
Derived-column expansion & predicate inlining `slayer/sql/generator.py`, `slayer/engine/column_expansion.py`	Adds detection of derived refs in predicates (`_predicate_references_derived`), `_render_mode_a_predicate` inline-expansion via expand_derived_refs_sync, `_expand_degenerate_derived_root`, and `collect_root_scope_joined_paths` to discover join-path prefixes.
Column-filter join-path discovery (planner) `slayer/engine/column_filter_paths.py`, `slayer/engine/binding.py`	Planner-side `compute_column_filter_join_paths` parses/expands filter SQL across dialect fallbacks, walks root-scope `<alias>.<col>` refs, and returns ordered join-path prefixes; binder populates `SqlExprKey.referenced_join_paths` using it.
Cross-model CTE join-threading and filtered-local isolation `slayer/sql/generator.py`, `slayer/engine/cross_model_planner.py`, `slayer/engine/planned.py`, `slayer/engine/stage_planner.py`	Generator threads filter-driven join paths into CTE FROMs and ranked subqueries; planner implements DEV‑1503 filtered-local isolation by rerooting host-rooted sub-plans and returning `CrossModelAggregatePlan.cte_root_model` to signal host-rooted CTE rendering; `plan_query` gains `disable_dev1503_isolation` to suppress recursion.
Aggregate canonicalization `slayer/engine/planning.py`	Hidden-slot canonical names for AggregateKey now include canonicalized args/kwargs (via canonical_agg_name and agg_kwarg canonical strings).

Normalization, reachable custom aggs, and resolver

Layer / File(s)	Summary
Model normalization with custom_agg_names `slayer/engine/normalization.py`, `slayer/engine/query_engine.py`, `slayer/engine/source_bundle.py`, `slayer/engine/schema_drift.py`, `slayer/memories/resolver.py`	`normalize_model` accepts `custom_agg_names`; `ResolvedSourceBundle.reachable_aggregation_names` and query_engine/save_model use join-graph BFS to compute reachable aggregation names for FUNC_STYLE_AGG recognition; schema-drift and resolver paths updated accordingly.

SQL generator structural and rendering changes

Layer / File(s)	Summary
Derived-aware expansion and filter join merging `slayer/sql/generator.py`	Pre-expands derived ROW/TIME dimensions with is_root semantics, merges join paths from Mode‑A inline-expanded predicates and typed ValueKey trees, builds `_expand_column_filter_sql` and `_render_mode_a_predicate` helpers, and replaces some generic func emission with specific `exp.Like` rendering.
Nested-transform rejection message and handling `slayer/sql/generator.py`	Nested-transform rejections now raise `ValueError` with guidance text to precompute inner transforms in multi-stage models.

Tests, helpers, and migrations

Layer / File(s)	Summary
Shared engine-driven SQL helpers `tests/_engine_helpers.py`	Adds `_assert_valid_sql`, `_engine_generate` (ephemeral YAMLStorage + engine.execute dry_run), `_outer_select`, `_where_text`, `_having_text` for SQL-shape assertions.
Test migrations to engine dry-run SQL many `tests/*` (e.g., `tests/test_cross_model_derived_columns.py`, `tests/test_filter_renamed_measure.py`, `tests/test_projection_trim.py`, `tests/test_reference_semantics.py`, `tests/test_projection_planner.py`)	Multiple tests switched from enrichment/SQLGenerator internals to engine dry-run SQL assertions; added join-alias extraction, dialect-threading, and new regression tests for DEV‑1494/DEV‑1501/DEV‑1503.
New/refocused tests `tests/test_filtered_local_isolation.py`, `tests/test_measure_expansion.py`, `tests/test_slack_normalization.py`, `tests/test_value_registry.py`, `tests/test_agg_render_spec.py`, `tests/test_entity_resolution.py`, `tests/test_source_bundle.py`, `tests/test_validate_models.py`, `tests/test_errors_hierarchy.py`	Adds planner-focused DEV‑1503 coverage, typed measure-expansion negatives, slack-normalization coverage (DEV‑1484/DEV‑1500), TransformKey interning tests, agg-render-spec qualification test, entity extraction for joined custom aggs, reachable-aggregation-name unit tests, cascade-rule regression, and error-hierarchy assertions.

Sequence Diagram

sequenceDiagram
  participant Test as Test
  participant Engine as SlayerQueryEngine
  participant Binder as Binding
  participant Planner as StagePlanner/CrossModelPlanner
  participant Generator as SQLGenerator
  Test->>Engine: engine.execute(query, dry_run=True)
  Engine->>Binder: bind query, compute SqlExprKey (canonical_sql + referenced_join_paths)
  Binder->>Planner: plan_query(..., disable_dev1503_isolation=False)
  Planner->>Planner: detect filtered-local trigger -> _plan_filtered_local (DEV-1503)
  Planner->>Generator: build SQL (expand derived refs, collect filter join-paths)
  Generator->>Engine: return response.sql
  Engine->>Test: response.sql (validated)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

MotleyAI/slayer#158: Overlaps Mode‑B = rewrite/tokenizer logic and related syntax changes.
MotleyAI/slayer#163: Overlaps DEV‑1503 filtered-local isolation and related planner/binder generator plumbing.
MotleyAI/slayer#159: Related changes to aggregate canonical naming for parametric aggregates.

Suggested reviewers

AivanF

Poem

"I hop where pipes become concat and LIKE,
I guard the shifted time from divide-by-strife.
I stitch join-paths into CTEs, neat and small,
Tests now ask the engine, and SQL shows all.
A carrot for the reviewer, and a happy hop for all."

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch egor/dev-1484-dev-1452-stage-c-migrate-pre-existing-tests-off-legacy

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

slayer/sql/generator.py (1)
3554-3600: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate is_root=False to the other joined-ColumnSqlKey call sites.

This fixes base-row derived dimensions, but joined derived time dimensions and joined derived filter refs still call _expand_derived_column_sql() with the default True (_raw_time_col_expr_for_planned, _collect_filter_join_paths, _render_value_key_for_filter). A multi-hop derived ref on a joined model will still qualify to the bare child alias instead of the full __ path, so those queries keep generating invalid SQL outside the base-select path.
🛠️ Suggested follow-up
-                expanded_sql = self._expand_derived_column_sql(
-                    source_model=joined_model,
-                    source_relation="__".join(time_column.path),
-                    column_name=time_column.column_name,
-                    bundle=bundle,
-                )
+                expanded_sql = self._expand_derived_column_sql(
+                    source_model=joined_model,
+                    source_relation="__".join(time_column.path),
+                    column_name=time_column.column_name,
+                    bundle=bundle,
+                    is_root=False,
+                )
Apply the same change to the joined-model branches in _collect_filter_join_paths() and _render_value_key_for_filter().
As per coding guidelines, "Joined tables use __-delimited path aliases in SQL to disambiguate diamond joins."

Also applies to: 7315-7350
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/sql/generator.py` around lines 3554 - 3600, The joined-model call
sites that build derived-column SQL must pass the correct is_root flag into
_expand_derived_column_sql so multi-hop derived refs on joined models get full
__-delimited path aliases; update the joined-ColumnSqlKey branches in
_raw_time_col_expr_for_planned, _collect_filter_join_paths, and
_render_value_key_for_filter (the same pattern as the shown generator.py block)
to pass is_root=False (or is_root=not key.path where key is a ColumnSqlKey) when
key.path is non-empty so the expansion roots at the joined owner rather than the
base model.

🧹 Nitpick comments (4)

tests/test_transform_lowerer.py (1)

97-97: ⚡ Quick win

Pin canonical zero type in the denominator assertion.

At Line 97, assert zero == 0 also passes for non-canonical values and won’t catch normalization regressions. Assert Decimal(0) (or normalize_scalar(0)) explicitly.

Proposed test hardening

+from decimal import Decimal
+
 from slayer.core.keys import (
     AggregateKey,
     ArithmeticKey,
@@
-        assert zero == 0
+        assert zero == Decimal(0)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_transform_lowerer.py` at line 97, The assertion currently checks
`zero == 0`, which can pass for non-canonical representations; update the test
to assert a canonical zero type by comparing `zero` to `Decimal(0)` (or to the
result of `normalize_scalar(0)` if that helper exists) so the test fails on
normalization regressions—replace the `assert zero == 0` with an assertion that
`zero` equals `Decimal(0)` (or `normalize_scalar(0)`), referencing the `zero`
variable and `normalize_scalar` helper as needed.

tests/test_errors_hierarchy.py (1)

51-51: ⚡ Quick win

Use keyword arguments in both pytest.mark.parametrize decorators.

Switch to argnames= and argvalues= to follow the repo-wide Python call style rule.

Proposed change

-@pytest.mark.parametrize("cls", _VALUE_ERROR_BINDING_ERRORS)
+@pytest.mark.parametrize(argnames="cls", argvalues=_VALUE_ERROR_BINDING_ERRORS)
 def test_binding_error_is_value_error(cls: type) -> None:
@@
-@pytest.mark.parametrize("cls", _VALUE_ERROR_BINDING_ERRORS)
+@pytest.mark.parametrize(argnames="cls", argvalues=_VALUE_ERROR_BINDING_ERRORS)
 def test_binding_error_is_slayer_error(cls: type) -> None:

As per coding guidelines: `**/*.py`: Use keyword arguments for functions with more than 1 parameter.

Also applies to: 59-59

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_errors_hierarchy.py` at line 51, Update the two
pytest.parametrized decorators to use keyword args: replace the positional call
pytest.mark.parametrize("cls", _VALUE_ERROR_BINDING_ERRORS) (and the other
instance at the same file) with pytest.mark.parametrize(argnames="cls",
argvalues=_VALUE_ERROR_BINDING_ERRORS) so the test decorators follow the repo
style for functions with multiple parameters; locate the decorators in
tests/test_errors_hierarchy.py by the pytest.mark.parametrize usage and change
to argnames= and argvalues= accordingly.

tests/test_aggregation_gating.py (1)

52-73: ⚡ Quick win

Use shared _engine_generate instead of duplicating engine setup.

_generate_sql duplicates the same storage/bootstrap/validate flow that now lives in tests/_engine_helpers.py. Switching to the shared helper will keep test harness behavior consistent and reduce future divergence.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_aggregation_gating.py` around lines 52 - 73, The _generate_sql
function duplicates test engine setup/teardown; replace its body with a call to
the shared helper _engine_generate to reuse the centralized
storage/bootstrap/validation logic: import _engine_generate at the top of the
test module (if not already), change _generate_sql to delegate to await
_engine_generate(orders=orders, customers=customers, measures=measures) and
return the SQL string it yields, and remove the duplicated
YAMLStorage/SlugQueryEngine/bootstrap/assert logic (keep any local assertions
only if they’re uniquely required for this test).

tests/_engine_helpers.py (1)

57-64: 🏗️ Heavy lift

Make _engine_generate keyword-only for query and model.

This helper currently encourages positional calls with multiple arguments, which is drifting from the repo’s calling convention. Please make query and model keyword-only and update call sites accordingly.

Suggested change

 async def _engine_generate(
-    query: SlayerQuery,
-    model: SlayerModel,
-    *,
+    *,
+    query: SlayerQuery,
+    model: SlayerModel,
     dialect: str = "postgres",
     extra_models: Optional[list] = None,
     validate: bool = True,
 ) -> str:

As per coding guidelines, **/*.py: “Use keyword arguments for functions with more than 1 parameter”.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/_engine_helpers.py` around lines 57 - 64, Change the signature of
_engine_generate to make query and model keyword-only (e.g., async def
_engine_generate(*, query: SlayerQuery, model: SlayerModel, dialect: str =
"postgres", extra_models: Optional[list] = None, validate: bool = True) ) and
then update every call site to pass those two parameters by name (use query=...
and model=... instead of positional args); search for usages of _engine_generate
in tests/_engine_helpers.py and across the test suite to update invocations
accordingly so the function enforces keyword-only for query and model.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/sql/generator.py`:
- Around line 4436-4437: Routed filter expressions like like(...) are being
turned into string literals because _collect_routed_filters calls
_render_filter_value_key_in_target_scope which lacks a branch for ScalarCallKey;
add handling for ScalarCallKey in _render_filter_value_key_in_target_scope (and
the analogous places referenced) to detect a ScalarCallKey whose name == "like"
and return an exp.Like AST (or otherwise defer to the existing renderer that
emits exp.Like) instead of falling through to _literal_key_to_exp; update any
identical routines used for cross-model CTE routing so ScalarCallKey predicates
stay predicates rather than literals (refer to exp.Like,
_collect_routed_filters, _render_filter_value_key_in_target_scope,
ScalarCallKey, and _literal_key_to_exp).

In `@tests/test_agg_render_spec.py`:
- Around line 403-408: The call to _slot currently passes key positionally;
update it to use a keyword argument (e.g., key=key) so all parameters are passed
by name; modify the instantiation assigned to variable slot that calls
_slot(declared_name=..., public_name=..., slot_type=...) to include key=key to
follow the keyword-arg style for multi-parameter functions.

In `@tests/test_query_backed_models.py`:
- Around line 52-58: The helper function _expand_stage_as_model currently
accepts multiple positional parameters (engine, stage, *, name: str,
data_source: str) but is being called positionally; change its signature to
enforce keyword-only usage (e.g., make engine and stage positional but require
name and data_source as keyword-only or make all parameters keyword-only) and
update all call sites (including the calls referenced around the other
occurrences) to pass arguments by keyword (e.g., name=<...>, data_source=<...>)
so the function is invoked using keyword-based calls; ensure callers of
_expand_stage_as_model use the exact parameter names (name, data_source) in
their calls.

In `@tests/test_response_meta.py`:
- Line 4: The docstring in tests/test_response_meta.py contains a duplicated
word "legacy legacy" in the phrase "legacy legacy enriched-query path"; update
the docstring to remove the duplicate so it reads "legacy enriched-query path"
(search for the string "legacy legacy enriched-query path" and replace it with
"legacy enriched-query path" in that file).

---

Outside diff comments:
In `@slayer/sql/generator.py`:
- Around line 3554-3600: The joined-model call sites that build derived-column
SQL must pass the correct is_root flag into _expand_derived_column_sql so
multi-hop derived refs on joined models get full __-delimited path aliases;
update the joined-ColumnSqlKey branches in _raw_time_col_expr_for_planned,
_collect_filter_join_paths, and _render_value_key_for_filter (the same pattern
as the shown generator.py block) to pass is_root=False (or is_root=not key.path
where key is a ColumnSqlKey) when key.path is non-empty so the expansion roots
at the joined owner rather than the base model.

---

Nitpick comments:
In `@tests/_engine_helpers.py`:
- Around line 57-64: Change the signature of _engine_generate to make query and
model keyword-only (e.g., async def _engine_generate(*, query: SlayerQuery,
model: SlayerModel, dialect: str = "postgres", extra_models: Optional[list] =
None, validate: bool = True) ) and then update every call site to pass those two
parameters by name (use query=... and model=... instead of positional args);
search for usages of _engine_generate in tests/_engine_helpers.py and across the
test suite to update invocations accordingly so the function enforces
keyword-only for query and model.

In `@tests/test_aggregation_gating.py`:
- Around line 52-73: The _generate_sql function duplicates test engine
setup/teardown; replace its body with a call to the shared helper
_engine_generate to reuse the centralized storage/bootstrap/validation logic:
import _engine_generate at the top of the test module (if not already), change
_generate_sql to delegate to await _engine_generate(orders=orders,
customers=customers, measures=measures) and return the SQL string it yields, and
remove the duplicated YAMLStorage/SlugQueryEngine/bootstrap/assert logic (keep
any local assertions only if they’re uniquely required for this test).

In `@tests/test_errors_hierarchy.py`:
- Line 51: Update the two pytest.parametrized decorators to use keyword args:
replace the positional call pytest.mark.parametrize("cls",
_VALUE_ERROR_BINDING_ERRORS) (and the other instance at the same file) with
pytest.mark.parametrize(argnames="cls", argvalues=_VALUE_ERROR_BINDING_ERRORS)
so the test decorators follow the repo style for functions with multiple
parameters; locate the decorators in tests/test_errors_hierarchy.py by the
pytest.mark.parametrize usage and change to argnames= and argvalues=
accordingly.

In `@tests/test_transform_lowerer.py`:
- Line 97: The assertion currently checks `zero == 0`, which can pass for
non-canonical representations; update the test to assert a canonical zero type
by comparing `zero` to `Decimal(0)` (or to the result of `normalize_scalar(0)`
if that helper exists) so the test fails on normalization regressions—replace
the `assert zero == 0` with an assertion that `zero` equals `Decimal(0)` (or
`normalize_scalar(0)`), referencing the `zero` variable and `normalize_scalar`
helper as needed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 64624493-4eca-4a6e-bead-2108082f137a

📥 Commits

Reviewing files that changed from the base of the PR and between 8b2239c and 0065024.

📒 Files selected for processing (34)

CLAUDE.md
docs/concepts/references.md
slayer/core/errors.py
slayer/core/keys.py
slayer/engine/binding.py
slayer/engine/planning.py
slayer/engine/syntax.py
slayer/sql/generator.py
tests/_engine_helpers.py
tests/integration/test_integration.py
tests/test_agg_render_spec.py
tests/test_aggregation_gating.py
tests/test_cross_model_derived_columns.py
tests/test_cross_model_rename_dev1448.py
tests/test_errors_hierarchy.py
tests/test_filter_renamed_measure.py
tests/test_formula.py
tests/test_generator2_multistage.py
tests/test_help.py
tests/test_measure_expansion.py
tests/test_memories_resolver_typed.py
tests/test_named_measures.py
tests/test_nested_dag_cross_stage_refs.py
tests/test_projection_trim.py
tests/test_query_backed_models.py
tests/test_query_backed_typed_expansion.py
tests/test_reference_semantics.py
tests/test_response_meta.py
tests/test_schema_drift_typed.py
tests/test_slack_normalization.py
tests/test_sql_generator.py
tests/test_syntax.py
tests/test_transform_lowerer.py
tests/test_value_registry.py

coderabbitai · 2026-05-28T18:30:06Z

+        slot = _slot(
+            key,
+            declared_name="active_revenue_sum",
+            public_name="active_revenue_sum",
+            slot_type=DataType.DOUBLE,
+        )


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Pass key as a keyword argument in _slot(...).

This new call should follow keyword-arg style for multi-parameter functions.

Suggested patch

- slot = _slot( - key, + slot = _slot( + key=key, declared_name="active_revenue_sum", public_name="active_revenue_sum", slot_type=DataType.DOUBLE, )

As per coding guidelines **/*.py: Use keyword arguments for functions with more than 1 parameter.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

slot = _slot(

key,

declared_name="active_revenue_sum",

public_name="active_revenue_sum",

slot_type=DataType.DOUBLE,

)

slot = _slot(

key=key,

declared_name="active_revenue_sum",

public_name="active_revenue_sum",

slot_type=DataType.DOUBLE,

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_agg_render_spec.py` around lines 403 - 408, The call to _slot currently passes key positionally; update it to use a keyword argument (e.g., key=key) so all parameters are passed by name; modify the instantiation assigned to variable slot that calls _slot(declared_name=..., public_name=..., slot_type=...) to include key=key to follow the keyword-arg style for multi-parameter functions.

coderabbitai · 2026-05-28T18:30:06Z

+async def _expand_stage_as_model(
+    engine: SlayerQueryEngine,
+    stage: SlayerQuery,
+    *,
+    name: str,
+    data_source: str = "ds",
+) -> SlayerModel:


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use keyword-only parameters and keyword-based calls for the new helper.

The new helper and its callsites use positional arguments even though the function has multiple parameters.

Suggested patch

async def _expand_stage_as_model( - engine: SlayerQueryEngine, - stage: SlayerQuery, - *, + *, + engine: SlayerQueryEngine, + stage: SlayerQuery, name: str, data_source: str = "ds", ) -> SlayerModel: @@ - virtual = await _expand_stage_as_model(engine, stage, name="qb_default") + virtual = await _expand_stage_as_model( + engine=engine, + stage=stage, + name="qb_default", + ) @@ - virtual = await _expand_stage_as_model(engine, stage, name="qb_collide") + virtual = await _expand_stage_as_model( + engine=engine, + stage=stage, + name="qb_collide", + ) @@ - virtual = await _expand_stage_as_model(engine, stage, name="qb_alias") + virtual = await _expand_stage_as_model( + engine=engine, + stage=stage, + name="qb_alias", + )

As per coding guidelines **/*.py: Use keyword arguments for functions with more than 1 parameter.

Also applies to: 1470-1470, 1489-1489, 1509-1509

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_query_backed_models.py` around lines 52 - 58, The helper function _expand_stage_as_model currently accepts multiple positional parameters (engine, stage, *, name: str, data_source: str) but is being called positionally; change its signature to enforce keyword-only usage (e.g., make engine and stage positional but require name and data_source as keyword-only or make all parameters keyword-only) and update all call sites (including the calls referenced around the other occurrences) to pass arguments by keyword (e.g., name=<...>, data_source=<...>) so the function is invoked using keyword-based calls; ensure callers of _expand_stage_as_model use the exact parameter names (name, data_source) in their calls.

coderabbitai · 2026-05-28T18:30:06Z


 Asserts the typed-plan-derived ``attributes`` + ``expected_columns`` match
-what the legacy EnrichedQuery path produced: result keys read straight from
+what the legacy legacy enriched-query path produced: result keys read straight from


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix duplicated "legacy" in docstring.

The phrase "legacy legacy enriched-query path" appears to have a duplicated word.

📝 Proposed fix

-what the legacy legacy enriched-query path produced: result keys read straight from +what the legacy enriched-query path produced: result keys read straight from

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

what the legacy legacy enriched-query path produced: result keys read straight from

what the legacy enriched-query path produced: result keys read straight from

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_response_meta.py` at line 4, The docstring in tests/test_response_meta.py contains a duplicated word "legacy legacy" in the phrase "legacy legacy enriched-query path"; update the docstring to remove the duplicate so it reads "legacy enriched-query path" (search for the string "legacy legacy enriched-query path" and replace it with "legacy enriched-query path" in that file).

… their joins A column-level Column.filter on an aggregated measure renders as SUM(CASE WHEN <filter> THEN col END). When the filter referenced a joined table via a derived column (bare is_eu, or dotted loss_payment.has_flag whose own sql is derived), the typed pipeline left the derived ref un-inlined (emitting a non-physical <alias>.<derived_col>) and/or failed to pull the crossed join into the FROM. Fix (all in slayer/sql/generator.py): extract one shared Mode-A predicate renderer (_render_mode_a_predicate) that inline-expands derived refs — bare AND dotted-to-a-derived-column-on-a-joined-model — else cheap-qualifies, matching the already-correct query-level filter path so no dangling ref is emitted. Join discovery (_filter_join_paths) unions the un-inlined predicate's paths (so the dbt placeholder-join idiom has_flag sql="1" keeps its alias even when it inlines to a constant) with the inline-expanded predicate's paths (so a derived ref's crossed joins surface). Applied to both Column.filter and SlayerModel.filters; the cross-model _cm_ CTE target-filter path keeps dotted-derived inlining off (no deeper-join mechanism — that is DEV-1503). Tests: un-xfail + strengthen the two tracking tests (now assert the ref is inlined inside the CASE-WHEN, not just that a join exists); promote the filtered-last carries-join test; keep the DEV-1503 isolation xfails with refreshed reasons; add companions for dotted-base, dotted-derived crossing a further join, model-filter inlining, a mixed filtered/unfiltered sibling, the root-scope-only EXISTS limitation, a sqlite constant case, and multi-hop input. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CodeRabbit: (1) recurse into composite AGGREGATE keys (ArithmeticKey / ScalarCallKey) when collecting Column.filter joins, so a filtered aggregate nested in `a:sum + b:sum` discovers its crossed join; (2) thread source_relation into the dotted-derived detection so it resolves consistently with the rest of the Mode-A path; (3) use _norm(sql) in the mixed-measure assertion. Codex: a cross-model aggregate whose TARGET column carries a Column.filter crossing one of the target's own joins emitted a dangling alias in the _cm_* CTE. _render_cross_model_cte now collects the target measure's Column.filter + target-model-filter join paths and adds them to the CTE's FROM (the CTE is an isolated per-(target,grain) computation, so the join only affects this measure). Target-model-filter dotted-derived inlining is enabled there as a result. Sonar gate (was red): dedupe the new companion fixtures behind shared helpers (new_duplicated_lines_density 14% -> under threshold); rewrite the two ReDoS .*X.* assertion regexes to non-backtracking [^"]* (S5852 hotspots). Reduce cognitive complexity by extracting helpers: _column_ref_is_derived, _expand_degenerate_derived_root (from _render_mode_a_predicate), _collect_column_filter_join_paths + _expand_derived_row_dims (from _build_base_select_for_planned), and _value_key_join_paths (from _collect_filter_join_paths). New tests: composite-measure filtered aggregate, cross-model CTE target-filter join. Full non-integration suite: 3914 passed / 0 failed / 39 xfailed; ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Security hotspots (python:S5852, gate new_security_hotspots_reviewed): the two flagged assertion regexes still tripped the rule after the [^"]* rewrite — S5852 flags ANY regex with two-plus unbounded quantifiers, so [^"]*X[^"]*Y is still ReDoS-shaped. Reduce to a single bounded [^"]* segment plus a plain substring check for the THEN clause; no hotspot is raised. Cognitive complexity (python:S3776, non-gate-blocking CRITICAL): the helper extraction reduced the worst function 87 -> 43 and eliminated two, but the residual is irreducible per-slot/per-key dispatch + recursion that can't reach 15 without fragmenting cohesive logic. NOSONAR(S3776) with per-function rationale on the four (matches the existing _render_cross_model_cte precedent). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…oss-model-derived-column-refs-in-filters DEV-1494: inline cross-model derived refs in Column.filter + discover their joins

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

slayer/sql/generator.py (3)
5404-5412: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Routed derived filters need to contribute to cte_join_paths.

_collect_routed_filters() can expand a routed target-local ColumnSqlKey into SQL that references deeper __ aliases, but those paths are never added to cte_join_paths before cte_where is attached. A routed filter like where is_eu = true on the target model will therefore generate a predicate against joins this _cm_* CTE never added.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/sql/generator.py` around lines 5404 - 5412, _cte_where returned by
_collect_routed_filters may reference deeper join aliases but its join path(s)
are not merged into cte_join_paths, so routed filters produce predicates against
joins the _cm_* CTE never added; update the call site after cte_where is
computed so that when cte_where is not None you also extract and merge the
routed join path information returned by _collect_routed_filters (or call its
companion accessor) into cte_join_paths before appending cte_where, referencing
the symbols _collect_routed_filters, cte_where and cte_join_paths (and the
ColumnSqlKey/_cm_* path names) so the CTE includes the necessary join paths for
the generated predicates.
3525-3556: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Collect joins crossed by derived aggregate sources too.

This prepass now covers row dims, row filters, and Column.filter, but it still skips joins introduced by a local aggregate’s ColumnSqlKey source. _build_agg_render_spec_from_planned() expands that SQL later, so a measure like sum(net_amount) can still emit customers__... refs against a FROM that never joined customers unless some unrelated clause happens to pull the same path in.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/sql/generator.py` around lines 3525 - 3556, The prepass misses joins
introduced by a local aggregate's ColumnSqlKey source so add a pass that
collects those join paths and appends them to needed_join_paths before
rendering; call a new or existing collector (e.g. implement and call
self._collect_agg_source_join_paths(planned_query=planned_query,
base_render_order=base_render_order, slots_by_id=slots_by_id,
source_relation=source_relation, source_model=source_model, bundle=bundle) or
similar) right alongside the other collectors shown, and for each returned path
do "if p not in needed_join_paths: needed_join_paths.append(p)"; ensure this
complements _build_agg_render_spec_from_planned() so aggregate ColumnSqlKey SQL
won't reference unjoined relations.
5361-5438: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The _cm_* join collector still misses the measure source SQL.

cte_join_paths only looks at the measure’s column_filter_key and text target-model filters. If the cross-model measure source itself is a target-local derived ColumnSqlKey, _build_agg_render_spec_from_planned() will expand it to __-qualified refs, but target_from is still built without those joins, so the aggregate body can reference aliases that never entered the CTE’s FROM.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/sql/generator.py` around lines 5361 - 5438, cte_join_paths is not
being seeded with the measure's source ColumnSqlKey, so when
_build_agg_render_spec_from_planned expands a target-local derived measure the
CTE's FROM misses required joins; update the block that seeds cte_join_paths
(around local_agg_key and _add_cte_join_paths) to also detect the measure source
ColumnSqlKey on local_agg_key (the same source used by
_build_agg_render_spec_from_planned) and call _add_cte_join_paths with its
canonical_sql (e.g. local_agg_key.<source_column_key>.canonical_sql or whatever
field holds the measure source ColumnSqlKey) so the generated
target_from/cte_base_joins include those join paths.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@slayer/sql/generator.py`:
- Around line 5404-5412: _cte_where returned by _collect_routed_filters may
reference deeper join aliases but its join path(s) are not merged into
cte_join_paths, so routed filters produce predicates against joins the _cm_* CTE
never added; update the call site after cte_where is computed so that when
cte_where is not None you also extract and merge the routed join path
information returned by _collect_routed_filters (or call its companion accessor)
into cte_join_paths before appending cte_where, referencing the symbols
_collect_routed_filters, cte_where and cte_join_paths (and the
ColumnSqlKey/_cm_* path names) so the CTE includes the necessary join paths for
the generated predicates.
- Around line 3525-3556: The prepass misses joins introduced by a local
aggregate's ColumnSqlKey source so add a pass that collects those join paths and
appends them to needed_join_paths before rendering; call a new or existing
collector (e.g. implement and call
self._collect_agg_source_join_paths(planned_query=planned_query,
base_render_order=base_render_order, slots_by_id=slots_by_id,
source_relation=source_relation, source_model=source_model, bundle=bundle) or
similar) right alongside the other collectors shown, and for each returned path
do "if p not in needed_join_paths: needed_join_paths.append(p)"; ensure this
complements _build_agg_render_spec_from_planned() so aggregate ColumnSqlKey SQL
won't reference unjoined relations.
- Around line 5361-5438: cte_join_paths is not being seeded with the measure's
source ColumnSqlKey, so when _build_agg_render_spec_from_planned expands a
target-local derived measure the CTE's FROM misses required joins; update the
block that seeds cte_join_paths (around local_agg_key and _add_cte_join_paths)
to also detect the measure source ColumnSqlKey on local_agg_key (the same source
used by _build_agg_render_spec_from_planned) and call _add_cte_join_paths with
its canonical_sql (e.g. local_agg_key.<source_column_key>.canonical_sql or
whatever field holds the measure source ColumnSqlKey) so the generated
target_from/cte_base_joins include those join paths.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 07aeef3a-a52b-4433-b671-9031b96395a8

📥 Commits

Reviewing files that changed from the base of the PR and between 0065024 and 74ed95a.

📒 Files selected for processing (5)

docs/architecture/index.md
docs/architecture/sql-generation.md
slayer/sql/generator.py
tests/test_cross_model_derived_columns.py
tests/test_sql_generator.py

✅ Files skipped from review due to trivial changes (2)

docs/architecture/index.md
docs/architecture/sql-generation.md

…G slack rule The slack FUNC_STYLE_AGG normalizer previously sourced custom aggregation names from the source model only, so `rolling_avg(customers.score)` reached the Mode-B parser unrewritten and raised UnknownFunctionError. Both the query path (`_normalize_stage`) and the model-save path (`normalize_model` via `save_model`) now collect aggregations reachable through the join graph: the query path uses a sync `ResolvedSourceBundle.reachable_aggregation_names` walk over the pre-resolved bundle (scoped per stage), and the save path reuses `_collect_reachable_agg_names` with a best-effort storage closure that swallows AmbiguousModelError and other lookup errors. `normalize_model` gains a `custom_agg_names` param mirroring `normalize_query`, with None=>fallback-to-own-aggs and frozenset()=>suppress-fallback. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

`parse_filter_expr` rewrote every standalone `=` to `==` so SQL equality (`status = 'paid'`) parses. The blanket rewrite also rewrote keyword- argument `=` inside transform/aggregation calls, so `ntile(amount:sum, n=4) <= 1` became `n == 4` (a positional `Cmp`) and the binder rejected it as "exactly one positional argument". The documented top-N filter patterns (`ntile(<m>, n=<N>) <= 1`, `rank(<m>, partition_by=<col>) <= N`) were broken end-to-end whenever a kwarg was present. Replace the per-part `=`→`==` regex with a literal-aware, paren-kind- tracking single scan that preserves `=` as a kwarg iff the innermost open paren is a CALL paren whose callee is NOT in `SCALAR_FUNCTIONS`, and the `=` is preceded by `IDENT` preceded by `(` or `,` (Python's kwarg grammar). Lowercase keywords (`and`/`or`/`not`/`in`/`is`) do not classify a following `(` as a call paren. The scanner runs as the last step of `_normalize_sql_filter_operators` on the rejoined string. The `SCALAR_FUNCTIONS` narrowing mirrors the architecture's existing no-kwargs-on-scalars rule (docs/architecture/parsing.md), so `coalesce(status = 'paid', False)` still normalizes to a `ScalarCall` with a `Cmp` arg as before. Same code path fixes the colon-agg form `revenue:percentile(p=0.5) > 100`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…outer trim Hidden first/last aggregates referenced only by ORDER BY or HAVING used to render inline against a dangling `_last_rn`, collapsing distinct explicit-time-column specs into a single broken term. The architecture's documented design (planning.md:89-93,155-159) already mandates the fix: materialise hidden order/filter slots in the base CTE, trim them via an outer SELECT. The transform path already did this; this commit extends the same pattern to the no-transform path. Five coordinated changes: * planning.py `_canonical_name` routes AggregateKey through `canonical_agg_name` so parametric aggregates (`revenue:last(created_at)`, `revenue:percentile(p=0.5)`) carry their arg/kwarg signature in the declared name — distinct hidden slots get distinct base-CTE aliases. * generator.py `_collect_base_aux_slot_ids` learns `aggregates_only=True` and walks AGGREGATE-phase filter deps too. The no-transform path uses it to pull hidden AggregateKey order/filter operands into base_render_order without dragging ROW leaves into GROUP BY. * generator.py `_build_outer_trim_wrap_sql` wraps the base SELECT when any hidden materialised slot is present, projecting exactly the public projection (mirroring the transform path's outer wrap, cycling aliases_by_slot_id for C13 duplicates) and moving ORDER BY / LIMIT / OFFSET to the outer level. `_apply_order_limit_from_planned` resolves hidden AggregateKey order targets via the materialised aliases. * generator.py `_build_first_last_base_select` collects full aliases per sid (multi-alias list) and pass-2 cycles per visit — pre-existing C13 bug exposed by the wrap. * generator.py `_build_agg` suffix guard no longer requires `default_time_col`; FirstLastRenderState threads the rn maps from the base builder into `_build_where_having_from_planned` so HAVING aggregates reference the same `_first_rn` / `_last_rn{suffix}` columns the base projects. Tests: 27 new in test_sql_generator (TestDev1501HiddenFirstLastRender + TestDev1501BroadTriggerAndGuards + multi-dialect parametric) and test_projection_planner (TestDev1501CanonicalNameWithAggArgs) plus one SQLite integration test against real data. test_projection_trim's `test_declared_projection_order_and_hidden_exclusion` second assertion relaxed to "no quantity_sum in OUTER projection" to match the materialise +trim shape per docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…TYLE_AGG callers The query/save normalization paths walked the join graph for custom aggregation names in beb6924, but the two quiet callers of ``func_style_agg_to_colon`` still passed source-model-only aggs: - ``slayer.memories.resolver.extract_entities_from_query`` — a saved memory whose query references a joined-model custom agg in funcstyle (e.g. ``rolling_avg(customers.score)``) lost the ``customers.score`` entity tag because the rewrite did not fire. - ``slayer.engine.schema_drift._cascade_measures`` — a ModelMeasure on ``orders`` using the same funcstyle did not cascade-drop when the joined column was removed from ``customers``. Both paths now collect custom aggregation names reachable through the join graph: the resolver reuses the async ``_collect_reachable_agg_names`` with a best-effort storage closure (matching ``save_model``), and the cascade does a sync BFS over ``_CascadeState.models_by_name``. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…hygiene Codex + CodeRabbit + SonarQube review of PR #159 surfaced two genuine HAVING render bugs and a defensive POST-walk gap, all rooted in incomplete plumbing of the ranked-subquery state into the HAVING render paths added by the original Change 5. Group A — HAVING ranked-state threading (2 real bugs): * Host HAVING: ``_render_value_key_for_filter``'s AggregateKey branch built the synth with placeholder ``full_alias="__having_ref__"``, so ``_build_agg``'s ``filtered_rn_map[alias]`` / ``filtered_match_map [alias]`` lookups missed and the term fell back to unfiltered ``_last_rn`` + raw ``filter_sql`` for filtered first/last. Thread ``aliases_by_slot_id`` into ``_build_where_having_from_planned`` → ``_render_value_key_for_filter`` and look up the materialised slot's full alias as the synth alias. * Cross-model HAVING: ``_collect_routed_filters`` → ``_render_filter_value_key_in_target_scope`` rebuilt the aggregate via ``_build_agg(synth)`` with no rn-state context, so a cross-model filtered first/last in HAVING bound to bare ``_last_rn`` instead of the CTE's ``_last_rn_f0``. Extend ``FirstLastRenderState`` with ``agg_synth_alias`` (single-aggregate cross-model CTE case), ``_render_cross_model_cte`` builds the state from its existing rn maps + the projected aggregate's ``full_agg_alias``, threaded through ``_collect_routed_filters`` / ``_render_filter_value_key_in_target_scope`` recursion. Group B — defensive guard: * ``_collect_base_aux_slot_ids`` no longer walks POST-phase filter deps unless ``transform_layers`` exist. POST-phase requires TransformKey operands, so POST in no-transform queries is planner-unreachable; walking the deps without applying the filter would silently materialise helper slots. Group C — Sonar S3776 NOSONAR markers on 5 functions over the complexity threshold (``_canonical_name`` planning.py:631; ``generate_from_planned``, ``_collect_base_aux_slot_ids``, ``_render_value_key_for_filter``, ``_apply_order_limit_from_planned`` generator.py), each with a rationale matching the existing codebase pattern. Pre-existing test helper at tests/test_sql_generator.py:88 left untouched. Group D — two small Sonar test fixes: relax the reluctant-quantifier HAVING regex from ``r"HAVING(.*?)(ORDER BY|$)"`` to ``r"HAVING(.*)$"`` (inner SELECT has no ORDER BY after outer wrap); drop ``async`` from ``test_hidden_composite_order_rejected_at_input_validation`` (does no ``await`` — just an OrderItem input-validation check). Tests: 2 new regression tests for the HAVING bugs (``test_filtered_first_last_in_having_uses_filtered_rn``, ``test_cross_model_filtered_last_in_having``); all 32 DEV-1501 tests pass; full non-integration suite 3931 pass; SQLite + DuckDB integration suites pass; ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The 4-hop save and 4-hop query tests each defined their own local _chain_model helper and inlined the same a -> b -> c -> d -> e save sequence. Lift both into module-level helpers so the tests don't duplicate the chain setup — fixes the SonarCloud duplication block the gate flagged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… + NOSONAR Codex round-2 review caught a residual gap from the round-1 fix: ``_render_value_key_for_filter``'s recursive calls for ``ScalarCallKey`` / ``BetweenKey`` / ``InKey`` threaded ``first_last_state`` but DROPPED ``aliases_by_slot_id``. So a filtered ``paid_amount:last(created_at)`` nested under ``coalesce(...)``, ``IN (...)``, or ``BETWEEN`` rebuilt the synth with placeholder ``__having_ref__`` → ``filtered_rn_map[alias]`` missed → fell back to unfiltered ``_last_rn`` + raw ``filter_sql``. The original round-1 regression test only exercised the ``>`` (ArithmeticKey) path. Fix: pass ``aliases_by_slot_id`` through the three missing recursion sites in ``_render_value_key_for_filter``. New regression test pins a ``coalesce(paid_amount:last(created_at), 0) > 0`` HAVING shape to prevent re-occurrence. Sonar S3776 NOSONAR marker on ``_render_filter_value_key_in_target_scope`` (the function grew from the round-1 Group A.3 edits; complexity 26). Also: reply on the two stale CodeRabbit threads (POST-drop guard + cross-model HAVING rn-state) that were addressed in commit dc7a775 but not auto-resolved. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…n-state Codex round-3 review caught the last rn-state plumbing gap from the DEV-1501 series: composite aggregate slots (ArithmeticKey / ScalarCallKey of aggregates) projected on the first/last-ranked path render via ``_render_aggregate_composite_expr`` which previously did NOT receive ``rn_suffix_map`` / ``default_time_col_sql`` / ``filtered_rn_map`` / ``filtered_match_map``. A composite like ``revenue:last(created_at) + revenue:last(updated_at)`` therefore emitted ``MAX(CASE WHEN _last_rn = 1 …) + MAX(CASE WHEN _last_rn = 1 …)`` — both operands collapsed to bare ``_last_rn`` regardless of their explicit time arg. Two coordinated fixes: * ``_render_aggregate_composite_expr`` now accepts the four rn-state params (defaults to ``None`` so non-first/last call sites are unaffected) and passes them to ``_build_agg`` in the AggregateKey branch + threads through ArithmeticKey / ScalarCallKey recursion. * ``_build_first_last_base_select`` gathers composite-operand first/last AggregateKey leaves (via the new module-level ``_iter_first_last_leaves`` walker) and includes their synth specs in the ranked-subquery synth list — so each operand's distinct time column contributes its own ``_first_rn`` / ``_last_rn{suffix}`` column. The operands are NOT projected as base SELECT columns (they're inlined inside the composite render); they only participate in the rn-column set. New regression test: ``test_composite_first_last_in_projection_uses_ correct_suffixes`` exercises a query with both a direct projected ``last(created_at)`` AND a composite ``last(created_at) + last(updated_at)``, asserting the composite renders each operand with a DIFFERENT ``_last_rn{suffix}``. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Two new Sonar OPEN issues on PR 156 were attributed to my DEV-1500 edits: - normalize_model (normalization.py:571) — cognitive complexity 48 > 15. Split into three pure helpers — _normalize_model_measures (FUNC_STYLE on ModelMeasure.formula), _normalize_column_dot_paths (DOT_PATH on Column sql/filter), _normalize_model_filter_dot_paths (DOT_PATH on SlayerModel.filters). normalize_model now just sequences them, so the outer function's complexity drops sharply. - save_model (query_engine.py:1582) — cognitive complexity 29 > 15. The nested _resolve_join_target_for_save closure plus the gating try/except inflated the score. Lift the whole thing into a private async method _reachable_aggs_for_save(model); save_model is back to a one-liner await. Pure refactor, no behavior change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…agg-normalization-doesnt-recognize-custom DEV-1500: recognize joined-model custom aggregations in FUNC_STYLE_AGG slack rule

…anked subquery Codex round-4 review caught the entry-gate gap from round 3's composite fix: ``_has_first_last_aggregate`` only walked top-level ``AggregateKey`` slots. A query whose ONLY first/last reference is inside a composite slot (no direct first/last sibling) — e.g. ``measures=[ModelMeasure("revenue:last(created_at) + revenue:last(updated_at)")]`` — took the regular base path, the composite render emitted ``MAX(CASE WHEN _last_rn = 1 …)`` referencing a column the bare-FROM never projected. Fix: extend ``_has_first_last_aggregate`` to also walk composite slot keys via ``_iter_first_last_leaves``. Combined with round 3's composite-operand synth gathering and rn-state threading, composite-only queries now correctly build the ranked subquery and render each operand with its own ``_first_rn`` / ``_last_rn{suffix}``. New regression test: ``test_composite_only_first_last_triggers_ranked_subquery``. Also: Sonar S3776 NOSONAR(S3776) on ``_iter_first_last_leaves`` (the walker is the per-type recursion contract — extracting per-type helpers would scatter it); Sonar S3626 redundant ``return`` removed at the end of the InKey branch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…y gate The DEV-1492 scanner had cognitive complexity 56 (cap 15) — Sonar `python:S3776` and CodeRabbit Thread 8 on PR #158. Dispatch on `m.lastgroup` (each token-class group in `_SCAN_TOKEN_RE` is named and the arms are mutually exclusive). Extract one helper per token class and pull the two non-trivial sub-decisions into pure helpers `_classify_paren` and `_is_kwarg_equals` — both testable in isolation. Top-level body drops to a 4-line dispatcher; each handler and helper is ≤ ~5 lines. Behavior unchanged. 18 targeted tests + full non-integration suite still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/test_validate_models.py (1)
453-453: ⚡ Quick win

Move Aggregation to the module import block.

This function-scoped import breaks the repo's import-placement rule and makes the test dependency easy to miss. As per coding guidelines, "Place imports at the top of files."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_validate_models.py` at line 453, The test currently does a
function-scoped import of Aggregation (from slayer.core.models import
Aggregation); move that import into the module-level import block at the top of
tests/test_validate_models.py so all imports are grouped together. Locate the
current inline import of Aggregation, remove it from inside the function, and
add a single module-level import line for Aggregation alongside the other test
imports so the test follows the project's import-placement rule.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/engine/query_engine.py`:
- Around line 1591-1592: The early-return guard only checks model.measures and
thus skips query-backed models whose formulas live in
SlayerModel.source_queries; update the check in the function containing this
guard to treat a model as empty only if both model.measures and
model.source_queries are empty/None (i.e., proceed when either has content).
Ensure reachable custom-aggregation discovery logic considers source_queries the
same way it considers measures so joined-model funcstyle aggregations normalized
by save_model() are also discovered at execute time.

---

Nitpick comments:
In `@tests/test_validate_models.py`:
- Line 453: The test currently does a function-scoped import of Aggregation
(from slayer.core.models import Aggregation); move that import into the
module-level import block at the top of tests/test_validate_models.py so all
imports are grouped together. Locate the current inline import of Aggregation,
remove it from inside the function, and add a single module-level import line
for Aggregation alongside the other test imports so the test follows the
project's import-placement rule.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 26e96109-fda4-46e8-8006-1a3d9e83d2ff

📥 Commits

Reviewing files that changed from the base of the PR and between 74ed95a and 93bc7cd.

📒 Files selected for processing (10)

slayer/engine/normalization.py
slayer/engine/query_engine.py
slayer/engine/schema_drift.py
slayer/engine/source_bundle.py
slayer/memories/resolver.py
tests/test_entity_resolution.py
tests/test_slack_normalization.py
tests/test_source_bundle.py
tests/test_sql_generator.py
tests/test_validate_models.py

coderabbitai · 2026-05-31T17:15:19Z

+        if not model.measures:
+            return None


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Handle query-backed models here too.

Query-backed models keep their formulas under SlayerModel.source_queries, not model.measures, so this guard skips reachable custom-aggregation discovery for exactly the models that save_model() also normalizes. Joined-model funcstyle custom aggregations in stored stages will therefore stay unnormalized on save even though execute-time normalization now recognizes them.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/query_engine.py` around lines 1591 - 1592, The early-return guard only checks model.measures and thus skips query-backed models whose formulas live in SlayerModel.source_queries; update the check in the function containing this guard to treat a model as empty only if both model.measures and model.source_queries are empty/None (i.e., proceed when either has content). Ensure reachable custom-aggregation discovery logic considers source_queries the same way it considers measures so joined-model funcstyle aggregations normalized by save_model() are also discovered at execute time.

…ined-path discovery Codex round-5 review caught two more composite-aware code paths the round 3/4 fixes missed: 1. ``_build_first_last_base_select``'s "no-default ranking time column" validation only walked top-level ``AggregateKey`` slots — a composite like ``revenue:last + 1`` (no explicit time arg, no model default_time_dimension) bypassed the error and then ``_build_unfiltered_rn_columns`` emitted ``ORDER BY None``. Extend the validation to also walk composite slots via ``_iter_first_last_leaves``. 2. ``_collect_joined_paths_for_base`` walked explicit time args only on top-level ``AggregateKey`` slots. A composite operand ``revenue:last(customers.signed_up_at) + 1`` orders the ranked subquery by ``customers.signed_up_at`` but the ``customers`` join was never pulled into the FROM, producing invalid SQL. Extend path discovery to walk composite-slot first/last leaves. Two new regression tests: * ``test_composite_first_last_without_default_time_dim_raises`` * ``test_composite_first_last_with_joined_time_arg_adds_join`` Also: CodeRabbit minor — the round-2 cross-model HAVING regression test used ``filter="active = 1"`` on a BOOLEAN column, which Postgres rejects (no implicit bool↔int cast). Switched to ``= TRUE``. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Codex review on PR #158 caught that the kwarg scanner mis-classified the documented `consecutive_periods(<predicate>)` form when the predicate's LHS is a bare ref and its operator is SQL `=`: consecutive_periods(status = 'paid') >= 2 was normalized to `consecutive_periods(status='paid') >= 2`, parsed as a kwarg, and rejected by the binder. The `:` in colon-agg forms (`revenue:sum > 0`) accidentally avoided the regression because the intervening `:` token broke the LPAREN/IDENT pattern; bare refs had nothing to break it. Distinguish callee classes in `_is_kwarg_equals`: for transforms (in `ALL_TRANSFORMS`) the first positional is always the value to transform, so a kwarg `=` can only appear AFTER a `,`. Aggregations keep the existing `LPAREN` or `COMMA` rule because parametric forms (`weighted_avg(weight=qty)`, `percentile(p=0.5)`) legitimately put a kwarg first. Scalars keep their existing no-kwarg exclusion. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… filter Codex round-6 review caught two more edge cases: 1. Filtered composite operands rebuilt their synth with the placeholder ``full_alias="__op__"`` in ``_render_aggregate_composite_expr``, but ``_build_first_last_base_select`` keyed ``filtered_rn_map`` by the per-leaf alias it minted at synth time (``_composite_op_N``). The lookup missed → bare ``_last_rn`` + raw filter. Thread a ``composite_alias_by_key`` map so the operand synth's alias matches the rn-map key. 2. Cross-model queries with a LOCAL first/last HAVING filter (``filters=["revenue:last(created_at) > 100"]``) had the ``_render_with_cross_model_plans`` ``base_render_order`` construction skip AGGREGATE-phase filter deps. The host ``_base`` CTE wasn't building the ranked subquery → HAVING emitted dangling ``_last_rn``. Add an unconditional ``_collect_base_aux_slot_ids(aggregates_only= True, include_order=False)`` pass to the cross-model base-order construction. The fix surfaces the existing ``NotImplementedError`` for the combination of local first/last + cross-model aggregate (the in-place guard was already there), which is a strict improvement over silently dangling-``_last_rn`` SQL. NOSONAR(S3776) on ``_collect_joined_paths_for_base`` (round 5 extension bumped its complexity to 30). Two regression tests: * ``test_filtered_composite_first_last_uses_filtered_rn`` * ``test_cross_model_query_with_local_first_last_having_filter`` Also: reply on CodeRabbit's POST-drop re-flag thread referencing the ``has_transforms`` guard in dc7a775. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…+ NOSONARs Three follow-ups from round-7 review: 1. Codex MAJOR: ``_render_aggregate_composite_expr``'s ``ScalarCallKey`` branch dropped ``composite_alias_by_key`` when recursing into operands (the ``ArithmeticKey`` branch forwarded it; only the ``ScalarCallKey`` recursion missed it). A filtered first/last operand inside ``coalesce(...)`` would then fall back to the placeholder ``__op__`` alias and miss the ``filtered_rn_map`` lookup. Forward ``composite_alias_by_key`` consistently in all three recursion branches. 2. Sonar quality-gate ERROR on ``new_duplicated_lines_density`` (4.0% > 3.0%): the round-6 cross-model AGG-phase filter materialization block was near-identical to the round-5 transform-deps block. Factor both into a single ``_add_local_aux_slots(include_order=, aggregates_only=)`` inner helper. Same control flow, one body. 3. Sonar S3776 NOSONAR markers on ``_render_aggregate_composite_expr`` (complexity 37 — sequential ValueKey dispatch with rn-state + composite-alias-by-key threading) and ``_render_with_cross_model_plans`` (complexity 80 — orchestration of host ``_base`` CTE + per-plan ``_cm_*`` CTEs + combined SELECT + transform chain + outer wrap). Also: reply on the CodeRabbit POST-drop re-flag thread (the third time this round) — the ``has_transforms`` guard from dc7a775 remains sufficient; POST in no-transform is planner-unreachable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Codex review on PR #158 caught that the kwarg scanner classifies a call paren by callee name alone, missing the colon-aggregation context. The names `first` and `last` sit in both ALL_TRANSFORMS (FIRST_VALUE window) and the built-in aggregation set (`_AMBIGUOUS_AGG_TRANSFORMS` in `slayer/core/formula.py`), and after a `:` they are always aggregations — never transforms. So `revenue:first(order=created_at) > 0` was landing in the transform branch and the `order=` kwarg was being rewritten to `order==`. Detect colon-agg context in `_classify_paren` by checking whether `hist[-2]` is the `:` token. When so, drop the callee to `None` so `_is_kwarg_equals` takes the aggregation/unknown branch (kwargs after `(` or `,`). The `hist` cap of 2 retains exactly the prev-prev token needed; longer prefixes like `customers.revenue:first(...)` roll off without affecting the detection. Built-in `first`/`last` take their order column positionally today (`revenue:last(ordered_at)`), so the user-visible impact is on custom user-defined aggregations that override these names; the test pins the architectural rule with a concrete kwarg-shape input. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Three follow-ups from round-8 review: 1. Codex MAJOR (slayer/sql/generator.py:6304+): a DERIVED first/last time arg (``ColumnSqlKey``) skipped join discovery. ``_resolve_explicit_time_col`` expands ``net_signed_at.sql = "customers.signed_up_at"`` so the ranked subquery's ORDER BY emits ``customers.signed_up_at`` — but ``_collect_joined_paths_for_base`` only walked ``ColumnKey`` args, so the ``customers`` join was not added to ``_base``. Extend the helper to accept ``source_model`` / ``source_relation`` / ``bundle`` and walk ``ColumnSqlKey`` args via the existing ``_expand_derived_column_sql`` + ``_joined_paths_in_sql`` pair (same machinery the ROW-derived TimeTruncKey path already uses). + regression test ``TestDev1501HiddenFirstLastRender.test_derived_time_arg_pulls_in_referenced_join``. 2. Sonar S3776 NOSONAR on ``_projection_rn_by_alias`` test helper (complexity 21 > 15) — sequential isinstance dispatch over outermost-SELECT projection layers (Alias / Cast / Max / Case / EQ / Column unwrap). Each layer is a structural-shape predicate whose failure short-circuits to the next projection; extracting per-layer helpers would scatter the predicate. 3. Sonar duplication gate (4.0% > 3.0% threshold, test_sql_generator.py 5.1% per-file). Extract ``_persist_and_engine(*models)`` async context-manager helper plus a ``_orders_with_paid_amount_model`` fixture; apply across the DEV-1501 test classes (TestDev1501HiddenFirstLastRender, TestDev1501BroadTriggerAndGuards, TestParameterizedAggCanonicalDistinct, TestTransformAmbiguousTimeDimension). One-line per call site (vs the prior 5-line ``tempfile + YAMLStorage + save_datasource + save_model + SlayerQueryEngine`` block) — drops ~140 net lines of new-line duplication. 3939 unit tests pass; ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…-mangles-transform-kwargs-in-filters-ntilex DEV-1492: preserve transform kwargs in filter normalization

Codex MAJOR (slayer/sql/generator.py:6031+): the cross-model HAVING ``AggregateKey`` reconstruction in ``_render_filter_value_key_in_target_scope`` rerooted ``source`` to ``path=()`` but copied ``args`` / ``kwargs`` unchanged. The projection path (lines ~5685-5730) already handles all three symmetrically via ``_reroot_kwarg``. Mirror that logic here so a multi-hop routed filter (``customers.regions.amount:last(customers.regions.opened_at) > 0``) qualifies its time arg under the local CTE relation instead of the host-rooted ``__``-path alias that doesn't exist inside the target CTE. Inert for single-hop cases — both pre-fix and post-fix render ``customers.signed_up_at`` because ``_resolve_explicit_time_col``'s ``"__".join(path)`` happens to collapse to the same string when the target relation has the same single hop. Real fix lands for multi-hop shapes (``customers.regions.opened_at``) that the existing test suite doesn't yet exercise — kept the fix mechanical (same shape as the projection-path reroot) rather than introducing a multi-model test fixture that exercises an unrelated routing concern. 3939 unit tests pass; ruff clean. Sonar gate is fully green (duplication 0.9% < 3% threshold, 0 OPEN issues, 0 hotspots, no CI failures) after round 8's ``_persist_and_engine`` refactor. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…rst-with-different-explicit-time-columns DEV-1501: parametric first/last in ORDER BY / HAVING — materialise + outer trim

…apper A host aggregate whose Column.filter references a joined table (loss_payment_amt:sum where loss_payment_amt has filter="loss_payment.has_flag = 1") now hoists into its own _cm_ CTE host-rooted variant of the cross-model aggregate strategy. Without isolation, two such measures whose filter joins are different INNER joins on different tables would intersect in the host base to only the rows present in BOTH targets, silently corrupting both aggregates. Planner: trigger predicate extended to fire on column_filter_key.referenced_join_paths non-empty (not just source.path non-empty). _plan_filtered_local builds a host-rooted nested PlannedQuery and attaches via rerooted_plan / rerooted_grain_pairs / rerooted_agg_slot_id with cte_root_model = host.name. AGGREGATE-phase host filters referencing the isolated aggregate are skipped (not passed to the sub-plan) so the generator routes them to outer WHERE instead of HAVING-into-CTE (which would surface host rows as NULL via LEFT JOIN instead of dropping them). Generator: _render_with_cross_model_plans now identifies AGGREGATE filters walking an isolated AggregateKey, adds their ids to routed_ids so _base skips them, and renders each via _render_filter_for_outer_ wrapper as plain WHERE on the combined non-aggregating SELECT, substituting isolated AggregateKey refs with <_cm_>."<col>" and other slot refs with _base."<alias>". Hidden operand promotion for mixed filters (loss_payment_amt:sum > 1000 AND total_amount:sum > 10) flows through the existing _add_local_aux_slots(aggregates_only=True) pass. Identity: SqlExprKey gains referenced_join_paths so the planner's filtered-local trigger reads a typed field rather than re-parsing the column filter SQL. compute_column_filter_join_paths (new slayer/engine/column_filter_paths.py) is the planner-side discovery helper, swallowing internal sqlglot scope-analyser failures so the generator's parse-time security gate stays authoritative on malicious payloads. Tests: TestIsolatedFilteredMeasureCTEs xfails rewritten to assert _cm_/_base/outer-WHERE structure; new tests/test_filtered_local_ isolation.py covers SqlExprKey.referenced_join_paths, trigger dispatch (incl. cross-model-with-target-filter regression), recursion suppression, host model filter interactions, first/last sub-plan cleanliness. Docs: Strategy 3 section in docs/architecture/cross-model-aggregates.md; CLAUDE.md key-conventions bullet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ardening Correctness: - Planner: fix date_range off-by-one in user-filter routing for the filtered-local sub-plan. host_filter_routings is ordered [date_range_routings..., user_filter_routings...]; slice the user portion as host_filters[-len(host_query.filters):] instead of looking up by f"f{i}" (off by n_date_range when a date_range time_dimension is present; Codex review). - Generator: outer-WHERE wrapper now maps EVERY cross-model aggregate plan to its _cm_ CTE column (not just filtered-local with cte_root_model set). A mixed predicate like loss_payment_amt:sum + customers.revenue:sum > 100 routed to the wrapper via a filtered-local operand can now resolve the forward cross-model operand at the outer scope instead of raising NotImplementedError (CodeRabbit thread 2). Test boundary assertions: - POST-routing tests now assert layer boundaries: AGGREGATE predicate inside the combined base CTE WHERE, POST predicate inside the post-transform _filtered wrap, neither leaks into the other layer (CodeRabbit thread 4). - TestIsolatedFilteredMeasureCTEs + the customers-JOIN-inside-_cm_ test switched from sql.index("\n)", start) manual slicing to the balanced-paren _extract_cte_body helper so nested ranked subqueries don't truncate the CTE body (CodeRabbit thread 3). Hardening: - SqlExprKey.referenced_join_paths gains a before-validator that canonicalises to a sorted, de-duplicated tuple of tuples — identity is now order-independent, callers can pass any iterable shape (CodeRabbit nitpick). - column_filter_paths: self-qualified anchor-local derived refs (filter="orders.is_eu = 1" where orders is the anchor) now trigger the derived-expansion gate identically to the bare form (filter="is_eu = 1") so the cross-model path the expansion surfaces is recorded on the SqlExprKey (CodeRabbit thread 1). Refactors: - Shared root-scope joined-path walker hoisted into column_expansion.py as collect_root_scope_joined_paths; column_filter_paths and generator both call it instead of duplicating the alias-walk-over-joins loop (Sonar duplication clearance). - compute_column_filter_join_paths splits its derived-expansion preamble into _expand_filter_sql_if_anchor_derived (S3776 cognitive complexity). - IsolatedCteCrossModelPlanner.plan extracts the filtered-local trigger precondition + dispatch into _dispatch_filtered_local (S3776). - _plan_filtered_local extracts the host-query-filter classification loop into the module-level _classify_subplan_filters helper (S3776). - Test fixture: orders+customers+is_eu+eu_amount setup extracted to a shared _orders_with_derived_eu_filter helper across three derived-ref tests (Sonar duplication clearance). - stage_planner.plan_query: NOSONAR(S3776) with reason — function's pre-existing complexity is owned by multi-stage scope/bundle/projection/ filter-routing wiring tracked as a separate refactor. Tests: full unit suite 4036 passed (was 4030). Ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Correctness: - Composite projections over isolated filtered-local aggregates now render at the combined SELECT scope, not inline in _base. A measure like loss_payment_amt:sum + loss_reserve_amt:sum would otherwise pull BOTH filter-target INNER joins back into _base and intersect to rows in BOTH targets — silently corrupting both aggregates, the exact bug DEV-1503 set out to eliminate (Codex round 2 #1). Identification walks aggregate_slots + combined_expression_slots for ArithmeticKey / ScalarCallKey projections whose tree references any cross-model agg slot; rendering reuses _render_filter_for_outer_wrapper to substitute isolated AggregateKey → _cm_X."col" and other operands → _base."alias". Non-isolated agg operands are auto-promoted to hidden aux slots so _base materialises them. - No-dim aggregate-only isolated queries no longer emit SELECT 1 AS _placeholder FROM <host> (N rows) cross-joined with a scalar _cm_* CTE (1 row) — N × 1 duplicated the aggregate by host row count. The placeholder now drops the host FROM, yielding the expected single-row scalar (Codex round 2 #2). Tests: - test_formula_over_isolated_measures strengthened: asserts _base body contains NEITHER Loss_Payment NOR Loss_Reserve joins, and the composite alias appears in the outer combined SELECT referencing both _cm_* CTE columns. - test_base_not_empty_when_no_dims_all_measures_skipped strengthened: asserts the _base body does not reference the host table and carries the _placeholder literal. - test_cross_model_dimension_count_distinct_in_formula and test_rerooted_cross_model_in_formula promoted from xfail(strict=True) to PASS — the DEV-1499 cross-model-in-composite case is the same shape the DEV-1503 fix now handles. Sonar: - collect_root_scope_joined_paths (column_expansion.py) extracts the per-column alias-walk into _resolve_alias_to_join_segments (S3776 cognitive complexity). - _plan_filtered_local (cross_model_planner.py) extracts grain-pair matching, sub_agg slot finding, and cte_schema build into module-level helpers (_match_filtered_local_grain_pairs, _find_filtered_local_sub_agg_slot, _build_filtered_local_cte_schema) (S3776). - test_multi_hop_derived_filter_expands_inside_cm_cte converted from _re.search + manual sql.index("\n)") to _extract_cte_body — removes the remaining python:S5852 ReDoS hotspot. Tests: full unit suite 4038 passed (was 4036). Ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Generator (slayer/sql/generator.py:_render_with_cross_model_plans): - outer_composite_slot_ids scan now also walks slots referenced by planned_query.order, not just planned_query.projection. A hidden ORDER BY composite over isolated filtered-local aggregates was previously falling through to the order-only local path and rendering inline in _base, re-pulling filter-target joins into the host spine (Codex round 3 #1). - Outer-routed composite slots projected multiple times (C13 multi-alias same-key consolidation) cycle through cslot.public_aliases per emission and APPEND into combined_aliases_by_slot_id, rather than always emitting public_aliases[0] and overwriting the dict slot (CodeRabbit thread on alias cycling). - Order-only outer composites materialise as hidden combined-SELECT columns under the slot's declared_name so ORDER BY can resolve via the synthesised alias. _build_combined_order_by_sql: - New ``outer_composite_aliases`` kwarg maps each outer-routed composite slot id to the alias the combined SELECT projects it under. ORDER BY references those aliases bare (the projection belongs to the combined SELECT, not _base) — previously they would emit ``_base."<alias>"`` against a column that doesn't exist there (Codex round 3 #2). Tests: - test_order_by_projected_composite_over_isolated_resolves_at_combined pins the ORDER-BY-alias-at-combined-scope invariant: the composite must not leak into _base AND the ORDER BY must not reference _base. - test_outer_composite_with_multiple_user_aliases pins C13 alias cycling: the same composite formula declared under two user names surfaces both aliases in the outer SELECT. Tests: 4040 passed (was 4038). Ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Correctness (Codex round 4 / CodeRabbit): - empty_base placeholder now re-introduces the host FROM + WHERE + LIMIT 1 when ANY host-local ROW filter exists (filter id not in routed_ids and phase=ROW). The round-2 fix that dropped the host FROM to avoid N-row CROSS JOIN duplication also bypassed _build_where_having_from_planned, silently ignoring host-local ROW filters on no-dim queries with forward cross-model aggregates (e.g. customers.revenue:sum with filters=["status = 'active'"]). LIMIT 1 preserves the 1-row cardinality fix from round 2; WHERE applies the filter so the combined query returns 0 rows when no host row matches (correct semantics) and 1 row otherwise. When NO host-local filter exists, the placeholder stays bare (no FROM, no WHERE, no LIMIT) — matching round 2 for the unfiltered fast path. Sonar: - _build_combined_order_by_sql (generator.py:6528) extracts per-entry alias resolution into _resolve_combined_order_term — S3776 cognitive complexity 20 → 15. Tests: - test_no_dim_query_with_host_row_filter_applies_in_base pins the WHERE + LIMIT 1 invariant in the empty_base placeholder for a no-dim query with a host ROW filter. Tests: 4041 passed (was 4040). Ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Correctness (Codex round 5): - _build_first_last_base_select now accepts skip_filter_ids and forwards it to _build_where_having_from_planned. The cross-model path passes routed_ids to _build_base_select_for_planned, but the first/last branch was calling the inner helper without forwarding the set — a filter routed to a per-plan _cm_ CTE was double-applied: once inside the host ranked subquery's WHERE, once inside the _cm_ CTE. For a filter referencing a join path not in _base's FROM, the host-side rendering would emit invalid SQL. Test: - test_local_first_last_with_routed_cross_model_filter pins the routing: a query mixing total_amount:last (local) + loss_payment.has_flag:sum (forward cross-model) + a filter routed via PROPAGATE_HAVING asserts the filter literal appears only inside the _cm_ CTE body, never in the host _base ranked subquery. Tests: 4042 passed (was 4041). Ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Correctness (Codex round 6): - The round-4 empty_base placeholder fix added FROM <host> + WHERE + LIMIT 1 when a host-local ROW filter exists, but bypassed _collect_filter_join_paths / _build_from_and_joins. A filter referencing a joined alias (e.g. filters=["claim.claim_number = '12345'"]) therefore emitted WHERE claim.claim_number = ... against a _base that had no Claim join — invalid SQL ("missing FROM-clause entry"). The empty_base branch now walks non-routed filters via _collect_filter_join_paths, threads the resulting paths through _build_from_and_joins, and attaches the LEFT JOINs to the placeholder before the WHERE. Test: - test_no_dim_host_filter_referencing_joined_column_pulls_join pins the join-discovery: a no-dim query with a host ROW filter on a joined column asserts the Claim join appears in _base alongside the filter literal AND LIMIT 1. Tests: 4043 passed (was 4042). Ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Correctness (Codex round 7): - column_filter_paths.compute_column_filter_join_paths parsed every Column.filter SQL with hard-coded dialect="postgres" and returned () on parse failure. For a non-Postgres datasource using dialect-specific syntax (MySQL backticks, T-SQL square brackets, ClickHouse-specific functions), the Postgres parse could fail silently — the planner then saw no referenced join paths, the DEV-1503 trigger missed, and the generator (dialect-aware) rendered the filter inline in _base, pulling the cross-model join back into the host rowset and silently corrupting the aggregate. The helper now walks a small dialect fallback chain (Postgres → sqlglot default → MySQL → BigQuery → T-SQL) and only returns () when every dialect rejects the input. The injection security gate (UNION SELECT, etc.) is unchanged because malformed payloads still fail every dialect. Test: - test_dialect_fallback_chain_recovers_mysql_backtick_filter pins the recovery: a backtick-quoted joined alias ref ("loss_payment"."has_flag" = 1) surfaces ("loss_payment",) in referenced_join_paths. Tests: 4044 passed (was 4043). Ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Correctness: - _render_with_cross_model_plans now honours where_consumed from the first/last base-rendering path: when the ranked subquery consumed WHERE, we no longer re-apply base_where on the outer _base SELECT. Double-application changed first/last semantics (the outer WHERE filtered AFTER ranking) and could dangle joined-column aliases on the outer scope (CodeRabbit thread). - Order-only outer composites (composite slots referenced by ORDER BY but absent from planned_query.projection) now render INLINE inside the combined ORDER BY rather than materialising as a hidden combined-SELECT column. The old shape leaked the synthetic alias as an extra user-visible result column on the no-transform path (Codex round 8) AND lost the slot in the cross-model transform chain's combined_aliases_by_slot_id carry-forward dict, breaking ORDER BY resolution on the chain output (CodeRabbit thread). A new outer_composite_order_expressions map carries the rendered SQL for each order-only outer composite; _build_combined_order_by_sql emits <expr> <direction> bare for those slots. Test: - test_first_last_with_host_filter_not_reapplied_outside_ranked_subquery pins the where_consumed gate: a query mixing total_amount:last (local first/last) + loss_payment.has_flag:sum (forward cross-model) + filters=["status = 'active'"] asserts the filter literal appears exactly once in _base — inside the ranked subquery. Tests: 4045 passed (was 4044). Ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Correctness (Codex round 9): - The round-7 dialect fallback chain covered the outer sqlglot.parse_one call and the re-parse after derived expansion, but expand_derived_refs_sync itself was still pinned to Postgres. A derived column whose Column.sql uses dialect-specific syntax (MySQL backticks, BigQuery struct literals, etc.) would fail the inner parse silently, dropping the join path the DEV-1503 trigger needs. A new _expand_derived_refs_any_dialect helper walks the same chain (excluding the None sentinel since expand_derived_refs_sync requires a dialect string) and returns the first non-None expansion. Test: - test_dialect_fallback_chain_recovers_derived_ref_with_backticks pins the recovery: a derived column whose sql uses ``CASE WHEN `customers`.`region` = 'EU'`` (MySQL backticks) referenced by a Column.filter still surfaces the (customers,) join path. Without the wrapper the helper falls back to Postgres only and returns (). Tests: 4046 passed (was 4045). Ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…cks-isolated-cte-handling-for-cross-model DEV-1503: filtered-local isolation on typed pipeline + outer-WHERE wrapper

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/engine/column_filter_paths.py`:
- Around line 57-63: The dialect chain _PLANNER_PARSE_DIALECT_CHAIN is missing
the "clickhouse" dialect which causes compute_column_filter_join_paths (and
Column.filter parsing) to fail to detect ClickHouse-specific predicates; add
"clickhouse" into the tuple (e.g., include "clickhouse" among "postgres", None,
"mysql", "bigquery", "tsql") so SQLGlot will try ClickHouse parsing and preserve
ClickHouse-only join-path semantics. Ensure the ordering keeps permissive None
in place and run unit tests that exerciseClickHouse-specific filters to verify
parsing now returns expected join paths.

In `@slayer/engine/cross_model_planner.py`:
- Around line 518-528: The current code slices host_query.filters and assumes
alignment with host_filters, which breaks when stage_planner.plan_query()
dedupes filters; instead, stop slicing host_query.filters and build
sub_filter_texts from the host_filters list itself using each
HostFilterRouting's text field: iterate over host_filters (the list of
HostFilterRouting), skip routings whose phase is Phase.POST or Phase.AGGREGATE,
and append routing.text (or routing.text if named differently) to
sub_filter_texts only when that text is set; also ensure
stage_planner.plan_query()/stage planning code populates HostFilterRouting.text
for the deduped/surviving user-filter routings so cross_model_planner.consume
the same deduped texts.

In `@tests/integration/test_integration.py`:
- Around line 3539-3548: Test fixture's tie-break is weak because natural status
order could match the expected result even if the secondary ORDER BY is ignored;
update the "rows" fixture (variable rows in the failing test) so the natural
alphabetical order of status would be opposite the intended tie-break (e.g.,
swap the created_at/updated_at values so status "B" would come before "A" if no
secondary key is applied), or add an explicit dry-run assertion in the same test
that verifies two distinct sort keys were produced for the groups (i.e., assert
that the secondary sort key values differ for the tied groups) to make the
secondary-sort failure impossible to miss; adjust the rows data and/or add the
assertion near the rows definition used later in the test to exercise the
secondary ORDER BY.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c4fd0735-6d19-40f1-b9ce-511f7178e7f3

📥 Commits

Reviewing files that changed from the base of the PR and between 93bc7cd and e381429.

📒 Files selected for processing (21)

CLAUDE.md
docs/architecture/cross-model-aggregates.md
docs/architecture/planning.md
docs/architecture/sql-generation.md
slayer/core/keys.py
slayer/engine/binding.py
slayer/engine/column_expansion.py
slayer/engine/column_filter_paths.py
slayer/engine/cross_model_planner.py
slayer/engine/planned.py
slayer/engine/planning.py
slayer/engine/stage_planner.py
slayer/engine/syntax.py
slayer/sql/generator.py
tests/integration/test_integration.py
tests/test_filtered_local_isolation.py
tests/test_keys.py
tests/test_projection_planner.py
tests/test_projection_trim.py
tests/test_sql_generator.py
tests/test_syntax.py

✅ Files skipped from review due to trivial changes (2)

CLAUDE.md
docs/architecture/sql-generation.md

coderabbitai · 2026-06-03T15:55:30Z

+_PLANNER_PARSE_DIALECT_CHAIN: Tuple[Optional[str], ...] = (
+    "postgres",
+    None,        # sqlglot's permissive default — accepts ANSI SQL broadly
+    "mysql",
+    "bigquery",
+    "tsql",
+)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does sqlglot 30.0.0 support a ClickHouse dialect named clickhouse, and does ClickHouse-specific SQL require selecting that dialect instead of relying on postgres, default, mysql, bigquery, or tsql parsing?

💡 Result:

Yes, SQLGlot 30.0.0 includes official, built-in support for a ClickHouse dialect [1][2][3]. To correctly parse or generate ClickHouse-specific SQL, you must specify clickhouse as the dialect (e.g., read="clickhouse" for parsing or dialect="clickhouse" for generation) [4][5]. Relying on other dialects like postgres, mysql, bigquery, or tsql will likely result in parsing errors or incorrect SQL generation because SQLGlot uses these specific dialect configurations to handle ClickHouse's unique features, such as aggregate combinators (e.g., countIf), specific join types, non-nullable types, and distinct identifier escaping rules [6]. If no dialect is specified, SQLGlot defaults to its own internal "SQLGlot dialect," which acts as a superset of supported dialects and may not correctly interpret or output valid syntax for ClickHouse-specific constructs [5]. Using the official clickhouse dialect ensures that the parser and generator are configured with the correct metadata, tokenization rules, and syntax handling required for ClickHouse [7][6].

Citations:

1: https://pypi.org/project/sqlglot/30.0.0/

2: https://pypi.org/project/sqlglot/30.0.3/

3: https://pypi.org/project/sqlglot/30.0.1/

4: ClickHouse: fail to parse "WITH <expression> AS <identifier>" tobymao/sqlglot#3175

5: https://github.com/tobymao/sqlglot/blob/main/README.md

6: https://deepwiki.com/tobymao/sqlglot/4.8-clickhouse-and-mysql-dialects

7: https://sqlglot.com/sqlglot/dialects/clickhouse.html

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the file around the reported lines sed -n '1,200p' slayer/engine/column_filter_paths.py | cat -n # Locate and inspect compute_column_filter_join_paths usage/definition rg -n "compute_column_filter_join_paths" slayer/engine/column_filter_paths.py rg -n "_PLANNER_PARSE_DIALECT_CHAIN" slayer/engine/column_filter_paths.py # Print the function body if present python3 - <<'PY' import re, itertools, sys, pathlib path = pathlib.Path("slayer/engine/column_filter_paths.py") txt = path.read_text() m = re.search(r"def\s+compute_column_filter_join_paths\s*$[^)]*$\s*:", txt) print("found_def:", bool(m)) PY # If function exists, show a narrower excerpt around it line=$(rg -n "def compute_column_filter_join_paths" slayer/engine/column_filter_paths.py | head -n1 | cut -d: -f1 || true) if [ -n "${line:-}" ]; then start=$((line-40)) end=$((line+160)) sed -n "${start},${end}p" slayer/engine/column_filter_paths.py | cat -n fi

Repository: MotleyAI/slayer

Length of output: 14511

Add ClickHouse to _PLANNER_PARSE_DIALECT_CHAIN

slayer/engine/column_filter_paths.py (lines 57-63) omits the clickhouse dialect even though the file explicitly calls out ClickHouse-specific Column.filter syntax; compute_column_filter_join_paths() uses this chain for dialect-aware parsing, so ClickHouse-only predicates that don’t parse under postgres/mysql/bigquery/tsql (or the permissive None default) drop to () and skip DEV-1503 join-path isolation/semantics on a supported ClickHouse backend. SQLGlot’s clickhouse dialect should be included to parse ClickHouse constructs correctly.

Suggested change

_PLANNER_PARSE_DIALECT_CHAIN: Tuple[Optional[str], ...] = ( "postgres", None, # sqlglot's permissive default — accepts ANSI SQL broadly "mysql", + "clickhouse", "bigquery", "tsql", )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/column_filter_paths.py` around lines 57 - 63, The dialect chain _PLANNER_PARSE_DIALECT_CHAIN is missing the "clickhouse" dialect which causes compute_column_filter_join_paths (and Column.filter parsing) to fail to detect ClickHouse-specific predicates; add "clickhouse" into the tuple (e.g., include "clickhouse" among "postgres", None, "mysql", "bigquery", "tsql") so SQLGlot will try ClickHouse parsing and preserve ClickHouse-only join-path semantics. Ensure the ordering keeps permissive None in place and run unit tests that exerciseClickHouse-specific filters to verify parsing now returns expected join paths.

coderabbitai · 2026-06-03T15:55:31Z

+    user_query_filters = host_query.filters or []
+    if not user_query_filters:
+        return None
+    user_routings = list(host_filters[-len(user_query_filters):])
+    sub_filter_texts: List[str] = []
+    for routing, filt in zip(user_routings, user_query_filters):
+        if routing.phase in (Phase.POST, Phase.AGGREGATE):
+            continue
+        # ROW phase — propagate.
+        sub_filter_texts.append(filt)
+    return sub_filter_texts or None


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't recover sub-plan filter text by slicing host_query.filters.

stage_planner.plan_query() dedupes user filters before building host_filters, so len(host_query.filters) is no longer guaranteed to match the routed entries here. With a date_range plus duplicate/alias-equivalent user filters, this slice can mis-pair phases and texts, which lets an AGGREGATE filter be propagated into the DEV-1503 sub-plan as if it were ROW-phase.

Suggested fix

- user_query_filters = host_query.filters or [] - if not user_query_filters: + user_routings = [ + routing for routing in host_filters + if routing.text is not None + ] + if not user_routings: return None - user_routings = list(host_filters[-len(user_query_filters):]) sub_filter_texts: List[str] = [] - for routing, filt in zip(user_routings, user_query_filters): + for routing in user_routings: if routing.phase in (Phase.POST, Phase.AGGREGATE): continue - # ROW phase — propagate. - sub_filter_texts.append(filt) + sub_filter_texts.append(routing.text) return sub_filter_texts or None

And in slayer/engine/stage_planner.py, populate HostFilterRouting.text for the surviving user-filter routings so this function consumes the same deduped list the planner actually kept.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/cross_model_planner.py` around lines 518 - 528, The current code slices host_query.filters and assumes alignment with host_filters, which breaks when stage_planner.plan_query() dedupes filters; instead, stop slicing host_query.filters and build sub_filter_texts from the host_filters list itself using each HostFilterRouting's text field: iterate over host_filters (the list of HostFilterRouting), skip routings whose phase is Phase.POST or Phase.AGGREGATE, and append routing.text (or routing.text if named differently) to sub_filter_texts only when that text is set; also ensure stage_planner.plan_query()/stage planning code populates HostFilterRouting.text for the deduped/surviving user-filter routings so cross_model_planner.consume the same deduped texts.

coderabbitai · 2026-06-03T15:55:31Z

+    # Per-group last(created_at) is tied at 50 across A and B; the
+    # secondary ORDER BY last(updated_at) ASC must break the tie with
+    # A (10) before B (99).
+    rows = [
+        # Status A: last(created_at)=50, last(updated_at)=10.
+        (1, "A", 50.0, "2025-03-01", "2025-01-01"),
+        (2, "A", 10.0, "2025-02-01", "2025-04-01"),
+        # Status B: last(created_at)=50 (tied), last(updated_at)=99.
+        (3, "B", 50.0, "2025-03-01", "2025-02-01"),
+        (4, "B", 99.0, "2025-01-15", "2025-04-01"),


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make the secondary sort failure impossible to miss.

This fixture can still pass if the second ORDER BY key is ignored, because tied groups may naturally come back as "A", then "B". That makes the DEV-1501 regression weakly covered here. Flip the data so the expected result contradicts status ordering, or add a dry-run assertion that two distinct sort keys were generated.

Suggested way to make the tie-break observable

- # Per-group last(created_at) is tied at 50 across A and B; the - # secondary ORDER BY last(updated_at) ASC must break the tie with - # A (10) before B (99). + # Per-group last(created_at) is tied at 50 across A and B; the + # secondary ORDER BY last(updated_at) ASC must break the tie with + # B (10) before A (99), so the expected order now contradicts the + # natural status ordering. rows = [ - # Status A: last(created_at)=50, last(updated_at)=10. + # Status A: last(created_at)=50, last(updated_at)=99. (1, "A", 50.0, "2025-03-01", "2025-01-01"), - (2, "A", 10.0, "2025-02-01", "2025-04-01"), - # Status B: last(created_at)=50 (tied), last(updated_at)=99. + (2, "A", 99.0, "2025-02-01", "2025-04-01"), + # Status B: last(created_at)=50 (tied), last(updated_at)=10. (3, "B", 50.0, "2025-03-01", "2025-02-01"), - (4, "B", 99.0, "2025-01-15", "2025-04-01"), + (4, "B", 10.0, "2025-01-15", "2025-04-01"), ] @@ - # Two rows, both count 2; A before B because primary lc is tied (50) - # and secondary lu ASC puts A (lu=10) before B (lu=99). + # Two rows, both count 2; B before A because primary lc is tied (50) + # and secondary lu ASC puts B (lu=10) before A (lu=99). assert len(response.data) == 2, response.data - assert response.data[0]["orders.status"] == "A", response.data - assert response.data[1]["orders.status"] == "B", response.data + assert response.data[0]["orders.status"] == "B", response.data + assert response.data[1]["orders.status"] == "A", response.data

Also applies to: 3590-3596

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/test_integration.py` around lines 3539 - 3548, Test fixture's tie-break is weak because natural status order could match the expected result even if the secondary ORDER BY is ignored; update the "rows" fixture (variable rows in the failing test) so the natural alphabetical order of status would be opposite the intended tie-break (e.g., swap the created_at/updated_at values so status "B" would come before "A" if no secondary key is applied), or add an explicit dry-run assertion in the same test that verifies two distinct sort keys were produced for the groups (i.e., assert that the secondary sort key values differ for the tied groups) to make the secondary-sort failure impossible to miss; adjust the rows data and/or add the assertion near the rows definition used later in the test to exercise the secondary ORDER BY.

In the typed pipeline, an AGGREGATE slot whose source is a derived (ColumnSqlKey) column whose Column.sql contains a __-delimited join-path alias (e.g. customers__regions.population) never had the implied LEFT JOINs added to the host base FROM — the aggregate body emitted SUM(customers__regions.population) referencing an undefined table alias. The dimension / time-dimension cases got this right under DEV-1484; the Column.filter case got it right under DEV-1494; the agg-source case was the symmetric remaining gap. Fix: add _collect_aggregate_source_join_paths next to _collect_column_filter_join_paths. Walks AGGREGATE-phase slots, recursing through ArithmeticKey / ScalarCallKey composite keys. For each AggregateKey with a ColumnSqlKey source and source.path == (), expand the column's sql via _expand_derived_column_sql and scan through _joined_paths_in_sql, appending discovered paths to needed_join_paths. Wire into _build_base_select_for_planned next to the existing filter-side collector — single wire-up covers regular AGGREGATE and local first/last (the ranked-subquery builder inherits from_clause / base_joins from the same call). Cross-model agg sources (source.path \!= ()) are skipped; their _cm_* CTE owns its own discovery (gap tracked as DEV-1526; xfail in place). The render-time agg-body expansion in _build_agg_render_spec_from_planned already produced correct SUM(<expanded>) SQL — only the join-discovery side was missing. Tests: un-xfail the existing tracking test + strengthen its body assertion; new TestMeasureSourceSqlJoinInference class covers single-hop / multi-hop, ArithmeticKey + ScalarCallKey composites, Mode-A function wrappers around the path alias, sibling derived chains, local last with derived source (ranked-subquery shape asserted), cumsum-wrapped path-aliased measure (POST-phase aux materialization), multiple agg sources sharing one join (dedupe), dim+measure sharing one join (dedupe), filter+source crossing different joins (DEV-1494/1502 coexistence), no-path-no-extra-join sanity, defensive cross-model skip with positive _cm_ assertion. Two strict xfails pin the follow-up gaps (DEV-1526 cross-model CTE source-sql gap, DEV-1527 parametric agg derived-kwarg gap). One SQLite integration test seeds orders → customers → regions and verifies region_pop:sum returns 350.0 end-to-end. Docs: paragraph in sql-generation.md listing the three symmetric host-base discovery sources (DEV-1484 dim, DEV-1494 filter, DEV-1502 agg-source). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

SQLite returns REAL aggregates as floats; strict == 350.0 trips python:S1244 and is the right call to relax (the gate is currently red on new_reliability_rating=3 because of this one finding). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

CodeRabbit caught that `"regions" not in _join_aliases(sql)` does an exact-match on the alias set — but the realistic leak shape from a misbehaving host-side collector walking through customers would be the path alias `customers__regions`, which the bare-name check silently misses. Assert both alias shapes so the defensive test actually pins the regression it was meant to guard. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The new host-base discovery section claimed cross-model aggregates own their own join discovery in the _cm_* CTE without qualification. True for the Column.filter side (DEV-1494 / DEV-1503), but the symmetric source-Column.sql case is the open DEV-1526 gap. Codex flagged the unqualified statement as misleading future readers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Codex MAJOR caught that the test asserts dry-run SQL substrings but the emitted SQL fails at RUNTIME — the inner ranked subquery has the LEFT JOINs (DEV-1502 pulled them in), but the outer SELECT's MAX(CASE WHEN _last_rn = 1 THEN <expr> END) still references the cross-table- qualified expression directly, which is out of scope outside the subquery (only the source_relation alias is). Confirmed via SQLite execution: `no such column: customers__regions. payment_amount`. Pre-existing bug for ANY derived first/last source crossing a join (dotted form too, DEV-1410 territory); DEV-1502 just unmasked it for the __ alias case by getting the joins emitted at all. Pre-DEV-1502 this failed at "no JOINs"; post-DEV-1502 it fails at "outer scope missing". Convert the test to a strict xfail tracking DEV-1531; the assertion now pins the END STATE (the outer SELECT must not reference the cross-table ref directly) rather than just substring presence. Auto-promotes when DEV-1531's materialisation fix lands. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…-path-alias-in-a-measures-columnsql-doesnt DEV-1502: pull joins crossed by an aggregate source's Column.sql

sonarqubecloud · 2026-06-05T06:45:30Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.6% Duplication on New Code

See analysis details on SonarQube Cloud

ZmeiGorynych and others added 10 commits May 28, 2026 10:52

Merge remote-tracking branch 'origin/egor/dev-1450-principled-redesig…

0065024

…n-of-syntax' into egor/dev-1484-dev-1452-stage-c-migrate-pre-existing-tests-off-legacy

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

ZmeiGorynych and others added 4 commits May 29, 2026 12:50

Merge pull request #155 from MotleyAI/egor/dev-1494-typed-pipeline-cr…

74ed95a

…oss-model-derived-column-refs-in-filters DEV-1494: inline cross-model derived refs in Column.filter + discover their joins

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

ZmeiGorynych and others added 12 commits May 30, 2026 14:17

Merge pull request #156 from MotleyAI/egor/dev-1500-slack-func_style_…

93bc7cd

…agg-normalization-doesnt-recognize-custom DEV-1500: recognize joined-model custom aggregations in FUNC_STYLE_AGG slack rule

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

ZmeiGorynych and others added 20 commits June 1, 2026 10:30

Merge pull request #158 from MotleyAI/egor/dev-1492-parse_filter_expr…

b1b8b09

…-mangles-transform-kwargs-in-filters-ntilex DEV-1492: preserve transform kwargs in filter normalization

Merge pull request #159 from MotleyAI/egor/dev-1501-parametric-lastfi…

ffd85bc

…rst-with-different-explicit-time-columns DEV-1501: parametric first/last in ORDER BY / HAVING — materialise + outer trim

Merge pull request #163 from MotleyAI/egor/dev-1503-typed-pipeline-la…

e381429

…cks-isolated-cte-handling-for-cross-model DEV-1503: filtered-local isolation on typed pipeline + outer-WHERE wrapper

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

ZmeiGorynych and others added 6 commits June 3, 2026 19:58

Merge pull request #165 from MotleyAI/egor/dev-1502-typed-pipeline-__…

1c69c78

…-path-alias-in-a-measures-columnsql-doesnt DEV-1502: pull joins crossed by an aggregate source's Column.sql

	what the legacy legacy enriched-query path produced: result keys read straight from
	what the legacy enriched-query path produced: result keys read straight from

Conversation

ZmeiGorynych commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Deferred typed-pipeline divergences (strict-xfail + tracking issue)

Test plan

Summary by CodeRabbit

Uh oh!

linear Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Jun 5, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ZmeiGorynych commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading