Skip to content

DEV-1484 Stage C: migrate pre-existing tests off legacy enrichment#153

Open
ZmeiGorynych wants to merge 52 commits into
egor/dev-1450-principled-redesign-of-syntaxfrom
egor/dev-1484-dev-1452-stage-c-migrate-pre-existing-tests-off-legacy
Open

DEV-1484 Stage C: migrate pre-existing tests off legacy enrichment#153
ZmeiGorynych wants to merge 52 commits into
egor/dev-1450-principled-redesign-of-syntaxfrom
egor/dev-1484-dev-1452-stage-c-migrate-pre-existing-tests-off-legacy

Conversation

@ZmeiGorynych
Copy link
Copy Markdown
Member

@ZmeiGorynych ZmeiGorynych commented May 28, 2026

Summary

Stage C of DEV-1452 (delete the legacy enrichment pipeline): migrate every
pre-existing test file off slayer.engine.enrichment / slayer.engine.enriched
/ MixedArithmeticField so Stage D (DEV-1485) can delete the legacy modules.
Test-only changes — no production behaviour changed in Stage C.

The acceptance grep over tests/ is clean except test_agg_render_spec_shim.py
(the shim that dies in Stage D). This branch also merges in the latest
egor/dev-1450-principled-redesign-of-syntax (PR #144 / Stage A+B + Stage B
rounds 6–8); the merge was conflict-free.

  • Migrated test_sql_generator.py, test_reference_semantics.py,
    test_formula.py, test_projection_trim.py, test_cross_model_derived_columns.py,
    test_query_backed_models.py, test_named_measures.py, test_help.py,
    test_aggregation_gating.py, test_filter_renamed_measure.py,
    test_nested_dag_cross_stage_refs.py off the legacy stack — engine._enrich
    / enrich_query / raw EnrichedQuery sites now run through
    engine.execute(dry_run=True) or AggRenderSpec + the dialect helpers, with
    SQL-based assertions.
  • Deleted legacy-only test classes (parse_formula / named-measure expansion /
    extract-filter / placeholder / TestAutoMoveDimensions / stage-origin
    resolver) after auditing coverage; backfilled the unique cases additively
    into test_value_registry.py, test_syntax.py, test_measure_expansion.py,
    test_agg_render_spec.py, and test_slack_normalization.py.

Deferred typed-pipeline divergences (strict-xfail + tracking issue)

Genuine behaviour gaps the migration surfaced — each pinned by a
@pytest.mark.xfail(strict=True) test so it auto-promotes when fixed:

  • DEV-1502 — measure-source Column.sql __-path alias doesn't trigger join inference.
  • DEV-1494 — column-level Column.filter cross-table ref doesn't trigger join discovery (derived + direct-dotted-ref flavours).
  • DEV-1503 — typed pipeline lacks isolated-CTE handling for cross-model-FILTERED local measures (legacy _fm_ feature). Stage-D blocker — 13 tracking tests.

(Earlier Stage A/B sessions raised DEV-1492 / 1495 / 1496 / 1497 / 1499 / 1500 / 1501 the same way.)

Test plan

  • Acceptance grep clean over tests/ (only test_agg_render_spec_shim.py remains)
  • poetry run pytest -m "not integration" — 3902 passed, 6 skipped, 42 xfailed, 0 failed
  • poetry run ruff check slayer/ tests/ — clean
  • Integration smoke (SQLite + DuckDB) — 133 passed

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation

    • Expanded DSL and SQL-generation docs: lowercase like allowed; like(value, pattern) → SQL LIKE (NOT LIKE via not(...)); || → concat(...); detailed notes on CTEs, predicate join-paths, and filtered-local aggregate isolation.
  • New Features

    • DSL recognizes 2-arg like(...) and concat via ||; SQL generator emits explicit LIKE/concat; planner supports filtered-local aggregate isolation to avoid join-intersection issues.
  • Bug Fixes

    • change_pct divide-by-zero guarded; validation errors now map to client validation errors; improved join-path/derived-column handling to prevent incorrect SQL.
  • Tests

    • Large suite updates and many new regression/integration tests covering DSL, SQL emission, planner isolation, join discovery, and normalization.

ZmeiGorynych and others added 10 commits May 28, 2026 10:52
The typed-pipeline migration (Stage A/B) introduced dedicated SlayerError
subclasses for binding-time failures that the legacy enrichment pipeline
raised as bare ValueError. The REST /query endpoint maps ValueError -> HTTP
400, so these would have surfaced as 500s instead of user-correctable 400s.

Add ValueError as a second base to the 7 affected classes
(AggregationNotAllowedError, UnknownFunctionError, MeasureRecursionLimitError,
MeasureCycleError, DuplicateMeasureNameError, MeasureNameCollidesWithColumnError,
CanonicalAliasShadowsColumnError) so they match the 6 sibling resolution-time
errors that already extend (SlayerError, ValueError). New
tests/test_errors_hierarchy.py pins the contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…richment

Migrate test_filter_renamed_measure, test_named_measures, test_help, and
test_aggregation_gating off slayer.engine.enrichment / SQLGenerator.generate
onto the typed pipeline via engine.execute(dry_run=True).

New tests/_engine_helpers.py provides the shared Bucket-1 helpers:
_engine_generate (ephemeral YAMLStorage + engine + dry_run), _assert_valid_sql
(lifted from the legacy _validating_generate wrapper), and _where_text /
_having_text (sqlglot clause extraction) for filter-classification assertions
that previously read enriched.filters[*].is_having.

test_help switches parse_formula -> parse_expr and extract_filter_transforms
chain -> parse_filter_expr. Deletes the 2 skipped DEV-1445/1446 guard tests
in TestDeferredScopeGuards (coverage lives in test_dev1445_* / test_dev1446_*).
Error-message regexes updated to the typed-pipeline error text.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Migrate tests/test_reference_semantics.py off enrich_query onto
engine.execute(dry_run=True) + sqlglot WHERE/HAVING clause extraction for
filter-classification assertions. Delete the two drop_unreachable_filters
tests (legacy enrich_query internal flag; coverage in test_cross_model_planner).

tests/test_formula.py: delete the 7 classes that tested the removed
parse_formula parser + legacy AST node types; keep the 5 classes covering
the retained parse_filter / _rewrite_funcstyle_aggregations helpers.

Backfills for the genuine gaps the deletions left:
- tests/test_measure_expansion.py (new): negative eligibility cases for
  expand_model_measures (AggCall.source / DottedRef not expanded; reserved
  transform name rejected at construction).
- tests/test_value_registry.py: cross-model vs local AggregateKey, ntile
  n-distinct, time_shift kwargs order-independence, repeated-intern stability
  (backfilled from the deleted TestCanonicalExpressionKey).
- tests/test_syntax.py: first/last transform parse shape; plus a strict
  xfail pinning DEV-1492 (parse_filter_expr mangles transform kwargs in
  filters) so it auto-promotes when fixed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e _query_as_model

test_nested_dag_cross_stage_refs.py: delete the direct-unit tests of the
legacy stage-origin resolver (TestChainedDAGCandidates, TestCandidatePrecedence,
TestNonVirtualModel's resolver-None test, and the intercept-dim unit test).
They pinned the OLD ancestor-strip resolution of the *dotted* downstream form;
the typed pipeline intentionally uses a flat-only downstream contract
(dotted refs rejected), pinned by test_binding.py::TestStageSchemaScope and
the file's remaining flat end-to-end engine.execute tests. Drop the resolver
import (keep SourceModelOrigin, still used by serialization-roundtrip tests).

test_query_backed_models.py: migrate the 3 direct _query_as_model unit tests
to a _expand_stage_as_model helper over engine._expand_query_backed_model
(the Stage B replacement); assertions on virtual columns + wrapped alias SQL
carry over. Kept test_join_target_resolving_set_is_per_context — its ContextVar
is still live in production until Stage D.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s + fix derived-dim gap

Migrate tests/test_cross_model_derived_columns.py off engine._enrich /
SQLGenerator.generate onto engine.execute(dry_run=True); rewrite the
resolved_joins assertions to a sqlglot-based _join_aliases(sql) that walks
emitted JOINs. _gen_sql upserts the source model + datasource so inline-built
models resolve from storage.

Stage B fix-up (slayer/sql/generator.py): the typed pipeline could not select
a cross-model DERIVED column (Column.sql set) as a dimension — the
ColumnSqlKey branch fell back to a source-model-only lookup and raised
"Column not found". Make the branch path-aware (mirror the time-column arm):
expand the derived sql rooted at the owning joined model's __-path alias and
pull its joins into the FROM. _expand_derived_column_sql gains an is_root flag
so a further-joined ref inside a cross-model derived column prefixes to the
full path (B reaching C -> B__C). The migrated dimension tests are the
coverage.

The column-level Column.filter derived-ref join-discovery gap (DEV-1494) and
the cosmetic cycle-chain entry-point change are handled as a strict xfail and
an updated assertion respectively.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… enrichment

Migrate tests/test_projection_trim.py off enrich_query / SQLGenerator.generate
onto engine.execute(dry_run=True). The legacy render_mode generator parameter
is gone (typed pipeline emits the outer trim directly), so the two tests that
exercised it directly are removed; the staged-trim behaviour is covered
end-to-end via engine.execute / query-backed wrap.

Rewrite the provenance/user_declared tests as outer-SELECT membership
assertions (the typed-pipeline equivalent of "user-declared = in the public
projection"); delete the legacy public_projection_aliases / user_projection /
canonical_expression_key tests (the latter's coverage was backfilled into
test_value_registry in commit 2). Rewrite the same-canonical-measures tests
from "rejected" to the typed multi-alias output (P4/C13: one slot, both
aliases project). The get_column_types-doesn't-call-legacy-generate contract
pin stays.

Two pre-existing typed-pipeline cross-model projection bugs surfaced and are
strict-xfailed against DEV-1495: a cross-model dimension key flattening to
``__`` instead of dotted, and an order-by-only cross-model aggregate leaking
into the outer projection with a malformed alias.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t comments

Reword docstring / comment / assert-message references to the dying legacy
symbols (enrichment AST node types, the free-function formula parser, the
query-backed wrap renderer, the public-projection helper) across the test
suite so the acceptance grep is byte-clean except for test_sql_generator.py
(Stage C commit 5) and test_agg_render_spec_shim.py (deleted in Stage D).
Renames two integration test functions off the legacy token. Comment-only;
no behavioural change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…partial) + Mode-B ||/LIKE + change_pct guard

C.6 partial: migrate tests/test_sql_generator.py off slayer.engine.enrichment —
local _generate now delegates to engine.execute(dry_run=True); Bucket-2 sites
use AggRenderSpec directly; ~75% of the engine._enrich/generator.generate blocks
rewritten to engine.execute (datasource + extra_models registered as needed).
23 strict-xfails track genuine typed-pipeline divergences surfaced by the
migration (DEV-1496 windowed measures [Stage-D blocker], DEV-1497 partition_by
grain, DEV-1499 cross-model agg in composite, DEV-1500 funcstyle custom agg on
joined models, DEV-1501 parametric last time-col collapse, plus existing
DEV-1445 / DEV-1494).

Production fixes surfaced during migration (full non-integration suite green,
3918 passed / 27 xfailed):
- Mode-B query filters now accept the SQL || concat operator (desugared to
  concat via a |/BitOr rewrite) and a like(value, pattern) scalar emitting the
  SQL LIKE operator — closing the gap docs/concepts/references.md already
  promised but the typed pipeline rejected.
- change_pct desugar guards the divisor with NULLIF(prev, 0).
- Nested self-join transforms now raise ValueError (HTTP 400) instead of
  NotImplementedError (HTTP 500).

Remaining for next session: Phase-2 tail (TestIsolatedFilteredMeasureCTEs and
the other enrich_query-helper / standalone-_enrich classes), Phase 4 (raw
EnrichedQuery constructions), Phase 5 (delete TestAutoMoveDimensions + backfill
into test_slack_normalization), then the acceptance grep + ruff + integration
smoke gates and dropping the top-of-file legacy imports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…f legacy

Finishes the C.6 Phase-2 tail so no test file imports the legacy enrichment
stack (acceptance grep clean except the Stage-D-only agg_render_spec shim):

- Migrate the remaining engine._enrich / enrich_query / raw EnrichedQuery
  sites to engine.execute(dry_run=True) and AggRenderSpec + dialect helpers;
  add a SQL-based _join_aliases helper; drop the legacy imports and the dead
  _validating_generate wrapper.
- Delete TestAutoMoveDimensions (legacy _auto_move_fields_to_dimensions);
  backfill its 7 cases into test_slack_normalization.py (TestMisplacedMeasure
  + an end-to-end TestEngineWiring case).
- Relocate the two source-alias filter tests to a unit test in
  test_agg_render_spec.py.

Three genuine typed-pipeline divergences surfaced and tracked with strict
xfails (never silently weakened):
- DEV-1502: measure-source Column.sql __-path alias doesn't trigger joins.
- DEV-1494: column-level Column.filter direct dotted cross-table ref doesn't
  trigger join discovery (2nd tracking test added).
- DEV-1503 (Stage-D blocker): typed pipeline lacks isolated-CTE handling for
  cross-model-FILTERED local measures (legacy _fm_ feature); 13 tests xfailed.
  The cross-model-MEASURE _cm_ tests migrated to engine.execute and pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n-of-syntax' into egor/dev-1484-dev-1452-stage-c-migrate-pre-existing-tests-off-legacy
@linear
Copy link
Copy Markdown

linear Bot commented May 28, 2026

DEV-1484

DEV-1452

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6f19329c-9a42-43b5-9318-2c720bf41317

📥 Commits

Reviewing files that changed from the base of the PR and between e381429 and 1c69c78.

📒 Files selected for processing (4)
  • docs/architecture/sql-generation.md
  • slayer/sql/generator.py
  • tests/integration/test_integration.py
  • tests/test_sql_generator.py

📝 Walkthrough

Walkthrough

Adds Mode‑B like and ||concat parsing and binder arity checks, records column-filter join-paths on SqlExprKey, implements filtered-local isolation (DEV‑1503) in cross-model planning, extends derived-ref expansion and join-threading in SQL generation, adjusts Aggregate canonical naming and change_pct lowering, updates several error classes to inherit ValueError, and migrates many tests to engine-driven SQL helpers.

Changes

DSL scalar functions and operator support

Layer / File(s) Summary
**Documentation for like and `
Parser and binder support
slayer/engine/syntax.py, slayer/engine/binding.py, slayer/core/keys.py
Mode‑B preprocessing rewrites `
SQL generation for like
slayer/sql/generator.py
Typed SQL rendering now emits exp.Like(...) for ScalarCallKey(name="like") across aggregate composite, post-filter WHERE/HAVING, and value-key rendering paths.

Error hierarchy and planning guard

Layer / File(s) Summary
Binding-time errors inherit ValueError
slayer/core/errors.py, tests/test_errors_hierarchy.py
Seven validation error classes now multi-inherit from ValueError alongside SlayerError; unit tests assert the inheritance contract.
Guard change_pct divisor against zero
slayer/engine/planning.py, tests/test_transform_lowerer.py
desugar_change_pct now divides by nullif(shifted, 0) (ScalarCallKey nullif) to avoid divide-by-zero; tests assert nullif shape and reuse of the same time_shift instance.

Derived-column SQL qualification and cross-model path support

Layer / File(s) Summary
Derived-column expansion & predicate inlining
slayer/sql/generator.py, slayer/engine/column_expansion.py
Adds detection of derived refs in predicates (_predicate_references_derived), _render_mode_a_predicate inline-expansion via expand_derived_refs_sync, _expand_degenerate_derived_root, and collect_root_scope_joined_paths to discover join-path prefixes.
Column-filter join-path discovery (planner)
slayer/engine/column_filter_paths.py, slayer/engine/binding.py
Planner-side compute_column_filter_join_paths parses/expands filter SQL across dialect fallbacks, walks root-scope <alias>.<col> refs, and returns ordered join-path prefixes; binder populates SqlExprKey.referenced_join_paths using it.
Cross-model CTE join-threading and filtered-local isolation
slayer/sql/generator.py, slayer/engine/cross_model_planner.py, slayer/engine/planned.py, slayer/engine/stage_planner.py
Generator threads filter-driven join paths into CTE FROMs and ranked subqueries; planner implements DEV‑1503 filtered-local isolation by rerooting host-rooted sub-plans and returning CrossModelAggregatePlan.cte_root_model to signal host-rooted CTE rendering; plan_query gains disable_dev1503_isolation to suppress recursion.
Aggregate canonicalization
slayer/engine/planning.py
Hidden-slot canonical names for AggregateKey now include canonicalized args/kwargs (via canonical_agg_name and agg_kwarg canonical strings).

Normalization, reachable custom aggs, and resolver

Layer / File(s) Summary
Model normalization with custom_agg_names
slayer/engine/normalization.py, slayer/engine/query_engine.py, slayer/engine/source_bundle.py, slayer/engine/schema_drift.py, slayer/memories/resolver.py
normalize_model accepts custom_agg_names; ResolvedSourceBundle.reachable_aggregation_names and query_engine/save_model use join-graph BFS to compute reachable aggregation names for FUNC_STYLE_AGG recognition; schema-drift and resolver paths updated accordingly.

SQL generator structural and rendering changes

Layer / File(s) Summary
Derived-aware expansion and filter join merging
slayer/sql/generator.py
Pre-expands derived ROW/TIME dimensions with is_root semantics, merges join paths from Mode‑A inline-expanded predicates and typed ValueKey trees, builds _expand_column_filter_sql and _render_mode_a_predicate helpers, and replaces some generic func emission with specific exp.Like rendering.
Nested-transform rejection message and handling
slayer/sql/generator.py
Nested-transform rejections now raise ValueError with guidance text to precompute inner transforms in multi-stage models.

Tests, helpers, and migrations

Layer / File(s) Summary
Shared engine-driven SQL helpers
tests/_engine_helpers.py
Adds _assert_valid_sql, _engine_generate (ephemeral YAMLStorage + engine.execute dry_run), _outer_select, _where_text, _having_text for SQL-shape assertions.
Test migrations to engine dry-run SQL
many tests/* (e.g., tests/test_cross_model_derived_columns.py, tests/test_filter_renamed_measure.py, tests/test_projection_trim.py, tests/test_reference_semantics.py, tests/test_projection_planner.py)
Multiple tests switched from enrichment/SQLGenerator internals to engine dry-run SQL assertions; added join-alias extraction, dialect-threading, and new regression tests for DEV‑1494/DEV‑1501/DEV‑1503.
New/refocused tests
tests/test_filtered_local_isolation.py, tests/test_measure_expansion.py, tests/test_slack_normalization.py, tests/test_value_registry.py, tests/test_agg_render_spec.py, tests/test_entity_resolution.py, tests/test_source_bundle.py, tests/test_validate_models.py, tests/test_errors_hierarchy.py
Adds planner-focused DEV‑1503 coverage, typed measure-expansion negatives, slack-normalization coverage (DEV‑1484/DEV‑1500), TransformKey interning tests, agg-render-spec qualification test, entity extraction for joined custom aggs, reachable-aggregation-name unit tests, cascade-rule regression, and error-hierarchy assertions.

Sequence Diagram

sequenceDiagram
  participant Test as Test
  participant Engine as SlayerQueryEngine
  participant Binder as Binding
  participant Planner as StagePlanner/CrossModelPlanner
  participant Generator as SQLGenerator
  Test->>Engine: engine.execute(query, dry_run=True)
  Engine->>Binder: bind query, compute SqlExprKey (canonical_sql + referenced_join_paths)
  Binder->>Planner: plan_query(..., disable_dev1503_isolation=False)
  Planner->>Planner: detect filtered-local trigger -> _plan_filtered_local (DEV-1503)
  Planner->>Generator: build SQL (expand derived refs, collect filter join-paths)
  Generator->>Engine: return response.sql
  Engine->>Test: response.sql (validated)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • MotleyAI/slayer#158: Overlaps Mode‑B = rewrite/tokenizer logic and related syntax changes.
  • MotleyAI/slayer#163: Overlaps DEV‑1503 filtered-local isolation and related planner/binder generator plumbing.
  • MotleyAI/slayer#159: Related changes to aggregate canonical naming for parametric aggregates.

Suggested reviewers

  • AivanF

Poem

"I hop where pipes become concat and LIKE,
I guard the shifted time from divide-by-strife.
I stitch join-paths into CTEs, neat and small,
Tests now ask the engine, and SQL shows all.
A carrot for the reviewer, and a happy hop for all."

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch egor/dev-1484-dev-1452-stage-c-migrate-pre-existing-tests-off-legacy

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
slayer/sql/generator.py (1)

3554-3600: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate is_root=False to the other joined-ColumnSqlKey call sites.

This fixes base-row derived dimensions, but joined derived time dimensions and joined derived filter refs still call _expand_derived_column_sql() with the default True (_raw_time_col_expr_for_planned, _collect_filter_join_paths, _render_value_key_for_filter). A multi-hop derived ref on a joined model will still qualify to the bare child alias instead of the full __ path, so those queries keep generating invalid SQL outside the base-select path.

🛠️ Suggested follow-up
-                expanded_sql = self._expand_derived_column_sql(
-                    source_model=joined_model,
-                    source_relation="__".join(time_column.path),
-                    column_name=time_column.column_name,
-                    bundle=bundle,
-                )
+                expanded_sql = self._expand_derived_column_sql(
+                    source_model=joined_model,
+                    source_relation="__".join(time_column.path),
+                    column_name=time_column.column_name,
+                    bundle=bundle,
+                    is_root=False,
+                )

Apply the same change to the joined-model branches in _collect_filter_join_paths() and _render_value_key_for_filter().

As per coding guidelines, "Joined tables use __-delimited path aliases in SQL to disambiguate diamond joins."

Also applies to: 7315-7350

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/sql/generator.py` around lines 3554 - 3600, The joined-model call
sites that build derived-column SQL must pass the correct is_root flag into
_expand_derived_column_sql so multi-hop derived refs on joined models get full
__-delimited path aliases; update the joined-ColumnSqlKey branches in
_raw_time_col_expr_for_planned, _collect_filter_join_paths, and
_render_value_key_for_filter (the same pattern as the shown generator.py block)
to pass is_root=False (or is_root=not key.path where key is a ColumnSqlKey) when
key.path is non-empty so the expansion roots at the joined owner rather than the
base model.
🧹 Nitpick comments (4)
tests/test_transform_lowerer.py (1)

97-97: ⚡ Quick win

Pin canonical zero type in the denominator assertion.

At Line 97, assert zero == 0 also passes for non-canonical values and won’t catch normalization regressions. Assert Decimal(0) (or normalize_scalar(0)) explicitly.

Proposed test hardening
+from decimal import Decimal
+
 from slayer.core.keys import (
     AggregateKey,
     ArithmeticKey,
@@
-        assert zero == 0
+        assert zero == Decimal(0)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_transform_lowerer.py` at line 97, The assertion currently checks
`zero == 0`, which can pass for non-canonical representations; update the test
to assert a canonical zero type by comparing `zero` to `Decimal(0)` (or to the
result of `normalize_scalar(0)` if that helper exists) so the test fails on
normalization regressions—replace the `assert zero == 0` with an assertion that
`zero` equals `Decimal(0)` (or `normalize_scalar(0)`), referencing the `zero`
variable and `normalize_scalar` helper as needed.
tests/test_errors_hierarchy.py (1)

51-51: ⚡ Quick win

Use keyword arguments in both pytest.mark.parametrize decorators.

Switch to argnames= and argvalues= to follow the repo-wide Python call style rule.

Proposed change
-@pytest.mark.parametrize("cls", _VALUE_ERROR_BINDING_ERRORS)
+@pytest.mark.parametrize(argnames="cls", argvalues=_VALUE_ERROR_BINDING_ERRORS)
 def test_binding_error_is_value_error(cls: type) -> None:
@@
-@pytest.mark.parametrize("cls", _VALUE_ERROR_BINDING_ERRORS)
+@pytest.mark.parametrize(argnames="cls", argvalues=_VALUE_ERROR_BINDING_ERRORS)
 def test_binding_error_is_slayer_error(cls: type) -> None:
As per coding guidelines: `**/*.py`: Use keyword arguments for functions with more than 1 parameter.

Also applies to: 59-59

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_errors_hierarchy.py` at line 51, Update the two
pytest.parametrized decorators to use keyword args: replace the positional call
pytest.mark.parametrize("cls", _VALUE_ERROR_BINDING_ERRORS) (and the other
instance at the same file) with pytest.mark.parametrize(argnames="cls",
argvalues=_VALUE_ERROR_BINDING_ERRORS) so the test decorators follow the repo
style for functions with multiple parameters; locate the decorators in
tests/test_errors_hierarchy.py by the pytest.mark.parametrize usage and change
to argnames= and argvalues= accordingly.
tests/test_aggregation_gating.py (1)

52-73: ⚡ Quick win

Use shared _engine_generate instead of duplicating engine setup.

_generate_sql duplicates the same storage/bootstrap/validate flow that now lives in tests/_engine_helpers.py. Switching to the shared helper will keep test harness behavior consistent and reduce future divergence.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_aggregation_gating.py` around lines 52 - 73, The _generate_sql
function duplicates test engine setup/teardown; replace its body with a call to
the shared helper _engine_generate to reuse the centralized
storage/bootstrap/validation logic: import _engine_generate at the top of the
test module (if not already), change _generate_sql to delegate to await
_engine_generate(orders=orders, customers=customers, measures=measures) and
return the SQL string it yields, and remove the duplicated
YAMLStorage/SlugQueryEngine/bootstrap/assert logic (keep any local assertions
only if they’re uniquely required for this test).
tests/_engine_helpers.py (1)

57-64: 🏗️ Heavy lift

Make _engine_generate keyword-only for query and model.

This helper currently encourages positional calls with multiple arguments, which is drifting from the repo’s calling convention. Please make query and model keyword-only and update call sites accordingly.

Suggested change
 async def _engine_generate(
-    query: SlayerQuery,
-    model: SlayerModel,
-    *,
+    *,
+    query: SlayerQuery,
+    model: SlayerModel,
     dialect: str = "postgres",
     extra_models: Optional[list] = None,
     validate: bool = True,
 ) -> str:

As per coding guidelines, **/*.py: “Use keyword arguments for functions with more than 1 parameter”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/_engine_helpers.py` around lines 57 - 64, Change the signature of
_engine_generate to make query and model keyword-only (e.g., async def
_engine_generate(*, query: SlayerQuery, model: SlayerModel, dialect: str =
"postgres", extra_models: Optional[list] = None, validate: bool = True) ) and
then update every call site to pass those two parameters by name (use query=...
and model=... instead of positional args); search for usages of _engine_generate
in tests/_engine_helpers.py and across the test suite to update invocations
accordingly so the function enforces keyword-only for query and model.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/sql/generator.py`:
- Around line 4436-4437: Routed filter expressions like like(...) are being
turned into string literals because _collect_routed_filters calls
_render_filter_value_key_in_target_scope which lacks a branch for ScalarCallKey;
add handling for ScalarCallKey in _render_filter_value_key_in_target_scope (and
the analogous places referenced) to detect a ScalarCallKey whose name == "like"
and return an exp.Like AST (or otherwise defer to the existing renderer that
emits exp.Like) instead of falling through to _literal_key_to_exp; update any
identical routines used for cross-model CTE routing so ScalarCallKey predicates
stay predicates rather than literals (refer to exp.Like,
_collect_routed_filters, _render_filter_value_key_in_target_scope,
ScalarCallKey, and _literal_key_to_exp).

In `@tests/test_agg_render_spec.py`:
- Around line 403-408: The call to _slot currently passes key positionally;
update it to use a keyword argument (e.g., key=key) so all parameters are passed
by name; modify the instantiation assigned to variable slot that calls
_slot(declared_name=..., public_name=..., slot_type=...) to include key=key to
follow the keyword-arg style for multi-parameter functions.

In `@tests/test_query_backed_models.py`:
- Around line 52-58: The helper function _expand_stage_as_model currently
accepts multiple positional parameters (engine, stage, *, name: str,
data_source: str) but is being called positionally; change its signature to
enforce keyword-only usage (e.g., make engine and stage positional but require
name and data_source as keyword-only or make all parameters keyword-only) and
update all call sites (including the calls referenced around the other
occurrences) to pass arguments by keyword (e.g., name=<...>, data_source=<...>)
so the function is invoked using keyword-based calls; ensure callers of
_expand_stage_as_model use the exact parameter names (name, data_source) in
their calls.

In `@tests/test_response_meta.py`:
- Line 4: The docstring in tests/test_response_meta.py contains a duplicated
word "legacy legacy" in the phrase "legacy legacy enriched-query path"; update
the docstring to remove the duplicate so it reads "legacy enriched-query path"
(search for the string "legacy legacy enriched-query path" and replace it with
"legacy enriched-query path" in that file).

---

Outside diff comments:
In `@slayer/sql/generator.py`:
- Around line 3554-3600: The joined-model call sites that build derived-column
SQL must pass the correct is_root flag into _expand_derived_column_sql so
multi-hop derived refs on joined models get full __-delimited path aliases;
update the joined-ColumnSqlKey branches in _raw_time_col_expr_for_planned,
_collect_filter_join_paths, and _render_value_key_for_filter (the same pattern
as the shown generator.py block) to pass is_root=False (or is_root=not key.path
where key is a ColumnSqlKey) when key.path is non-empty so the expansion roots
at the joined owner rather than the base model.

---

Nitpick comments:
In `@tests/_engine_helpers.py`:
- Around line 57-64: Change the signature of _engine_generate to make query and
model keyword-only (e.g., async def _engine_generate(*, query: SlayerQuery,
model: SlayerModel, dialect: str = "postgres", extra_models: Optional[list] =
None, validate: bool = True) ) and then update every call site to pass those two
parameters by name (use query=... and model=... instead of positional args);
search for usages of _engine_generate in tests/_engine_helpers.py and across the
test suite to update invocations accordingly so the function enforces
keyword-only for query and model.

In `@tests/test_aggregation_gating.py`:
- Around line 52-73: The _generate_sql function duplicates test engine
setup/teardown; replace its body with a call to the shared helper
_engine_generate to reuse the centralized storage/bootstrap/validation logic:
import _engine_generate at the top of the test module (if not already), change
_generate_sql to delegate to await _engine_generate(orders=orders,
customers=customers, measures=measures) and return the SQL string it yields, and
remove the duplicated YAMLStorage/SlugQueryEngine/bootstrap/assert logic (keep
any local assertions only if they’re uniquely required for this test).

In `@tests/test_errors_hierarchy.py`:
- Line 51: Update the two pytest.parametrized decorators to use keyword args:
replace the positional call pytest.mark.parametrize("cls",
_VALUE_ERROR_BINDING_ERRORS) (and the other instance at the same file) with
pytest.mark.parametrize(argnames="cls", argvalues=_VALUE_ERROR_BINDING_ERRORS)
so the test decorators follow the repo style for functions with multiple
parameters; locate the decorators in tests/test_errors_hierarchy.py by the
pytest.mark.parametrize usage and change to argnames= and argvalues=
accordingly.

In `@tests/test_transform_lowerer.py`:
- Line 97: The assertion currently checks `zero == 0`, which can pass for
non-canonical representations; update the test to assert a canonical zero type
by comparing `zero` to `Decimal(0)` (or to the result of `normalize_scalar(0)`
if that helper exists) so the test fails on normalization regressions—replace
the `assert zero == 0` with an assertion that `zero` equals `Decimal(0)` (or
`normalize_scalar(0)`), referencing the `zero` variable and `normalize_scalar`
helper as needed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 64624493-4eca-4a6e-bead-2108082f137a

📥 Commits

Reviewing files that changed from the base of the PR and between 8b2239c and 0065024.

📒 Files selected for processing (34)
  • CLAUDE.md
  • docs/concepts/references.md
  • slayer/core/errors.py
  • slayer/core/keys.py
  • slayer/engine/binding.py
  • slayer/engine/planning.py
  • slayer/engine/syntax.py
  • slayer/sql/generator.py
  • tests/_engine_helpers.py
  • tests/integration/test_integration.py
  • tests/test_agg_render_spec.py
  • tests/test_aggregation_gating.py
  • tests/test_cross_model_derived_columns.py
  • tests/test_cross_model_rename_dev1448.py
  • tests/test_errors_hierarchy.py
  • tests/test_filter_renamed_measure.py
  • tests/test_formula.py
  • tests/test_generator2_multistage.py
  • tests/test_help.py
  • tests/test_measure_expansion.py
  • tests/test_memories_resolver_typed.py
  • tests/test_named_measures.py
  • tests/test_nested_dag_cross_stage_refs.py
  • tests/test_projection_trim.py
  • tests/test_query_backed_models.py
  • tests/test_query_backed_typed_expansion.py
  • tests/test_reference_semantics.py
  • tests/test_response_meta.py
  • tests/test_schema_drift_typed.py
  • tests/test_slack_normalization.py
  • tests/test_sql_generator.py
  • tests/test_syntax.py
  • tests/test_transform_lowerer.py
  • tests/test_value_registry.py

Comment thread slayer/sql/generator.py
Comment on lines +403 to +408
slot = _slot(
key,
declared_name="active_revenue_sum",
public_name="active_revenue_sum",
slot_type=DataType.DOUBLE,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Pass key as a keyword argument in _slot(...).

This new call should follow keyword-arg style for multi-parameter functions.

Suggested patch
-        slot = _slot(
-            key,
+        slot = _slot(
+            key=key,
             declared_name="active_revenue_sum",
             public_name="active_revenue_sum",
             slot_type=DataType.DOUBLE,
         )

As per coding guidelines **/*.py: Use keyword arguments for functions with more than 1 parameter.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
slot = _slot(
key,
declared_name="active_revenue_sum",
public_name="active_revenue_sum",
slot_type=DataType.DOUBLE,
)
slot = _slot(
key=key,
declared_name="active_revenue_sum",
public_name="active_revenue_sum",
slot_type=DataType.DOUBLE,
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_agg_render_spec.py` around lines 403 - 408, The call to _slot
currently passes key positionally; update it to use a keyword argument (e.g.,
key=key) so all parameters are passed by name; modify the instantiation assigned
to variable slot that calls _slot(declared_name=..., public_name=...,
slot_type=...) to include key=key to follow the keyword-arg style for
multi-parameter functions.

Comment on lines +52 to +58
async def _expand_stage_as_model(
engine: SlayerQueryEngine,
stage: SlayerQuery,
*,
name: str,
data_source: str = "ds",
) -> SlayerModel:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use keyword-only parameters and keyword-based calls for the new helper.

The new helper and its callsites use positional arguments even though the function has multiple parameters.

Suggested patch
 async def _expand_stage_as_model(
-    engine: SlayerQueryEngine,
-    stage: SlayerQuery,
-    *,
+    *,
+    engine: SlayerQueryEngine,
+    stage: SlayerQuery,
     name: str,
     data_source: str = "ds",
 ) -> SlayerModel:
@@
-            virtual = await _expand_stage_as_model(engine, stage, name="qb_default")
+            virtual = await _expand_stage_as_model(
+                engine=engine,
+                stage=stage,
+                name="qb_default",
+            )
@@
-            virtual = await _expand_stage_as_model(engine, stage, name="qb_collide")
+            virtual = await _expand_stage_as_model(
+                engine=engine,
+                stage=stage,
+                name="qb_collide",
+            )
@@
-            virtual = await _expand_stage_as_model(engine, stage, name="qb_alias")
+            virtual = await _expand_stage_as_model(
+                engine=engine,
+                stage=stage,
+                name="qb_alias",
+            )

As per coding guidelines **/*.py: Use keyword arguments for functions with more than 1 parameter.

Also applies to: 1470-1470, 1489-1489, 1509-1509

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_query_backed_models.py` around lines 52 - 58, The helper function
_expand_stage_as_model currently accepts multiple positional parameters (engine,
stage, *, name: str, data_source: str) but is being called positionally; change
its signature to enforce keyword-only usage (e.g., make engine and stage
positional but require name and data_source as keyword-only or make all
parameters keyword-only) and update all call sites (including the calls
referenced around the other occurrences) to pass arguments by keyword (e.g.,
name=<...>, data_source=<...>) so the function is invoked using keyword-based
calls; ensure callers of _expand_stage_as_model use the exact parameter names
(name, data_source) in their calls.


Asserts the typed-plan-derived ``attributes`` + ``expected_columns`` match
what the legacy EnrichedQuery path produced: result keys read straight from
what the legacy legacy enriched-query path produced: result keys read straight from
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix duplicated "legacy" in docstring.

The phrase "legacy legacy enriched-query path" appears to have a duplicated word.

📝 Proposed fix
-what the legacy legacy enriched-query path produced: result keys read straight from
+what the legacy enriched-query path produced: result keys read straight from
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
what the legacy legacy enriched-query path produced: result keys read straight from
what the legacy enriched-query path produced: result keys read straight from
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_response_meta.py` at line 4, The docstring in
tests/test_response_meta.py contains a duplicated word "legacy legacy" in the
phrase "legacy legacy enriched-query path"; update the docstring to remove the
duplicate so it reads "legacy enriched-query path" (search for the string
"legacy legacy enriched-query path" and replace it with "legacy enriched-query
path" in that file).

ZmeiGorynych and others added 4 commits May 29, 2026 12:50
… their joins

A column-level Column.filter on an aggregated measure renders as
SUM(CASE WHEN <filter> THEN col END). When the filter referenced a joined
table via a derived column (bare is_eu, or dotted loss_payment.has_flag whose
own sql is derived), the typed pipeline left the derived ref un-inlined
(emitting a non-physical <alias>.<derived_col>) and/or failed to pull the
crossed join into the FROM.

Fix (all in slayer/sql/generator.py): extract one shared Mode-A predicate
renderer (_render_mode_a_predicate) that inline-expands derived refs — bare
AND dotted-to-a-derived-column-on-a-joined-model — else cheap-qualifies,
matching the already-correct query-level filter path so no dangling ref is
emitted. Join discovery (_filter_join_paths) unions the un-inlined predicate's
paths (so the dbt placeholder-join idiom has_flag sql="1" keeps its alias even
when it inlines to a constant) with the inline-expanded predicate's paths (so a
derived ref's crossed joins surface). Applied to both Column.filter and
SlayerModel.filters; the cross-model _cm_ CTE target-filter path keeps
dotted-derived inlining off (no deeper-join mechanism — that is DEV-1503).

Tests: un-xfail + strengthen the two tracking tests (now assert the ref is
inlined inside the CASE-WHEN, not just that a join exists); promote the
filtered-last carries-join test; keep the DEV-1503 isolation xfails with
refreshed reasons; add companions for dotted-base, dotted-derived crossing a
further join, model-filter inlining, a mixed filtered/unfiltered sibling, the
root-scope-only EXISTS limitation, a sqlite constant case, and multi-hop input.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeRabbit: (1) recurse into composite AGGREGATE keys (ArithmeticKey /
ScalarCallKey) when collecting Column.filter joins, so a filtered aggregate
nested in `a:sum + b:sum` discovers its crossed join; (2) thread source_relation
into the dotted-derived detection so it resolves consistently with the rest of
the Mode-A path; (3) use _norm(sql) in the mixed-measure assertion.

Codex: a cross-model aggregate whose TARGET column carries a Column.filter
crossing one of the target's own joins emitted a dangling alias in the _cm_*
CTE. _render_cross_model_cte now collects the target measure's Column.filter +
target-model-filter join paths and adds them to the CTE's FROM (the CTE is an
isolated per-(target,grain) computation, so the join only affects this measure).
Target-model-filter dotted-derived inlining is enabled there as a result.

Sonar gate (was red): dedupe the new companion fixtures behind shared helpers
(new_duplicated_lines_density 14% -> under threshold); rewrite the two ReDoS
.*X.* assertion regexes to non-backtracking [^"]* (S5852 hotspots). Reduce
cognitive complexity by extracting helpers: _column_ref_is_derived,
_expand_degenerate_derived_root (from _render_mode_a_predicate),
_collect_column_filter_join_paths + _expand_derived_row_dims (from
_build_base_select_for_planned), and _value_key_join_paths (from
_collect_filter_join_paths).

New tests: composite-measure filtered aggregate, cross-model CTE target-filter
join. Full non-integration suite: 3914 passed / 0 failed / 39 xfailed; ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Security hotspots (python:S5852, gate new_security_hotspots_reviewed): the two
flagged assertion regexes still tripped the rule after the [^"]* rewrite —
S5852 flags ANY regex with two-plus unbounded quantifiers, so [^"]*X[^"]*Y is
still ReDoS-shaped. Reduce to a single bounded [^"]* segment plus a plain
substring check for the THEN clause; no hotspot is raised.

Cognitive complexity (python:S3776, non-gate-blocking CRITICAL): the helper
extraction reduced the worst function 87 -> 43 and eliminated two, but the
residual is irreducible per-slot/per-key dispatch + recursion that can't reach
15 without fragmenting cohesive logic. NOSONAR(S3776) with per-function
rationale on the four (matches the existing _render_cross_model_cte precedent).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oss-model-derived-column-refs-in-filters

DEV-1494: inline cross-model derived refs in Column.filter + discover their joins
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
slayer/sql/generator.py (3)

5404-5412: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Routed derived filters need to contribute to cte_join_paths.

_collect_routed_filters() can expand a routed target-local ColumnSqlKey into SQL that references deeper __ aliases, but those paths are never added to cte_join_paths before cte_where is attached. A routed filter like where is_eu = true on the target model will therefore generate a predicate against joins this _cm_* CTE never added.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/sql/generator.py` around lines 5404 - 5412, _cte_where returned by
_collect_routed_filters may reference deeper join aliases but its join path(s)
are not merged into cte_join_paths, so routed filters produce predicates against
joins the _cm_* CTE never added; update the call site after cte_where is
computed so that when cte_where is not None you also extract and merge the
routed join path information returned by _collect_routed_filters (or call its
companion accessor) into cte_join_paths before appending cte_where, referencing
the symbols _collect_routed_filters, cte_where and cte_join_paths (and the
ColumnSqlKey/_cm_* path names) so the CTE includes the necessary join paths for
the generated predicates.

3525-3556: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Collect joins crossed by derived aggregate sources too.

This prepass now covers row dims, row filters, and Column.filter, but it still skips joins introduced by a local aggregate’s ColumnSqlKey source. _build_agg_render_spec_from_planned() expands that SQL later, so a measure like sum(net_amount) can still emit customers__... refs against a FROM that never joined customers unless some unrelated clause happens to pull the same path in.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/sql/generator.py` around lines 3525 - 3556, The prepass misses joins
introduced by a local aggregate's ColumnSqlKey source so add a pass that
collects those join paths and appends them to needed_join_paths before
rendering; call a new or existing collector (e.g. implement and call
self._collect_agg_source_join_paths(planned_query=planned_query,
base_render_order=base_render_order, slots_by_id=slots_by_id,
source_relation=source_relation, source_model=source_model, bundle=bundle) or
similar) right alongside the other collectors shown, and for each returned path
do "if p not in needed_join_paths: needed_join_paths.append(p)"; ensure this
complements _build_agg_render_spec_from_planned() so aggregate ColumnSqlKey SQL
won't reference unjoined relations.

5361-5438: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The _cm_* join collector still misses the measure source SQL.

cte_join_paths only looks at the measure’s column_filter_key and text target-model filters. If the cross-model measure source itself is a target-local derived ColumnSqlKey, _build_agg_render_spec_from_planned() will expand it to __-qualified refs, but target_from is still built without those joins, so the aggregate body can reference aliases that never entered the CTE’s FROM.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/sql/generator.py` around lines 5361 - 5438, cte_join_paths is not
being seeded with the measure's source ColumnSqlKey, so when
_build_agg_render_spec_from_planned expands a target-local derived measure the
CTE's FROM misses required joins; update the block that seeds cte_join_paths
(around local_agg_key and _add_cte_join_paths) to also detect the measure source
ColumnSqlKey on local_agg_key (the same source used by
_build_agg_render_spec_from_planned) and call _add_cte_join_paths with its
canonical_sql (e.g. local_agg_key.<source_column_key>.canonical_sql or whatever
field holds the measure source ColumnSqlKey) so the generated
target_from/cte_base_joins include those join paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@slayer/sql/generator.py`:
- Around line 5404-5412: _cte_where returned by _collect_routed_filters may
reference deeper join aliases but its join path(s) are not merged into
cte_join_paths, so routed filters produce predicates against joins the _cm_* CTE
never added; update the call site after cte_where is computed so that when
cte_where is not None you also extract and merge the routed join path
information returned by _collect_routed_filters (or call its companion accessor)
into cte_join_paths before appending cte_where, referencing the symbols
_collect_routed_filters, cte_where and cte_join_paths (and the
ColumnSqlKey/_cm_* path names) so the CTE includes the necessary join paths for
the generated predicates.
- Around line 3525-3556: The prepass misses joins introduced by a local
aggregate's ColumnSqlKey source so add a pass that collects those join paths and
appends them to needed_join_paths before rendering; call a new or existing
collector (e.g. implement and call
self._collect_agg_source_join_paths(planned_query=planned_query,
base_render_order=base_render_order, slots_by_id=slots_by_id,
source_relation=source_relation, source_model=source_model, bundle=bundle) or
similar) right alongside the other collectors shown, and for each returned path
do "if p not in needed_join_paths: needed_join_paths.append(p)"; ensure this
complements _build_agg_render_spec_from_planned() so aggregate ColumnSqlKey SQL
won't reference unjoined relations.
- Around line 5361-5438: cte_join_paths is not being seeded with the measure's
source ColumnSqlKey, so when _build_agg_render_spec_from_planned expands a
target-local derived measure the CTE's FROM misses required joins; update the
block that seeds cte_join_paths (around local_agg_key and _add_cte_join_paths)
to also detect the measure source ColumnSqlKey on local_agg_key (the same source
used by _build_agg_render_spec_from_planned) and call _add_cte_join_paths with
its canonical_sql (e.g. local_agg_key.<source_column_key>.canonical_sql or
whatever field holds the measure source ColumnSqlKey) so the generated
target_from/cte_base_joins include those join paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 07aeef3a-a52b-4433-b671-9031b96395a8

📥 Commits

Reviewing files that changed from the base of the PR and between 0065024 and 74ed95a.

📒 Files selected for processing (5)
  • docs/architecture/index.md
  • docs/architecture/sql-generation.md
  • slayer/sql/generator.py
  • tests/test_cross_model_derived_columns.py
  • tests/test_sql_generator.py
✅ Files skipped from review due to trivial changes (2)
  • docs/architecture/index.md
  • docs/architecture/sql-generation.md

ZmeiGorynych and others added 12 commits May 30, 2026 14:17
…G slack rule

The slack FUNC_STYLE_AGG normalizer previously sourced custom aggregation
names from the source model only, so `rolling_avg(customers.score)` reached
the Mode-B parser unrewritten and raised UnknownFunctionError. Both the
query path (`_normalize_stage`) and the model-save path (`normalize_model`
via `save_model`) now collect aggregations reachable through the join graph:
the query path uses a sync `ResolvedSourceBundle.reachable_aggregation_names`
walk over the pre-resolved bundle (scoped per stage), and the save path
reuses `_collect_reachable_agg_names` with a best-effort storage closure
that swallows AmbiguousModelError and other lookup errors. `normalize_model`
gains a `custom_agg_names` param mirroring `normalize_query`, with
None=>fallback-to-own-aggs and frozenset()=>suppress-fallback.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`parse_filter_expr` rewrote every standalone `=` to `==` so SQL equality
(`status = 'paid'`) parses. The blanket rewrite also rewrote keyword-
argument `=` inside transform/aggregation calls, so
`ntile(amount:sum, n=4) <= 1` became `n == 4` (a positional `Cmp`) and
the binder rejected it as "exactly one positional argument". The
documented top-N filter patterns (`ntile(<m>, n=<N>) <= 1`,
`rank(<m>, partition_by=<col>) <= N`) were broken end-to-end whenever
a kwarg was present.

Replace the per-part `=`→`==` regex with a literal-aware, paren-kind-
tracking single scan that preserves `=` as a kwarg iff the innermost
open paren is a CALL paren whose callee is NOT in `SCALAR_FUNCTIONS`,
and the `=` is preceded by `IDENT` preceded by `(` or `,` (Python's
kwarg grammar). Lowercase keywords (`and`/`or`/`not`/`in`/`is`) do not
classify a following `(` as a call paren. The scanner runs as the last
step of `_normalize_sql_filter_operators` on the rejoined string.

The `SCALAR_FUNCTIONS` narrowing mirrors the architecture's existing
no-kwargs-on-scalars rule (docs/architecture/parsing.md), so
`coalesce(status = 'paid', False)` still normalizes to a `ScalarCall`
with a `Cmp` arg as before. Same code path fixes the colon-agg form
`revenue:percentile(p=0.5) > 100`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…outer trim

Hidden first/last aggregates referenced only by ORDER BY or HAVING used to
render inline against a dangling `_last_rn`, collapsing distinct
explicit-time-column specs into a single broken term. The architecture's
documented design (planning.md:89-93,155-159) already mandates the fix:
materialise hidden order/filter slots in the base CTE, trim them via an
outer SELECT. The transform path already did this; this commit extends the
same pattern to the no-transform path.

Five coordinated changes:

* planning.py `_canonical_name` routes AggregateKey through
  `canonical_agg_name` so parametric aggregates (`revenue:last(created_at)`,
  `revenue:percentile(p=0.5)`) carry their arg/kwarg signature in the
  declared name — distinct hidden slots get distinct base-CTE aliases.
* generator.py `_collect_base_aux_slot_ids` learns `aggregates_only=True`
  and walks AGGREGATE-phase filter deps too. The no-transform path uses it
  to pull hidden AggregateKey order/filter operands into base_render_order
  without dragging ROW leaves into GROUP BY.
* generator.py `_build_outer_trim_wrap_sql` wraps the base SELECT when any
  hidden materialised slot is present, projecting exactly the public
  projection (mirroring the transform path's outer wrap, cycling
  aliases_by_slot_id for C13 duplicates) and moving ORDER BY / LIMIT /
  OFFSET to the outer level. `_apply_order_limit_from_planned` resolves
  hidden AggregateKey order targets via the materialised aliases.
* generator.py `_build_first_last_base_select` collects full aliases per
  sid (multi-alias list) and pass-2 cycles per visit — pre-existing C13
  bug exposed by the wrap.
* generator.py `_build_agg` suffix guard no longer requires
  `default_time_col`; FirstLastRenderState threads the rn maps from the
  base builder into `_build_where_having_from_planned` so HAVING
  aggregates reference the same `_first_rn` / `_last_rn{suffix}` columns
  the base projects.

Tests: 27 new in test_sql_generator (TestDev1501HiddenFirstLastRender +
TestDev1501BroadTriggerAndGuards + multi-dialect parametric) and
test_projection_planner (TestDev1501CanonicalNameWithAggArgs) plus one
SQLite integration test against real data. test_projection_trim's
`test_declared_projection_order_and_hidden_exclusion` second assertion
relaxed to "no quantity_sum in OUTER projection" to match the materialise
+trim shape per docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…TYLE_AGG callers

The query/save normalization paths walked the join graph for custom
aggregation names in beb6924, but the two quiet callers of
``func_style_agg_to_colon`` still passed source-model-only aggs:

- ``slayer.memories.resolver.extract_entities_from_query`` — a saved
  memory whose query references a joined-model custom agg in funcstyle
  (e.g. ``rolling_avg(customers.score)``) lost the ``customers.score``
  entity tag because the rewrite did not fire.
- ``slayer.engine.schema_drift._cascade_measures`` — a ModelMeasure on
  ``orders`` using the same funcstyle did not cascade-drop when the
  joined column was removed from ``customers``.

Both paths now collect custom aggregation names reachable through the
join graph: the resolver reuses the async ``_collect_reachable_agg_names``
with a best-effort storage closure (matching ``save_model``), and the
cascade does a sync BFS over ``_CascadeState.models_by_name``.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…hygiene

Codex + CodeRabbit + SonarQube review of PR #159 surfaced two genuine
HAVING render bugs and a defensive POST-walk gap, all rooted in
incomplete plumbing of the ranked-subquery state into the HAVING render
paths added by the original Change 5.

Group A — HAVING ranked-state threading (2 real bugs):

* Host HAVING: ``_render_value_key_for_filter``'s AggregateKey branch
  built the synth with placeholder ``full_alias="__having_ref__"``, so
  ``_build_agg``'s ``filtered_rn_map[alias]`` / ``filtered_match_map
  [alias]`` lookups missed and the term fell back to unfiltered
  ``_last_rn`` + raw ``filter_sql`` for filtered first/last. Thread
  ``aliases_by_slot_id`` into ``_build_where_having_from_planned`` →
  ``_render_value_key_for_filter`` and look up the materialised slot's
  full alias as the synth alias.

* Cross-model HAVING: ``_collect_routed_filters`` →
  ``_render_filter_value_key_in_target_scope`` rebuilt the aggregate
  via ``_build_agg(synth)`` with no rn-state context, so a cross-model
  filtered first/last in HAVING bound to bare ``_last_rn`` instead of
  the CTE's ``_last_rn_f0``. Extend ``FirstLastRenderState`` with
  ``agg_synth_alias`` (single-aggregate cross-model CTE case),
  ``_render_cross_model_cte`` builds the state from its existing rn
  maps + the projected aggregate's ``full_agg_alias``, threaded
  through ``_collect_routed_filters`` /
  ``_render_filter_value_key_in_target_scope`` recursion.

Group B — defensive guard:

* ``_collect_base_aux_slot_ids`` no longer walks POST-phase filter deps
  unless ``transform_layers`` exist. POST-phase requires TransformKey
  operands, so POST in no-transform queries is planner-unreachable;
  walking the deps without applying the filter would silently
  materialise helper slots.

Group C — Sonar S3776 NOSONAR markers on 5 functions over the
complexity threshold (``_canonical_name`` planning.py:631;
``generate_from_planned``, ``_collect_base_aux_slot_ids``,
``_render_value_key_for_filter``, ``_apply_order_limit_from_planned``
generator.py), each with a rationale matching the existing codebase
pattern. Pre-existing test helper at tests/test_sql_generator.py:88
left untouched.

Group D — two small Sonar test fixes: relax the reluctant-quantifier
HAVING regex from ``r"HAVING(.*?)(ORDER BY|$)"`` to ``r"HAVING(.*)$"``
(inner SELECT has no ORDER BY after outer wrap); drop ``async`` from
``test_hidden_composite_order_rejected_at_input_validation`` (does no
``await`` — just an OrderItem input-validation check).

Tests: 2 new regression tests for the HAVING bugs
(``test_filtered_first_last_in_having_uses_filtered_rn``,
``test_cross_model_filtered_last_in_having``); all 32 DEV-1501 tests
pass; full non-integration suite 3931 pass; SQLite + DuckDB
integration suites pass; ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 4-hop save and 4-hop query tests each defined their own local
_chain_model helper and inlined the same a -> b -> c -> d -> e save
sequence. Lift both into module-level helpers so the tests don't
duplicate the chain setup — fixes the SonarCloud duplication block
the gate flagged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… + NOSONAR

Codex round-2 review caught a residual gap from the round-1 fix:
``_render_value_key_for_filter``'s recursive calls for
``ScalarCallKey`` / ``BetweenKey`` / ``InKey`` threaded
``first_last_state`` but DROPPED ``aliases_by_slot_id``. So a filtered
``paid_amount:last(created_at)`` nested under ``coalesce(...)``, ``IN
(...)``, or ``BETWEEN`` rebuilt the synth with placeholder
``__having_ref__`` → ``filtered_rn_map[alias]`` missed → fell back to
unfiltered ``_last_rn`` + raw ``filter_sql``. The original round-1
regression test only exercised the ``>`` (ArithmeticKey) path.

Fix: pass ``aliases_by_slot_id`` through the three missing recursion
sites in ``_render_value_key_for_filter``. New regression test pins a
``coalesce(paid_amount:last(created_at), 0) > 0`` HAVING shape to
prevent re-occurrence.

Sonar S3776 NOSONAR marker on
``_render_filter_value_key_in_target_scope`` (the function grew from
the round-1 Group A.3 edits; complexity 26).

Also: reply on the two stale CodeRabbit threads (POST-drop guard +
cross-model HAVING rn-state) that were addressed in commit dc7a775
but not auto-resolved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…n-state

Codex round-3 review caught the last rn-state plumbing gap from the
DEV-1501 series: composite aggregate slots (ArithmeticKey /
ScalarCallKey of aggregates) projected on the first/last-ranked path
render via ``_render_aggregate_composite_expr`` which previously did
NOT receive ``rn_suffix_map`` / ``default_time_col_sql`` /
``filtered_rn_map`` / ``filtered_match_map``. A composite like
``revenue:last(created_at) + revenue:last(updated_at)`` therefore
emitted ``MAX(CASE WHEN _last_rn = 1 …) + MAX(CASE WHEN _last_rn = 1
…)`` — both operands collapsed to bare ``_last_rn`` regardless of
their explicit time arg.

Two coordinated fixes:

* ``_render_aggregate_composite_expr`` now accepts the four rn-state
  params (defaults to ``None`` so non-first/last call sites are
  unaffected) and passes them to ``_build_agg`` in the AggregateKey
  branch + threads through ArithmeticKey / ScalarCallKey recursion.
* ``_build_first_last_base_select`` gathers composite-operand
  first/last AggregateKey leaves (via the new module-level
  ``_iter_first_last_leaves`` walker) and includes their synth specs
  in the ranked-subquery synth list — so each operand's distinct time
  column contributes its own ``_first_rn`` / ``_last_rn{suffix}``
  column. The operands are NOT projected as base SELECT columns
  (they're inlined inside the composite render); they only participate
  in the rn-column set.

New regression test: ``test_composite_first_last_in_projection_uses_
correct_suffixes`` exercises a query with both a direct projected
``last(created_at)`` AND a composite ``last(created_at) +
last(updated_at)``, asserting the composite renders each operand with
a DIFFERENT ``_last_rn{suffix}``.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two new Sonar OPEN issues on PR 156 were attributed to my DEV-1500 edits:

- normalize_model (normalization.py:571) — cognitive complexity 48 > 15.
  Split into three pure helpers — _normalize_model_measures (FUNC_STYLE on
  ModelMeasure.formula), _normalize_column_dot_paths (DOT_PATH on Column
  sql/filter), _normalize_model_filter_dot_paths (DOT_PATH on
  SlayerModel.filters). normalize_model now just sequences them, so the
  outer function's complexity drops sharply.

- save_model (query_engine.py:1582) — cognitive complexity 29 > 15. The
  nested _resolve_join_target_for_save closure plus the gating try/except
  inflated the score. Lift the whole thing into a private async method
  _reachable_aggs_for_save(model); save_model is back to a one-liner await.

Pure refactor, no behavior change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…agg-normalization-doesnt-recognize-custom

DEV-1500: recognize joined-model custom aggregations in FUNC_STYLE_AGG slack rule
…anked subquery

Codex round-4 review caught the entry-gate gap from round 3's composite
fix: ``_has_first_last_aggregate`` only walked top-level ``AggregateKey``
slots. A query whose ONLY first/last reference is inside a composite
slot (no direct first/last sibling) — e.g.
``measures=[ModelMeasure("revenue:last(created_at) + revenue:last(updated_at)")]``
— took the regular base path, the composite render emitted ``MAX(CASE
WHEN _last_rn = 1 …)`` referencing a column the bare-FROM never
projected.

Fix: extend ``_has_first_last_aggregate`` to also walk composite slot
keys via ``_iter_first_last_leaves``. Combined with round 3's
composite-operand synth gathering and rn-state threading, composite-only
queries now correctly build the ranked subquery and render each operand
with its own ``_first_rn`` / ``_last_rn{suffix}``.

New regression test:
``test_composite_only_first_last_triggers_ranked_subquery``.

Also: Sonar S3776 NOSONAR(S3776) on ``_iter_first_last_leaves`` (the
walker is the per-type recursion contract — extracting per-type helpers
would scatter it); Sonar S3626 redundant ``return`` removed at the end
of the InKey branch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…y gate

The DEV-1492 scanner had cognitive complexity 56 (cap 15) — Sonar
`python:S3776` and CodeRabbit Thread 8 on PR #158.

Dispatch on `m.lastgroup` (each token-class group in `_SCAN_TOKEN_RE`
is named and the arms are mutually exclusive). Extract one helper per
token class and pull the two non-trivial sub-decisions into pure
helpers `_classify_paren` and `_is_kwarg_equals` — both testable in
isolation. Top-level body drops to a 4-line dispatcher; each handler
and helper is ≤ ~5 lines.

Behavior unchanged. 18 targeted tests + full non-integration suite
still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/test_validate_models.py (1)

453-453: ⚡ Quick win

Move Aggregation to the module import block.

This function-scoped import breaks the repo's import-placement rule and makes the test dependency easy to miss. As per coding guidelines, "Place imports at the top of files."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_validate_models.py` at line 453, The test currently does a
function-scoped import of Aggregation (from slayer.core.models import
Aggregation); move that import into the module-level import block at the top of
tests/test_validate_models.py so all imports are grouped together. Locate the
current inline import of Aggregation, remove it from inside the function, and
add a single module-level import line for Aggregation alongside the other test
imports so the test follows the project's import-placement rule.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/engine/query_engine.py`:
- Around line 1591-1592: The early-return guard only checks model.measures and
thus skips query-backed models whose formulas live in
SlayerModel.source_queries; update the check in the function containing this
guard to treat a model as empty only if both model.measures and
model.source_queries are empty/None (i.e., proceed when either has content).
Ensure reachable custom-aggregation discovery logic considers source_queries the
same way it considers measures so joined-model funcstyle aggregations normalized
by save_model() are also discovered at execute time.

---

Nitpick comments:
In `@tests/test_validate_models.py`:
- Line 453: The test currently does a function-scoped import of Aggregation
(from slayer.core.models import Aggregation); move that import into the
module-level import block at the top of tests/test_validate_models.py so all
imports are grouped together. Locate the current inline import of Aggregation,
remove it from inside the function, and add a single module-level import line
for Aggregation alongside the other test imports so the test follows the
project's import-placement rule.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 26e96109-fda4-46e8-8006-1a3d9e83d2ff

📥 Commits

Reviewing files that changed from the base of the PR and between 74ed95a and 93bc7cd.

📒 Files selected for processing (10)
  • slayer/engine/normalization.py
  • slayer/engine/query_engine.py
  • slayer/engine/schema_drift.py
  • slayer/engine/source_bundle.py
  • slayer/memories/resolver.py
  • tests/test_entity_resolution.py
  • tests/test_slack_normalization.py
  • tests/test_source_bundle.py
  • tests/test_sql_generator.py
  • tests/test_validate_models.py

Comment on lines +1591 to +1592
if not model.measures:
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Handle query-backed models here too.

Query-backed models keep their formulas under SlayerModel.source_queries, not model.measures, so this guard skips reachable custom-aggregation discovery for exactly the models that save_model() also normalizes. Joined-model funcstyle custom aggregations in stored stages will therefore stay unnormalized on save even though execute-time normalization now recognizes them.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/engine/query_engine.py` around lines 1591 - 1592, The early-return
guard only checks model.measures and thus skips query-backed models whose
formulas live in SlayerModel.source_queries; update the check in the function
containing this guard to treat a model as empty only if both model.measures and
model.source_queries are empty/None (i.e., proceed when either has content).
Ensure reachable custom-aggregation discovery logic considers source_queries the
same way it considers measures so joined-model funcstyle aggregations normalized
by save_model() are also discovered at execute time.

ZmeiGorynych and others added 20 commits June 1, 2026 10:30
…ined-path discovery

Codex round-5 review caught two more composite-aware code paths the
round 3/4 fixes missed:

1. ``_build_first_last_base_select``'s "no-default ranking time
   column" validation only walked top-level ``AggregateKey`` slots —
   a composite like ``revenue:last + 1`` (no explicit time arg, no
   model default_time_dimension) bypassed the error and then
   ``_build_unfiltered_rn_columns`` emitted ``ORDER BY None``. Extend
   the validation to also walk composite slots via
   ``_iter_first_last_leaves``.

2. ``_collect_joined_paths_for_base`` walked explicit time args only
   on top-level ``AggregateKey`` slots. A composite operand
   ``revenue:last(customers.signed_up_at) + 1`` orders the ranked
   subquery by ``customers.signed_up_at`` but the ``customers`` join
   was never pulled into the FROM, producing invalid SQL. Extend
   path discovery to walk composite-slot first/last leaves.

Two new regression tests:
* ``test_composite_first_last_without_default_time_dim_raises``
* ``test_composite_first_last_with_joined_time_arg_adds_join``

Also: CodeRabbit minor — the round-2 cross-model HAVING regression
test used ``filter="active = 1"`` on a BOOLEAN column, which Postgres
rejects (no implicit bool↔int cast). Switched to ``= TRUE``.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codex review on PR #158 caught that the kwarg scanner mis-classified
the documented `consecutive_periods(<predicate>)` form when the
predicate's LHS is a bare ref and its operator is SQL `=`:

    consecutive_periods(status = 'paid') >= 2

was normalized to `consecutive_periods(status='paid') >= 2`, parsed as
a kwarg, and rejected by the binder. The `:` in colon-agg forms
(`revenue:sum > 0`) accidentally avoided the regression because the
intervening `:` token broke the LPAREN/IDENT pattern; bare refs had
nothing to break it.

Distinguish callee classes in `_is_kwarg_equals`: for transforms (in
`ALL_TRANSFORMS`) the first positional is always the value to
transform, so a kwarg `=` can only appear AFTER a `,`. Aggregations
keep the existing `LPAREN` or `COMMA` rule because parametric forms
(`weighted_avg(weight=qty)`, `percentile(p=0.5)`) legitimately put a
kwarg first. Scalars keep their existing no-kwarg exclusion.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… filter

Codex round-6 review caught two more edge cases:

1. Filtered composite operands rebuilt their synth with the placeholder
   ``full_alias="__op__"`` in ``_render_aggregate_composite_expr``, but
   ``_build_first_last_base_select`` keyed ``filtered_rn_map`` by the
   per-leaf alias it minted at synth time (``_composite_op_N``). The
   lookup missed → bare ``_last_rn`` + raw filter. Thread a
   ``composite_alias_by_key`` map so the operand synth's alias matches
   the rn-map key.

2. Cross-model queries with a LOCAL first/last HAVING filter
   (``filters=["revenue:last(created_at) > 100"]``) had the
   ``_render_with_cross_model_plans`` ``base_render_order`` construction
   skip AGGREGATE-phase filter deps. The host ``_base`` CTE wasn't
   building the ranked subquery → HAVING emitted dangling ``_last_rn``.
   Add an unconditional ``_collect_base_aux_slot_ids(aggregates_only=
   True, include_order=False)`` pass to the cross-model base-order
   construction. The fix surfaces the existing
   ``NotImplementedError`` for the combination of local first/last +
   cross-model aggregate (the in-place guard was already there), which
   is a strict improvement over silently dangling-``_last_rn`` SQL.

NOSONAR(S3776) on ``_collect_joined_paths_for_base`` (round 5
extension bumped its complexity to 30).

Two regression tests:
* ``test_filtered_composite_first_last_uses_filtered_rn``
* ``test_cross_model_query_with_local_first_last_having_filter``

Also: reply on CodeRabbit's POST-drop re-flag thread referencing the
``has_transforms`` guard in dc7a775.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…+ NOSONARs

Three follow-ups from round-7 review:

1. Codex MAJOR: ``_render_aggregate_composite_expr``'s ``ScalarCallKey``
   branch dropped ``composite_alias_by_key`` when recursing into
   operands (the ``ArithmeticKey`` branch forwarded it; only the
   ``ScalarCallKey`` recursion missed it). A filtered first/last
   operand inside ``coalesce(...)`` would then fall back to the
   placeholder ``__op__`` alias and miss the ``filtered_rn_map``
   lookup. Forward ``composite_alias_by_key`` consistently in all
   three recursion branches.

2. Sonar quality-gate ERROR on ``new_duplicated_lines_density`` (4.0%
   > 3.0%): the round-6 cross-model AGG-phase filter materialization
   block was near-identical to the round-5 transform-deps block.
   Factor both into a single ``_add_local_aux_slots(include_order=,
   aggregates_only=)`` inner helper. Same control flow, one body.

3. Sonar S3776 NOSONAR markers on
   ``_render_aggregate_composite_expr`` (complexity 37 — sequential
   ValueKey dispatch with rn-state + composite-alias-by-key threading)
   and ``_render_with_cross_model_plans`` (complexity 80 —
   orchestration of host ``_base`` CTE + per-plan ``_cm_*`` CTEs +
   combined SELECT + transform chain + outer wrap).

Also: reply on the CodeRabbit POST-drop re-flag thread (the third
time this round) — the ``has_transforms`` guard from dc7a775 remains
sufficient; POST in no-transform is planner-unreachable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codex review on PR #158 caught that the kwarg scanner classifies a call
paren by callee name alone, missing the colon-aggregation context. The
names `first` and `last` sit in both ALL_TRANSFORMS (FIRST_VALUE window)
and the built-in aggregation set (`_AMBIGUOUS_AGG_TRANSFORMS` in
`slayer/core/formula.py`), and after a `:` they are always aggregations
— never transforms. So `revenue:first(order=created_at) > 0` was
landing in the transform branch and the `order=` kwarg was being
rewritten to `order==`.

Detect colon-agg context in `_classify_paren` by checking whether
`hist[-2]` is the `:` token. When so, drop the callee to `None` so
`_is_kwarg_equals` takes the aggregation/unknown branch (kwargs after
`(` or `,`). The `hist` cap of 2 retains exactly the prev-prev token
needed; longer prefixes like `customers.revenue:first(...)` roll off
without affecting the detection.

Built-in `first`/`last` take their order column positionally today
(`revenue:last(ordered_at)`), so the user-visible impact is on custom
user-defined aggregations that override these names; the test pins
the architectural rule with a concrete kwarg-shape input.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three follow-ups from round-8 review:

1. Codex MAJOR (slayer/sql/generator.py:6304+): a DERIVED first/last
   time arg (``ColumnSqlKey``) skipped join discovery.
   ``_resolve_explicit_time_col`` expands ``net_signed_at.sql =
   "customers.signed_up_at"`` so the ranked subquery's ORDER BY emits
   ``customers.signed_up_at`` — but ``_collect_joined_paths_for_base``
   only walked ``ColumnKey`` args, so the ``customers`` join was not
   added to ``_base``. Extend the helper to accept ``source_model`` /
   ``source_relation`` / ``bundle`` and walk ``ColumnSqlKey`` args via
   the existing ``_expand_derived_column_sql`` + ``_joined_paths_in_sql``
   pair (same machinery the ROW-derived TimeTruncKey path already uses).
   + regression test
   ``TestDev1501HiddenFirstLastRender.test_derived_time_arg_pulls_in_referenced_join``.

2. Sonar S3776 NOSONAR on ``_projection_rn_by_alias`` test helper
   (complexity 21 > 15) — sequential isinstance dispatch over
   outermost-SELECT projection layers (Alias / Cast / Max / Case / EQ
   / Column unwrap). Each layer is a structural-shape predicate whose
   failure short-circuits to the next projection; extracting per-layer
   helpers would scatter the predicate.

3. Sonar duplication gate (4.0% > 3.0% threshold,
   test_sql_generator.py 5.1% per-file). Extract
   ``_persist_and_engine(*models)`` async context-manager helper plus
   a ``_orders_with_paid_amount_model`` fixture; apply across the
   DEV-1501 test classes (TestDev1501HiddenFirstLastRender,
   TestDev1501BroadTriggerAndGuards, TestParameterizedAggCanonicalDistinct,
   TestTransformAmbiguousTimeDimension). One-line per call site (vs
   the prior 5-line ``tempfile +  YAMLStorage + save_datasource +
   save_model + SlayerQueryEngine`` block) — drops ~140 net lines of
   new-line duplication.

3939 unit tests pass; ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-mangles-transform-kwargs-in-filters-ntilex

DEV-1492: preserve transform kwargs in filter normalization
Codex MAJOR (slayer/sql/generator.py:6031+): the cross-model HAVING
``AggregateKey`` reconstruction in ``_render_filter_value_key_in_target_scope``
rerooted ``source`` to ``path=()`` but copied ``args`` / ``kwargs``
unchanged. The projection path (lines ~5685-5730) already handles all
three symmetrically via ``_reroot_kwarg``. Mirror that logic here so a
multi-hop routed filter
(``customers.regions.amount:last(customers.regions.opened_at) > 0``)
qualifies its time arg under the local CTE relation instead of the
host-rooted ``__``-path alias that doesn't exist inside the target CTE.

Inert for single-hop cases — both pre-fix and post-fix render
``customers.signed_up_at`` because ``_resolve_explicit_time_col``'s
``"__".join(path)`` happens to collapse to the same string when the
target relation has the same single hop. Real fix lands for multi-hop
shapes (``customers.regions.opened_at``) that the existing test suite
doesn't yet exercise — kept the fix mechanical (same shape as the
projection-path reroot) rather than introducing a multi-model test
fixture that exercises an unrelated routing concern.

3939 unit tests pass; ruff clean. Sonar gate is fully green
(duplication 0.9% < 3% threshold, 0 OPEN issues, 0 hotspots, no CI
failures) after round 8's ``_persist_and_engine`` refactor.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rst-with-different-explicit-time-columns

DEV-1501: parametric first/last in ORDER BY / HAVING — materialise + outer trim
…apper

A host aggregate whose Column.filter references a joined table
(loss_payment_amt:sum where loss_payment_amt has
filter="loss_payment.has_flag = 1") now hoists into its own _cm_ CTE
host-rooted variant of the cross-model aggregate strategy. Without
isolation, two such measures whose filter joins are different INNER
joins on different tables would intersect in the host base to only the
rows present in BOTH targets, silently corrupting both aggregates.

Planner: trigger predicate extended to fire on
column_filter_key.referenced_join_paths non-empty (not just
source.path non-empty). _plan_filtered_local builds a host-rooted
nested PlannedQuery and attaches via rerooted_plan /
rerooted_grain_pairs / rerooted_agg_slot_id with
cte_root_model = host.name. AGGREGATE-phase host filters referencing
the isolated aggregate are skipped (not passed to the sub-plan) so the
generator routes them to outer WHERE instead of HAVING-into-CTE (which
would surface host rows as NULL via LEFT JOIN instead of dropping
them).

Generator: _render_with_cross_model_plans now identifies AGGREGATE
filters walking an isolated AggregateKey, adds their ids to routed_ids
so _base skips them, and renders each via _render_filter_for_outer_
wrapper as plain WHERE on the combined non-aggregating SELECT,
substituting isolated AggregateKey refs with <_cm_>."<col>" and other
slot refs with _base."<alias>". Hidden operand promotion for mixed
filters (loss_payment_amt:sum > 1000 AND total_amount:sum > 10)
flows through the existing _add_local_aux_slots(aggregates_only=True)
pass.

Identity: SqlExprKey gains referenced_join_paths so the planner's
filtered-local trigger reads a typed field rather than re-parsing the
column filter SQL. compute_column_filter_join_paths (new
slayer/engine/column_filter_paths.py) is the planner-side discovery
helper, swallowing internal sqlglot scope-analyser failures so the
generator's parse-time security gate stays authoritative on
malicious payloads.

Tests: TestIsolatedFilteredMeasureCTEs xfails rewritten to assert
_cm_/_base/outer-WHERE structure; new tests/test_filtered_local_
isolation.py covers SqlExprKey.referenced_join_paths, trigger
dispatch (incl. cross-model-with-target-filter regression),
recursion suppression, host model filter interactions, first/last
sub-plan cleanliness.

Docs: Strategy 3 section in docs/architecture/cross-model-aggregates.md;
CLAUDE.md key-conventions bullet.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ardening

Correctness:
- Planner: fix date_range off-by-one in user-filter routing for the
  filtered-local sub-plan. host_filter_routings is ordered
  [date_range_routings..., user_filter_routings...]; slice the user
  portion as host_filters[-len(host_query.filters):] instead of looking
  up by f"f{i}" (off by n_date_range when a date_range time_dimension is
  present; Codex review).
- Generator: outer-WHERE wrapper now maps EVERY cross-model aggregate
  plan to its _cm_ CTE column (not just filtered-local with
  cte_root_model set). A mixed predicate like loss_payment_amt:sum +
  customers.revenue:sum > 100 routed to the wrapper via a filtered-local
  operand can now resolve the forward cross-model operand at the outer
  scope instead of raising NotImplementedError (CodeRabbit thread 2).

Test boundary assertions:
- POST-routing tests now assert layer boundaries: AGGREGATE predicate
  inside the combined base CTE WHERE, POST predicate inside the
  post-transform _filtered wrap, neither leaks into the other layer
  (CodeRabbit thread 4).
- TestIsolatedFilteredMeasureCTEs + the customers-JOIN-inside-_cm_ test
  switched from sql.index("\n)", start) manual slicing to the
  balanced-paren _extract_cte_body helper so nested ranked subqueries
  don't truncate the CTE body (CodeRabbit thread 3).

Hardening:
- SqlExprKey.referenced_join_paths gains a before-validator that
  canonicalises to a sorted, de-duplicated tuple of tuples — identity is
  now order-independent, callers can pass any iterable shape
  (CodeRabbit nitpick).
- column_filter_paths: self-qualified anchor-local derived refs
  (filter="orders.is_eu = 1" where orders is the anchor) now trigger
  the derived-expansion gate identically to the bare form
  (filter="is_eu = 1") so the cross-model path the expansion surfaces
  is recorded on the SqlExprKey (CodeRabbit thread 1).

Refactors:
- Shared root-scope joined-path walker hoisted into column_expansion.py
  as collect_root_scope_joined_paths; column_filter_paths and
  generator both call it instead of duplicating the alias-walk-over-joins
  loop (Sonar duplication clearance).
- compute_column_filter_join_paths splits its derived-expansion
  preamble into _expand_filter_sql_if_anchor_derived (S3776 cognitive
  complexity).
- IsolatedCteCrossModelPlanner.plan extracts the filtered-local trigger
  precondition + dispatch into _dispatch_filtered_local (S3776).
- _plan_filtered_local extracts the host-query-filter classification
  loop into the module-level _classify_subplan_filters helper (S3776).
- Test fixture: orders+customers+is_eu+eu_amount setup extracted to a
  shared _orders_with_derived_eu_filter helper across three derived-ref
  tests (Sonar duplication clearance).
- stage_planner.plan_query: NOSONAR(S3776) with reason — function's
  pre-existing complexity is owned by multi-stage scope/bundle/projection/
  filter-routing wiring tracked as a separate refactor.

Tests: full unit suite 4036 passed (was 4030). Ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Correctness:
- Composite projections over isolated filtered-local aggregates now
  render at the combined SELECT scope, not inline in _base. A measure
  like loss_payment_amt:sum + loss_reserve_amt:sum would otherwise pull
  BOTH filter-target INNER joins back into _base and intersect to rows
  in BOTH targets — silently corrupting both aggregates, the exact bug
  DEV-1503 set out to eliminate (Codex round 2 #1). Identification walks
  aggregate_slots + combined_expression_slots for ArithmeticKey /
  ScalarCallKey projections whose tree references any cross-model agg
  slot; rendering reuses _render_filter_for_outer_wrapper to substitute
  isolated AggregateKey → _cm_X."col" and other operands → _base."alias".
  Non-isolated agg operands are auto-promoted to hidden aux slots so
  _base materialises them.
- No-dim aggregate-only isolated queries no longer emit
  SELECT 1 AS _placeholder FROM <host> (N rows) cross-joined with a
  scalar _cm_* CTE (1 row) — N × 1 duplicated the aggregate by host row
  count. The placeholder now drops the host FROM, yielding the expected
  single-row scalar (Codex round 2 #2).

Tests:
- test_formula_over_isolated_measures strengthened: asserts _base body
  contains NEITHER Loss_Payment NOR Loss_Reserve joins, and the
  composite alias appears in the outer combined SELECT referencing both
  _cm_* CTE columns.
- test_base_not_empty_when_no_dims_all_measures_skipped strengthened:
  asserts the _base body does not reference the host table and carries
  the _placeholder literal.
- test_cross_model_dimension_count_distinct_in_formula and
  test_rerooted_cross_model_in_formula promoted from xfail(strict=True)
  to PASS — the DEV-1499 cross-model-in-composite case is the same shape
  the DEV-1503 fix now handles.

Sonar:
- collect_root_scope_joined_paths (column_expansion.py) extracts the
  per-column alias-walk into _resolve_alias_to_join_segments (S3776
  cognitive complexity).
- _plan_filtered_local (cross_model_planner.py) extracts grain-pair
  matching, sub_agg slot finding, and cte_schema build into module-level
  helpers (_match_filtered_local_grain_pairs,
  _find_filtered_local_sub_agg_slot, _build_filtered_local_cte_schema)
  (S3776).
- test_multi_hop_derived_filter_expands_inside_cm_cte converted from
  _re.search + manual sql.index("\n)") to _extract_cte_body — removes
  the remaining python:S5852 ReDoS hotspot.

Tests: full unit suite 4038 passed (was 4036). Ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Generator (slayer/sql/generator.py:_render_with_cross_model_plans):
- outer_composite_slot_ids scan now also walks slots referenced by
  planned_query.order, not just planned_query.projection. A hidden
  ORDER BY composite over isolated filtered-local aggregates was
  previously falling through to the order-only local path and
  rendering inline in _base, re-pulling filter-target joins into the
  host spine (Codex round 3 #1).
- Outer-routed composite slots projected multiple times (C13
  multi-alias same-key consolidation) cycle through cslot.public_aliases
  per emission and APPEND into combined_aliases_by_slot_id, rather than
  always emitting public_aliases[0] and overwriting the dict slot
  (CodeRabbit thread on alias cycling).
- Order-only outer composites materialise as hidden combined-SELECT
  columns under the slot's declared_name so ORDER BY can resolve via
  the synthesised alias.

_build_combined_order_by_sql:
- New ``outer_composite_aliases`` kwarg maps each outer-routed
  composite slot id to the alias the combined SELECT projects it
  under. ORDER BY references those aliases bare (the projection
  belongs to the combined SELECT, not _base) — previously they would
  emit ``_base."<alias>"`` against a column that doesn't exist there
  (Codex round 3 #2).

Tests:
- test_order_by_projected_composite_over_isolated_resolves_at_combined
  pins the ORDER-BY-alias-at-combined-scope invariant: the composite
  must not leak into _base AND the ORDER BY must not reference _base.
- test_outer_composite_with_multiple_user_aliases pins C13 alias
  cycling: the same composite formula declared under two user names
  surfaces both aliases in the outer SELECT.

Tests: 4040 passed (was 4038). Ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Correctness (Codex round 4 / CodeRabbit):
- empty_base placeholder now re-introduces the host FROM + WHERE +
  LIMIT 1 when ANY host-local ROW filter exists (filter id not in
  routed_ids and phase=ROW). The round-2 fix that dropped the host
  FROM to avoid N-row CROSS JOIN duplication also bypassed
  _build_where_having_from_planned, silently ignoring host-local ROW
  filters on no-dim queries with forward cross-model aggregates (e.g.
  customers.revenue:sum with filters=["status = 'active'"]).
  LIMIT 1 preserves the 1-row cardinality fix from round 2; WHERE
  applies the filter so the combined query returns 0 rows when no
  host row matches (correct semantics) and 1 row otherwise. When NO
  host-local filter exists, the placeholder stays bare (no FROM, no
  WHERE, no LIMIT) — matching round 2 for the unfiltered fast path.

Sonar:
- _build_combined_order_by_sql (generator.py:6528) extracts per-entry
  alias resolution into _resolve_combined_order_term — S3776 cognitive
  complexity 20 → 15.

Tests:
- test_no_dim_query_with_host_row_filter_applies_in_base pins the
  WHERE + LIMIT 1 invariant in the empty_base placeholder for a no-dim
  query with a host ROW filter.

Tests: 4041 passed (was 4040). Ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Correctness (Codex round 5):
- _build_first_last_base_select now accepts skip_filter_ids and forwards
  it to _build_where_having_from_planned. The cross-model path passes
  routed_ids to _build_base_select_for_planned, but the first/last branch
  was calling the inner helper without forwarding the set — a filter
  routed to a per-plan _cm_ CTE was double-applied: once inside the host
  ranked subquery's WHERE, once inside the _cm_ CTE. For a filter
  referencing a join path not in _base's FROM, the host-side rendering
  would emit invalid SQL.

Test:
- test_local_first_last_with_routed_cross_model_filter pins the routing:
  a query mixing total_amount:last (local) + loss_payment.has_flag:sum
  (forward cross-model) + a filter routed via PROPAGATE_HAVING asserts
  the filter literal appears only inside the _cm_ CTE body, never in
  the host _base ranked subquery.

Tests: 4042 passed (was 4041). Ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Correctness (Codex round 6):
- The round-4 empty_base placeholder fix added FROM <host> + WHERE +
  LIMIT 1 when a host-local ROW filter exists, but bypassed
  _collect_filter_join_paths / _build_from_and_joins. A filter
  referencing a joined alias (e.g. filters=["claim.claim_number =
  '12345'"]) therefore emitted WHERE claim.claim_number = ... against a
  _base that had no Claim join — invalid SQL ("missing FROM-clause
  entry"). The empty_base branch now walks non-routed filters via
  _collect_filter_join_paths, threads the resulting paths through
  _build_from_and_joins, and attaches the LEFT JOINs to the placeholder
  before the WHERE.

Test:
- test_no_dim_host_filter_referencing_joined_column_pulls_join pins
  the join-discovery: a no-dim query with a host ROW filter on a
  joined column asserts the Claim join appears in _base alongside the
  filter literal AND LIMIT 1.

Tests: 4043 passed (was 4042). Ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Correctness (Codex round 7):
- column_filter_paths.compute_column_filter_join_paths parsed every
  Column.filter SQL with hard-coded dialect="postgres" and returned ()
  on parse failure. For a non-Postgres datasource using dialect-specific
  syntax (MySQL backticks, T-SQL square brackets, ClickHouse-specific
  functions), the Postgres parse could fail silently — the planner then
  saw no referenced join paths, the DEV-1503 trigger missed, and the
  generator (dialect-aware) rendered the filter inline in _base, pulling
  the cross-model join back into the host rowset and silently corrupting
  the aggregate. The helper now walks a small dialect fallback chain
  (Postgres → sqlglot default → MySQL → BigQuery → T-SQL) and only
  returns () when every dialect rejects the input. The injection security
  gate (UNION SELECT, etc.) is unchanged because malformed payloads still
  fail every dialect.

Test:
- test_dialect_fallback_chain_recovers_mysql_backtick_filter pins the
  recovery: a backtick-quoted joined alias ref ("loss_payment"."has_flag"
  = 1) surfaces ("loss_payment",) in referenced_join_paths.

Tests: 4044 passed (was 4043). Ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Correctness:
- _render_with_cross_model_plans now honours where_consumed from the
  first/last base-rendering path: when the ranked subquery consumed
  WHERE, we no longer re-apply base_where on the outer _base SELECT.
  Double-application changed first/last semantics (the outer WHERE
  filtered AFTER ranking) and could dangle joined-column aliases on the
  outer scope (CodeRabbit thread).
- Order-only outer composites (composite slots referenced by ORDER BY
  but absent from planned_query.projection) now render INLINE inside
  the combined ORDER BY rather than materialising as a hidden
  combined-SELECT column. The old shape leaked the synthetic alias as
  an extra user-visible result column on the no-transform path (Codex
  round 8) AND lost the slot in the cross-model transform chain's
  combined_aliases_by_slot_id carry-forward dict, breaking ORDER BY
  resolution on the chain output (CodeRabbit thread). A new
  outer_composite_order_expressions map carries the rendered SQL for
  each order-only outer composite; _build_combined_order_by_sql emits
  <expr> <direction> bare for those slots.

Test:
- test_first_last_with_host_filter_not_reapplied_outside_ranked_subquery
  pins the where_consumed gate: a query mixing total_amount:last
  (local first/last) + loss_payment.has_flag:sum (forward cross-model)
  + filters=["status = 'active'"] asserts the filter literal appears
  exactly once in _base — inside the ranked subquery.

Tests: 4045 passed (was 4044). Ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Correctness (Codex round 9):
- The round-7 dialect fallback chain covered the outer
  sqlglot.parse_one call and the re-parse after derived expansion, but
  expand_derived_refs_sync itself was still pinned to Postgres. A
  derived column whose Column.sql uses dialect-specific syntax
  (MySQL backticks, BigQuery struct literals, etc.) would fail the
  inner parse silently, dropping the join path the DEV-1503 trigger
  needs. A new _expand_derived_refs_any_dialect helper walks the same
  chain (excluding the None sentinel since expand_derived_refs_sync
  requires a dialect string) and returns the first non-None expansion.

Test:
- test_dialect_fallback_chain_recovers_derived_ref_with_backticks pins
  the recovery: a derived column whose sql uses
  ``CASE WHEN `customers`.`region` = 'EU'`` (MySQL backticks)
  referenced by a Column.filter still surfaces the (customers,) join
  path. Without the wrapper the helper falls back to Postgres only
  and returns ().

Tests: 4046 passed (was 4045). Ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…cks-isolated-cte-handling-for-cross-model

DEV-1503: filtered-local isolation on typed pipeline + outer-WHERE wrapper
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/engine/column_filter_paths.py`:
- Around line 57-63: The dialect chain _PLANNER_PARSE_DIALECT_CHAIN is missing
the "clickhouse" dialect which causes compute_column_filter_join_paths (and
Column.filter parsing) to fail to detect ClickHouse-specific predicates; add
"clickhouse" into the tuple (e.g., include "clickhouse" among "postgres", None,
"mysql", "bigquery", "tsql") so SQLGlot will try ClickHouse parsing and preserve
ClickHouse-only join-path semantics. Ensure the ordering keeps permissive None
in place and run unit tests that exerciseClickHouse-specific filters to verify
parsing now returns expected join paths.

In `@slayer/engine/cross_model_planner.py`:
- Around line 518-528: The current code slices host_query.filters and assumes
alignment with host_filters, which breaks when stage_planner.plan_query()
dedupes filters; instead, stop slicing host_query.filters and build
sub_filter_texts from the host_filters list itself using each
HostFilterRouting's text field: iterate over host_filters (the list of
HostFilterRouting), skip routings whose phase is Phase.POST or Phase.AGGREGATE,
and append routing.text (or routing.text if named differently) to
sub_filter_texts only when that text is set; also ensure
stage_planner.plan_query()/stage planning code populates HostFilterRouting.text
for the deduped/surviving user-filter routings so cross_model_planner.consume
the same deduped texts.

In `@tests/integration/test_integration.py`:
- Around line 3539-3548: Test fixture's tie-break is weak because natural status
order could match the expected result even if the secondary ORDER BY is ignored;
update the "rows" fixture (variable rows in the failing test) so the natural
alphabetical order of status would be opposite the intended tie-break (e.g.,
swap the created_at/updated_at values so status "B" would come before "A" if no
secondary key is applied), or add an explicit dry-run assertion in the same test
that verifies two distinct sort keys were produced for the groups (i.e., assert
that the secondary sort key values differ for the tied groups) to make the
secondary-sort failure impossible to miss; adjust the rows data and/or add the
assertion near the rows definition used later in the test to exercise the
secondary ORDER BY.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c4fd0735-6d19-40f1-b9ce-511f7178e7f3

📥 Commits

Reviewing files that changed from the base of the PR and between 93bc7cd and e381429.

📒 Files selected for processing (21)
  • CLAUDE.md
  • docs/architecture/cross-model-aggregates.md
  • docs/architecture/planning.md
  • docs/architecture/sql-generation.md
  • slayer/core/keys.py
  • slayer/engine/binding.py
  • slayer/engine/column_expansion.py
  • slayer/engine/column_filter_paths.py
  • slayer/engine/cross_model_planner.py
  • slayer/engine/planned.py
  • slayer/engine/planning.py
  • slayer/engine/stage_planner.py
  • slayer/engine/syntax.py
  • slayer/sql/generator.py
  • tests/integration/test_integration.py
  • tests/test_filtered_local_isolation.py
  • tests/test_keys.py
  • tests/test_projection_planner.py
  • tests/test_projection_trim.py
  • tests/test_sql_generator.py
  • tests/test_syntax.py
✅ Files skipped from review due to trivial changes (2)
  • CLAUDE.md
  • docs/architecture/sql-generation.md

Comment on lines +57 to +63
_PLANNER_PARSE_DIALECT_CHAIN: Tuple[Optional[str], ...] = (
"postgres",
None, # sqlglot's permissive default — accepts ANSI SQL broadly
"mysql",
"bigquery",
"tsql",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does sqlglot 30.0.0 support a ClickHouse dialect named clickhouse, and does ClickHouse-specific SQL require selecting that dialect instead of relying on postgres, default, mysql, bigquery, or tsql parsing?

💡 Result:

Yes, SQLGlot 30.0.0 includes official, built-in support for a ClickHouse dialect [1][2][3]. To correctly parse or generate ClickHouse-specific SQL, you must specify clickhouse as the dialect (e.g., read="clickhouse" for parsing or dialect="clickhouse" for generation) [4][5]. Relying on other dialects like postgres, mysql, bigquery, or tsql will likely result in parsing errors or incorrect SQL generation because SQLGlot uses these specific dialect configurations to handle ClickHouse's unique features, such as aggregate combinators (e.g., countIf), specific join types, non-nullable types, and distinct identifier escaping rules [6]. If no dialect is specified, SQLGlot defaults to its own internal "SQLGlot dialect," which acts as a superset of supported dialects and may not correctly interpret or output valid syntax for ClickHouse-specific constructs [5]. Using the official clickhouse dialect ensures that the parser and generator are configured with the correct metadata, tokenization rules, and syntax handling required for ClickHouse [7][6].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the file around the reported lines
sed -n '1,200p' slayer/engine/column_filter_paths.py | cat -n

# Locate and inspect compute_column_filter_join_paths usage/definition
rg -n "compute_column_filter_join_paths" slayer/engine/column_filter_paths.py
rg -n "_PLANNER_PARSE_DIALECT_CHAIN" slayer/engine/column_filter_paths.py

# Print the function body if present
python3 - <<'PY'
import re, itertools, sys, pathlib
path = pathlib.Path("slayer/engine/column_filter_paths.py")
txt = path.read_text()
m = re.search(r"def\s+compute_column_filter_join_paths\s*\([^)]*\)\s*:", txt)
print("found_def:", bool(m))
PY

# If function exists, show a narrower excerpt around it
line=$(rg -n "def compute_column_filter_join_paths" slayer/engine/column_filter_paths.py | head -n1 | cut -d: -f1 || true)
if [ -n "${line:-}" ]; then
  start=$((line-40))
  end=$((line+160))
  sed -n "${start},${end}p" slayer/engine/column_filter_paths.py | cat -n
fi

Repository: MotleyAI/slayer

Length of output: 14511


Add ClickHouse to _PLANNER_PARSE_DIALECT_CHAIN

slayer/engine/column_filter_paths.py (lines 57-63) omits the clickhouse dialect even though the file explicitly calls out ClickHouse-specific Column.filter syntax; compute_column_filter_join_paths() uses this chain for dialect-aware parsing, so ClickHouse-only predicates that don’t parse under postgres/mysql/bigquery/tsql (or the permissive None default) drop to () and skip DEV-1503 join-path isolation/semantics on a supported ClickHouse backend. SQLGlot’s clickhouse dialect should be included to parse ClickHouse constructs correctly.

Suggested change
 _PLANNER_PARSE_DIALECT_CHAIN: Tuple[Optional[str], ...] = (
     "postgres",
     None,        # sqlglot's permissive default — accepts ANSI SQL broadly
     "mysql",
+    "clickhouse",
     "bigquery",
     "tsql",
 )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/engine/column_filter_paths.py` around lines 57 - 63, The dialect chain
_PLANNER_PARSE_DIALECT_CHAIN is missing the "clickhouse" dialect which causes
compute_column_filter_join_paths (and Column.filter parsing) to fail to detect
ClickHouse-specific predicates; add "clickhouse" into the tuple (e.g., include
"clickhouse" among "postgres", None, "mysql", "bigquery", "tsql") so SQLGlot
will try ClickHouse parsing and preserve ClickHouse-only join-path semantics.
Ensure the ordering keeps permissive None in place and run unit tests that
exerciseClickHouse-specific filters to verify parsing now returns expected join
paths.

Comment on lines +518 to +528
user_query_filters = host_query.filters or []
if not user_query_filters:
return None
user_routings = list(host_filters[-len(user_query_filters):])
sub_filter_texts: List[str] = []
for routing, filt in zip(user_routings, user_query_filters):
if routing.phase in (Phase.POST, Phase.AGGREGATE):
continue
# ROW phase — propagate.
sub_filter_texts.append(filt)
return sub_filter_texts or None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't recover sub-plan filter text by slicing host_query.filters.

stage_planner.plan_query() dedupes user filters before building host_filters, so len(host_query.filters) is no longer guaranteed to match the routed entries here. With a date_range plus duplicate/alias-equivalent user filters, this slice can mis-pair phases and texts, which lets an AGGREGATE filter be propagated into the DEV-1503 sub-plan as if it were ROW-phase.

Suggested fix
-    user_query_filters = host_query.filters or []
-    if not user_query_filters:
+    user_routings = [
+        routing for routing in host_filters
+        if routing.text is not None
+    ]
+    if not user_routings:
         return None
-    user_routings = list(host_filters[-len(user_query_filters):])
     sub_filter_texts: List[str] = []
-    for routing, filt in zip(user_routings, user_query_filters):
+    for routing in user_routings:
         if routing.phase in (Phase.POST, Phase.AGGREGATE):
             continue
-        # ROW phase — propagate.
-        sub_filter_texts.append(filt)
+        sub_filter_texts.append(routing.text)
     return sub_filter_texts or None

And in slayer/engine/stage_planner.py, populate HostFilterRouting.text for the surviving user-filter routings so this function consumes the same deduped list the planner actually kept.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/engine/cross_model_planner.py` around lines 518 - 528, The current
code slices host_query.filters and assumes alignment with host_filters, which
breaks when stage_planner.plan_query() dedupes filters; instead, stop slicing
host_query.filters and build sub_filter_texts from the host_filters list itself
using each HostFilterRouting's text field: iterate over host_filters (the list
of HostFilterRouting), skip routings whose phase is Phase.POST or
Phase.AGGREGATE, and append routing.text (or routing.text if named differently)
to sub_filter_texts only when that text is set; also ensure
stage_planner.plan_query()/stage planning code populates HostFilterRouting.text
for the deduped/surviving user-filter routings so cross_model_planner.consume
the same deduped texts.

Comment on lines +3539 to +3548
# Per-group last(created_at) is tied at 50 across A and B; the
# secondary ORDER BY last(updated_at) ASC must break the tie with
# A (10) before B (99).
rows = [
# Status A: last(created_at)=50, last(updated_at)=10.
(1, "A", 50.0, "2025-03-01", "2025-01-01"),
(2, "A", 10.0, "2025-02-01", "2025-04-01"),
# Status B: last(created_at)=50 (tied), last(updated_at)=99.
(3, "B", 50.0, "2025-03-01", "2025-02-01"),
(4, "B", 99.0, "2025-01-15", "2025-04-01"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make the secondary sort failure impossible to miss.

This fixture can still pass if the second ORDER BY key is ignored, because tied groups may naturally come back as "A", then "B". That makes the DEV-1501 regression weakly covered here. Flip the data so the expected result contradicts status ordering, or add a dry-run assertion that two distinct sort keys were generated.

Suggested way to make the tie-break observable
-    # Per-group last(created_at) is tied at 50 across A and B; the
-    # secondary ORDER BY last(updated_at) ASC must break the tie with
-    # A (10) before B (99).
+    # Per-group last(created_at) is tied at 50 across A and B; the
+    # secondary ORDER BY last(updated_at) ASC must break the tie with
+    # B (10) before A (99), so the expected order now contradicts the
+    # natural status ordering.
     rows = [
-        # Status A: last(created_at)=50, last(updated_at)=10.
+        # Status A: last(created_at)=50, last(updated_at)=99.
         (1, "A", 50.0, "2025-03-01", "2025-01-01"),
-        (2, "A", 10.0, "2025-02-01", "2025-04-01"),
-        # Status B: last(created_at)=50 (tied), last(updated_at)=99.
+        (2, "A", 99.0, "2025-02-01", "2025-04-01"),
+        # Status B: last(created_at)=50 (tied), last(updated_at)=10.
         (3, "B", 50.0, "2025-03-01", "2025-02-01"),
-        (4, "B", 99.0, "2025-01-15", "2025-04-01"),
+        (4, "B", 10.0, "2025-01-15", "2025-04-01"),
     ]
@@
-    # Two rows, both count 2; A before B because primary lc is tied (50)
-    # and secondary lu ASC puts A (lu=10) before B (lu=99).
+    # Two rows, both count 2; B before A because primary lc is tied (50)
+    # and secondary lu ASC puts B (lu=10) before A (lu=99).
     assert len(response.data) == 2, response.data
-    assert response.data[0]["orders.status"] == "A", response.data
-    assert response.data[1]["orders.status"] == "B", response.data
+    assert response.data[0]["orders.status"] == "B", response.data
+    assert response.data[1]["orders.status"] == "A", response.data

Also applies to: 3590-3596

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/test_integration.py` around lines 3539 - 3548, Test
fixture's tie-break is weak because natural status order could match the
expected result even if the secondary ORDER BY is ignored; update the "rows"
fixture (variable rows in the failing test) so the natural alphabetical order of
status would be opposite the intended tie-break (e.g., swap the
created_at/updated_at values so status "B" would come before "A" if no secondary
key is applied), or add an explicit dry-run assertion in the same test that
verifies two distinct sort keys were produced for the groups (i.e., assert that
the secondary sort key values differ for the tied groups) to make the
secondary-sort failure impossible to miss; adjust the rows data and/or add the
assertion near the rows definition used later in the test to exercise the
secondary ORDER BY.

ZmeiGorynych and others added 6 commits June 3, 2026 19:58
In the typed pipeline, an AGGREGATE slot whose source is a derived
(ColumnSqlKey) column whose Column.sql contains a __-delimited
join-path alias (e.g. customers__regions.population) never had the
implied LEFT JOINs added to the host base FROM — the aggregate body
emitted SUM(customers__regions.population) referencing an undefined
table alias. The dimension / time-dimension cases got this right
under DEV-1484; the Column.filter case got it right under DEV-1494;
the agg-source case was the symmetric remaining gap.

Fix: add _collect_aggregate_source_join_paths next to
_collect_column_filter_join_paths. Walks AGGREGATE-phase slots,
recursing through ArithmeticKey / ScalarCallKey composite keys.
For each AggregateKey with a ColumnSqlKey source and source.path == (),
expand the column's sql via _expand_derived_column_sql and scan
through _joined_paths_in_sql, appending discovered paths to
needed_join_paths. Wire into _build_base_select_for_planned next to
the existing filter-side collector — single wire-up covers regular
AGGREGATE and local first/last (the ranked-subquery builder inherits
from_clause / base_joins from the same call). Cross-model agg
sources (source.path \!= ()) are skipped; their _cm_* CTE owns its
own discovery (gap tracked as DEV-1526; xfail in place).

The render-time agg-body expansion in
_build_agg_render_spec_from_planned already produced correct
SUM(<expanded>) SQL — only the join-discovery side was missing.

Tests: un-xfail the existing tracking test + strengthen its body
assertion; new TestMeasureSourceSqlJoinInference class covers
single-hop / multi-hop, ArithmeticKey + ScalarCallKey composites,
Mode-A function wrappers around the path alias, sibling derived
chains, local last with derived source (ranked-subquery shape
asserted), cumsum-wrapped path-aliased measure (POST-phase aux
materialization), multiple agg sources sharing one join (dedupe),
dim+measure sharing one join (dedupe), filter+source crossing
different joins (DEV-1494/1502 coexistence), no-path-no-extra-join
sanity, defensive cross-model skip with positive _cm_ assertion.
Two strict xfails pin the follow-up gaps (DEV-1526 cross-model CTE
source-sql gap, DEV-1527 parametric agg derived-kwarg gap). One
SQLite integration test seeds orders → customers → regions and
verifies region_pop:sum returns 350.0 end-to-end.

Docs: paragraph in sql-generation.md listing the three symmetric
host-base discovery sources (DEV-1484 dim, DEV-1494 filter, DEV-1502
agg-source).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
SQLite returns REAL aggregates as floats; strict == 350.0 trips
python:S1244 and is the right call to relax (the gate is currently
red on new_reliability_rating=3 because of this one finding).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CodeRabbit caught that `"regions" not in _join_aliases(sql)` does an
exact-match on the alias set — but the realistic leak shape from a
misbehaving host-side collector walking through customers would be
the path alias `customers__regions`, which the bare-name check
silently misses. Assert both alias shapes so the defensive test
actually pins the regression it was meant to guard.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The new host-base discovery section claimed cross-model aggregates
own their own join discovery in the _cm_* CTE without qualification.
True for the Column.filter side (DEV-1494 / DEV-1503), but the
symmetric source-Column.sql case is the open DEV-1526 gap. Codex
flagged the unqualified statement as misleading future readers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codex MAJOR caught that the test asserts dry-run SQL substrings but
the emitted SQL fails at RUNTIME — the inner ranked subquery has the
LEFT JOINs (DEV-1502 pulled them in), but the outer SELECT's MAX(CASE
WHEN _last_rn = 1 THEN <expr> END) still references the cross-table-
qualified expression directly, which is out of scope outside the
subquery (only the source_relation alias is).

Confirmed via SQLite execution: `no such column: customers__regions.
payment_amount`. Pre-existing bug for ANY derived first/last source
crossing a join (dotted form too, DEV-1410 territory); DEV-1502 just
unmasked it for the __ alias case by getting the joins emitted at
all. Pre-DEV-1502 this failed at "no JOINs"; post-DEV-1502 it fails
at "outer scope missing".

Convert the test to a strict xfail tracking DEV-1531; the assertion
now pins the END STATE (the outer SELECT must not reference the
cross-table ref directly) rather than just substring presence.
Auto-promotes when DEV-1531's materialisation fix lands.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-path-alias-in-a-measures-columnsql-doesnt

DEV-1502: pull joins crossed by an aggregate source's Column.sql
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant