Skip to content

Quote model-name identifiers so reserved-keyword model names work#146

Open
whimo wants to merge 1 commit into
mainfrom
whimo/fix-reserved-keyword-model-names
Open

Quote model-name identifiers so reserved-keyword model names work#146
whimo wants to merge 1 commit into
mainfrom
whimo/fix-reserved-keyword-model-names

Conversation

@whimo
Copy link
Copy Markdown
Member

@whimo whimo commented May 26, 2026

Problem

A model whose name is a reserved SQL keyword (a customer source table Grant → model grant; also order, user, select) was emitted as an unquoted identifier — both as the FROM alias and as every column qualifier. sqlglot's default generator does not quote reserved keywords, so the database rejected the query:

PostgresSyntaxError: syntax error at or near "grant"
[SQL: SELECT grant ."idempotencyKey" AS "grant.idempotencyKey" FROM "Grant" AS grant ...]

Two failure surfaces:

  • A (fatal render): unquoted grant alias/qualifier → DB syntax error.
  • B (parse corruption): a model-qualified string like grant."col" re-parsed mid-pipeline makes sqlglot.parse_one fall back to a Command statement parse, leaking the tell-tale grant . spacing into the emitted SQL.

Found via Storyline shadow-compare ingestion (a Grant table). Tracking doc: BUG-reserved-keyword-identifiers.md.

Fix

Quote model-name identifiers unconditionally at construction time — fixes A directly and B transitively ("grant"."col" re-parses cleanly). Every model-name identifier routes through one helper, _model_identifier (exp.to_identifier(name, quoted=True)), so alias definitions and qualifier references quote identically (matters on case-folding dialects like Snowflake). .sql(identify=True) was rejected — it doesn't fix surface B and over-quotes.

Sites covered: _build_from_clause, _resolve_sql, measure qualifiers, the cross-model CTE, all join-target aliases, the ranked subquery (incl. SELECT <model>.*), column_expansion, and the filter/join-condition builders in enrichment.py.

Cross-dialect: raw-SQL-string sites that get re-parsed (join conditions, ranked-subquery .* /JOIN, WHERE qualifier) can't hardcode "MySQL/Spark/Databricks quote with backticks, where "grant" is a string literal. Those use quote_ident_for(name, dialect). The rerooted-filter qualifier (query_engine.py) stays bare because it's re-parsed by the Python-AST parse_filter (where "x" is a string literal); it's quoted later in resolve_filter_columns. The join-path-discovery regex (_TABLE_COL_RE) is now quote-tolerant.

Tests

  • New tests/test_reserved_keyword_quoting.py: grant/order/user/select × 6 dialects (incl. MySQL backticks) — asserts quoted form, clean parse, and no Command fallback warning (captured on the sqlglot logger).
  • DuckDB integration (in-process, real execution): reserved-keyword table with a mixed-case column, filter, join, and cross-model measure.
  • Postgres integration: parallel cases (run in CI).
  • Existing SQL-snapshot assertions updated for the quoted output (orders.status"orders".status; backticks for MySQL).

Full unit suite: no new failures vs. baseline. ruff clean. CLAUDE.md documents the convention.

Summary by CodeRabbit

  • Bug Fixes

    • Models named with reserved SQL keywords (e.g., "grant", "order") now generate valid, dialect-quoted identifiers and execute without parse/SQL errors.
  • Documentation

    • Added guidance describing the reserved-keyword identifier quoting convention and expected behavior.
  • Tests

    • Added and extended regression tests across dialects (DuckDB, PostgreSQL, others) to verify quoting, parsing, and absence of parsing-fallback warnings.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

📝 Walkthrough

<review_stack_artifact_start >
<stack_title>Reserved-keyword identifier quoting fix</stack_title>
<stack_summary>SLayer now consistently quotes model-derived SQL identifiers (aliases and column qualifiers) to avoid syntax errors and sqlglot parsing fallbacks for reserved-keyword model names.</stack_summary>

Fixes unquoted identifier emission for reserved SQL keyword model names across SQL generation, enrichment, column expansion, and tests. Bug report and convention doc describing failure surfaces, root cause, reproduction, and the quoting convention for model-derived identifiers. range_0b187e5b2cec range_656bed667b37 range_20bdc53d9f46 range_d7296ef6b4bf range_211f350b1bb3 range_1e7d6e6b33ed range_b0e2074ccb90 Updates enrichment (`_TABLE_COL_RE`, JOIN predicate formatting, filter column resolution) to quote alias prefixes; column-expansion rewrites canonical aliases using the model-identifier helper; query-engine comment clarifies bare remapping for later reparse/quoting. range_7f34d46b57ba range_8d607f187df9 range_31755f668977 range_4abee2ce33ca range_1d31103d40a4 range_d0da089c939b range_e09120fb3482 range_5f0d7fae5e26 Adds centralized quoting helpers: `_model_identifier()`, `quote_ident_for()`, and `_qualify_in_string()` to produce always-quoted sqlglot identifiers and dialect-aware qualifier strings for raw-SQL embedding. range_7aacd8604d56 range_01193533796b Replaces unquoted alias/qualification with `_model_identifier` and `_qualify_in_string` across CTEs, window joins, FROM/JOIN assembly, ranked-subquery aliasing, aggregate resolution, and WHERE/HAVING qualification. range_b73a08da0606 range_3310f09a8930 range_6b3ee74d0fb4 range_3232719ba75d range_aa91b662ae86 range_e73235f48dfe range_0926f74d205a range_63eeaf03e301 range_40ba99d288b1 range_2392cad79db8 range_67112a3c03c1 Adds DuckDB and PostgreSQL fixtures/models using quoted reserved table names and mixed-case columns; verifies queries, joins, and cross-model measures execute with correct quoting. range_35205b0d6d82 range_94aebefc4c65 range_9c819159703e range_1a4d2adc47e1 range_b2c5d5ee1140 range_ca2685b56748 New regression suite verifying dialect-quoted reserved-model identifiers, sqlglot parseability, and absence of the sqlglot `Command` fallback warning across dimensions, filters, joins, and ranked-subqueries. range_3fd203a94a9e range_f0ec2fbf01be range_f45c5d922db2 range_ca7b2fa2dfe3 range_1ce04a4245c4 range_bc0963418716 range_9cfa09050c02 range_2a7c17a5986f Updates test expectations to reflect dialect-appropriate quoted identifiers in aggregates, percentiles, statistical functions, filter wrapping, string translations, and derived-column inlining. range_1e12e7459310 range_cbd7b0696aeb range_e859e189d345 range_4105f86f7373 range_9c3874353b24 range_2829e126d715 range_78cf3b78793c range_57aed421cf1d range_3e7245f7b79a range_c33aeedfcd29 range_a36053c09814 range_3dc02ee259ce range_ffae39382afc range_5eff6bbb819f range_074b66d6b8e6 range_240b57767d44 range_46b594bc4972 range_324f3f6ca2f1 range_a0884b4dc25a range_a67d77a210b3 range_b559ed2ad10c range_04bd7bfc01bd range_0bd20d76c52d range_63bcac6f65b8 range_c0c58b322333 range_79e30db898f4 range_a20fbe281660 range_19372866907b range_e5ef856e088f range_66a2ebe5857a range_7e04c8472603 range_d3e2986d2817 range_1869c02e902f range_90c47352f126 range_66b843c2714c range_55b4361794f5 range_d1f0ccd5077d range_e446e65f688c range_5d7bd5d6980f range_201d5c51b672 range_a32b7b31e5f4 range_5fb5008eb892 range_8f6328574dff range_0ec125565b2f range_5272f10f677b range_74dc7c13b58b range_255f0a4796db range_a03eae0c9ebc range_c4bd796455c7 range_313fa810e634 range_71a58cb65fca range_fe52e116ed58 range_f4852609d3b7 range_3b1c2980ca22 range_4e7ba99dab4f range_eeb3f2a1e61f range_fbecf53883d3
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.94% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding quote handling for model-name identifiers to support reserved SQL keywords.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch whimo/fix-reserved-keyword-model-names

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
slayer/engine/enrichment.py (1)

2471-2473: ⚡ Quick win

Use keyword arguments for multi-parameter calls to quote_ident_for.

These newly added calls pass 2 parameters positionally; repo policy requires keyword args for functions with more than one parameter.

As per coding guidelines: "**/*.py: Use keyword arguments for functions with more than 1 parameter".

Also applies to: 2722-2722, 2813-2813, 2893-2893

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/engine/enrichment.py` around lines 2471 - 2473, The calls to
quote_ident_for use positional arguments; change them to use keyword arguments
(e.g., quote_ident_for(identifier=..., dialect=...)) wherever quote_ident_for is
called with more than one parameter (specifically the occurrences using
current_alias/hop_alias and dialect at the locations referencing current_alias
and hop_alias, and the other noted occurrences around lines 2722, 2813, 2893).
Update each call to pass identifier=<alias> and dialect=<dialect> (or whatever
the parameter names are) so all multi-parameter invocations follow the repo
policy.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@BUG-reserved-keyword-identifiers.md`:
- Around line 28-39: The fenced code block in
BUG-reserved-keyword-identifiers.md lacks a language specifier and triggers
MD040; update the opening fence for the block that starts with
'grant."idempotencyKey"' to include a language token such as text or log (e.g.,
replace ``` with ```text) so markdownlint passes; locate the block by searching
for the exact snippet 'grant."idempotencyKey"' and modify only the opening
fence.

In `@slayer/engine/enrichment.py`:
- Around line 56-61: The _TABLE_COL_RE regex only allows optional double-quote
quoting and therefore misses identifiers quoted with backticks or square
brackets; update the pattern used by _TABLE_COL_RE (referenced by
_collect_needed_paths) to accept optional " or ` or [...] quoting around both
table and column identifiers and still capture the bare identifier names (so
groups remain unquoted), e.g. by allowing an optional prefix/suffix of " or ` or
\[...\] and extracting the inner identifier for both captured groups; ensure the
pattern still enforces identifier rules (letters/underscore followed by word
chars).

---

Nitpick comments:
In `@slayer/engine/enrichment.py`:
- Around line 2471-2473: The calls to quote_ident_for use positional arguments;
change them to use keyword arguments (e.g., quote_ident_for(identifier=...,
dialect=...)) wherever quote_ident_for is called with more than one parameter
(specifically the occurrences using current_alias/hop_alias and dialect at the
locations referencing current_alias and hop_alias, and the other noted
occurrences around lines 2722, 2813, 2893). Update each call to pass
identifier=<alias> and dialect=<dialect> (or whatever the parameter names are)
so all multi-parameter invocations follow the repo policy.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a00d907a-f7f1-4194-bf0d-b9ec2970c8e8

📥 Commits

Reviewing files that changed from the base of the PR and between 9d1ff63 and 537f9d3.

📒 Files selected for processing (12)
  • BUG-reserved-keyword-identifiers.md
  • CLAUDE.md
  • slayer/engine/column_expansion.py
  • slayer/engine/enrichment.py
  • slayer/engine/query_engine.py
  • slayer/sql/generator.py
  • tests/integration/test_integration_duckdb.py
  • tests/integration/test_integration_postgres.py
  • tests/test_aggregation_gating.py
  • tests/test_cross_model_derived_columns.py
  • tests/test_reserved_keyword_quoting.py
  • tests/test_sql_generator.py

Comment on lines +28 to +39
```
'grant."idempotencyKey"' contains unsupported syntax. Falling back to parsing as a 'Command'.
shadow_compare: load_data shadow backend raised ProgrammingError:
(sqlalchemy.dialects.postgresql.asyncpg.ProgrammingError)
<class 'asyncpg.exceptions.PostgresSyntaxError'>: syntax error at or near "grant"
[SQL: SELECT
grant ."idempotencyKey" AS "grant.idempotencyKey"
FROM "Grant" AS grant
GROUP BY
grant ."idempotencyKey"
LIMIT 21]
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language to this fenced code block to satisfy markdownlint.

This block currently triggers MD040 (fenced-code-language). Add text (or log) after the opening fence.

Suggested patch
-```
+```text
 'grant."idempotencyKey"' contains unsupported syntax. Falling back to parsing as a 'Command'.
 shadow_compare: load_data shadow backend raised ProgrammingError:
 (sqlalchemy.dialects.postgresql.asyncpg.ProgrammingError)
 <class 'asyncpg.exceptions.PostgresSyntaxError'>: syntax error at or near "grant"
 [SQL: SELECT
   grant ."idempotencyKey" AS "grant.idempotencyKey"
 FROM "Grant" AS grant
 GROUP BY
   grant ."idempotencyKey"
 LIMIT 21]
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>

[warning] 28-28: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @BUG-reserved-keyword-identifiers.md around lines 28 - 39, The fenced code
block in BUG-reserved-keyword-identifiers.md lacks a language specifier and
triggers MD040; update the opening fence for the block that starts with
'grant."idempotencyKey"' to include a language token such as text or log (e.g.,
replace withtext) so markdownlint passes; locate the block by searching
for the exact snippet 'grant."idempotencyKey"' and modify only the opening
fence.


</details>

<!-- fingerprinting:phantom:poseidon:hawk -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment thread slayer/engine/enrichment.py Outdated
Comment on lines +56 to +61
# Matches a ``<table>.<col>`` reference used for join-path discovery. Each
# identifier may be wrapped in double quotes (expanded ``Column.sql`` qualifies
# its table alias via ``_model_identifier`` → ``"customers__regions".population``
# so reserved-keyword model names survive); the optional ``"?`` consumes those
# quotes without capturing them, so the captured groups stay bare.
_TABLE_COL_RE = re.compile(r'"?\b([a-zA-Z_]\w*)\b"?\."?\b([a-zA-Z_]\w*)\b"?')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

_TABLE_COL_RE misses non-double-quoted identifiers, which can hide required join paths.

This regex only handles optional " quoting. In dialects emitting backticks/brackets, table-qualified refs in expanded SQL/filters won’t be detected, so _collect_needed_paths can miss joins and generate invalid SQL.

💡 Suggested fix
-_TABLE_COL_RE = re.compile(r'"?\b([a-zA-Z_]\w*)\b"?\."?\b([a-zA-Z_]\w*)\b"?')
+_TABLE_COL_RE = re.compile(
+    r'(?:`|"|\[)?\b([a-zA-Z_]\w*)\b(?:`|"|\])?'
+    r'\.'
+    r'(?:`|"|\[)?\b([a-zA-Z_]\w*)\b(?:`|"|\])?'
+)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/engine/enrichment.py` around lines 56 - 61, The _TABLE_COL_RE regex
only allows optional double-quote quoting and therefore misses identifiers
quoted with backticks or square brackets; update the pattern used by
_TABLE_COL_RE (referenced by _collect_needed_paths) to accept optional " or ` or
[...] quoting around both table and column identifiers and still capture the
bare identifier names (so groups remain unquoted), e.g. by allowing an optional
prefix/suffix of " or ` or \[...\] and extracting the inner identifier for both
captured groups; ensure the pattern still enforces identifier rules
(letters/underscore followed by word chars).

A model whose name is a reserved SQL keyword (a source table `Grant`
becomes model `grant`; also `order`/`user`/`select`) was emitted as an
UNQUOTED identifier — both as the FROM alias and as every column
qualifier. sqlglot's default generator does not quote reserved keywords,
so the database rejected the statement with a syntax error. A second
surface: a model-qualified string like `grant."col"` re-parsed mid
pipeline made `sqlglot.parse_one` fall back to a `Command` statement
parse, leaking a `grant .` spacing into the emitted SQL.

Route every model-name identifier through a single `_model_identifier`
helper (`exp.to_identifier(name, quoted=True)`), so alias definitions and
qualifier references quote identically (matters on case-folding dialects
like Snowflake). The handful of raw-SQL-string sites that get re-parsed
(join conditions, the ranked-subquery `SELECT <model>.*` and JOIN, the
WHERE qualifier) quote the alias with the dialect's real quote char via
`quote_ident_for` — a hardcoded `"` is a string literal on MySQL/Spark.
The rerooted-filter qualifier stays bare (it is re-parsed by the
Python-AST `parse_filter`, where `"x"` is a string literal) and is quoted
later in `resolve_filter_columns`. The join-path-discovery regex is now
quote-tolerant.

Tests: new tests/test_reserved_keyword_quoting.py (reserved keywords x 6
dialects, incl. MySQL backticks — quoted form, clean parse, no Command
fallback warning); DuckDB + Postgres integration cases (reserved-keyword
table with mixed-case column, filter, join, cross-model measure);
existing SQL-snapshot assertions updated for the quoted output.
@whimo whimo force-pushed the whimo/fix-reserved-keyword-model-names branch from 537f9d3 to 759e680 Compare May 26, 2026 15:35
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
5.8% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
BUG-reserved-keyword-identifiers.md (1)

28-39: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language specifier to fenced code block.

The fenced code block lacks a language identifier, triggering markdownlint MD040.

📝 Proposed fix
-```
+```text
 'grant."idempotencyKey"' contains unsupported syntax. Falling back to parsing as a 'Command'.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@BUG-reserved-keyword-identifiers.md` around lines 28 - 39, The fenced code
block showing the SQL error (the block starting with 'grant."idempotencyKey"'
and the following SQL/stack trace) is missing a language identifier which
triggers markdownlint MD040; update the fence opening from ``` to ```text (or
another appropriate language like ```sql if you prefer) so the block begins with
```text and keep the existing block contents unchanged to satisfy the linter and
preserve the example.
🧹 Nitpick comments (2)
slayer/engine/enrichment.py (1)

2463-2464: ⚡ Quick win

Use keyword arguments for quote_ident_for(...) calls.

These multi-parameter calls are currently positional in production code; switch to keyword form for repo consistency.

Suggested change
-                src_q = quote_ident_for(current_alias, dialect)
-                tgt_q = quote_ident_for(hop_alias, dialect)
+                src_q = quote_ident_for(name=current_alias, dialect=dialect)
+                tgt_q = quote_ident_for(name=hop_alias, dialect=dialect)
...
-                        repl_sql = f"{quote_ident_for(model_name, dialect)}.{sql_expr}"
+                        repl_sql = f"{quote_ident_for(name=model_name, dialect=dialect)}.{sql_expr}"
...
-                            repl_sql = f"{quote_ident_for(table_alias, dialect)}.{sql_expr}"
+                            repl_sql = f"{quote_ident_for(name=table_alias, dialect=dialect)}.{sql_expr}"
...
-                            repl_sql = f"{quote_ident_for(model.name, dialect)}.{stage_col.name}"
+                            repl_sql = f"{quote_ident_for(name=model.name, dialect=dialect)}.{stage_col.name}"

As per coding guidelines **/*.py: Use keyword arguments for functions with more than 1 parameter.

Also applies to: 2710-2710, 2801-2801, 2881-2881

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/engine/enrichment.py` around lines 2463 - 2464, The calls to
quote_ident_for are using positional arguments; update them to use keyword
arguments (e.g., quote_ident_for(identifier=current_alias, dialect=dialect)) for
the instances in this file (specifically where current_alias/hop_alias and
dialect are passed to quote_ident_for), and apply the same change at the other
occurrences mentioned (near the uses involving current_alias, hop_alias at the
other locations). Ensure you only rename the call signature to use identifier=
and dialect= keywords without changing variable names or behavior.
slayer/sql/generator.py (1)

72-79: ⚡ Quick win

Use keyword arguments for multi-parameter helper calls in production code.

There are new positional calls to helpers with multiple parameters; switch these to keyword style for consistency with repo rules.

♻️ Suggested change
-    return f"{quote_ident_for(alias, dialect)}.{col}"
+    return f"{quote_ident_for(name=alias, dialect=dialect)}.{col}"
@@
-        parts = [f"{quote_ident_for(model, self.dialect)}.*"]
+        parts = [f"{quote_ident_for(name=model, dialect=self.dialect)}.*"]
@@
-                            _qualify_in_string(model, col_name, self.dialect),
+                            _qualify_in_string(alias=model, col=col_name, dialect=self.dialect),

As per coding guidelines, "**/*.py: Use keyword arguments for functions with more than 1 parameter."

Also applies to: 1980-1980, 2596-2596

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/sql/generator.py` around lines 72 - 79, The _qualify_in_string helper
currently calls quote_ident_for(alias, dialect) using positional arguments;
change this to use keyword arguments to follow the repo rule for multi-parameter
helpers—call quote_ident_for(name=alias, dialect=dialect) (or the exact
parameter names of quote_ident_for) and similarly update any other positional
uses of quote_ident_for (and other multi-parameter helpers referenced at
locations like the ones called around lines 1980 and 2596) to use explicit
keyword arguments so the intention is clear and consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/sql/generator.py`:
- Around line 2082-2084: The current code builds JOIN fragments by concatenating
strings from enriched.resolved_joins and quote_ident_for, which violates the
AST-first rule; replace that logic to construct sqlglot AST nodes instead: for
each (target_table, target_alias, join_cond, jtype) in enriched.resolved_joins
create a sqlglot.exp.Table (or sqlglot.exp.Identifier) for target_table, wrap
with sqlglot.exp.Alias for target_alias, build the join condition as a sqlglot
expression (parse or compose from existing expression nodes) and assemble a
sqlglot.exp.Join with the appropriate join type (jtype.upper()) and
.on(join_cond_expr); then attach these Join nodes to the main query AST rather
than emitting raw strings and stop using quote_ident_for for JOIN clauses so
dialect/escaping is handled by sqlglot.
- Around line 64-69: The function quote_ident_for currently builds an identifier
directly via exp.to_identifier instead of reusing the canonical
_model_identifier helper; change quote_ident_for to call _model_identifier(name)
and then return its SQL text with the requested dialect (e.g.
_model_identifier(name).sql(dialect=dialect or "postgres")) so all
model/alias/column quoting follows the single _model_identifier contract used
elsewhere.

---

Duplicate comments:
In `@BUG-reserved-keyword-identifiers.md`:
- Around line 28-39: The fenced code block showing the SQL error (the block
starting with 'grant."idempotencyKey"' and the following SQL/stack trace) is
missing a language identifier which triggers markdownlint MD040; update the
fence opening from ``` to ```text (or another appropriate language like ```sql
if you prefer) so the block begins with ```text and keep the existing block
contents unchanged to satisfy the linter and preserve the example.

---

Nitpick comments:
In `@slayer/engine/enrichment.py`:
- Around line 2463-2464: The calls to quote_ident_for are using positional
arguments; update them to use keyword arguments (e.g.,
quote_ident_for(identifier=current_alias, dialect=dialect)) for the instances in
this file (specifically where current_alias/hop_alias and dialect are passed to
quote_ident_for), and apply the same change at the other occurrences mentioned
(near the uses involving current_alias, hop_alias at the other locations).
Ensure you only rename the call signature to use identifier= and dialect=
keywords without changing variable names or behavior.

In `@slayer/sql/generator.py`:
- Around line 72-79: The _qualify_in_string helper currently calls
quote_ident_for(alias, dialect) using positional arguments; change this to use
keyword arguments to follow the repo rule for multi-parameter helpers—call
quote_ident_for(name=alias, dialect=dialect) (or the exact parameter names of
quote_ident_for) and similarly update any other positional uses of
quote_ident_for (and other multi-parameter helpers referenced at locations like
the ones called around lines 1980 and 2596) to use explicit keyword arguments so
the intention is clear and consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ef0d9be3-7b84-412b-a5a4-d9bac8fb9cb2

📥 Commits

Reviewing files that changed from the base of the PR and between 537f9d3 and 759e680.

📒 Files selected for processing (12)
  • BUG-reserved-keyword-identifiers.md
  • CLAUDE.md
  • slayer/engine/column_expansion.py
  • slayer/engine/enrichment.py
  • slayer/engine/query_engine.py
  • slayer/sql/generator.py
  • tests/integration/test_integration_duckdb.py
  • tests/integration/test_integration_postgres.py
  • tests/test_aggregation_gating.py
  • tests/test_cross_model_derived_columns.py
  • tests/test_reserved_keyword_quoting.py
  • tests/test_sql_generator.py
✅ Files skipped from review due to trivial changes (2)
  • slayer/engine/query_engine.py
  • CLAUDE.md

Comment thread slayer/sql/generator.py
Comment on lines +64 to +69
def quote_ident_for(name: str, dialect: Optional[str]) -> str:
"""Quoted-identifier text for ``name`` in ``dialect`` (double quotes / MySQL
backticks / T-SQL brackets). For raw-string sites re-parsed with the same
dialect — a hardcoded ``"`` is a string literal on backtick dialects.
"""
return exp.to_identifier(name, quoted=True).sql(dialect=dialect or "postgres")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use _model_identifier() inside quote_ident_for() to keep one quoting contract.

This helper currently reimplements identifier creation instead of routing through _model_identifier, which can drift behavior over time.

♻️ Suggested change
 def quote_ident_for(name: str, dialect: Optional[str]) -> str:
@@
-    return exp.to_identifier(name, quoted=True).sql(dialect=dialect or "postgres")
+    return _model_identifier(name).sql(dialect=dialect or "postgres")

As per coding guidelines, "Every SLayer model name / FROM alias / column qualifier must be emitted as a quoted identifier using the _model_identifier(name) helper."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/sql/generator.py` around lines 64 - 69, The function quote_ident_for
currently builds an identifier directly via exp.to_identifier instead of reusing
the canonical _model_identifier helper; change quote_ident_for to call
_model_identifier(name) and then return its SQL text with the requested dialect
(e.g. _model_identifier(name).sql(dialect=dialect or "postgres")) so all
model/alias/column quoting follows the single _model_identifier contract used
elsewhere.

Comment thread slayer/sql/generator.py
Comment on lines +2082 to 2084
f"{jtype.upper()} JOIN {target_table} AS "
f"{quote_ident_for(target_alias, self.dialect)} ON {join_cond}"
for target_table, target_alias, join_cond, jtype in enriched.resolved_joins
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Avoid raw JOIN string assembly in SQL generation.

Building JOIN fragments with string concatenation here bypasses the AST-first contract and can reintroduce dialect/escaping edge cases.

♻️ Suggested direction
-            join_sql_parts = [
-                f"{jtype.upper()} JOIN {target_table} AS "
-                f"{quote_ident_for(target_alias, self.dialect)} ON {join_cond}"
-                for target_table, target_alias, join_cond, jtype in enriched.resolved_joins
-            ]
-            ranked_sql += " " + " ".join(join_sql_parts)
+            parsed_ranked = self._parse(ranked_sql)
+            if isinstance(parsed_ranked, exp.Select):
+                for target_table, target_alias, join_cond, jtype in enriched.resolved_joins:
+                    join_target = exp.to_table(target_table, alias=_model_identifier(target_alias))
+                    parsed_ranked = parsed_ranked.join(
+                        join_target,
+                        on=self._parse_predicate(join_cond),
+                        join_type=jtype.upper(),
+                    )
+                ranked_sql = parsed_ranked.sql(dialect=self.dialect)

As per coding guidelines, "slayer/sql/**/*.py: Use sqlglot AST building (not string concatenation) for SQL generation."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/sql/generator.py` around lines 2082 - 2084, The current code builds
JOIN fragments by concatenating strings from enriched.resolved_joins and
quote_ident_for, which violates the AST-first rule; replace that logic to
construct sqlglot AST nodes instead: for each (target_table, target_alias,
join_cond, jtype) in enriched.resolved_joins create a sqlglot.exp.Table (or
sqlglot.exp.Identifier) for target_table, wrap with sqlglot.exp.Alias for
target_alias, build the join condition as a sqlglot expression (parse or compose
from existing expression nodes) and assemble a sqlglot.exp.Join with the
appropriate join type (jtype.upper()) and .on(join_cond_expr); then attach these
Join nodes to the main query AST rather than emitting raw strings and stop using
quote_ident_for for JOIN clauses so dialect/escaping is handled by sqlglot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant