fix(valuediff): normalise column case before adapter.quote() for Snowflake (DRC-3464) by iamcxa · Pull Request #1388 · DataRecce/recce

iamcxa · 2026-05-21T09:45:59Z

Summary

value_diff was failing on Snowflake with SQL compilation error: invalid identifier '"customer_id"'. The check params carry lowercase column names (from the dbt manifest convention), but Snowflake folds unquoted DDL identifiers to uppercase (CUSTOMER_ID). Calling adapter.quote("customer_id") produced the case-sensitive '"customer_id"' identifier, which Snowflake cannot resolve against its physical uppercase column.

Root cause

Two failure sites in recce/tasks/valuediff.py, both invoked from execute():

_verify_primary_key composite path (when primary_key is a list) — builds inline Jinja that calls adapter.quote(col) over the raw user-supplied list.
_query_value_diff — calls adapter.quote(field) / adapter.quote(column_to_compare) on the surrogate-key and column-comparison clauses.

The scalar primary_key path of _verify_primary_key uses adapter.dispatch('test_unique', 'dbt') which renders {{ column_name }} verbatim (unquoted) and so passes Snowflake case-insensitively. Same pattern in ValueDiffDetailTask. Other check tasks (profile_diff, histogram_diff, top_k_diff, row_count_diff, query_diff, schema_diff) are unaffected — profile_diff accidentally avoids the bug by sourcing column names from get_columns() (uppercase on Snowflake) rather than from check params.

Fix

Add a ValueDiffMixin._build_column_case_lookup() helper that calls dbt_adapter.get_columns() (base + current relations) and returns a {lower(physical_name): physical_name} lookup. _normalise_identifier() maps a user-supplied lowercase identifier to its physical catalog case, falling back to the original string when the column is unknown (preserves current behaviour for unknown columns instead of masking errors).

In ValueDiffTask.execute() and ValueDiffDetailTask.execute(), the lookup is built FIRST and the primary_key (and columns list when present) is normalised BEFORE _verify_primary_key() is called — so both _verify_primary_key's composite path AND _query_value_diff's adapter.quote() calls receive the physical catalog case. Mirrors the get_columns() pattern already used by profile_diff (recce/tasks/profile.py:222).

Tests

recce/tests/tasks/test_valuediff.py:

TestBuildColumnCaseLookup — 4 unit tests for the new helper (case-insensitive lookup, base+current relation merge, unknown-identifier passthrough, exception fallback).
test_value_diff_snowflake_uppercase_columns + test_value_diff_detail_snowflake_uppercase_columns — scalar primary_key on Snowflake: mock get_columns() returns uppercase columns; capture executed SQL; assert "CUSTOMER_ID" present, "customer_id" absent.
test_value_diff_snowflake_composite_primary_key + test_value_diff_detail_snowflake_composite_primary_key — composite primary_key on Snowflake; assert the SQL executed by _verify_primary_key's composite path contains uppercase quoted identifiers.

Pre-fix rigor (rebuild verified independently):

Revert valuediff.py to before either fix commit → all 4 Snowflake regression tests FAIL with verbatim AssertionError: '"CUSTOMER_ID"' in 'with a_query as (select "customer_id", ...)'.
Restore the fix → all 21 valuediff tests PASS in 4.64s.

Full tasks suite (pytest tests/tasks/ -q): 116 passed, 1 pre-existing portalocker warning (unrelated).

Audit trail

Investigated through DataRecce's internal debug-flywheel workflow as case 019. Stages: reproduce → discover → hypothesize → verify → fix-and-harden (cycle 1 after codex review caught the composite path gap on the first fix attempt).

Linear: DRC-3464

…flake On Snowflake with default quoting config, physical columns are stored uppercase (e.g. CUSTOMER_ID). user-supplied check params follow the dbt manifest convention (lowercase, e.g. customer_id). Passing them directly to adapter.quote() produces '"customer_id"' — a case-sensitive identifier that Snowflake cannot resolve, raising: SQL compilation error: invalid identifier '"customer_id"' Fix: add ValueDiffMixin._build_column_case_lookup() that calls get_columns() and builds a {lower(physical_name): physical_name} lookup. Before any adapter.quote() call in ValueDiffTask._query_value_diff and ValueDiffDetailTask._query_value_diff, each user-supplied identifier is mapped through the lookup so adapter.quote() receives the physical name ("CUSTOMER_ID") and produces a valid quoted identifier. Pass-through for identifiers not found in the catalog preserves existing behaviour for unknown columns rather than masking errors. Mirrors the pattern profile_diff already uses via get_columns(). Regression tests: test_value_diff_snowflake_uppercase_columns and test_value_diff_detail_snowflake_uppercase_columns patch get_columns() to return uppercase names (simulating Snowflake) and assert the generated SQL contains '"CUSTOMER_ID"' and NOT '"customer_id"'. Closes DRC-3464. Signed-off-by: Kent <kentchen@reccehq.com> Signed-off-by: Kent <iamcxa@gmail.com>

…y_key (DRC-3464 cycle 1) When primary_key is a list, execute() now builds the column-case lookup and normalises keys BEFORE calling _verify_primary_key, so the composite path's adapter.quote() calls receive physical catalog names (e.g. CUSTOMER_ID on Snowflake) instead of user-supplied lowercase identifiers. Also adds regression tests test_value_diff_snowflake_composite_primary_key and test_value_diff_detail_snowflake_composite_primary_key that patch dbt_adapter.adapter.execute (the low-level adapter used by _verify_primary_key) to capture and assert the generated SQL uses uppercase quoted identifiers. PRE-FIX: both tests FAIL (sql contains lowercase quoted identifiers) POST-FIX: both tests PASS (sql contains uppercase quoted identifiers) Signed-off-by: Kent <iamcxa@gmail.com>

Copilot

Pull request overview

Fixes Snowflake value_diff failures caused by quoting lowercase identifiers (from dbt manifest conventions) against physically uppercase columns, by normalizing user-supplied column identifiers to catalog/physical case before adapter.quote() is applied.

Changes:

Added a catalog-driven {lower(physical): physical} lookup helper and identifier normalization in ValueDiffTask and ValueDiffDetailTask.
Normalized primary_key before _verify_primary_key() to fix the composite primary-key quoting path.
Added Snowflake-focused regression tests that patch get_columns() to return uppercase columns and assert generated SQL uses uppercase quoted identifiers.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`recce/tasks/valuediff.py`	Adds catalog case lookup + normalizes PK/columns before quoting to avoid Snowflake invalid identifier errors.
`tests/tasks/test_valuediff.py`	Adds unit tests for the lookup/normalization helpers and Snowflake regression tests capturing executed SQL.

Comments suppressed due to low confidence (1)

recce/tasks/valuediff.py:431

As in ValueDiffTask._query_value_diff, case_lookup is built unconditionally (two get_columns() macro executions) even when columns is None and the function already calls get_columns() again to compute common columns. Since value_diff can run frequently and get_columns() may hit INFORMATION_SCHEMA, consider avoiding redundant catalog calls by building the lookup only when normalising user-supplied identifiers, or reusing the already-fetched column lists to construct the lookup.

        # Normalise user-supplied identifiers to physical catalog names before quoting.
        # On Snowflake (default quoting config), physical columns are stored uppercase;
        # adapter.quote("customer_id") → '"customer_id"' which Snowflake cannot resolve.
        case_lookup = self._build_column_case_lookup(dbt_adapter, model)

        if columns is None or len(columns) == 0:
            base_columns = [column.column for column in dbt_adapter.get_columns(model, base=True)]
            curr_columns = [column.column for column in dbt_adapter.get_columns(model, base=False)]
            columns = [column for column in base_columns if column in curr_columns]
        else:
            columns = [self._normalise_identifier(c, case_lookup) for c in columns]

iamcxa · 2026-05-21T14:04:10Z

+        # Build a {lower(physical_name): physical_name} lookup from the catalog so that
+        # user-supplied lowercase identifiers (e.g. "customer_id") are mapped to the
+        # physical name the database actually stores (e.g. "CUSTOMER_ID" on Snowflake).
+        # This must happen before get_columns() is called for the columns-list path so
+        # that the lookup covers both branches uniformly.
+        case_lookup = self._build_column_case_lookup(dbt_adapter, model)
+
        if columns is None or len(columns) == 0:
            base_columns = [column.column for column in dbt_adapter.get_columns(model, base=True)]
            curr_columns = [column.column for column in dbt_adapter.get_columns(model, base=False)]
            columns = [column for column in base_columns if column in curr_columns]
+        else:


Fixed in bc1fcb0: added optional case_lookup parameter to both _query_value_diff() methods; execute() now builds the lookup once and passes it down, reducing get_columns() calls from 4× to 2× per value_diff run.

iamcxa · 2026-05-21T14:04:16Z

+            # _query_value_diff rebuilds case_lookup internally; passing the already-
+            # normalised primary_key here is a defensive duplicate that keeps both
+            # paths consistent and avoids a second get_columns() round-trip on the
+            # happy path (the internal rebuild still runs for the columns list).


Fixed in bc1fcb0: the misleading comment at line 346 has been removed. The comment claimed the internal rebuild was still running; now that case_lookup is passed through from execute(), the comment is no longer needed and was deleted.

codecov · 2026-05-21T09:53:31Z

Codecov Report

❌ Patch coverage is 98.51852% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
recce/tasks/valuediff.py	93.93%	2 Missing ⚠️

Files with missing lines	Coverage Δ
tests/tasks/test_valuediff.py	`99.64% <100.00%> (+0.19%)`	⬆️
recce/tasks/valuediff.py	`81.93% <93.93%> (+1.73%)`	⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…umns() (DRC-3464 PR #1388 review) Signed-off-by: Kent <iamcxa@gmail.com>

iamcxa added 2 commits May 20, 2026 14:44

iamcxa self-assigned this May 21, 2026

iamcxa requested a review from Copilot May 21, 2026 09:48

Copilot started reviewing on behalf of iamcxa May 21, 2026 09:48 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

refactor(valuediff): pass case_lookup down to avoid redundant get_col…

bc1fcb0

…umns() (DRC-3464 PR #1388 review) Signed-off-by: Kent <iamcxa@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(valuediff): normalise column case before adapter.quote() for Snowflake (DRC-3464)#1388

fix(valuediff): normalise column case before adapter.quote() for Snowflake (DRC-3464)#1388
iamcxa wants to merge 3 commits into
mainfrom
fix/drc-3464-value-diff-snowflake-quoting

iamcxa commented May 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

iamcxa May 21, 2026

Uh oh!

iamcxa May 21, 2026

Uh oh!

codecov Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iamcxa commented May 21, 2026

Summary

Root cause

Fix

Tests

Audit trail

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

iamcxa May 21, 2026

Choose a reason for hiding this comment

Uh oh!

iamcxa May 21, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 21, 2026 •

edited

Loading