fix(tabular): raise SQLSyntaxError on unparseable SQL by DinaLaptii · Pull Request #2111 · NVIDIA/NeMo-Retriever

DinaLaptii · 2026-05-25T12:23:39Z

Previously extract_tables_and_columns swallowed sqlglot parse/token errors and returned an empty ExtractionResult, making truly invalid SQL indistinguishable from valid SQL that referenced no known tables. Both reached callers as the same silent "no tables found" signal, so the SQL-validation agent reported success on syntactically broken SQL.

Introduce a dedicated SQLSyntaxError(ValueError) and raise it when sqlglot rejects the input. Valid-but-unresolved SQL still returns the empty extraction as before, so callers can classify the two outcomes.

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

Previously `extract_tables_and_columns` swallowed sqlglot parse/token errors and returned an empty `ExtractionResult`, making truly invalid SQL indistinguishable from valid SQL that referenced no known tables. Both reached callers as the same silent "no tables found" signal, so the SQL-validation agent reported success on syntactically broken SQL. Introduce a dedicated `SQLSyntaxError(ValueError)` and raise it when sqlglot rejects the input. Valid-but-unresolved SQL still returns the empty extraction as before, so callers can classify the two outcomes.

greptile-apps · 2026-05-25T12:27:05Z

Greptile Summary

This PR fixes a silent-failure bug in extract_tables_and_columns: previously, sqlglot parse/token errors were swallowed and returned as an empty ExtractionResult, making invalid SQL indistinguishable from valid SQL that references no known tables — causing the SQL-validation agent to report success on broken input. A new SQLSyntaxError(ValueError) is now raised on parse failure, allowing callers (notably _sql_parse_validation) to correctly classify the two outcomes.

sqlglot_extractor.py: narrows the broad except Exception to except (ParseError, TokenError) and re-raises as SQLSyntaxError, a dedicated ValueError subclass with a clear docstring.
test_sqlglot_extractor.py: converts three returns-empty assertions to pytest.raises(SQLSyntaxError); imports pytest and SQLSyntaxError; one new join-section test is a duplicate of an existing table-section test.

Confidence Score: 5/5

Safe to merge; the change is narrow, well-tested, and the new exception propagates correctly through all existing callers.

The implementation is correct: ParseError and TokenError are the right sqlglot exception types to catch, SQLSyntaxError extends ValueError so existing broad except Exception boundaries in _sql_parse_validation and parse_queries_df still handle it, and valid-but-unresolved SQL continues to return an empty ExtractionResult without raising. The only finding is a duplicate test (the new test_empty_sql_raises_for_join_extraction duplicates test_empty_sql_raises_syntax_error), which does not affect correctness.

No files require special attention.

Important Files Changed

Filename	Overview
nemo_retriever/src/nemo_retriever/tabular_data/ingestion/parsers/sqlglot_extractor.py	Introduces `SQLSyntaxError(ValueError)` and narrows the parse-failure catch from `except Exception` to `except (ParseError, TokenError)`, re-raising as the new type; valid-but-empty extraction still returns normally.
nemo_retriever/tests/test_sqlglot_extractor.py	Converts two previously empty-return assertions to `pytest.raises(SQLSyntaxError)`, imports `SQLSyntaxError` and `pytest`; one new join-section test is an exact duplicate of a test in the table section.

Sequence Diagram

sequenceDiagram
    participant Agent as SQLValidationAgent
    participant PQS as parse_query_single
    participant PQL as parse_query_slim
    participant ETC as extract_tables_and_columns
    participant SG as sqlglot.parse_one

    Agent->>PQS: parse_query_single(sql, dialect, schemas)
    PQS->>PQL: parse_query_slim(sql, query_obj, ...)
    PQL->>ETC: extract_tables_and_columns(sql, dialect, ...)
    ETC->>SG: "parse_one(sql, dialect=dialect)"

    alt Valid SQL
        SG-->>ETC: Expression
        ETC-->>PQL: ExtractionResult
        PQL-->>PQS: True (tables found) / False (no tables)
        PQS-->>Agent: Query or None
        Agent-->>Agent: "result success = True"
    else Syntactically broken SQL
        SG-->>ETC: raises ParseError / TokenError
        ETC-->>PQL: raises SQLSyntaxError (NEW)
        PQL-->>PQS: raises SQLSyntaxError
        PQS-->>Agent: raises SQLSyntaxError
        Agent-->>Agent: "result error = str(err) was silently success before"
    end

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
nemo_retriever/tests/test_sqlglot_extractor.py:535-538
**Duplicate test adds no coverage**

`test_empty_sql_raises_for_join_extraction` calls `extract_tables_and_columns("", ...)` and asserts `SQLSyntaxError` — exactly what `test_empty_sql_raises_syntax_error` (line 229) already tests. The original `test_empty_sql_returns_no_joins` was valuable because it checked join-specific output; replacing it with a duplicate of an existing test leaves the join section with no positive coverage for the no-join happy path (e.g. valid SQL with no `JOIN` clause should still return `result.joins == []`).

_{Reviews (2): Last reviewed commit: "test(tabular): assert SQLSyntaxError on ..." | Re-trigger Greptile}

greptile-apps · 2026-05-25T12:27:12Z

+class SQLSyntaxError(ValueError):
+    """Raised when ``sqlglot`` cannot parse the SQL for the given dialect.
+
+    Distinguishes pure syntax/tokenization failures from later resolution
+    issues (e.g. unknown tables, schema mismatches), so callers can classify
+    validation outcomes precisely instead of treating an unparseable query
+    the same as a parseable query that references no known tables.
+    """


SQLSyntaxError not re-exported for callers

The PR's stated goal is to let callers "classify the two outcomes" — but SQLSyntaxError is only defined here in the implementation module. The parsers __init__.py is empty, and neither queries.py nor sql_parse_validation.py imports or re-exports it. Any caller that wants to catch SQLSyntaxError by name (rather than the broad except Exception used today) must know to import from nemo_retriever.tabular_data.ingestion.parsers.sqlglot_extractor directly, making the new type effectively invisible in the public surface. Adding it to the parsers __init__.py would make it discoverable.

Rule Used: Every public class and function in nemo_retriever ... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/tabular_data/ingestion/parsers/sqlglot_extractor.py Line: 35-42 Comment: **`SQLSyntaxError` not re-exported for callers** The PR's stated goal is to let callers "classify the two outcomes" — but `SQLSyntaxError` is only defined here in the implementation module. The parsers `__init__.py` is empty, and neither `queries.py` nor `sql_parse_validation.py` imports or re-exports it. Any caller that wants to catch `SQLSyntaxError` by name (rather than the broad `except Exception` used today) must know to import from `nemo_retriever.tabular_data.ingestion.parsers.sqlglot_extractor` directly, making the new type effectively invisible in the public surface. Adding it to the parsers `__init__.py` would make it discoverable. **Rule Used:** Every public class and function in nemo_retriever ... ([source](https://app.greptile.com/review/custom-context?memory=public-api-contract)) How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-25T12:27:13Z

    try:
        statement = sqlglot.parse_one(sql, dialect=dialect)
-    except Exception:
-        return ExtractionResult()
+    except (ParseError, TokenError) as err:
+        raise SQLSyntaxError(str(err)) from err


No tests for the new exception path

test_sqlglot_extractor.py tests happy-path extraction but has no test covering the new SQLSyntaxError raise. Per the test-coverage-new-code rule, new error paths must have corresponding unit tests — at a minimum one test that passes broken SQL (e.g. "SELECT FROM") and asserts pytest.raises(SQLSyntaxError), and one that confirms valid SQL with no matching tables still returns an empty ExtractionResult without raising.

Rule Used: New functionality must include corresponding unit ... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/tabular_data/ingestion/parsers/sqlglot_extractor.py Line: 518-521 Comment: **No tests for the new exception path** `test_sqlglot_extractor.py` tests happy-path extraction but has no test covering the new `SQLSyntaxError` raise. Per the `test-coverage-new-code` rule, new error paths must have corresponding unit tests — at a minimum one test that passes broken SQL (e.g. `"SELECT FROM"`) and asserts `pytest.raises(SQLSyntaxError)`, and one that confirms valid SQL with no matching tables still returns an empty `ExtractionResult` without raising. **Rule Used:** New functionality must include corresponding unit ... ([source](https://app.greptile.com/review/custom-context?memory=test-coverage-new-code)) How can I resolve this? If you propose a fix, please make it concise.

Update sqlglot_extractor tests to match the new contract introduced in the previous commit: empty or unparseable SQL now raises SQLSyntaxError instead of silently returning an empty ExtractionResult, so the two "silent empty result" assertions become pytest.raises checks.

jioffe502

per liav approval

DinaLaptii requested review from a team as code owners May 25, 2026 12:23

DinaLaptii requested a review from drobison00 May 25, 2026 12:23

greptile-apps Bot reviewed May 25, 2026

View reviewed changes

liavnave approved these changes May 27, 2026

View reviewed changes

jioffe502 approved these changes May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tabular): raise SQLSyntaxError on unparseable SQL#2111

fix(tabular): raise SQLSyntaxError on unparseable SQL#2111
DinaLaptii wants to merge 2 commits into
NVIDIA:mainfrom
ftatiana-nv:fix/add-exceptions-to-sql-validation

DinaLaptii commented May 25, 2026

Uh oh!

greptile-apps Bot commented May 25, 2026 •

edited

Loading

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 25, 2026

Uh oh!

greptile-apps Bot May 25, 2026

Uh oh!

jioffe502 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DinaLaptii commented May 25, 2026

Description

Checklist

Uh oh!

greptile-apps Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

jioffe502 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps Bot commented May 25, 2026 •

edited

Loading