Skip to content

fix(tabular): raise SQLSyntaxError on unparseable SQL#2111

Open
DinaLaptii wants to merge 2 commits into
NVIDIA:mainfrom
ftatiana-nv:fix/add-exceptions-to-sql-validation
Open

fix(tabular): raise SQLSyntaxError on unparseable SQL#2111
DinaLaptii wants to merge 2 commits into
NVIDIA:mainfrom
ftatiana-nv:fix/add-exceptions-to-sql-validation

Conversation

@DinaLaptii
Copy link
Copy Markdown
Contributor

Previously extract_tables_and_columns swallowed sqlglot parse/token errors and returned an empty ExtractionResult, making truly invalid SQL indistinguishable from valid SQL that referenced no known tables. Both reached callers as the same silent "no tables found" signal, so the SQL-validation agent reported success on syntactically broken SQL.

Introduce a dedicated SQLSyntaxError(ValueError) and raise it when sqlglot rejects the input. Valid-but-unresolved SQL still returns the empty extraction as before, so callers can classify the two outcomes.

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

Previously `extract_tables_and_columns` swallowed sqlglot parse/token
errors and returned an empty `ExtractionResult`, making truly invalid
SQL indistinguishable from valid SQL that referenced no known tables.
Both reached callers as the same silent "no tables found" signal, so
the SQL-validation agent reported success on syntactically broken SQL.

Introduce a dedicated `SQLSyntaxError(ValueError)` and raise it when
sqlglot rejects the input. Valid-but-unresolved SQL still returns the
empty extraction as before, so callers can classify the two outcomes.
@DinaLaptii DinaLaptii requested review from a team as code owners May 25, 2026 12:23
@DinaLaptii DinaLaptii requested a review from drobison00 May 25, 2026 12:23
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 25, 2026

Greptile Summary

This PR fixes a silent-failure bug in extract_tables_and_columns: previously, sqlglot parse/token errors were swallowed and returned as an empty ExtractionResult, making invalid SQL indistinguishable from valid SQL that references no known tables — causing the SQL-validation agent to report success on broken input. A new SQLSyntaxError(ValueError) is now raised on parse failure, allowing callers (notably _sql_parse_validation) to correctly classify the two outcomes.

  • sqlglot_extractor.py: narrows the broad except Exception to except (ParseError, TokenError) and re-raises as SQLSyntaxError, a dedicated ValueError subclass with a clear docstring.
  • test_sqlglot_extractor.py: converts three returns-empty assertions to pytest.raises(SQLSyntaxError); imports pytest and SQLSyntaxError; one new join-section test is a duplicate of an existing table-section test.

Confidence Score: 5/5

Safe to merge; the change is narrow, well-tested, and the new exception propagates correctly through all existing callers.

The implementation is correct: ParseError and TokenError are the right sqlglot exception types to catch, SQLSyntaxError extends ValueError so existing broad except Exception boundaries in _sql_parse_validation and parse_queries_df still handle it, and valid-but-unresolved SQL continues to return an empty ExtractionResult without raising. The only finding is a duplicate test (the new test_empty_sql_raises_for_join_extraction duplicates test_empty_sql_raises_syntax_error), which does not affect correctness.

No files require special attention.

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/tabular_data/ingestion/parsers/sqlglot_extractor.py Introduces SQLSyntaxError(ValueError) and narrows the parse-failure catch from except Exception to except (ParseError, TokenError), re-raising as the new type; valid-but-empty extraction still returns normally.
nemo_retriever/tests/test_sqlglot_extractor.py Converts two previously empty-return assertions to pytest.raises(SQLSyntaxError), imports SQLSyntaxError and pytest; one new join-section test is an exact duplicate of a test in the table section.

Sequence Diagram

sequenceDiagram
    participant Agent as SQLValidationAgent
    participant PQS as parse_query_single
    participant PQL as parse_query_slim
    participant ETC as extract_tables_and_columns
    participant SG as sqlglot.parse_one

    Agent->>PQS: parse_query_single(sql, dialect, schemas)
    PQS->>PQL: parse_query_slim(sql, query_obj, ...)
    PQL->>ETC: extract_tables_and_columns(sql, dialect, ...)
    ETC->>SG: "parse_one(sql, dialect=dialect)"

    alt Valid SQL
        SG-->>ETC: Expression
        ETC-->>PQL: ExtractionResult
        PQL-->>PQS: True (tables found) / False (no tables)
        PQS-->>Agent: Query or None
        Agent-->>Agent: "result success = True"
    else Syntactically broken SQL
        SG-->>ETC: raises ParseError / TokenError
        ETC-->>PQL: raises SQLSyntaxError (NEW)
        PQL-->>PQS: raises SQLSyntaxError
        PQS-->>Agent: raises SQLSyntaxError
        Agent-->>Agent: "result error = str(err) was silently success before"
    end
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
nemo_retriever/tests/test_sqlglot_extractor.py:535-538
**Duplicate test adds no coverage**

`test_empty_sql_raises_for_join_extraction` calls `extract_tables_and_columns("", ...)` and asserts `SQLSyntaxError` — exactly what `test_empty_sql_raises_syntax_error` (line 229) already tests. The original `test_empty_sql_returns_no_joins` was valuable because it checked join-specific output; replacing it with a duplicate of an existing test leaves the join section with no positive coverage for the no-join happy path (e.g. valid SQL with no `JOIN` clause should still return `result.joins == []`).

Reviews (2): Last reviewed commit: "test(tabular): assert SQLSyntaxError on ..." | Re-trigger Greptile

Comment on lines +35 to +42
class SQLSyntaxError(ValueError):
"""Raised when ``sqlglot`` cannot parse the SQL for the given dialect.

Distinguishes pure syntax/tokenization failures from later resolution
issues (e.g. unknown tables, schema mismatches), so callers can classify
validation outcomes precisely instead of treating an unparseable query
the same as a parseable query that references no known tables.
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 SQLSyntaxError not re-exported for callers

The PR's stated goal is to let callers "classify the two outcomes" — but SQLSyntaxError is only defined here in the implementation module. The parsers __init__.py is empty, and neither queries.py nor sql_parse_validation.py imports or re-exports it. Any caller that wants to catch SQLSyntaxError by name (rather than the broad except Exception used today) must know to import from nemo_retriever.tabular_data.ingestion.parsers.sqlglot_extractor directly, making the new type effectively invisible in the public surface. Adding it to the parsers __init__.py would make it discoverable.

Rule Used: Every public class and function in nemo_retriever ... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/ingestion/parsers/sqlglot_extractor.py
Line: 35-42

Comment:
**`SQLSyntaxError` not re-exported for callers**

The PR's stated goal is to let callers "classify the two outcomes" — but `SQLSyntaxError` is only defined here in the implementation module. The parsers `__init__.py` is empty, and neither `queries.py` nor `sql_parse_validation.py` imports or re-exports it. Any caller that wants to catch `SQLSyntaxError` by name (rather than the broad `except Exception` used today) must know to import from `nemo_retriever.tabular_data.ingestion.parsers.sqlglot_extractor` directly, making the new type effectively invisible in the public surface. Adding it to the parsers `__init__.py` would make it discoverable.

**Rule Used:** Every public class and function in nemo_retriever ... ([source](https://app.greptile.com/review/custom-context?memory=public-api-contract))

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 518 to +521
try:
statement = sqlglot.parse_one(sql, dialect=dialect)
except Exception:
return ExtractionResult()
except (ParseError, TokenError) as err:
raise SQLSyntaxError(str(err)) from err
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No tests for the new exception path

test_sqlglot_extractor.py tests happy-path extraction but has no test covering the new SQLSyntaxError raise. Per the test-coverage-new-code rule, new error paths must have corresponding unit tests — at a minimum one test that passes broken SQL (e.g. "SELECT FROM") and asserts pytest.raises(SQLSyntaxError), and one that confirms valid SQL with no matching tables still returns an empty ExtractionResult without raising.

Rule Used: New functionality must include corresponding unit ... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/ingestion/parsers/sqlglot_extractor.py
Line: 518-521

Comment:
**No tests for the new exception path**

`test_sqlglot_extractor.py` tests happy-path extraction but has no test covering the new `SQLSyntaxError` raise. Per the `test-coverage-new-code` rule, new error paths must have corresponding unit tests — at a minimum one test that passes broken SQL (e.g. `"SELECT FROM"`) and asserts `pytest.raises(SQLSyntaxError)`, and one that confirms valid SQL with no matching tables still returns an empty `ExtractionResult` without raising.

**Rule Used:** New functionality must include corresponding unit ... ([source](https://app.greptile.com/review/custom-context?memory=test-coverage-new-code))

How can I resolve this? If you propose a fix, please make it concise.

Update sqlglot_extractor tests to match the new contract introduced in
the previous commit: empty or unparseable SQL now raises SQLSyntaxError
instead of silently returning an empty ExtractionResult, so the two
"silent empty result" assertions become pytest.raises checks.
Copy link
Copy Markdown
Collaborator

@jioffe502 jioffe502 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per liav approval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants