fix(tabular): raise SQLSyntaxError on unparseable SQL#2111
Conversation
Previously `extract_tables_and_columns` swallowed sqlglot parse/token errors and returned an empty `ExtractionResult`, making truly invalid SQL indistinguishable from valid SQL that referenced no known tables. Both reached callers as the same silent "no tables found" signal, so the SQL-validation agent reported success on syntactically broken SQL. Introduce a dedicated `SQLSyntaxError(ValueError)` and raise it when sqlglot rejects the input. Valid-but-unresolved SQL still returns the empty extraction as before, so callers can classify the two outcomes.
Greptile SummaryThis PR fixes a silent-failure bug in
|
| Filename | Overview |
|---|---|
| nemo_retriever/src/nemo_retriever/tabular_data/ingestion/parsers/sqlglot_extractor.py | Introduces SQLSyntaxError(ValueError) and narrows the parse-failure catch from except Exception to except (ParseError, TokenError), re-raising as the new type; valid-but-empty extraction still returns normally. |
| nemo_retriever/tests/test_sqlglot_extractor.py | Converts two previously empty-return assertions to pytest.raises(SQLSyntaxError), imports SQLSyntaxError and pytest; one new join-section test is an exact duplicate of a test in the table section. |
Sequence Diagram
sequenceDiagram
participant Agent as SQLValidationAgent
participant PQS as parse_query_single
participant PQL as parse_query_slim
participant ETC as extract_tables_and_columns
participant SG as sqlglot.parse_one
Agent->>PQS: parse_query_single(sql, dialect, schemas)
PQS->>PQL: parse_query_slim(sql, query_obj, ...)
PQL->>ETC: extract_tables_and_columns(sql, dialect, ...)
ETC->>SG: "parse_one(sql, dialect=dialect)"
alt Valid SQL
SG-->>ETC: Expression
ETC-->>PQL: ExtractionResult
PQL-->>PQS: True (tables found) / False (no tables)
PQS-->>Agent: Query or None
Agent-->>Agent: "result success = True"
else Syntactically broken SQL
SG-->>ETC: raises ParseError / TokenError
ETC-->>PQL: raises SQLSyntaxError (NEW)
PQL-->>PQS: raises SQLSyntaxError
PQS-->>Agent: raises SQLSyntaxError
Agent-->>Agent: "result error = str(err) was silently success before"
end
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
nemo_retriever/tests/test_sqlglot_extractor.py:535-538
**Duplicate test adds no coverage**
`test_empty_sql_raises_for_join_extraction` calls `extract_tables_and_columns("", ...)` and asserts `SQLSyntaxError` — exactly what `test_empty_sql_raises_syntax_error` (line 229) already tests. The original `test_empty_sql_returns_no_joins` was valuable because it checked join-specific output; replacing it with a duplicate of an existing test leaves the join section with no positive coverage for the no-join happy path (e.g. valid SQL with no `JOIN` clause should still return `result.joins == []`).
Reviews (2): Last reviewed commit: "test(tabular): assert SQLSyntaxError on ..." | Re-trigger Greptile
| class SQLSyntaxError(ValueError): | ||
| """Raised when ``sqlglot`` cannot parse the SQL for the given dialect. | ||
|
|
||
| Distinguishes pure syntax/tokenization failures from later resolution | ||
| issues (e.g. unknown tables, schema mismatches), so callers can classify | ||
| validation outcomes precisely instead of treating an unparseable query | ||
| the same as a parseable query that references no known tables. | ||
| """ |
There was a problem hiding this comment.
SQLSyntaxError not re-exported for callers
The PR's stated goal is to let callers "classify the two outcomes" — but SQLSyntaxError is only defined here in the implementation module. The parsers __init__.py is empty, and neither queries.py nor sql_parse_validation.py imports or re-exports it. Any caller that wants to catch SQLSyntaxError by name (rather than the broad except Exception used today) must know to import from nemo_retriever.tabular_data.ingestion.parsers.sqlglot_extractor directly, making the new type effectively invisible in the public surface. Adding it to the parsers __init__.py would make it discoverable.
Rule Used: Every public class and function in nemo_retriever ... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/ingestion/parsers/sqlglot_extractor.py
Line: 35-42
Comment:
**`SQLSyntaxError` not re-exported for callers**
The PR's stated goal is to let callers "classify the two outcomes" — but `SQLSyntaxError` is only defined here in the implementation module. The parsers `__init__.py` is empty, and neither `queries.py` nor `sql_parse_validation.py` imports or re-exports it. Any caller that wants to catch `SQLSyntaxError` by name (rather than the broad `except Exception` used today) must know to import from `nemo_retriever.tabular_data.ingestion.parsers.sqlglot_extractor` directly, making the new type effectively invisible in the public surface. Adding it to the parsers `__init__.py` would make it discoverable.
**Rule Used:** Every public class and function in nemo_retriever ... ([source](https://app.greptile.com/review/custom-context?memory=public-api-contract))
How can I resolve this? If you propose a fix, please make it concise.| try: | ||
| statement = sqlglot.parse_one(sql, dialect=dialect) | ||
| except Exception: | ||
| return ExtractionResult() | ||
| except (ParseError, TokenError) as err: | ||
| raise SQLSyntaxError(str(err)) from err |
There was a problem hiding this comment.
No tests for the new exception path
test_sqlglot_extractor.py tests happy-path extraction but has no test covering the new SQLSyntaxError raise. Per the test-coverage-new-code rule, new error paths must have corresponding unit tests — at a minimum one test that passes broken SQL (e.g. "SELECT FROM") and asserts pytest.raises(SQLSyntaxError), and one that confirms valid SQL with no matching tables still returns an empty ExtractionResult without raising.
Rule Used: New functionality must include corresponding unit ... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tabular_data/ingestion/parsers/sqlglot_extractor.py
Line: 518-521
Comment:
**No tests for the new exception path**
`test_sqlglot_extractor.py` tests happy-path extraction but has no test covering the new `SQLSyntaxError` raise. Per the `test-coverage-new-code` rule, new error paths must have corresponding unit tests — at a minimum one test that passes broken SQL (e.g. `"SELECT FROM"`) and asserts `pytest.raises(SQLSyntaxError)`, and one that confirms valid SQL with no matching tables still returns an empty `ExtractionResult` without raising.
**Rule Used:** New functionality must include corresponding unit ... ([source](https://app.greptile.com/review/custom-context?memory=test-coverage-new-code))
How can I resolve this? If you propose a fix, please make it concise.Update sqlglot_extractor tests to match the new contract introduced in the previous commit: empty or unparseable SQL now raises SQLSyntaxError instead of silently returning an empty ExtractionResult, so the two "silent empty result" assertions become pytest.raises checks.
Previously
extract_tables_and_columnsswallowed sqlglot parse/token errors and returned an emptyExtractionResult, making truly invalid SQL indistinguishable from valid SQL that referenced no known tables. Both reached callers as the same silent "no tables found" signal, so the SQL-validation agent reported success on syntactically broken SQL.Introduce a dedicated
SQLSyntaxError(ValueError)and raise it when sqlglot rejects the input. Valid-but-unresolved SQL still returns the empty extraction as before, so callers can classify the two outcomes.Description
Checklist