Strip all whitespace before commas in reindented identifier lists#850
Open
sarathfrancis90 wants to merge 1 commit into
Open
Strip all whitespace before commas in reindented identifier lists#850sarathfrancis90 wants to merge 1 commit into
sarathfrancis90 wants to merge 1 commit into
Conversation
When reindenting an identifier list, only the single whitespace token immediately preceding a comma was removed. If a comma was preceded by more than one whitespace token (for example multiple spaces, tabs, or a newline followed by a space, as in "a , b"), one whitespace token was left in place. StripWhitespaceFilter._stripws_default then collapsed it to a single space, producing output such as "a ," with a stray space before the comma. That stray space was removed on a subsequent format pass, so formatting was not idempotent: format(format(sql)) != format(sql) for any identifier list containing extra whitespace before a comma. Track the full run of consecutive whitespace tokens before a comma and remove all of them, so the comma always hugs the preceding token and the output is stable when the formatter is applied to its own output. This extends the fix for issue140, which only handled a single whitespace token before a comma.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reindenting an identifier list is not idempotent when a comma is preceded by
more than one whitespace token. Re-formatting the formatter's own output
changes it, which it should not:
So
format(format(sql)) != format(sql). The single-whitespace case(
select a , b) already hugs the comma (a,) and is idempotent, which showsthe stable output the multi-whitespace case should also produce.
Root cause
StripWhitespaceFilter._stripws_identifierlist(sqlparse/filters/others.py)removes whitespace before commas (the original fix for issue140), but only
tracked the single most recent whitespace token:
When a comma is preceded by several whitespace tokens (multiple spaces, tabs,
or a newline followed by a space), only the last one is removed. The remaining
whitespace token is then collapsed to a single space by
_stripws_default, leaving the straya ,.Fix
Track the full run of consecutive whitespace tokens before a comma and remove
all of them, so the comma always hugs the preceding token:
Only whitespace before a comma is removed, so
comma_firstspacing(whitespace after a comma) is unaffected.
As a beneficial side effect this also resolves the extra-space output reported
in #644 (
comma_first/reindent_alignedproducingselect * ,).Test evidence
Added
TestFormatReindent.test_identifier_list_whitespace_before_commacovering multiple spaces, several spaces, tabs, and newline+space before a
comma, asserting both the expected output and idempotency. The new test fails
on the current code and passes with the fix.
Checklist
pytest)ruff)