Skip to content

Strip all whitespace before commas in reindented identifier lists#850

Open
sarathfrancis90 wants to merge 1 commit into
andialbrecht:masterfrom
sarathfrancis90:fix-reindent-whitespace-before-comma
Open

Strip all whitespace before commas in reindented identifier lists#850
sarathfrancis90 wants to merge 1 commit into
andialbrecht:masterfrom
sarathfrancis90:fix-reindent-whitespace-before-comma

Conversation

@sarathfrancis90
Copy link
Copy Markdown

Summary

Reindenting an identifier list is not idempotent when a comma is preceded by
more than one whitespace token. Re-formatting the formatter's own output
changes it, which it should not:

>>> import sqlparse
>>> a = sqlparse.format("select a  ,  b from t", reindent=True)
>>> a
'select a ,\n       b\nfrom t'          # note the stray space before the comma
>>> sqlparse.format(a, reindent=True)
'select a,\n       b\nfrom t'          # the space is gone on a second pass

So format(format(sql)) != format(sql). The single-whitespace case
(select a , b) already hugs the comma (a,) and is idempotent, which shows
the stable output the multi-whitespace case should also produce.

Root cause

StripWhitespaceFilter._stripws_identifierlist (sqlparse/filters/others.py)
removes whitespace before commas (the original fix for issue140), but only
tracked the single most recent whitespace token:

last_nl = None
for token in list(tlist.tokens):
    if last_nl and token.ttype is T.Punctuation and token.value == ',':
        tlist.tokens.remove(last_nl)
    last_nl = token if token.is_whitespace else None

When a comma is preceded by several whitespace tokens (multiple spaces, tabs,
or a newline followed by a space), only the last one is removed. The remaining
whitespace token is then collapsed to a single space by
_stripws_default, leaving the stray a ,.

Fix

Track the full run of consecutive whitespace tokens before a comma and remove
all of them, so the comma always hugs the preceding token:

last_ws = []
for token in list(tlist.tokens):
    if last_ws and token.ttype is T.Punctuation and token.value == ',':
        for ws in last_ws:
            tlist.tokens.remove(ws)
    last_ws = last_ws + [token] if token.is_whitespace else []

Only whitespace before a comma is removed, so comma_first spacing
(whitespace after a comma) is unaffected.

As a beneficial side effect this also resolves the extra-space output reported
in #644 (comma_first / reindent_aligned producing select * ,).

Test evidence

Added TestFormatReindent.test_identifier_list_whitespace_before_comma
covering multiple spaces, several spaces, tabs, and newline+space before a
comma, asserting both the expected output and idempotency. The new test fails
on the current code and passes with the fix.

$ pytest -q
486 passed, 2 xfailed, 1 xpassed
$ ruff check sqlparse/
All checks passed!

Checklist

  • ran the tests (pytest)
  • all style issues addressed (ruff)
  • your changes are covered by tests
  • your changes are documented, if needed (CHANGELOG entry added)

When reindenting an identifier list, only the single whitespace token
immediately preceding a comma was removed. If a comma was preceded by
more than one whitespace token (for example multiple spaces, tabs, or a
newline followed by a space, as in "a  ,  b"), one whitespace token was
left in place. StripWhitespaceFilter._stripws_default then collapsed it
to a single space, producing output such as "a ," with a stray space
before the comma.

That stray space was removed on a subsequent format pass, so formatting
was not idempotent: format(format(sql)) != format(sql) for any
identifier list containing extra whitespace before a comma.

Track the full run of consecutive whitespace tokens before a comma and
remove all of them, so the comma always hugs the preceding token and the
output is stable when the formatter is applied to its own output.

This extends the fix for issue140, which only handled a single
whitespace token before a comma.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant