Skip to content

gh-128110: Fix rfc2047 handling in email parser address headers#130749

Open
medmunds wants to merge 1 commit intopython:mainfrom
medmunds:fix-issue-128110
Open

gh-128110: Fix rfc2047 handling in email parser address headers#130749
medmunds wants to merge 1 commit intopython:mainfrom
medmunds:fix-issue-128110

Conversation

@medmunds
Copy link
Copy Markdown
Contributor

@medmunds medmunds commented Mar 1, 2025

RFC 2047 Section 6.2 requires that "any 'linear-white-space' that separates a pair of adjacent 'encoded-word's is ignored." The modern header value parser correctly implements that for unstructured headers, but had missed a case in structured headers. This could cause a parsed address header to include extraneous spaces in a display-name.

Fixed in get_atom() by converting a trailing CFWSList token after an encoded-word to an EWWhiteSpaceTerminal if another encoded-word follows.

Deliberately left similar code in get_dotatom() unmodified. A dotatom can only appear within an addr-spec. RFC 2047 Section 5 prohibits use of an encoded-word in any portion of an addr-spec, so its appearance in a dotatom is invalid. Adding (and testing) special white-space handling in an invalid dotatom seems an unnecessary complication.

Fixes gh-128110

Suggest label: topic-email

RFC 2047 Section 6.2 requires that "any 'linear-white-space' that
separates a pair of adjacent 'encoded-word's is ignored." The modern
header value parser correctly implements that for unstructured headers,
but had missed a case in structured headers. This could cause a parsed
address header to include extraneous spaces in a display-name.

Fixed in get_atom() by converting a trailing CFWSList token after an
encoded-word to an EWWhiteSpaceTerminal if another encoded-word follows.

Deliberately left similar code in get_dotatom() unmodified. A dotatom
can only appear within an addr-spec. RFC 2047 Section 5 prohibits
use of an encoded-word in any portion of an addr-spec, so its appearance
in a dotatom is invalid. Adding (and testing) special white-space
handling in an invalid dotatom seems an unnecessary complication.
@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions Bot added the stale Stale PR or inactive for long period of time. label Apr 22, 2026
Copy link
Copy Markdown
Member

@bitdancer bitdancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that you put the tests in get_phrase points to the fact that it's really get_phrase that is the locus of the bug. That's where the ews can end up next to each other. Here is a fix to get_phrase that passes all your tests. The if is complex, but that's because the circumstances where this situation comes up is very specific.

@@ -1473,6 +1473,16 @@ def get_phrase(value):
         else:
             try:
                 token, value = get_word(value)
+                if (token[0].token_type == 'encoded-word'
+                        and phrase
+                        and phrase[-1].token_type == 'atom'
+                        and len(phrase[-1]) > 1
+                        and phrase[-1][-2].token_type == 'encoded-word'
+                        and phrase[-1][-1].token_type == 'cfws'
+                        and not phrase[-1][-1].comments
+                    ):
+                    # linear ws between ews needs special handing...
+                    phrase[-1][-1] = EWWhiteSpaceTerminal(phrase[-1], 'fws')
             except errors.HeaderParseError:
                 if value[0] in CFWS_LEADER:
                     token, value = get_cfws(value)

This is dependent on the fact that "subsequent" atoms will never have leading whitespace because that's been consumed already. I don't think it's worth adding extra code for the possibility of leading whitespace because the parser won't produce it. It's a bit of parser fragility in the face of code changes, but I think that's a minor concern given the parser design (which is that it consumes whitespace greedily)

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 5, 2026

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting changes stale Stale PR or inactive for long period of time. topic-email

Projects

None yet

Development

Successfully merging this pull request may close these issues.

email.parser can insert extraneous spaces when parsing rfc2047 headers with policy.default

4 participants