Skip to content

Gmail sender content not being fully removed #23

@bittercoder

Description

@bittercoder

Hi!

Thanks to the lever team for building (and sharing) this really useful library!

I've recently found an issue with processing gmail messages where the email address is a hyperlink rather than plain text is causing a small amount of the content in the quoted messages sender line (effective the closing angle bracket of the <a></a> tag is included and any text after that up to the <br> within the same div.

Here are a couple of examples, first one that works correctly - this only returns the content in the body "Example message - there should be quoted content below (gmail style)"

<html>
<body>

Example message - there should be quoted content below (gmail style)

<div class="gmail_quote">
    <div dir="ltr" class="gmail_attr">On Thu, Nov 23, 2023 at 1:03 PM Sam Sampson &lt;sam@testsampson.com&gt; wrote:<br></div>
    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
        <div dir="ltr">
            Hi Gentlemen,<div><br></div>
            <div>Pleasure speaking with you all today. I''ve got your proposal running through finance as we speak 
            and should, hopefully, have a DocuSign out for you later today.
            </div>
        </div>
    </blockquote>
</div>
</body>
</html>

And then this is an example of one that does not work correctly:

<html>
<body>

Example message - there should be quoted content below (gmail style)


<div class="gmail_quote">
    <div dir="ltr" class="gmail_attr">On Thu, Nov 23, 2023 at 1:03 PM Sam Sampson &lt;<a 
 href="mailto:sam@testsampson.com">sam@testsampson.com</a>&gt; wrote:<br></div>
    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
        <div dir="ltr">
            Hi Gentlemen,<div><br></div>
            <div>Pleasure speaking with you all today. I''ve got your proposal running through finance as we speak 
            and should, hopefully, have a DocuSign out for you later today.
            </div>
        </div>
    </blockquote>
</div>
</body>
</html>

This returns:

<html>
<body>

Example message - there should be quoted content below (gmail style)

<div class="gmail_quote">
  <div dir="ltr" class="gmail_attr">
    > wrote:</br>
  </div>
</div>

Let me know if you need any further details.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions