Skip to content

potential fix for google_search()#6

Open
bowditch-c wants to merge 1 commit into
chrispetrou:masterfrom
bowditch-c:google_search-fix
Open

potential fix for google_search()#6
bowditch-c wants to merge 1 commit into
chrispetrou:masterfrom
bowditch-c:google_search-fix

Conversation

@bowditch-c

Copy link
Copy Markdown

A suggested fix for google_search()

@chrispetrou

Copy link
Copy Markdown
Owner

I tried the pull-request and it doesn't seem to work for me. Every email I tested gets reported as not found in the google search results which is not the case!

@bowditch-c

Copy link
Copy Markdown
Author

The major change is that the function now performs a google search using quotes, e.g “username@email.com”. It will search for that email exactly as typed. It works for me! Any public facing email addresses return results, whilst private emails don’t. If that’s not exactly the intended function, my apologies.

@chrispetrou

Copy link
Copy Markdown
Owner

The function does pretty match what you described but when I test the following script using your pull-request:

import os, sys
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

os.environ['MOZ_HEADLESS'] = '1'
cap = DesiredCapabilities().FIREFOX
cap["marionette"] = True

def google_search(email):
    endpoint = 'https://google.com/search?q=%22{}%22'.format(email)
    try:
        with webdriver.Firefox(capabilities=cap) as d:
            d.get(endpoint)
            if "No results found" or "did not match any documents" in d.page_source:
                return False
            else:
                return True
    except Exception as error:
        raise(error)

try:
    email = sys.argv[1]
    breached = google_search(email)
    if breached:
        print("{} shows up on google search results".format(email))
    else:
        print("{} doesn't show up on google search results.".format(email))
except IndexError:
    sys.exit(0)

I get positive (by positive I mean not showing up in google search results) results for every email I test. When I use your method manually it works but through that script it doesn't for some reason. I've tried it even for very simple emails that have been in thousands breaches and it keeps reporting them as safe...

@jsfan

jsfan commented Aug 29, 2019

Copy link
Copy Markdown

The patch ignores a race condition. Google's search is rendered via Javascript and the script does not make sure that it waits for the DOM to have been assembled before trying to read from it.

cf. https://selenium-python.readthedocs.io/waits.html

@bowditch-c

Copy link
Copy Markdown
Author

Aha! Excellent catch. Thank you! I was stumped. I couldn’t recreate the issue on my end with my set of test emails. An explicit wait should resolve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants