Skip to content

DbUrlList now honors recrawlInMs option.#43

Open
hjr3 wants to merge 1 commit into
brendonboshell:masterfrom
hjr3:issue-49
Open

DbUrlList now honors recrawlInMs option.#43
hjr3 wants to merge 1 commit into
brendonboshell:masterfrom
hjr3:issue-49

Conversation

@hjr3

@hjr3 hjr3 commented Nov 9, 2019

Copy link
Copy Markdown
Contributor

Fixes #40

Comment thread lib/DbUrlList.js
// seconds. This ensures the order we crawl URLs is random; otherwise, if
// we parse a sitemap, we could get stuck crawling one host for hours.
delay = - Math.random() * YEAR_MS;
delay = - Math.random() * this._recrawlInMs;

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this has the intended effect. Notice that delay is negative here. This is simply to randomize new URLs that come onto the queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How to periodically crawl again

2 participants