Skip to content

Add lightpanda to webcrawlers filter.#6143

Open
rodolfoBee wants to merge 1 commit into
getsentry:masterfrom
rodolfoBee:master
Open

Add lightpanda to webcrawlers filter.#6143
rodolfoBee wants to merge 1 commit into
getsentry:masterfrom
rodolfoBee:master

Conversation

@rodolfoBee

Copy link
Copy Markdown
Member

Fixes #6142

@rodolfoBee rodolfoBee requested a review from a team as a code owner June 26, 2026 13:39

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Want reviews to match your repository better? Bugbot Learning can learn team-specific rules from PR activity. A team admin can enable Learning in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 2e32395. Configure here.

PerplexityBot| # Perplexity - see https://docs.perplexity.ai/guides/bots
Applebot| # Apple - see https://support.apple.com/en-us/119829
DuckDuckBot # DuckDuckGo - see https://duckduckgo.com/duckduckgo-help-pages/results/duckduckbot
Lightpanda # Lightpand - see https://lightpanda.io/

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing regex alternation pipe

High Severity

The WEB_CRAWLERS pattern uses (?x), so DuckDuckBot and the new Lightpanda token are concatenated into one literal DuckDuckBotLightpanda because there is no | between them. Real DuckDuckBot and Lightpanda user agents no longer match their intended alternatives.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2e32395. Configure here.

Comment on lines 48 to +49
DuckDuckBot # DuckDuckGo - see https://duckduckgo.com/duckduckgo-help-pages/results/duckduckbot
Lightpanda # Lightpand - see https://lightpanda.io/

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The missing pipe | separator after DuckDuckBot causes the regex to incorrectly look for DuckDuckBotLightpanda instead of DuckDuckBot or Lightpanda, breaking filtering for both.
Severity: MEDIUM

Suggested Fix

Add a pipe separator (|) to the end of the DuckDuckBot line in the regex pattern. This will correctly separate the two patterns, ensuring the regex engine treats them as alternatives (DuckDuckBot OR Lightpanda).

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: relay-filter/src/web_crawlers.rs#L48-L49

Potential issue: Due to a missing pipe (`|`) separator on the `DuckDuckBot` line, the
verbose regex (`(?ix)`) will concatenate `DuckDuckBot` and the newly added `Lightpanda`.
This changes the pattern to match the literal string `DuckDuckBotLightpanda` instead of
matching either `DuckDuckBot` or `Lightpanda`. As a result, user agents for both
DuckDuckGo's and Lightpanda's crawlers will no longer be filtered out, allowing their
events to be processed when they should be dropped. This breaks the intended
functionality of the web crawler filter for these two bots.

Did we get this right? 👍 / 👎 to inform future reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add lightpanda.io to webcrawlers filter.

1 participant