Skip to content

[Bug]: Tencent WAF returns HTTP 200 with JS challenge page — Fetcher inconsistent, StealthyFetcher not truly stealthy #265

@jdb110

Description

@jdb110

Have you searched if there an existing issue for this?

  • I have searched the existing issues

Python version (python --version)

python 3.14.0

Scrapling version (scrapling.version)

0.4.7

Dependencies version (pip3 freeze)

browserforge==1.2.4
certifi==2026.2.25
curl_cffi==0.15.0
idna==3.11
patchright==1.58.2
playwright==1.58.0
playwright-stealth==2.0.3
scrapling==0.4.7
urllib3==2.6.3

What's your operating system?

Windows 10

Are you using a separate virtual environment?

No

Expected behavior

Fetcher should consistently return the real page content (~121 KB HTML with car sales data), bypassing Tencent WAF on every request — just like a real browser does.

Actual behavior

The server returns HTTP 200 OK, but the response body is probabilistically intercepted by Tencent WAF:

  • ~70% of requests: Returns a ~1.7 KB CAPTCHA challenge page containing TencentCaptcha JS code, not the real content
  • ~30% of requests: Returns the legitimate ~121 KB HTML page with sales data

The intercepted response contains:

<script>
    var seqid = "..._captcha"
</script>
<script src="https://ssl.captcha.qq.com/TCaptcha.js"></script>
<script>
    var captcha = new TencentCaptcha('2017163193', function(res) { ... });
    captcha.show();
</script>

The WAF uses Tencent Cloud WAF (server header: Lego Server) with CAPTCHA challenge mode, specifically protecting the /newcar/salesrank/ path. Despite Scrapling's stealth mechanisms, the WAF's risk scoring model fluctuates around the threshold, causing inconsistent, probabilistic blocking rather than always passing or always failing.

Additionally, StealthyFetcher cannot be used as a fallback because it fails to locate the Playwright browser:

BrowserType.launch_persistent_context: Executable doesn't exist at
...\.local-browsers\chromium-1208\chrome-win64\chrome.exe

This happens because StealthyFetcher uses patchright (a Playwright fork) which looks for browsers in its own .local-browsers directory rather than the standard Playwright installation path — requiring a second browser download even when Playwright browsers are already installed.


### Steps To Reproduce

```python
from scrapling import Fetcher

url = "https://car.yiche.com/newcar/salesrank/?saleType=0&date=2026-03-01&energy=6&page=1"

fetcher = Fetcher()
resp = fetcher.get(url, timeout=30)

body = resp.body.decode("utf-8", errors="replace")
print(f"Status: {resp.status}")
print(f"Body length: {len(body)}")

# Detect WAF challenge
if "TencentCaptcha" in body:
    print("INTERCEPTED by Tencent WAF")
else:
    print("PASSED — but this is inconsistent across runs")

Run the script 3-5 times in a loop. You will see a mix of:

  • ~1.7 KB responses with TencentCaptcha in the HTML (blocked)
  • ~121 KB responses with real car sales ranking data (passed)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions