Have you searched if there an existing issue for this?
Python version (python --version)
python 3.14.0
Scrapling version (scrapling.version)
0.4.7
Dependencies version (pip3 freeze)
browserforge==1.2.4
certifi==2026.2.25
curl_cffi==0.15.0
idna==3.11
patchright==1.58.2
playwright==1.58.0
playwright-stealth==2.0.3
scrapling==0.4.7
urllib3==2.6.3
What's your operating system?
Windows 10
Are you using a separate virtual environment?
No
Expected behavior
Fetcher should consistently return the real page content (~121 KB HTML with car sales data), bypassing Tencent WAF on every request — just like a real browser does.
Actual behavior
The server returns HTTP 200 OK, but the response body is probabilistically intercepted by Tencent WAF:
- ~70% of requests: Returns a ~1.7 KB CAPTCHA challenge page containing
TencentCaptcha JS code, not the real content
- ~30% of requests: Returns the legitimate ~121 KB HTML page with sales data
The intercepted response contains:
<script>
var seqid = "..._captcha"
</script>
<script src="https://ssl.captcha.qq.com/TCaptcha.js"></script>
<script>
var captcha = new TencentCaptcha('2017163193', function(res) { ... });
captcha.show();
</script>
The WAF uses Tencent Cloud WAF (server header: Lego Server) with CAPTCHA challenge mode, specifically protecting the /newcar/salesrank/ path. Despite Scrapling's stealth mechanisms, the WAF's risk scoring model fluctuates around the threshold, causing inconsistent, probabilistic blocking rather than always passing or always failing.
Additionally, StealthyFetcher cannot be used as a fallback because it fails to locate the Playwright browser:
BrowserType.launch_persistent_context: Executable doesn't exist at
...\.local-browsers\chromium-1208\chrome-win64\chrome.exe
This happens because StealthyFetcher uses patchright (a Playwright fork) which looks for browsers in its own .local-browsers directory rather than the standard Playwright installation path — requiring a second browser download even when Playwright browsers are already installed.
### Steps To Reproduce
```python
from scrapling import Fetcher
url = "https://car.yiche.com/newcar/salesrank/?saleType=0&date=2026-03-01&energy=6&page=1"
fetcher = Fetcher()
resp = fetcher.get(url, timeout=30)
body = resp.body.decode("utf-8", errors="replace")
print(f"Status: {resp.status}")
print(f"Body length: {len(body)}")
# Detect WAF challenge
if "TencentCaptcha" in body:
print("INTERCEPTED by Tencent WAF")
else:
print("PASSED — but this is inconsistent across runs")
Run the script 3-5 times in a loop. You will see a mix of:
- ~1.7 KB responses with
TencentCaptcha in the HTML (blocked)
- ~121 KB responses with real car sales ranking data (passed)
Have you searched if there an existing issue for this?
Python version (python --version)
python 3.14.0
Scrapling version (scrapling.version)
0.4.7
Dependencies version (pip3 freeze)
browserforge==1.2.4
certifi==2026.2.25
curl_cffi==0.15.0
idna==3.11
patchright==1.58.2
playwright==1.58.0
playwright-stealth==2.0.3
scrapling==0.4.7
urllib3==2.6.3
What's your operating system?
Windows 10
Are you using a separate virtual environment?
No
Expected behavior
Fetchershould consistently return the real page content (~121 KB HTML with car sales data), bypassing Tencent WAF on every request — just like a real browser does.Actual behavior
The server returns HTTP 200 OK, but the response body is probabilistically intercepted by Tencent WAF:
TencentCaptchaJS code, not the real contentThe intercepted response contains:
The WAF uses Tencent Cloud WAF (server header:
Lego Server) with CAPTCHA challenge mode, specifically protecting the/newcar/salesrank/path. Despite Scrapling's stealth mechanisms, the WAF's risk scoring model fluctuates around the threshold, causing inconsistent, probabilistic blocking rather than always passing or always failing.Additionally,
StealthyFetchercannot be used as a fallback because it fails to locate the Playwright browser:This happens because
StealthyFetcherusespatchright(a Playwright fork) which looks for browsers in its own.local-browsersdirectory rather than the standard Playwright installation path — requiring a second browser download even when Playwright browsers are already installed.Run the script 3-5 times in a loop. You will see a mix of:
TencentCaptchain the HTML (blocked)