feat!: pluggable HTTP backend – httpx or curl-cffi (#269)#308
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a pluggable HTTP layer for twscrape, replacing direct httpx usage with a unified HttpClient/Response abstraction and adding an optional curl-cffi backend (auto-preferred when installed, or selectable via TWS_HTTP_BACKEND).
Changes:
- Added
twscrape.httpwithHttpxClient/CurlClient, unifiedResponse, and backend auto-detection + env override. - Migrated internal call sites (API/raw paths, login, queue client, xclid, CLI) from
httpxtypes to the new wrapper types. - Updated packaging/docs/tests to support the new backend model (
twscrape[curl], new unit tests, removedpytest-httpx).
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Adds curl-cffi (and deps), removes pytest-httpx, updates extras/dev deps accordingly. |
| pyproject.toml | Defines curl extra, updates dev dependency group, adjusts pyright include. |
| readme.md | Documents the new optional curl backend and TWS_HTTP_BACKEND. |
| twscrape/http.py | New HTTP abstraction layer + backend detection/selection. |
| twscrape/account.py | Switches account client creation to make_client() and unified headers/cookies setup. |
| twscrape/queue_client.py | Replaces httpx exceptions/types with twscrape.http equivalents. |
| twscrape/login.py | Switches login flow to use HttpClient/Response. |
| twscrape/models.py | Updates parsing helpers to accept twscrape.Response. |
| twscrape/xclid.py | Switches page/script fetches to HttpClient. |
| twscrape/api.py | Updates raw response type to twscrape.Response. |
| twscrape/accounts_pool.py | Updates status error handling to use HttpStatusError. |
| twscrape/cli.py | Updates --raw printing path to accept twscrape.Response. |
| scripts/update_gql_ops.py | Switches script downloader to make_client() for pluggable backend support. |
| tests/mock_http.py | Adds a small HttpClient mock for deterministic tests without pytest-httpx. |
| tests/conftest.py | Monkeypatches Account.make_client() to use MockClient, adjusts log level. |
| tests/test_http.py | Adds coverage for both backends + backend detection + wrapper behavior. |
| tests/test_queue_client.py | Reworks tests to use MockClient and adds many additional branch tests. |
| tests/test_pool.py | Adds coverage for pool maintenance behaviors and no-account error paths. |
| tests/test_utils.py | Adds coverage for get_env_bool and new-schema flattening via to_old_obj. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@vladkens I would be happy to test this work in my project. Could you please rebase and squash your commits so I can pick them more easily? Thank you! |
|
@Flaburgan Wow, thanks! I merged this PR into Could you pull the latest If you use uv, you can install it from the main branch with: uv add "twscrape @ git+https://github.com/vladkens/twscrape.git@main"Or with pip: pip install -U "git+https://github.com/vladkens/twscrape.git@main"If you have any comments or want to discuss anything, you can also ping me on Telegram or Discord: @vladkens. Totally optional, just if it is convenient for you. |
Summary
Introduces a pluggable HTTP backend layer (
twscrape/http.py) so the library can use eitherhttpx(existing default) orcurl-cffifor requests.curl-cffiuses libcurl with browser-level TLS fingerprint spoofing, which helps bypass Cloudflare bot detection.pip install twscrape— works as before (httpx)pip install twscrape[curl]— enables curl-cffi, preferred automatically when presentTWS_HTTP_BACKEND=httpx|curl— force a specific backendChanges
New
twscrape/http.py— unifiedHttpClientinterface wrapping both backends:HttpxClient/CurlClient— backend implementationsResponse— thin wrapper normalisinghttpx.Responseandcurl_cffi.Responseto one API_detect_backend()— auto-selects curl if installed, falls back to httpxConnectError,NetworkError,HttpStatusErroraccount.py—make_client()now returnsHttpClientinstead ofhttpx.AsyncClientqueue_client.py— replaced directhttpxerror types withConnectError/NetworkErrorfrom the new abstractionBreaking change
Raw API methods (e.g.
search_raw) now returntwscrape.Responseinstead ofhttpx.Response. The interface is compatible (.status_code,.json(),.text,.headers), but directisinstance(rep, httpx.Response)checks will break.Tests
tests/test_http.py(new) — full coverage of both backends, all error-mapping paths,_detect_backend,make_clienttests/mock_http.py(new) —MockClientfor integration tests without a real HTTP servertest_queue_client.py— added branches: error code 131,_Missing,Authorizationpassthrough, unknown error, unhandled 5xx, JSON decode fallback, 404 retry, unknown-exception retrytest_pool.py—delete_accounts,reset_locks,mark_inactive,next_available_at,accounts_infosorting,load_from_file,get_for_queue_or_waitraise-on-no-accounthttp.py: 100% |queue_client.py: 69→83% |accounts_pool.py: 55→76%