Update dependency crawl4ai to v0.9.0#258
Open
renovate[bot] wants to merge 1 commit into
Open
Conversation
8b5f55d to
1d6fd91
Compare
1d6fd91 to
284a593
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==0.8.6→==0.9.0Release Notes
unclecode/crawl4ai (crawl4ai)
v0.9.0Compare Source
0.9.0 is a major, secure-by-default release of the Crawl4AI Docker API server. The out-of-the-box deployment is now hardened with defense in depth: authentication is on by default, the server binds loopback unless you give it a token, and the network request body is treated as an untrusted trust boundary. This release contains breaking changes for the self-hosted HTTP server only. The core pip library (SDK / in-process use) is unchanged.
What changed: the Docker server moved from an open, trust-the-caller posture to a closed, secure-by-default one. Defaults that used to be permissive (open bind, no auth, request-supplied browser internals, TLS verification off, Redis with no password) are now safe by default and gated behind explicit configuration.
What you must do: set
CRAWL4AI_API_TOKENand re-issue any tokens, then review whether you relied on any of the request fields or features that are now configured server-side. Most plain "crawl these URLs" users only need the two steps in the "Everyone" section of the migration guide. The full guide is atdeploy/docker/MIGRATION.md.Security
This release completes the secure-by-default hardening of the Docker API server begun in 0.8.7 and 0.8.8. It moves the worst remaining issues from mitigation to architecture: unauthenticated access and request-supplied code/config are eliminated by design rather than patched in place. Every change is hardening; users self-hosting the Docker server should upgrade and follow the migration guide.
0.0.0.0. With no token it binds127.0.0.1and prints a one-off local token; exposing it requiresCRAWL4AI_API_TOKENandAuthorization: Bearer <token>on every request exceptGET /health.O_NOFOLLOW, closing a path-traversal-to-file-write class. Credit: Y4tacker./crawl/streamand/crawlwithstream=truenow validate the destination and return HTTP 400 for disallowed targets, matching the non-streaming handlers. Credit: KOH Jun Sheng.browser_config.extra_argsrejected (CWE-94): launch arguments can no longer be supplied over the network, closing a Chromium launch-arg injection class. Credit: Y4tacker, UDU_RisePho (hoanggxyuuki).All reporters are credited in
SECURITY-CREDITS.md. GitHub Security Advisories accompany this release.Breaking Changes
These apply to the self-hosted Docker API server only. The pip library is unaffected. See
deploy/docker/MIGRATION.mdfor the step-by-step migration anddeploy/docker/SECURITY-VERIFY.mdfor the deployment checklist.CRAWL4AI_API_TOKENand sendAuthorization: Bearer <token>. With no token the server binds loopback only.0.0.0.0without a token; put a TLS-terminating reverse proxy in front when you expose it.POST /token.js_code,js_code_before_wait,c4a_script,proxy/proxy_config,extra_args,user_data_dir,cdp_url,cookies,headers,init_scripts,base_url,deep_crawl_strategy,simulate_user,magic,process_in_browser, and nested LLM config objects are rejected with HTTP 400 when sent over the network. Configure them server-side or use the in-process SDK. Unknown fields are dropped; timeouts, viewport, and scroll counts are clamped.hooks.codeis replaced by a fixed action set (block_resources,add_cookies,set_headers,scroll_to_bottom,wait_for_timeout). SeeGET /hooks/info.output_pathremoved, replaced by an artifact id:/screenshotand/pdfstore the result and returnartifact_id+ URL; fetch via authenticatedGET /artifacts/{artifact_id}(TTL and quota apply).base_urlremoved:/md,/llm, and/llm/jobselect a provider by name only; endpoint and key are configured server-side and constrained byconfig.llm.allowed_providers.POST /monitor/actions/*and/monitor/stats/resetneed an admin-scope principal.security.cors_allow_origins.CRAWL4AI_ALLOW_INSECURE_TLS=true,CRAWL4AI_ALLOW_INTERNAL_URLS=true.REDIS_PASSWORD.0= unbounded).{"error": "Internal server error", "correlation_id": "…"}; match the id in the logs for detail.Security Credits
Y4tacker, KOH Jun Sheng, and UDU_RisePho (hoanggxyuuki). See
SECURITY-CREDITS.md.v0.8.9Compare Source
0.8.9 is a follow-up, backward-compatible security patch for the self-hosted Docker API server, closing an SSRF path that 0.8.8 did not cover. Upgrade in place; no configuration changes required.
Security
A security advisory accompanies this release.
/crawl,/crawl/stream, or/crawl/jobrequest could setbrowser_config.proxy_config.server(or the deprecatedbrowser_config.proxy, orcrawler_config.proxy_config, or a--proxy-server/--host-resolver-rulesflag inextra_args) to an internal address and route the browser through it, reaching internal services and cloud-metadata endpoints. All proxy destinations are now validated with the same global-routability check before the browser is built, and proxy/DNS-redirecting flags are stripped fromextra_args. A legitimate public proxy still works. Credit: Geo (geo-chen).Backward compatible. Note: raw
--proxy-server/--host-resolver-rules/--proxy-bypass-list/--proxy-pac-urlflags passed viaextra_argsare now ignored; configure proxies throughproxy_config(which is validated).v0.8.8Compare Source
0.8.8 is a focused, backward-compatible security patch for the self-hosted Docker API server. Upgrade in place; no configuration changes are required. If you run the Docker server, upgrade. If it is exposed to a network, also set
CRAWL4AI_API_TOKEN.Security
Security advisories accompany this release.
64:ff9b::/96, 6to42002::/16, IPv4-mapped, and the unspecified::), which previously bypassed the explicit blocklist and could reach internal services and cloud-metadata endpoints. SSRF errors no longer echo the resolved address. Credit: internal security audit.output_pathhardened (CWE-59/22):/screenshotand/pdfnow resolve symlinks and re-check containment before writing, and write withO_NOFOLLOW, closing a symlink/TOCTOU bypass of the directory restriction.output_pathbehavior is unchanged for normal use. Credit: internal security audit./md,/llm,/llm/job) ignore a request-suppliedbase_url, so the configured provider key can no longer be redirected to an attacker endpoint.LLMConfigadditionally refuses to resolve protected environment variables via theenv:token form. Thebase_urlfield is still accepted but no longer honored. Credit: Geo (geo-chen); theenv:hardening from internal security audit.All changes are backward compatible.
Coming next: secure-by-default Docker server (~1-2 weeks)
The next release is a larger, secure-by-default update for the self-hosted Docker API server, with intentional breaking changes. We are giving advance notice so you can prepare. If you run the Docker server, start planning now and test in staging before upgrading:
CRAWL4AI_API_TOKEN) is configured./screenshotand/pdfreturn an artifact id instead of a file path, and the LLM endpoint is selected by provider name.A full migration guide will accompany the pre-announcement on Discord and X.
v0.8.7Compare Source
0.8.7 is a security-hardening release. It bundles every responsibly-disclosed vulnerability patched since 0.8.6, plus the new DomainMapper feature and a batch of scraping, deep-crawl, and LLM fixes.
Security
This release fixes multiple critical vulnerabilities in the Docker API server. If you self-host the Docker API, upgrade immediately. Two GitHub Security Advisories accompany this release.
gi_frame.f_backframe-chain escape in the computed-fieldeval()path. Removedeval()from computed fields entirely and deleted_safe_eval_expression. Credit: Song Binglin (q1uf3ng).asyncio,json,re) carried a full__builtins__, bypassing the__import__block. Stripped injected builtins and removed dangerous allowlist entries. Credit: by111 (August829)."mysecret"allowed token forgery. Removed the default, reject weak/short secrets, and auto-generate an ephemeral key when JWT is enabled with no key set. Credit: by111 (August829).output_path(CVSS 9.1, CWE-22):/screenshotand/pdfwrote to any path. Restricted writes toCRAWL4AI_OUTPUT_DIRand reject..traversal. Credit: Jeongbean Jeon, wulonchia./crawl/joband/llm/jobcould reach internal and cloud-metadata IPs. Added a blocklist andfollow_redirects=False. Credit: Jeongbean Jeon./crawl,/md, and/llmfetched arbitrary URLs, and IPv6-mapped IPv4 addresses ([::ffff:169.254.169.254]) bypassed naive checks. Added destination validation on all entry points and normalize IPv6-mapped IPv4 before the blocklist check. Credit: secsys_codex, Velayutham Selvaraj, IcySun./execute_js(CVSS 8.1, CWE-94): disabled by default viaCRAWL4AI_EXECUTE_JS_ENABLED, removed--disable-web-securityfrom default browser args, and added an SSRF blocklist on the destination. Credit: by111 (August829)./monitor/*routes, including destructive actions, were unauthenticated. Addedtoken_depto the router and an explicit token check on the WebSocket endpoint. Credit: Jeongbean Jeon.innerHTMLwithout escaping. Added server-sidehtml.escape()and a client-sideescapeHtml()wrapper. Credit: Jeongbean Jeon./config/dump: replaced with JSON input validated by Pydantic.markdown_generatortype inCrawlerRunConfigto reject malformed JSON (#1880).Added
include_subdomainsflag and a per-source timeout.Fixed
rowspan/colspanin cleaned HTML (#1920).tailtext when removing empty elements (#1938)NlpSentenceChunking(#1909)set(False)instead ofreset(token)(#1917)semaphore_countinto the auto-createdMemoryAdaptiveDispatcherand default it to 10 (#1927)LLMExtractionStrategy.extraction_typeto schemaLLMTableExtractionto the Docker deserialization allowlistsuccess=Truefor binary downloads and skip the block check whendownloaded_filesis set<base href>in prefetchquick_extract_links(#752)AsyncLoggeroutput to stderr by default (#1968) and useConsole(width=200)for non-TTY contextsensure_ascii=Falsein the MCP bridge to preserve CJK characters (#1967)browser_adapternow uses theStealthimport, fixing a stealth import mismatch (#1960)arun()return type toCrawlResultContainer(#1898)Docs
Security Credits
Song Binglin (q1uf3ng), by111 (August829), Jeongbean Jeon, wulonchia, secsys_codex, Velayutham Selvaraj, and IcySun. See
SECURITY-CREDITS.md.Configuration
📅 Schedule: (UTC)
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.