fix(adagents): pin DNS to close rebinding TOCTOU on SSRF gate#920
Open
bokelley wants to merge 1 commit into
Open
fix(adagents): pin DNS to close rebinding TOCTOU on SSRF gate#920bokelley wants to merge 1 commit into
bokelley wants to merge 1 commit into
Conversation
The three publisher-controlled fetch paths in adagents.py ran the _dns_validate_host resolve-and-gate pre-check, then built a plain httpx.AsyncClient() that re-resolved the host at connect time — leaving the DNS-rebinding window the pre-check docstring admits. Thread build_async_ip_pinned_transport into the SDK-owned AsyncClient construction sites (ads.txt MANAGERDOMAIN fetch, adagents.json stream, AAO directory fetch) via a new _owned_pinned_client helper. The client now pins to the IP the SSRF gate validated, so pre-check and connect observe the same resolution. SSRFValidationError is mapped onto AdagentsValidationError. Injected-client branches keep _dns_validate_host as their only guard since the SDK does not own that transport. Closes #757 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Closes the rebinding TOCTOU the _dns_validate_host docstring openly admitted to. Right fix: the pre-check and the connect now observe the same resolution because the connect is pinned to the validated IP, instead of two independent getaddrinfo calls a hostile resolver can split.
Things I checked
_owned_pinned_client(src/adcp/adagents.py:272) resolves once viabuild_async_ip_pinned_transport→resolve_and_validate_host, mapsSSRFValidationError→AdagentsValidationError, and setstrust_env=Falseso anHTTPS_PROXYcan't route around the pinned backend.SSRFValidationErroris the only exceptionresolve_and_validate_hostraises, so the map is total.- No resource leak on the fail-closed path:
build_async_ip_pinned_transportruns beforehttpx.AsyncClient(...), so on the SSRF raise there is no client to close; on success the caller'sasync withowns it. - Single-host pin is compatible with all three sites — every one uses
follow_redirects=False(ads.txt path explicitly atadagents.py:716/721; the two stream paths via_stream_cappedatadagents.py:1166). No cross-host hop, so the backend's wrong-host fail-closed (ip_pinned_transport.py:186-191) never trips.authoritative_locationdelegation re-enters_fetch_adagents_urland builds a fresh pinned client per hop. - Fail-closed posture is correct and the asymmetry is deliberate: best-effort ads.txt swallows
AdagentsValidationError→[]("no MANAGERDOMAIN," same as its existingRequestErrorhandling); the two primary paths let it propagate.AdagentsValidationErroris already a documented raise offetch_adagents, so the public contract is unchanged. - Scope is right:
detect_publisher_properties_divergencereuses one shared client across many publisher hosts, so a single-host pin can't apply — correctly left on_dns_validate_host, matching the PR body. - IPv6 / mixed-A-record handled upstream —
resolve_and_validate_hostfails closed on any reserved address in the result set and unwraps IPv4-mapped IPv6 before range checks (jwks.py).allowed_ports=Nonematches pre-existing behavior, no port-filter regression. security-reviewer: sound, Low residual only.code-reviewer: clean, no findings. Both confirmed the pin closes the connect-time re-resolution window with no remaining gap on SDK-owned paths.- No public-API change — new export is a private
_-prefixed helper, the rest is internal client-construction swaps.fix(adagents):is the correct conventional-commit prefix for a non-breaking fix.
Follow-ups (non-blocking — file as issues)
- Injected-client branches stay on
_dns_validate_hostonly — the SDK can't pin a transport it didn't build, so this is the correct scoping and it's documented in the helper docstring. Worth exposingbuild_async_ip_pinned_transportin adopter docs so injected-client users can opt into pinning. Defense-in-depth, not required here. - The
_dns_validate_hostpre-check is now redundant for the three SDK-owned paths (the pin's resolution is the load-bearing one). Harmless, but a candidate for a future cleanup once the injected paths are addressed.
Two tests pin exactly the right invariant — the rebinding test asserts every connect targets the validated public IP and the loopback flip is never reached, which is the behavior the issue is about. Real fix, clean execution.
Safe to merge.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #757
Problem
The SSRF gate added in #760 wired
_dns_validate_host(a resolve-and-gate pre-check) into all three publisher-controlled fetch paths inadagents.py. But each path then built a plainhttpx.AsyncClient()that re-resolves the host at connect time. A hostile resolver can return a public IP during the pre-check and a private/loopback IP milliseconds later at connect — the DNS-rebinding window the pre-check docstring itself admits.Fix
A new
_owned_pinned_clienthelper builds anhttpx.AsyncClientbacked byAsyncIpPinnedTransport(the existing, tested primitive fromadcp.signing.ip_pinned_transport). It resolves the host once viaresolve_and_validate_hostand pins httpx's connect to that validated IP, so the pre-check and the connect observe the same resolution.SSRFValidationErroris mapped ontoAdagentsValidationError.trust_env=Falseprevents anHTTPS_PROXYfrom routing around the pinned backend.The helper is threaded into the three SDK-owned
AsyncClient()sites only:_fetch_ads_txt_managerdomains(ads.txt MANAGERDOMAIN fallback)_fetch_adagents_url(adagents.json stream)fetch_agent_authorizations_from_directory(AAO directory fetch)Injected-client branches are left unchanged — there the SDK does not own the transport, so
_dns_validate_hostremains the only available guard (this includesdetect_publisher_properties_divergence, whose shared client is reused across many publisher hosts and so cannot use a single-host pin).The best-effort ads.txt fallback swallows a late SSRF failure as "no MANAGERDOMAIN found" (consistent with its existing network-error handling); the two primary fetch paths let
AdagentsValidationErrorpropagate (fail closed).Tests
test_rebinding_after_precheck_connects_to_pinned_public_ip: resolver returns a public IP during validation then flips to loopback; asserts every TCP connect targets the pinned public IP and the loopback address is never reached.test_public_target_resolves_and_pins_then_serves_body: a legitimate public target still resolves, pins the public IP, and serves/parses the body end-to-end.All 204 tests in
tests/test_adagents.pypass;ruff,mypy, andblackclean on the changed files.🤖 Generated with Claude Code