Skip to content

fix(adagents): pin DNS to close rebinding TOCTOU on SSRF gate#920

Open
bokelley wants to merge 1 commit into
mainfrom
claude/issue-757-adagents-dns-pinning
Open

fix(adagents): pin DNS to close rebinding TOCTOU on SSRF gate#920
bokelley wants to merge 1 commit into
mainfrom
claude/issue-757-adagents-dns-pinning

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

@bokelley bokelley commented Jun 6, 2026

Closes #757

Problem

The SSRF gate added in #760 wired _dns_validate_host (a resolve-and-gate pre-check) into all three publisher-controlled fetch paths in adagents.py. But each path then built a plain httpx.AsyncClient() that re-resolves the host at connect time. A hostile resolver can return a public IP during the pre-check and a private/loopback IP milliseconds later at connect — the DNS-rebinding window the pre-check docstring itself admits.

Fix

A new _owned_pinned_client helper builds an httpx.AsyncClient backed by AsyncIpPinnedTransport (the existing, tested primitive from adcp.signing.ip_pinned_transport). It resolves the host once via resolve_and_validate_host and pins httpx's connect to that validated IP, so the pre-check and the connect observe the same resolution. SSRFValidationError is mapped onto AdagentsValidationError. trust_env=False prevents an HTTPS_PROXY from routing around the pinned backend.

The helper is threaded into the three SDK-owned AsyncClient() sites only:

  • _fetch_ads_txt_managerdomains (ads.txt MANAGERDOMAIN fallback)
  • _fetch_adagents_url (adagents.json stream)
  • fetch_agent_authorizations_from_directory (AAO directory fetch)

Injected-client branches are left unchanged — there the SDK does not own the transport, so _dns_validate_host remains the only available guard (this includes detect_publisher_properties_divergence, whose shared client is reused across many publisher hosts and so cannot use a single-host pin).

The best-effort ads.txt fallback swallows a late SSRF failure as "no MANAGERDOMAIN found" (consistent with its existing network-error handling); the two primary fetch paths let AdagentsValidationError propagate (fail closed).

Tests

  • test_rebinding_after_precheck_connects_to_pinned_public_ip: resolver returns a public IP during validation then flips to loopback; asserts every TCP connect targets the pinned public IP and the loopback address is never reached.
  • test_public_target_resolves_and_pins_then_serves_body: a legitimate public target still resolves, pins the public IP, and serves/parses the body end-to-end.

All 204 tests in tests/test_adagents.py pass; ruff, mypy, and black clean on the changed files.

🤖 Generated with Claude Code

The three publisher-controlled fetch paths in adagents.py ran the
_dns_validate_host resolve-and-gate pre-check, then built a plain
httpx.AsyncClient() that re-resolved the host at connect time — leaving
the DNS-rebinding window the pre-check docstring admits.

Thread build_async_ip_pinned_transport into the SDK-owned AsyncClient
construction sites (ads.txt MANAGERDOMAIN fetch, adagents.json stream,
AAO directory fetch) via a new _owned_pinned_client helper. The client
now pins to the IP the SSRF gate validated, so pre-check and connect
observe the same resolution. SSRFValidationError is mapped onto
AdagentsValidationError. Injected-client branches keep _dns_validate_host
as their only guard since the SDK does not own that transport.

Closes #757

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@bokelley bokelley marked this pull request as ready for review June 6, 2026 19:12
Copy link
Copy Markdown
Contributor

@aao-ipr-bot aao-ipr-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closes the rebinding TOCTOU the _dns_validate_host docstring openly admitted to. Right fix: the pre-check and the connect now observe the same resolution because the connect is pinned to the validated IP, instead of two independent getaddrinfo calls a hostile resolver can split.

Things I checked

  • _owned_pinned_client (src/adcp/adagents.py:272) resolves once via build_async_ip_pinned_transportresolve_and_validate_host, maps SSRFValidationErrorAdagentsValidationError, and sets trust_env=False so an HTTPS_PROXY can't route around the pinned backend. SSRFValidationError is the only exception resolve_and_validate_host raises, so the map is total.
  • No resource leak on the fail-closed path: build_async_ip_pinned_transport runs before httpx.AsyncClient(...), so on the SSRF raise there is no client to close; on success the caller's async with owns it.
  • Single-host pin is compatible with all three sites — every one uses follow_redirects=False (ads.txt path explicitly at adagents.py:716/721; the two stream paths via _stream_capped at adagents.py:1166). No cross-host hop, so the backend's wrong-host fail-closed (ip_pinned_transport.py:186-191) never trips. authoritative_location delegation re-enters _fetch_adagents_url and builds a fresh pinned client per hop.
  • Fail-closed posture is correct and the asymmetry is deliberate: best-effort ads.txt swallows AdagentsValidationError[] ("no MANAGERDOMAIN," same as its existing RequestError handling); the two primary paths let it propagate. AdagentsValidationError is already a documented raise of fetch_adagents, so the public contract is unchanged.
  • Scope is right: detect_publisher_properties_divergence reuses one shared client across many publisher hosts, so a single-host pin can't apply — correctly left on _dns_validate_host, matching the PR body.
  • IPv6 / mixed-A-record handled upstream — resolve_and_validate_host fails closed on any reserved address in the result set and unwraps IPv4-mapped IPv6 before range checks (jwks.py). allowed_ports=None matches pre-existing behavior, no port-filter regression.
  • security-reviewer: sound, Low residual only. code-reviewer: clean, no findings. Both confirmed the pin closes the connect-time re-resolution window with no remaining gap on SDK-owned paths.
  • No public-API change — new export is a private _-prefixed helper, the rest is internal client-construction swaps. fix(adagents): is the correct conventional-commit prefix for a non-breaking fix.

Follow-ups (non-blocking — file as issues)

  • Injected-client branches stay on _dns_validate_host only — the SDK can't pin a transport it didn't build, so this is the correct scoping and it's documented in the helper docstring. Worth exposing build_async_ip_pinned_transport in adopter docs so injected-client users can opt into pinning. Defense-in-depth, not required here.
  • The _dns_validate_host pre-check is now redundant for the three SDK-owned paths (the pin's resolution is the load-bearing one). Harmless, but a candidate for a future cleanup once the injected paths are addressed.

Two tests pin exactly the right invariant — the rebinding test asserts every connect targets the validated public IP and the loopback flip is never reached, which is the behavior the issue is about. Real fix, clean execution.

Safe to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

security(adagents): pin DNS to prevent rebinding on SSRF gate

1 participant