Skip to content

fix(sender): scope Thunderbolt dials to the bridge interface and surface real connect errors#130

Open
jasontitus wants to merge 4 commits into
swellweb:3.1-maintfrom
jasontitus:fix/tb-interface-scoped-connect
Open

fix(sender): scope Thunderbolt dials to the bridge interface and surface real connect errors#130
jasontitus wants to merge 4 commits into
swellweb:3.1-maintfrom
jasontitus:fix/tb-interface-scoped-connect

Conversation

@jasontitus

@jasontitus jasontitus commented Jul 2, 2026

Copy link
Copy Markdown

Stacked on #127#129 — includes their commits; review the last commit only.

Motivation — diagnosed on real hardware

Field setup: M2 MacBook Pro → Apple TB3→TB2 adapter → 2015 iMac 5K receiver. Thunderbolt link healthy, both bridges up with self-assigned link-local IPs (the macOS default: DHCP on Thunderbolt Bridge finds no server and self-assigns 169.254.x), receiver advertising over Bonjour. Every GUI connect: "Connection failed: Connection timed out" with no further information.

Root cause chain, verified with route -n get + packet-level probes:

  1. connect() pins only requiredLocalEndpoint — the source address, not the egress interface.
  2. macOS keeps a single routing-table entry for all of 169.254.0.0/16 pointing at the primary interface (usually Wi-Fi). The dial to the receiver's bridge-only address left via Wi-Fi and black-holed.
  3. The connection sat in .waiting("No route to host") — the one state the handler didn't handle — so the true reason was discarded and the fixed-string 5s watchdog reported a bare timeout.

So the default Thunderbolt Bridge configuration fails while everything is configured correctly, and the error hides the cause. (Manual workaround until now: static IPs on both bridges.)

What's in it

  • Interface-scoped dialing: link-local receiver addresses are dialed as "169.254.x.y%bridge0" (IPv4Address carries the interface, so Network.framework routes on the Thunderbolt link regardless of the routing table). Non-link-local dials unchanged; falls back to the unscoped host if the scoped form doesn't parse.
  • .waiting(error) is handled: logged and remembered, so timeouts/failures report it.
  • Context-rich failures: timeout/failed statuses now append dialed <ip>:<port> from <localIP> (<interface>) [<transport>] — last network state: … via a pure, unit-tested composer.
  • os.Logger subsystem com.targetbridge.sender (category connection) traces dial/ready/waiting/failed/timeout — field triage becomes log stream --predicate 'subsystem == "com.targetbridge.sender"'.

How tested

  • 9 new unit tests for the scoping + detail composition (65 total green).
  • Runtime-verified that IPv4Address("169.254.89.80%bridge0") parses with the interface attached and that a bogus interface name safely yields nil (→ unscoped fallback).
  • Live probe on the affected setup confirms .waiting carries the actionable reason ("No route to host" / "Network is down") that this PR stops discarding.

🤖 Generated with Claude Code

jasontitus and others added 4 commits July 1, 2026 23:11
…ion parsing

Adds a TBDisplaySenderTests unit bundle (hosted by the app for @testable
access) with 49 tests over the pure logic that has no hardware dependency:

- TBMonitorProtocol: BE32 primitives, packet layout, drainPacket handling of
  fragmented/contiguous/split feeds, JSON payload round-trips, and parity of
  the hand-rolled input-event encoder with JSONDecoder — guarding the
  invariant documented on makeInputEventPacket (PR swellweb#123).
- TBDiscoveredReceiver: per-transport ip(for:) selection (which IP the sender
  dials for Thunderbolt vs Network Link), id, shortHostName, displayText.
- TBSenderAutomation: parseTransport/parseMode/parsePreset aliases,
  receiver matching, and the resolveSessionIndex tri-state.

The automation parsing helpers move from private to internal so the test
bundle can exercise them; behavior is unchanged. project.pbxproj and the
shared TBDisplaySender scheme are regenerated via xcodegen so a fresh
checkout can run `xcodebuild test` directly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Executes the new TBDisplaySenderTests suite after the sender build.
- Triggers CI on 3.1-maint and 3.2-dev in addition to main, so the
  branches that actually receive PRs get build/test coverage.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ket types

Two wire-protocol drain hardenings in TBMonitorProtocol.drainPacket, matching
the receiver parser's behavior (net.c caps packets at 64 MiB and treats a bad
length as fatal):

- A corrupt length prefix (zero, or above the new 64 MiB maxPacketLength) now
  throws instead of returning "need more data". Previously a corrupted length
  such as 0xFFFFFFFF made the sender buffer inbound bytes forever, waiting for
  a packet that could never complete — unbounded memory growth on a corrupt
  stream. The drain loop in TBDisplaySenderService now closes the connection
  and surfaces the reason in the session status.

- A packet with an unrecognized type byte (e.g. from a newer receiver) is now
  skipped and draining continues. Previously it consumed the packet but
  returned nil, which the caller treated as "buffer empty" — stalling every
  valid packet queued behind the unknown one until the next network read.

Covered by 7 new unit tests (zero/oversized/all-ones lengths, cap boundary,
corruption behind a valid packet, unknown-type skip and lone unknown-type).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ace real connect errors

Root cause, observed in the field: connect() pins only requiredLocalEndpoint
(the source ADDRESS), never the egress interface. macOS keeps a single
routing-table entry for 169.254.0.0/16 pointing at the primary interface
(usually Wi-Fi), so a dial to a Thunderbolt Bridge peer's self-assigned
link-local address leaves via the wrong link and black-holes — with both
Macs configured correctly and the TB link healthy. The connection then sits
in .waiting(...) carrying the true reason ("No route to host", "Network is
down"), which the state handler silently dropped, and the 5s watchdog
reported a bare fixed-string "Connection timed out".

Fixes, kept surgical:

- Link-local receiver addresses are now dialed scoped ("169.254.x.y%bridge0")
  to the interface that owns the session's local IP, so routing happens on
  the Thunderbolt link regardless of the routing table. Non-link-local dials
  are unchanged. Falls back to the unscoped host if the scoped form does not
  parse. This makes plain DHCP/link-local Thunderbolt Bridge setups (the
  macOS default) work without manual static-IP workarounds.

- The .waiting(error) state is now handled: logged, and remembered so the
  watchdog/failure paths can report it.

- Timeout and failure statuses now append where we dialed, from which local
  IP/interface, the transport, and the last network state, via the pure
  TBConnectionDiagnostics.failureDetail composer.

- New os.Logger subsystem com.targetbridge.sender (category "connection")
  traces dial/ready/waiting/failed/timeout, so field issues can be triaged
  with `log stream --predicate 'subsystem == "com.targetbridge.sender"'`.

Covered by 9 new unit tests for the scoping and detail-composition logic.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant