Harden gateway connection and pairing flows#741
Conversation
e9f84ac to
e27427c
Compare
e27427c to
6b93d72
Compare
Consolidate the connection/pairing hardening work into one validated change set. Connection and credential handling: - keep gateway credentials registry-backed and preserve strict credential precedence: device token, shared gateway token, then bootstrap token - force fresh setup-code bootstrap credentials for immediate QR/setup-code pairing while preserving shared gateway tokens for HTTP/dashboard paths - dedupe loopback-equivalent gateway URLs so localhost and 127.0.0.1 records do not split pairing state - validate replacement shared tokens before disconnecting or clearing durable device tokens - clear stale bootstrap tokens only after required role tokens are durably readable - recover stale operator device-token mismatches by falling back to bootstrap when recovery material is still present Operator/node pairing and token lifecycle: - keep operator clients in the operator role during bootstrap while preserving explicit node bootstrap behavior - persist role-specific handoff tokens from hello-ok auth.deviceTokens[] for both operator and node roles - forward WindowsNodeClient node-token receipt through NodeConnector so GatewayConnectionManager can complete bootstrap cleanup after the node token becomes durable - request operator.pairing with normal shared-token operator connects so node trust approvals can be reached - wait for node/device pair approval responses instead of treating a sent frame as success - fall back from node.pair.approve to device.pair.approve only when admin authority is available - guard node connection events by client generation so stale clients cannot mutate current state - abort node handshake when pre-connect capability binding fails, preventing caps=0/cmds=0 registrations Tray, MCP, and browser-control behavior: - expose connection-control MCP tools only through local MCP, not the gateway node transport - route MCP setup-code and shared-token connection tools through GatewayConnectionManager - refresh gateway node state when local node connected/paired events arrive - register browser.proxy only when a live gateway client and shared gateway token are available, and use the shared token for browser-control HTTP auth Setup and reliability: - add bounded retry for transient WSL startup timing when validating /etc/wsl.conf after WSL terminate/apply-config - keep invalid wsl.conf content validation strict after the read succeeds - preserve SSH tunnel behavior for operator and node connection paths Maintainability simplifications: - reuse setup-code gateway lookup state in GatewayConnectionManager - centralize delayed reconnect scheduling with generation/disposal guards - centralize response-aware pair approval RPC handling - consolidate operator scope helper literals and checks Validation: - build.ps1 passed - OpenClaw.Shared.Tests passed - OpenClaw.Tray.Tests passed - OpenClaw.Connection.Tests passed - OpenClaw.SetupEngine.Tests passed - full OpenClaw.E2ETests passed with OPENCLAW_RUN_E2E=1 and win-arm64 runtime - targeted QR/setup-code E2E tests passed after audit follow-up - Copilot autoreview passed with no accepted/actionable findings - dual-model protocol audit completed; accepted multi-role handoff finding fixed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
6b93d72 to
5d14408
Compare
|
Codex review: needs maintainer review before merge. Reviewed June 12, 2026, 11:33 AM ET / 15:33 UTC. Summary Reproducibility: yes. for the central local setup and recovery failures: current source paths and automated Windows/WSL E2E provide high-confidence reproduction coverage. Remote gateway behavior remains unproven. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Retain the lifecycle separation, stale-client guards, response-aware approvals, validation-first replacement, and durable role-token handling, then merge only after inspectable live upgrade proof and explicit maintainer acceptance of the credential replacement semantics. Do we have a high-confidence way to reproduce the issue? Yes for the central local setup and recovery failures: current source paths and automated Windows/WSL E2E provide high-confidence reproduction coverage. Remote gateway behavior remains unproven. Is this the best way to solve the issue? Mostly yes: separating operator/node lifecycles, rejecting stale clients, distinguishing approval kinds, validating replacement credentials before destructive changes, and requiring durable role tokens are appropriate. The forced-bootstrap and credential-replacement policy still requires explicit upgrade acceptance or narrower scoping. AGENTS.md: found and applied where relevant. Codex review notes: model internal, reasoning high; reviewed against 913ba4e8f504. Label changesLabel changes:
Label justifications:
Evidence reviewedWhat I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The removed test used an admin fixture to approve node pairing, so it did not represent the QR-only external scenario it claimed to cover and could leave pending node approvals that broke later setup/connect tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@clawsweeper re-review Proof update:
Remote/external connection paths still need follow-up coverage and validation; this PR is intended to harden and prove the local setup/connect paths first. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
Summary
Harden Windows companion gateway connection, setup-code pairing, node pairing, token recovery, and browser-control auth flows.
Key changes:
role=operatorhello-ok.auth.deviceTokens[]WindowsNodeClientthroughNodeConnectorso bootstrap cleanup can complete after node reconnect/etc/wsl.confread timing while keeping config validation strictValidation
./build.ps1passeddotnet test ./tests/OpenClaw.Shared.Tests/OpenClaw.Shared.Tests.csproj --no-restorepasseddotnet test ./tests/OpenClaw.Tray.Tests/OpenClaw.Tray.Tests.csproj --no-restorepasseddotnet test ./tests/OpenClaw.Connection.Tests/OpenClaw.Connection.Tests.csproj --no-restorepasseddotnet test ./tests/OpenClaw.SetupEngine.Tests/OpenClaw.SetupEngine.Tests.csproj --no-restorepassedOPENCLAW_RUN_E2E=1 dotnet test ./tests/OpenClaw.E2ETests/OpenClaw.E2ETests.csproj -r win-arm64 --no-restorepassed: 17/17Notes
Draft PR for review. The changes intentionally avoid broad WebSocket lifecycle rewrites or gateway-side assumptions; fixes are scoped to client-side behavior verified against
docs/CONNECTION_PROTOCOL_RESEARCH.mdanddocs/CONNECTION_ARCHITECTURE.md.