feat(ssh): per-host auth/sudo learning + compliance-scan sudo -S (full SSH matrix)#566
Merged
Conversation
…lback OpenWatch now supports the full SSH matrix on the compliance scan and remembers, per host, which combination works so repeat connections avoid wasted or failed attempts. Functional: - The compliance scan can finally escalate via a sudo password. The kensa transport determines the host's sudo mode once per connection with a `true` sentinel, then runs every command in that mode: `sudo -n sh -c` (NOPASSWD) or `sudo -S -p '' sh -c` with the credential password on stdin (never argv). Previously it hardcoded `sudo -n`, so password-sudo hosts were inventoried but never scanned. - The sudo -S fallback now DEFAULTS ON (systemconfig.DefaultSecurity), retained as a kill-switch, so the full matrix works on a fresh install. Learning (system-connection-profile, migration 0035): - New host_connection_profile table + internal/connprofile store record the last-good SSH auth method and sudo mode per host. A hint, never a lock: paths still fall back and rewrite the record when the host changes, so a stale hint self-heals. - The shared dial layer gains DialOptions.PreferAuth (lead with the known-good method, avoiding a doomed publickey attempt that trips fail2ban / MaxAuthTries) and ObservedAuth (report which method authenticated). The scan reads + records the profile through it. Scope: this lands the substrate, the dial mechanism, and the scan (the path with the real functional gap and the highest command volume). Extending the same memo to the discovery / intelligence / liveness paths -- which already support the matrix but re-probe each cycle -- is a documented follow-up; their SSHTransport.Dial and the sshprivilege dialer need hostID threaded through first. Specs: new system-connection-profile (AC-01..06); kensa-executor C-15 / AC-19 and ssh-connectivity C-09 updated for the scan sudo -S + default-on.
Verification (4-agent review of the prior commit) found the compliance scan path bypassed two gates the collector / liveness / discovery paths all enforce: - HIGH: the scan ignored systemconfig AllowCredentialSudoPassword, so the kill-switch could not disable password-sudo on the busiest path (now default-on, so it was live by default). - MEDIUM: the scan fed any non-empty cred.Password to sudo -S regardless of the credential's auth method. Fix: thread a SudoPasswordPolicy seam through TransportFactory (wired in main.go/worker.go to systemconfig LoadSecurity, mirroring the collector's SudoPolicyLoader) and gate the scan's sudo password via sudoPasswordFor — allowed only when the kill-switch is on AND auth method is password|both. When disallowed the transport gets an empty sudo password, so probeSudoMode never attempts sudo -S and the connection degrades to sudo -n. This gates ONLY the sudo use of the password; SSH password AUTH stays independent (the dial already used the full credential). Also from the review (LOW): - Correct the sshprivilege package doc that wrongly claimed the scan uses ssh.RunSudo with an identical retry shape. - Add the single-factor caveat to the ObservedAuth doc (authObserver / dial.go): Last() is the authenticating method under OpenWatch's alternative-methods model; under true SSH MFA it is only the final factor, never persisted on failure. Spec system-connection-profile gains C-05 + AC-07; sudoPasswordFor unit test covers the gate matrix (kill-switch off, key-only, password, both).
remyluslosius
added a commit
that referenced
this pull request
Jun 16, 2026
- Un-ignore SESSION_LOG.md (.gitignore listed it next to the already-tracked BACKLOG.md; both are the session-continuity docs CLAUDE.md/BACKLOG reference for provenance) and add it with the 2026-06-16 handoff: SSH full-matrix + per-host learning (#566), packaging fresh-install + auto-upgrade (#564/#569), CI gate speedup (#567), settings/cleanup (#561/#562/#563/#568), and the Dependabot triage (9 merged / 6 skipped), plus next-steps + gotchas. - BACKLOG: drop the completed PKG-1/PKG-2 (shipped in #564); add the SSH learning follow-up (wire connprofile into discovery/intelligence/ liveness) and a "Deferred Dependency Upgrades" section (MUI 7→9, eslint 10 blocked-upstream, cosign-installer v4 signing migration). Bump date.
remyluslosius
added a commit
that referenced
this pull request
Jun 16, 2026
- Un-ignore SESSION_LOG.md (.gitignore listed it next to the already-tracked BACKLOG.md; both are the session-continuity docs CLAUDE.md/BACKLOG reference for provenance) and add it with the 2026-06-16 handoff: SSH full-matrix + per-host learning (#566), packaging fresh-install + auto-upgrade (#564/#569), CI gate speedup (#567), settings/cleanup (#561/#562/#563/#568), and the Dependabot triage (9 merged / 6 skipped), plus next-steps + gotchas. - BACKLOG: drop the completed PKG-1/PKG-2 (shipped in #564); add the SSH learning follow-up (wire connprofile into discovery/intelligence/ liveness) and a "Deferred Dependency Upgrades" section (MUI 7→9, eslint 10 blocked-upstream, cosign-installer v4 signing migration). Bump date.
remyluslosius
added a commit
that referenced
this pull request
Jun 16, 2026
…ence/liveness (#575) The connection-profile memo (PR #566) only led the dial with the host's known-good SSH auth method on the compliance-scan path. The other three paths that talk to a managed host -- OS discovery, OS intelligence (collector), and the liveness privilege probe -- still dialed key-first every cycle, re-offering an unauthorized public key to password-only hosts (a failed publickey attempt that counts against MaxAuthTries and can trip fail2ban) on a loop. Extend the shared connprofile store into those three paths: - connprofile.WithHostID / HostIDFrom: context helpers so a transport that only receives host:port+cred can still look up + record the host's profile, without churning the SSHTransport.Dial signature (and its test stubs across discovery + collector). - discovery.SSHTransportProd gains WithProfiles + a dial seam: when a store is wired and the ctx carries a host id, it sets PreferAuth from the recorded method and records ObservedAuth after a successful dial. This one transport is what BOTH discovery and the collector dial through (via collectorSSHAdapter), so both learn at once. - discovery.go / collector.go wrap the ctx with the host id at the dial site (both hostFacts already carry HostID). - sshprivilege.Probe gains WithProfiles: it leads the dial with the recorded method (reordering buildAuthMethods for AuthBoth) and records which method authenticated via a local single-factor observer. - cmd/openwatch wires one shared connprofile.NewStore(pool) across all four paths (the scan now reuses it too). Learning stays best-effort: a missing host id, absent profile row, or store error dials in the default order and never fails the connection (hint, not a lock -- a stale hint self-heals on the next dial). Scope: the SSH auth-method dimension. sudo-mode (NOPASSWD vs password) learning for these three paths stays a follow-up -- they already probe sudo mode correctly each cycle; only the scan learns both today. Spec system-connection-profile -> v1.1.0: C-06, AC-08 (discovery/ collector transport), AC-09 (liveness probe).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Goal
Make every SSH capability support the full matrix — key or password auth, NOPASSWD (
sudo -n) or password sudo (sudo -S) — and learn per host which combination works so repeat connections stop re-trying the doomed option.Two real gaps closed
sudo -Sat all.internal/kensa/transport.gohardcodedsudo -n sh -c, so a host needing a sudo password was discoverable and inventoried but never scanned — the one capability that most needs root. It now determines the host's sudo mode once per connection (an innocuoustruesentinel — we can't infer "sudo refused" from a real check's non-zero exit) and runs every command in that mode:sudo -n sh -corsudo -S -p '' sh -cwith the credential password on stdin (never argv).sudo -Spolicy was unreachable (default-off, no UI).DefaultSecurity().AllowCredentialSudoPasswordnow defaults ON, retained as a kill-switch — the full matrix works on a fresh install.Learning (
system-connection-profile, migration 0035)host_connection_profiletable +internal/connprofilestore record the last-good SSH auth method + sudo mode per host. A hint, never a lock: paths still fall back and rewrite the record when the host changes (key rotated, sudoers edited), so a stale hint self-heals.internal/ssh) gainsDialOptions.PreferAuth(lead with the known-good method — avoids a doomed publickey attempt that trips fail2ban / MaxAuthTries) andObservedAuth(reports which method actually authenticated). The scan reads + records the profile through it.Scope (deliberate)
Lands the substrate (table + store), the shared dial mechanism, and the scan — the path with the actual functional gap and the highest command volume (hundreds per host, so the biggest learning payoff). Extending the same memo to the discovery / intelligence / liveness paths — which already support the matrix but re-probe each cycle — is a documented follow-up: their
SSHTransport.Dialand thesshprivilegedialer needhostIDthreaded through first. Splitting there keeps this PR reviewable instead of churning that interface + all its stubs.Tests / specs
system-connection-profilespec (AC-01..06).kensa-executorC-15/AC-19 andssh-connectivityC-09 updated for the scansudo -S+ default-on (the scan is explicitly exempted from the three paths' identical inline-retry shape — it uses the per-connection probe instead).connprofilestore integration tests (Get/partial-upsert/no-op),sshauth ordering + observer tests,kensawrap-by-mode (password on stdin, never argv),systemconfigdefault-on.gofmt/go vet/go build ./.../specter check(106 specs) all clean locally.Follow-ups