feat(ssh): wire per-host auth-method learning into discovery/intelligence/liveness by remyluslosius · Pull Request #575 · Hanalyx/OpenWatch

remyluslosius · 2026-06-16T15:38:50Z

What

Extends the per-host SSH connection memo from PR #566 (which only learned on the compliance-scan path) into the three remaining paths that talk to a managed host:

OS discovery (internal/intelligence/discovery)
OS intelligence collection (internal/intelligence/collector)
liveness privilege probe (internal/sshprivilege)

Each now leads the SSH dial with the host's last known-good auth method and records which method actually authenticated, via the shared connprofile store.

Why

Without the memo, those paths dialed key-first every cycle. On a password-only host that re-offers an unauthorized public key on a loop (discovery on add, the collector every intelligence cycle, the liveness probe every ~5 min) — a failed publickey attempt that counts against MaxAuthTries and can trip fail2back. This was the user's original "try SSH key next time / try password next time" ask, applied to the paths #566 left out.

How

connprofile.WithHostID / HostIDFrom — context helpers so a transport that only receives host:port + cred can still look up + record the host's profile, without changing the SSHTransport.Dial signature (which would churn the discovery + collector interfaces and 8 test stubs). Best-effort enrichment: no host id on the ctx ⇒ no learning, default dial order.
discovery.SSHTransportProd.WithProfiles + a dial seam — reads the profile, sets DialOptions.PreferAuth, records ObservedAuth after a successful dial. This one prod transport is what both discovery and the collector dial through (collector via collectorSSHAdapter), so both learn from a single wiring.
discovery.go / collector.go wrap the ctx with the host id at the dial site (both hostFacts already carry HostID).
sshprivilege.Probe.WithProfiles — leads the dial with the recorded method (reordering buildAuthMethods for AuthBoth; both methods stay offered) and records what authenticated via a small local single-factor observer that mirrors internal/ssh's.
cmd/openwatch wires one shared connprofile.NewStore(pool) across all four paths (the scan reuses it now too).

Learning is a hint, not a lock: a missing host id, an absent profile row, or a store error dials in the default order and never fails the connection — a stale hint self-heals on the next dial.

Scope

This PR covers the SSH auth-method dimension. sudo-mode (NOPASSWD vs password) learning for these three paths stays a follow-up — they already probe sudo mode correctly each cycle; only the compliance scan learns both dimensions today. Documented in the spec's excludes.

Spec / tests

system-connection-profile → v1.1.0: adds C-06 and AC-08 (discovery/collector transport) + AC-09 (liveness probe).
New tests: TestSSHTransportProd_AuthLearning (4 cases: lead-with + record, no host id, no store, lookup-error non-fatal) and TestPrivilegeProbe_AuthLearning (lead-with + record / no-store). Existing TestBuildAuthMethods updated to the new signature.
gofmt/go vet/go build clean; touched-package suites green against the isolated test DB; specter check 0 errors; system-connection-profile 9/9 ACs have results, PASSES tier 2.

…ence/liveness The connection-profile memo (PR #566) only led the dial with the host's known-good SSH auth method on the compliance-scan path. The other three paths that talk to a managed host -- OS discovery, OS intelligence (collector), and the liveness privilege probe -- still dialed key-first every cycle, re-offering an unauthorized public key to password-only hosts (a failed publickey attempt that counts against MaxAuthTries and can trip fail2ban) on a loop. Extend the shared connprofile store into those three paths: - connprofile.WithHostID / HostIDFrom: context helpers so a transport that only receives host:port+cred can still look up + record the host's profile, without churning the SSHTransport.Dial signature (and its test stubs across discovery + collector). - discovery.SSHTransportProd gains WithProfiles + a dial seam: when a store is wired and the ctx carries a host id, it sets PreferAuth from the recorded method and records ObservedAuth after a successful dial. This one transport is what BOTH discovery and the collector dial through (via collectorSSHAdapter), so both learn at once. - discovery.go / collector.go wrap the ctx with the host id at the dial site (both hostFacts already carry HostID). - sshprivilege.Probe gains WithProfiles: it leads the dial with the recorded method (reordering buildAuthMethods for AuthBoth) and records which method authenticated via a local single-factor observer. - cmd/openwatch wires one shared connprofile.NewStore(pool) across all four paths (the scan now reuses it too). Learning stays best-effort: a missing host id, absent profile row, or store error dials in the default order and never fails the connection (hint, not a lock -- a stale hint self-heals on the next dial). Scope: the SSH auth-method dimension. sudo-mode (NOPASSWD vs password) learning for these three paths stays a follow-up -- they already probe sudo mode correctly each cycle; only the scan learns both today. Spec system-connection-profile -> v1.1.0: C-06, AC-08 (discovery/ collector transport), AC-09 (liveness probe).

…ce/liveness (#576) Follow-up to PR #575 (auth-method learning). The same three paths that talk to a managed host still probed sudo mode from scratch every cycle — running a doomed `sudo -n` on a password-sudo host before retrying `sudo -S`. Extend the connprofile memo to the SUDO dimension so each path leads with the host's recorded mode and records the mode confirmed to work. Division of labour: - The liveness privilege probe is the AUTHORITATIVE sudo-mode learner: it runs an innocuous `true` sentinel every ~5 min, so it reliably confirms the mode regardless of any real command's exit code. - OS discovery (firewall probe) and OS intelligence (collector) learn OPPORTUNISTICALLY from their existing real sudo commands — no extra round-trip. To avoid misrecording, a mode is recorded ONLY on a confirmed exit-0 of a given sudo form, never inferred from a command that failed for its own reasons. Mechanics: - ssh.RunSudo (collector's shared primitive) gains a prefer (in) + observed (out): when prefer=SudoPassword and the password gate is satisfied it leads with `sudo -S`, skipping the doomed `sudo -n`; observed reports the confirmed form. Plain-string tokens (SudoNopasswd/SudoPassword) keep the ssh package decoupled from connprofile, matching PreferKey/PreferPassword. - discovery.runSudoWithFallback + probeFirewall thread prefer + return the learned mode; the discovery Service reads it (cfg.prefer) and records it. - sshprivilege.Probe extracts probeSudo: leads with `sudo -S` on a known password host, records the confirmed mode (preserving the AC-18/19/21 error shapes). - collector threads the recorded mode across the cycle's sudo commands and records once at the end. - cmd/openwatch wires the shared connprofile store into the collector and discovery services (the probe already had it from #575). The sudo password GATE (kill-switch + auth-method) is unchanged: leading with `sudo -S` is allowed only when a password may already be fed. Learning stays best-effort — a store miss/error escalates in the default order and never fails the connection; a stale mode self-heals (sudo -S miss falls back to sudo -n). Spec system-connection-profile -> v1.2.0: C-07, AC-10 (RunSudo primitive), AC-11 (discovery firewall probe), AC-12 (liveness probe).

Closes the project's biggest test blind spot: the dial, auth-ordering, and sudo -n/-S paths were only unit-tested at the command-construction level (stubbed transport), never against a real box. A wired-up host could regress and every test stay green. internal/ssh/livehost_test.go drives the REAL ssh.Dial + ssh.RunSudo — the primitives every host-talking path (scan, discovery, collector, liveness) shares — against an operator-supplied inventory: OPENWATCH_LIVE_HOSTS=/path/to/test_hosts.csv (hostname,ip,username,credential) OPENWATCH_LIVE_KEY=/path/to/id_rsa With either unset the test t.Skip()s, so it never gates normal CI; the inventory + key stay on the operator's workstation, never in the repo. The fleet is heterogeneous, so the test DISCOVERS each host's capabilities rather than demanding every method everywhere. Per host it asserts the machinery for whatever the host supports: - key auth dials -> ObservedAuth == "key" (the value the memo records) - password auth dials -> ObservedAuth == "password" - sudo mode confirmed via the `true` sentinel (nopasswd | password) - the real `sudo -S -k -p '' true` password-on-stdin path executes A server-side auth rejection (key not authorized, or PasswordAuthentication off) is a tolerated host-config fact; an unreachable host is skipped; only an unexpected protocol-level error or a wrong ObservedAuth/sudo result fails the test. A host with no usable auth is skipped. Validated against the dev fleet: 5 key+NOPASSWD hosts pass (real key dial, sudo -n, and sudo -S all exercised), key-rejecting and unreachable hosts skip. The password-AUTH assertion is live-unverified only because the dev fleet runs PasswordAuthentication=no everywhere (noted in BACKLOG); it runs as soon as one password-enabled host is in the inventory. Also drops the completed "wire SSH auth/sudo learning" backlog entry (shipped in #575 + #576).

github-actions Bot added the size/XL label Jun 16, 2026

remyluslosius merged commit 0e2096c into main Jun 16, 2026
13 checks passed

remyluslosius deleted the feat/ssh-learning-allpaths branch June 16, 2026 15:51

remyluslosius mentioned this pull request Jun 16, 2026

feat(ssh): wire per-host sudo-mode learning into discovery/intelligence/liveness #576

Merged

remyluslosius mentioned this pull request Jun 16, 2026

test(ssh): add opt-in live-host SSH/sudo integration test #577

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ssh): wire per-host auth-method learning into discovery/intelligence/liveness#575

feat(ssh): wire per-host auth-method learning into discovery/intelligence/liveness#575
remyluslosius merged 1 commit into
mainfrom
feat/ssh-learning-allpaths

remyluslosius commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

remyluslosius commented Jun 16, 2026

What

Why

How

Scope

Spec / tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant