Skip to content

feat(ssh): per-host auth/sudo learning + compliance-scan sudo -S (full SSH matrix)#566

Merged
remyluslosius merged 2 commits into
mainfrom
feat/ssh-auth-learning
Jun 16, 2026
Merged

feat(ssh): per-host auth/sudo learning + compliance-scan sudo -S (full SSH matrix)#566
remyluslosius merged 2 commits into
mainfrom
feat/ssh-auth-learning

Conversation

@remyluslosius

Copy link
Copy Markdown
Contributor

Goal

Make every SSH capability support the full matrix — key or password auth, NOPASSWD (sudo -n) or password sudo (sudo -S) — and learn per host which combination works so repeat connections stop re-trying the doomed option.

Two real gaps closed

  1. The compliance scan couldn't do sudo -S at all. internal/kensa/transport.go hardcoded sudo -n sh -c, so a host needing a sudo password was discoverable and inventoried but never scanned — the one capability that most needs root. It now determines the host's sudo mode once per connection (an innocuous true sentinel — we can't infer "sudo refused" from a real check's non-zero exit) and runs every command in that mode: sudo -n sh -c or sudo -S -p '' sh -c with the credential password on stdin (never argv).
  2. The sudo -S policy was unreachable (default-off, no UI). DefaultSecurity().AllowCredentialSudoPassword now defaults ON, retained as a kill-switch — the full matrix works on a fresh install.

Ownership note: the scan's sudo -S is sole-OpenWatch-owned (the sudo wrapping + session exec live in internal/kensa/transport.go; Kensa only sees Run(cmd) → CommandResult). Compliance evidence is byte-identical — sudo -S -p '' runs the same command as root with the prompt suppressed. No Kensa-team review needed.

Learning (system-connection-profile, migration 0035)

  • New host_connection_profile table + internal/connprofile store record the last-good SSH auth method + sudo mode per host. A hint, never a lock: paths still fall back and rewrite the record when the host changes (key rotated, sudoers edited), so a stale hint self-heals.
  • The shared dial layer (internal/ssh) gains DialOptions.PreferAuth (lead with the known-good method — avoids a doomed publickey attempt that trips fail2ban / MaxAuthTries) and ObservedAuth (reports which method actually authenticated). The scan reads + records the profile through it.

Scope (deliberate)

Lands the substrate (table + store), the shared dial mechanism, and the scan — the path with the actual functional gap and the highest command volume (hundreds per host, so the biggest learning payoff). Extending the same memo to the discovery / intelligence / liveness paths — which already support the matrix but re-probe each cycle — is a documented follow-up: their SSHTransport.Dial and the sshprivilege dialer need hostID threaded through first. Splitting there keeps this PR reviewable instead of churning that interface + all its stubs.

Tests / specs

  • New system-connection-profile spec (AC-01..06). kensa-executor C-15/AC-19 and ssh-connectivity C-09 updated for the scan sudo -S + default-on (the scan is explicitly exempted from the three paths' identical inline-retry shape — it uses the per-connection probe instead).
  • connprofile store integration tests (Get/partial-upsert/no-op), ssh auth ordering + observer tests, kensa wrap-by-mode (password on stdin, never argv), systemconfig default-on. gofmt/go vet/go build ./.../specter check (106 specs) all clean locally.

Follow-ups

  • Wire the memo into discovery / intelligence / liveness (reuses everything here).
  • Optional Settings → Security toggle for the kill-switch (currently DB-only).

…lback

OpenWatch now supports the full SSH matrix on the compliance scan and
remembers, per host, which combination works so repeat connections avoid
wasted or failed attempts.

Functional:
- The compliance scan can finally escalate via a sudo password. The kensa
  transport determines the host's sudo mode once per connection with a
  `true` sentinel, then runs every command in that mode: `sudo -n sh -c`
  (NOPASSWD) or `sudo -S -p '' sh -c` with the credential password on
  stdin (never argv). Previously it hardcoded `sudo -n`, so password-sudo
  hosts were inventoried but never scanned.
- The sudo -S fallback now DEFAULTS ON (systemconfig.DefaultSecurity),
  retained as a kill-switch, so the full matrix works on a fresh install.

Learning (system-connection-profile, migration 0035):
- New host_connection_profile table + internal/connprofile store record
  the last-good SSH auth method and sudo mode per host. A hint, never a
  lock: paths still fall back and rewrite the record when the host
  changes, so a stale hint self-heals.
- The shared dial layer gains DialOptions.PreferAuth (lead with the
  known-good method, avoiding a doomed publickey attempt that trips
  fail2ban / MaxAuthTries) and ObservedAuth (report which method
  authenticated). The scan reads + records the profile through it.

Scope: this lands the substrate, the dial mechanism, and the scan (the
path with the real functional gap and the highest command volume).
Extending the same memo to the discovery / intelligence / liveness paths
-- which already support the matrix but re-probe each cycle -- is a
documented follow-up; their SSHTransport.Dial and the sshprivilege dialer
need hostID threaded through first.

Specs: new system-connection-profile (AC-01..06); kensa-executor C-15 /
AC-19 and ssh-connectivity C-09 updated for the scan sudo -S + default-on.
Verification (4-agent review of the prior commit) found the compliance
scan path bypassed two gates the collector / liveness / discovery paths
all enforce:

- HIGH: the scan ignored systemconfig AllowCredentialSudoPassword, so
  the kill-switch could not disable password-sudo on the busiest path
  (now default-on, so it was live by default).
- MEDIUM: the scan fed any non-empty cred.Password to sudo -S regardless
  of the credential's auth method.

Fix: thread a SudoPasswordPolicy seam through TransportFactory (wired in
main.go/worker.go to systemconfig LoadSecurity, mirroring the collector's
SudoPolicyLoader) and gate the scan's sudo password via sudoPasswordFor —
allowed only when the kill-switch is on AND auth method is password|both.
When disallowed the transport gets an empty sudo password, so probeSudoMode
never attempts sudo -S and the connection degrades to sudo -n. This gates
ONLY the sudo use of the password; SSH password AUTH stays independent
(the dial already used the full credential).

Also from the review (LOW):
- Correct the sshprivilege package doc that wrongly claimed the scan uses
  ssh.RunSudo with an identical retry shape.
- Add the single-factor caveat to the ObservedAuth doc (authObserver /
  dial.go): Last() is the authenticating method under OpenWatch's
  alternative-methods model; under true SSH MFA it is only the final
  factor, never persisted on failure.

Spec system-connection-profile gains C-05 + AC-07; sudoPasswordFor unit
test covers the gate matrix (kill-switch off, key-only, password, both).
@remyluslosius remyluslosius merged commit 45f2629 into main Jun 16, 2026
13 checks passed
@remyluslosius remyluslosius deleted the feat/ssh-auth-learning branch June 16, 2026 00:59
remyluslosius added a commit that referenced this pull request Jun 16, 2026
- Un-ignore SESSION_LOG.md (.gitignore listed it next to the
  already-tracked BACKLOG.md; both are the session-continuity docs
  CLAUDE.md/BACKLOG reference for provenance) and add it with the
  2026-06-16 handoff: SSH full-matrix + per-host learning (#566),
  packaging fresh-install + auto-upgrade (#564/#569), CI gate speedup
  (#567), settings/cleanup (#561/#562/#563/#568), and the Dependabot
  triage (9 merged / 6 skipped), plus next-steps + gotchas.
- BACKLOG: drop the completed PKG-1/PKG-2 (shipped in #564); add the SSH
  learning follow-up (wire connprofile into discovery/intelligence/
  liveness) and a "Deferred Dependency Upgrades" section (MUI 7→9, eslint
  10 blocked-upstream, cosign-installer v4 signing migration). Bump date.
remyluslosius added a commit that referenced this pull request Jun 16, 2026
- Un-ignore SESSION_LOG.md (.gitignore listed it next to the
  already-tracked BACKLOG.md; both are the session-continuity docs
  CLAUDE.md/BACKLOG reference for provenance) and add it with the
  2026-06-16 handoff: SSH full-matrix + per-host learning (#566),
  packaging fresh-install + auto-upgrade (#564/#569), CI gate speedup
  (#567), settings/cleanup (#561/#562/#563/#568), and the Dependabot
  triage (9 merged / 6 skipped), plus next-steps + gotchas.
- BACKLOG: drop the completed PKG-1/PKG-2 (shipped in #564); add the SSH
  learning follow-up (wire connprofile into discovery/intelligence/
  liveness) and a "Deferred Dependency Upgrades" section (MUI 7→9, eslint
  10 blocked-upstream, cosign-installer v4 signing migration). Bump date.
remyluslosius added a commit that referenced this pull request Jun 16, 2026
…ence/liveness (#575)

The connection-profile memo (PR #566) only led the dial with the host's
known-good SSH auth method on the compliance-scan path. The other three
paths that talk to a managed host -- OS discovery, OS intelligence
(collector), and the liveness privilege probe -- still dialed key-first
every cycle, re-offering an unauthorized public key to password-only
hosts (a failed publickey attempt that counts against MaxAuthTries and
can trip fail2ban) on a loop.

Extend the shared connprofile store into those three paths:

- connprofile.WithHostID / HostIDFrom: context helpers so a transport
  that only receives host:port+cred can still look up + record the
  host's profile, without churning the SSHTransport.Dial signature (and
  its test stubs across discovery + collector).
- discovery.SSHTransportProd gains WithProfiles + a dial seam: when a
  store is wired and the ctx carries a host id, it sets PreferAuth from
  the recorded method and records ObservedAuth after a successful dial.
  This one transport is what BOTH discovery and the collector dial
  through (via collectorSSHAdapter), so both learn at once.
- discovery.go / collector.go wrap the ctx with the host id at the dial
  site (both hostFacts already carry HostID).
- sshprivilege.Probe gains WithProfiles: it leads the dial with the
  recorded method (reordering buildAuthMethods for AuthBoth) and records
  which method authenticated via a local single-factor observer.
- cmd/openwatch wires one shared connprofile.NewStore(pool) across all
  four paths (the scan now reuses it too).

Learning stays best-effort: a missing host id, absent profile row, or
store error dials in the default order and never fails the connection
(hint, not a lock -- a stale hint self-heals on the next dial).

Scope: the SSH auth-method dimension. sudo-mode (NOPASSWD vs password)
learning for these three paths stays a follow-up -- they already probe
sudo mode correctly each cycle; only the scan learns both today.

Spec system-connection-profile -> v1.1.0: C-06, AC-08 (discovery/
collector transport), AC-09 (liveness probe).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant