Skip to content

test(ssh): add opt-in live-host SSH/sudo integration test#577

Merged
remyluslosius merged 1 commit into
mainfrom
feat/livehost-ssh-test
Jun 16, 2026
Merged

test(ssh): add opt-in live-host SSH/sudo integration test#577
remyluslosius merged 1 commit into
mainfrom
feat/livehost-ssh-test

Conversation

@remyluslosius

Copy link
Copy Markdown
Contributor

What

Adds internal/ssh/livehost_test.go — an opt-in, self-skipping integration test that drives the real ssh.Dial + ssh.RunSudo against actual hosts. Closes the project's biggest test blind spot: the dial, auth-ordering, and sudo -n/-S paths were only unit-tested at the command-construction level (stubbed transport), never against a live box, so a wired-up host could regress with every test still green.

How it runs

OPENWATCH_LIVE_HOSTS=/path/to/test_hosts.csv   # hostname,ip,username,credential[=password]
OPENWATCH_LIVE_KEY=/path/to/id_rsa             # an OpenSSH private key
go test ./internal/ssh/ -run TestLiveHost -v

With either env var unset it t.Skip()s — never gates normal CI. The inventory + key stay on the operator's workstation, never in the repo.

What it validates

The fleet is heterogeneous, so the test discovers each host's capabilities rather than demanding every method on every host. Per reachable host it asserts the machinery for whatever the host supports — exactly the observations the per-host connprofile memo records:

Check Assertion
key auth dial ObservedAuth == "key"
password auth dial ObservedAuth == "password"
sudo mode confirmed via the innocuous true sentinel (nopasswd | password)
sudo -S the real sudo -S -k -p '' true password-on-stdin path executes

Tolerance rules keep it meaningful but not brittle on a real fleet:

  • a server-side auth rejection (key not authorized / PasswordAuthentication no) is a host-config fact → logged, that method skipped;
  • an unreachable host → subtest skipped;
  • only an unexpected protocol-level error or a wrong ObservedAuth/sudo result fails the test;
  • a host with no usable auth → skipped.

Proven against the dev fleet

Ran it against the 9-host dev inventory: 5 key+NOPASSWD hosts pass (real key dial → ObservedAuth=="key", sudo -n true → nopasswd, and the sudo -S password-on-stdin path all exercised end-to-end); 4 key-rejecting hosts and 1 unreachable host skip. This is live proof of the three SSH-learning PRs just merged (#566 / #575 / #576).

Known gap (documented in BACKLOG): the password-AUTH branch (ObservedAuth=="password") is live-unverified because the dev fleet runs PasswordAuthentication=no everywhere — it runs the moment a password-enabled host is in the inventory.

Also

Drops the completed "wire SSH auth/sudo learning into discovery/intelligence/liveness" backlog entry (shipped in #575 + #576) and marks the live-host-test item mostly-done.

gofmt/go vet/go build ./... clean; specter check 0 errors; the test compiles and skips cleanly in the normal suite.

Closes the project's biggest test blind spot: the dial, auth-ordering,
and sudo -n/-S paths were only unit-tested at the command-construction
level (stubbed transport), never against a real box. A wired-up host
could regress and every test stay green.

internal/ssh/livehost_test.go drives the REAL ssh.Dial + ssh.RunSudo —
the primitives every host-talking path (scan, discovery, collector,
liveness) shares — against an operator-supplied inventory:

  OPENWATCH_LIVE_HOSTS=/path/to/test_hosts.csv  (hostname,ip,username,credential)
  OPENWATCH_LIVE_KEY=/path/to/id_rsa

With either unset the test t.Skip()s, so it never gates normal CI; the
inventory + key stay on the operator's workstation, never in the repo.

The fleet is heterogeneous, so the test DISCOVERS each host's
capabilities rather than demanding every method everywhere. Per host it
asserts the machinery for whatever the host supports:

  - key auth dials      -> ObservedAuth == "key"      (the value the memo records)
  - password auth dials -> ObservedAuth == "password"
  - sudo mode confirmed via the `true` sentinel (nopasswd | password)
  - the real `sudo -S -k -p '' true` password-on-stdin path executes

A server-side auth rejection (key not authorized, or PasswordAuthentication
off) is a tolerated host-config fact; an unreachable host is skipped; only
an unexpected protocol-level error or a wrong ObservedAuth/sudo result
fails the test. A host with no usable auth is skipped.

Validated against the dev fleet: 5 key+NOPASSWD hosts pass (real key dial,
sudo -n, and sudo -S all exercised), key-rejecting and unreachable hosts
skip. The password-AUTH assertion is live-unverified only because the dev
fleet runs PasswordAuthentication=no everywhere (noted in BACKLOG); it
runs as soon as one password-enabled host is in the inventory.

Also drops the completed "wire SSH auth/sudo learning" backlog entry
(shipped in #575 + #576).
@remyluslosius remyluslosius merged commit 299a472 into main Jun 16, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant