Skip to content

tools/stress: agent SSH + log parser#3780

Open
elitegreg wants to merge 1 commit into
gm/stress-orchestrator-skeletonfrom
gm/stress-orchestrator-ssh-agent
Open

tools/stress: agent SSH + log parser#3780
elitegreg wants to merge 1 commit into
gm/stress-orchestrator-skeletonfrom
gm/stress-orchestrator-ssh-agent

Conversation

@elitegreg
Copy link
Copy Markdown
Contributor

Summary

Completes the device-stress orchestrator (#3746) by replacing the no-op AgentRunner with the live SSH-driven runner and the log parser that turns agent diff/commit lines into pre_commit_log / applied runlog rows. Stacked on top of #3776 (part 2, orchestrator skeleton). Part 3 of #3746. Closes #3772.

  • pkg/agent/parser.goParser.Parse(line) []Event tracks two log lines from controlplane/agent/pkg/arista/eapi.go:
    • Committing config session due to diffs detected: <diff> → extracts every + interface Tunnel<ID> and emits one pre_commit_log event per ID; the diff's - lines (deprovisions) are ignored.
    • Configuration session finalized with command '... commit' → emits one applied event per pending tunnel; the ... abort variant clears the buffer without emitting.
  • pkg/agent/ssh.goNewSSH(cfg) Runner dials --dut-ssh-host with --dut-ssh-key, execs doublezero-agent -verbose (appending -controller <addr> when --controller is set), and tees remote stdout/stderr into <working-dir>/orchestrator.agent.log while running every line through Parser. The session is closed on ctx cancel; the events channel closes after both stream readers exit so consumers never see a half-emitted event. Host-key verification uses ssh.InsecureIgnoreHostKey (documented; targets are ephemeral cEOS containers).
  • pkg/sweep — gained an agent-event consumer goroutine and a tunnelRegistry populated as users are created; consumer attributes each agent.Event back to a user_index via tunnel ID lookup and appends pre_commit_log / applied rows. Unknown tunnels are debug-logged and dropped. The agent is started under a derived context that the sweep cancels after deprovision, so a clean run still shuts the agent down rather than leaking goroutines.
  • pkg/exec.fetchTunnelID — now implemented: GetAccountInfo on the user PDA + DeserializeUser + return TunnelId. exec.Config gains an RPC field; the cmd binary passes the same *solanarpc.Client used by Client/Executor.
  • cmd/device-orchestrator — new flags --dut-ssh-user (default admin) and --no-agent (offline testing). SSH runner is the default when --dut-ssh-host and --dut-ssh-key are both set; with either missing, the cmd falls back to the no-op runner and warns.

Testing Verification

  • pkg/agent/parser_test.go: golden line fixtures for single-tunnel diff, multi-tunnel diff (mixed +/-), deprovision-only diff, commit-success after multi-tunnel diff, abort-clears-buffer, stray commit-with-no-pending, two consecutive provision cycles, oversized tunnel ID skipped, Tunnel5000 vs Tunnel500 boundary.
  • pkg/sweep/sweep_test.go: new scriptedAgent + a deleteGate on the fake executor lets the test emit agent events while deprovision is blocked; asserts pre_commit_log / applied rows are written for the two registered tunnels and the unregistered tunnel 999 is dropped. Tests pass under -race.
  • pkg/exec/exec_test.go: stub RPC returns a hand-encoded User body with TunnelId = 4242; fetchTunnelID reads it correctly. Missing-account path returns an error containing not found.
  • Smoke test: make build produces bin/device-orchestrator; --dry-run writes orchestrator-config.json containing dut_ssh_user, no_agent, and the rest of the flag set.
  • make go-build go-lint go-test all green.

Out of scope

Completes the device-stress orchestrator (#3746) by replacing the no-op
AgentRunner with the live SSH-driven runner and the log parser that turns
agent diff/commit lines into pre_commit_log / applied events.

- pkg/agent/parser.go — Parser tracks two lines from
  controlplane/agent/pkg/arista/eapi.go: `Committing config session due to
  diffs detected: <diff>` (extracts every `+ interface Tunnel<ID>` and emits
  one pre_commit_log event per ID) and `Configuration session finalized with
  command '... commit'` (emits one applied event per pending tunnel; the
  abort variant clears the buffer without emitting).
- pkg/agent/ssh.go — Dials --dut-ssh-host with --dut-ssh-key, execs the
  configured doublezero-agent command (verbose, with optional --controller),
  and tees remote stdout/stderr into <working-dir>/orchestrator.agent.log
  while feeding lines through the parser. Host-key verification is
  InsecureIgnoreHostKey because targets are ephemeral cEOS containers.
- pkg/sweep — adds a consumer goroutine that reads Agent.Events() and writes
  pre_commit_log / applied rows by looking up each event's tunnel ID in a
  registry the provision goroutine populates as users are created. Unknown
  tunnels are debug-logged and dropped. The agent is started under a derived
  context so deprovision-then-clean-shutdown works without leaking the
  goroutine.
- pkg/exec.fetchTunnelID — implemented properly: GetAccountInfo on the user
  PDA, DeserializeUser, return User.TunnelId. Required adding an RPC field
  to exec.Config.
- cmd/device-orchestrator — new flags --dut-ssh-user (default `admin`) and
  --no-agent (offline testing); SSH runner becomes the default when
  --dut-ssh-host and --dut-ssh-key are both set.

Part 3 of #3746. Closes #3772.
@elitegreg elitegreg requested a review from nikw9944 May 27, 2026 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant