tools/stress: agent SSH + log parser#3780
Open
elitegreg wants to merge 1 commit into
Open
Conversation
Completes the device-stress orchestrator (#3746) by replacing the no-op AgentRunner with the live SSH-driven runner and the log parser that turns agent diff/commit lines into pre_commit_log / applied events. - pkg/agent/parser.go — Parser tracks two lines from controlplane/agent/pkg/arista/eapi.go: `Committing config session due to diffs detected: <diff>` (extracts every `+ interface Tunnel<ID>` and emits one pre_commit_log event per ID) and `Configuration session finalized with command '... commit'` (emits one applied event per pending tunnel; the abort variant clears the buffer without emitting). - pkg/agent/ssh.go — Dials --dut-ssh-host with --dut-ssh-key, execs the configured doublezero-agent command (verbose, with optional --controller), and tees remote stdout/stderr into <working-dir>/orchestrator.agent.log while feeding lines through the parser. Host-key verification is InsecureIgnoreHostKey because targets are ephemeral cEOS containers. - pkg/sweep — adds a consumer goroutine that reads Agent.Events() and writes pre_commit_log / applied rows by looking up each event's tunnel ID in a registry the provision goroutine populates as users are created. Unknown tunnels are debug-logged and dropped. The agent is started under a derived context so deprovision-then-clean-shutdown works without leaking the goroutine. - pkg/exec.fetchTunnelID — implemented properly: GetAccountInfo on the user PDA, DeserializeUser, return User.TunnelId. Required adding an RPC field to exec.Config. - cmd/device-orchestrator — new flags --dut-ssh-user (default `admin`) and --no-agent (offline testing); SSH runner becomes the default when --dut-ssh-host and --dut-ssh-key are both set. Part 3 of #3746. Closes #3772.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Completes the device-stress orchestrator (#3746) by replacing the no-op
AgentRunnerwith the live SSH-driven runner and the log parser that turns agent diff/commit lines intopre_commit_log/appliedrunlog rows. Stacked on top of #3776 (part 2, orchestrator skeleton). Part 3 of #3746. Closes #3772.pkg/agent/parser.go—Parser.Parse(line) []Eventtracks two log lines fromcontrolplane/agent/pkg/arista/eapi.go:Committing config session due to diffs detected: <diff>→ extracts every+ interface Tunnel<ID>and emits onepre_commit_logevent per ID; the diff's-lines (deprovisions) are ignored.Configuration session finalized with command '... commit'→ emits oneappliedevent per pending tunnel; the... abortvariant clears the buffer without emitting.pkg/agent/ssh.go—NewSSH(cfg) Runnerdials--dut-ssh-hostwith--dut-ssh-key, execsdoublezero-agent -verbose(appending-controller <addr>when--controlleris set), and tees remote stdout/stderr into<working-dir>/orchestrator.agent.logwhile running every line throughParser. The session is closed on ctx cancel; the events channel closes after both stream readers exit so consumers never see a half-emitted event. Host-key verification usesssh.InsecureIgnoreHostKey(documented; targets are ephemeral cEOS containers).pkg/sweep— gained an agent-event consumer goroutine and atunnelRegistrypopulated as users are created; consumer attributes eachagent.Eventback to auser_indexvia tunnel ID lookup and appendspre_commit_log/appliedrows. Unknown tunnels are debug-logged and dropped. The agent is started under a derived context that the sweep cancels after deprovision, so a clean run still shuts the agent down rather than leaking goroutines.pkg/exec.fetchTunnelID— now implemented:GetAccountInfoon the user PDA +DeserializeUser+ returnTunnelId.exec.Configgains anRPCfield; the cmd binary passes the same*solanarpc.Clientused byClient/Executor.cmd/device-orchestrator— new flags--dut-ssh-user(defaultadmin) and--no-agent(offline testing). SSH runner is the default when--dut-ssh-hostand--dut-ssh-keyare both set; with either missing, the cmd falls back to the no-op runner and warns.Testing Verification
pkg/agent/parser_test.go: golden line fixtures for single-tunnel diff, multi-tunnel diff (mixed +/-), deprovision-only diff, commit-success after multi-tunnel diff, abort-clears-buffer, stray commit-with-no-pending, two consecutive provision cycles, oversized tunnel ID skipped,Tunnel5000vsTunnel500boundary.pkg/sweep/sweep_test.go: newscriptedAgent+ adeleteGateon the fake executor lets the test emit agent events while deprovision is blocked; assertspre_commit_log/appliedrows are written for the two registered tunnels and the unregistered tunnel 999 is dropped. Tests pass under-race.pkg/exec/exec_test.go: stub RPC returns a hand-encodedUserbody withTunnelId = 4242;fetchTunnelIDreads it correctly. Missing-account path returns an error containingnot found.make buildproducesbin/device-orchestrator;--dry-runwritesorchestrator-config.jsoncontainingdut_ssh_user,no_agent, and the rest of the flag set.make go-build go-lint go-testall green.Out of scope
golang.org/x/crypto/ssh; CI never opens an SSH session. Acceptance is via the manual devnet run per stress: implement tools/stress/device-orchestrator #3746.