Skip to content

feat(agents): AgentHarness base and CLI agents (Stage 2a)#5

Closed
pradeepvrd wants to merge 3 commits into
integration/devops-bench-stage1from
feat/devops-bench-agents
Closed

feat(agents): AgentHarness base and CLI agents (Stage 2a)#5
pradeepvrd wants to merge 3 commits into
integration/devops-bench-stage1from
feat/devops-bench-agents

Conversation

@pradeepvrd

Copy link
Copy Markdown
Owner

Restructures the legacy agent runners into the flat devops_bench/agents/ package.

  • New AgentHarness ABC + AGENTS registry (agents/base.py).
  • CLI agents cli/gemini.py (← pkg/agents/runner/gcli.py) and cli/openclaw.py (← pkg/agents/runner/openclaw.py), self-registered.
  • Model-agnostic: provider/model flow from AGENT_PROVIDER/AGENT_MODEL config; no hardcoded model ids. deepeval imported lazily.
  • Tests under tests/unit/agents/.

Stacked draft PR — part of the in-place Stage 2/3 restructure (see docs/migration/pr-plan.md). Base is the fork branch shown above; it will be retargeted to gke-labs/main once Stage 1 (gke-labs#89–92) merges. PRs are intended to be reviewed and merged in stage order.

Status: peer-reviewed by 2 teammates + senior sign-off on the full integration branch; full suite green (ruff + 374 unit tests). Do NOT mark ready until its stage is up for merge.

@pradeepvrd pradeepvrd force-pushed the feat/devops-bench-agents branch from e5d7e5a to 3901757 Compare June 18, 2026 07:57
Restructure the legacy agent runners into the new flat package
devops_bench/agents/, behind a model-agnostic AgentHarness interface and
an AGENTS registry, reusing the shared core foundation.

Modules moved/refactored:
- pkg/agents/runner/runner.py  -> devops_bench/agents/base.py
  (new AgentHarness ABC + AGENTS registry; legacy runner.py was a stale demo,
  only its result-dict shape is preserved)
- pkg/agents/runner/gcli.py    -> devops_bench/agents/cli/gemini.py
- pkg/agents/runner/openclaw.py -> devops_bench/agents/cli/openclaw.py

Bugs fixed vs legacy:
- none (this commit is the relocation + structural refactor; behavioral bug
  fixes land in the follow-up fix(agents) commits)

Improvements vs legacy:
- new AgentHarness ABC + module-level AGENTS registry for model-agnostic
  pluggability; concrete CLI agents self-register via @AGENTS.register
- model/provider selection flows from config (AGENT_PROVIDER/AGENT_MODEL),
  never hardcoded into the CLI invocation
- lazy heavy imports: deepeval is imported inside the concrete modules only;
  package __init__ files stay light (no deepeval/SDK on import)
- reuse the core foundation via public APIs: core.subprocess.run (list-args,
  no shell=True) and core.config (get_env/get_bool) instead of raw
  os.environ + subprocess
- structured logging via core.get_logger instead of print()
- Apache headers, Google-style docstrings, and __all__ on every module
Behavioral bug fixes to the relocated CLI agent runners, found in review.

Modules moved/refactored:
- see base move commit (refactor(agents): relocate legacy runners ...)

Bugs fixed vs legacy:
- CLI dispatch order: run_cli_agent checked the naive "oc" substring before
  "gemini", so a gemini path containing "oc" (e.g. /usr/local/bin/gemini, via
  "local") misrouted to OpenClaw. Now "gemini" is matched first, then
  openclaw/oc, restoring legacy gcli.py precedence.
- Generic "binary" agent lost its input: a target that is neither gemini nor
  openclaw/oc was run with no -p and no stdin, so the prompt reached the agent
  nowhere. Restored the legacy stdin contract: feed
  json.dumps({"goal", "context"}) on stdin.
- SSH shell-injection / breakage: prompt, agent_name, and the session_file path
  were interpolated into the remote shell string unquoted (-m '{prompt}',
  cat {session_file}), breaking or injecting on quotes/spaces. All are now
  shlex.quote'd.
- SSH session cleanup used a hardcoded "operator" dir instead of the actual
  agent_name; now rm -rf ~/.openclaw/agents/{quoted_agent}/sessions/*.
- Uncaught OSError/FileNotFoundError: a missing binary (gemini/oc) or missing
  ssh crashed the run because core.subprocess.run does not wrap OSError. Both
  CLI runners and run_cli_agent now catch (SubprocessError, OSError) (and the
  local runner catches OSError alongside CalledProcessError) and return the
  standard failed-result dict.
- _parse_openclaw_session TypeError: tool-only assistant turns can carry
  content: null; iterating it raised. Now coalesced via `or []`.
- Greedy JSON regex: parse_gemini_cli_output used ({.*}) with DOTALL, which
  spans across unrelated log lines containing braces and corrupts the capture
  (silently dropping tokens/tools/session_id). Replaced with a balanced-brace
  scan (_extract_json_object) that skips braces inside strings and returns the
  last span that parses to a dict.
- Unescaped glob id: the session-file glob interpolated short_id raw; now
  glob.escape(short_id) so metacharacters match literally.

Improvements vs legacy:
- none (improvements land in the follow-up feat(agents) commit)
Non-bug behavioral improvements over legacy, layered on the relocated runners.

Modules moved/refactored:
- see base move commit (refactor(agents): relocate legacy runners ...)

Bugs fixed vs legacy:
- none (bug fixes land in the preceding fix(agents) commit)

Improvements vs legacy:
- GeminiCliAgent resolves the binary path via
  first_env("AGENT_TARGET", "GEMINI_PATH", default="gemini"), giving an explicit,
  documented precedence instead of chained get_env calls.
- run_openclaw_agent no longer falls back to sandbox-specific defaults (an
  author's GCP project, a getpass-derived *_google_com SSH user, an internal
  VM host). OPENCLAW_SSH_USER and OPENCLAW_VM_HOST are now required via
  core.require_env, raising ConfigError when unset so the runner never silently
  targets some other host. (OPENCLAW_SSH_KEY keeps the standard gcloud default.)
- extract_trajectory_from_session gains a parent-folder fallback: a SKILL.md
  path with no "skills" directory (e.g. /plugin/my-skill/SKILL.md) now yields
  the skill name from the SKILL.md's parent folder.
@pradeepvrd pradeepvrd force-pushed the feat/devops-bench-agents branch from 3901757 to 99ba7ff Compare June 18, 2026 08:23
@pradeepvrd

Copy link
Copy Markdown
Owner Author

Superseded by the reconciled cross-cutting refactor (see docs/refactor/e2e-refactor-sequencing-plan.md). Reworked into the layered devops_bench/ package on branch refactor/integration; replaced by the reworked component PRs and capstone #23. Closing as superseded.

@pradeepvrd pradeepvrd closed this Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant