feat(agents): AgentHarness base and CLI agents (Stage 2a)#5
Closed
pradeepvrd wants to merge 3 commits into
Closed
Conversation
e5d7e5a to
3901757
Compare
Restructure the legacy agent runners into the new flat package devops_bench/agents/, behind a model-agnostic AgentHarness interface and an AGENTS registry, reusing the shared core foundation. Modules moved/refactored: - pkg/agents/runner/runner.py -> devops_bench/agents/base.py (new AgentHarness ABC + AGENTS registry; legacy runner.py was a stale demo, only its result-dict shape is preserved) - pkg/agents/runner/gcli.py -> devops_bench/agents/cli/gemini.py - pkg/agents/runner/openclaw.py -> devops_bench/agents/cli/openclaw.py Bugs fixed vs legacy: - none (this commit is the relocation + structural refactor; behavioral bug fixes land in the follow-up fix(agents) commits) Improvements vs legacy: - new AgentHarness ABC + module-level AGENTS registry for model-agnostic pluggability; concrete CLI agents self-register via @AGENTS.register - model/provider selection flows from config (AGENT_PROVIDER/AGENT_MODEL), never hardcoded into the CLI invocation - lazy heavy imports: deepeval is imported inside the concrete modules only; package __init__ files stay light (no deepeval/SDK on import) - reuse the core foundation via public APIs: core.subprocess.run (list-args, no shell=True) and core.config (get_env/get_bool) instead of raw os.environ + subprocess - structured logging via core.get_logger instead of print() - Apache headers, Google-style docstrings, and __all__ on every module
Behavioral bug fixes to the relocated CLI agent runners, found in review.
Modules moved/refactored:
- see base move commit (refactor(agents): relocate legacy runners ...)
Bugs fixed vs legacy:
- CLI dispatch order: run_cli_agent checked the naive "oc" substring before
"gemini", so a gemini path containing "oc" (e.g. /usr/local/bin/gemini, via
"local") misrouted to OpenClaw. Now "gemini" is matched first, then
openclaw/oc, restoring legacy gcli.py precedence.
- Generic "binary" agent lost its input: a target that is neither gemini nor
openclaw/oc was run with no -p and no stdin, so the prompt reached the agent
nowhere. Restored the legacy stdin contract: feed
json.dumps({"goal", "context"}) on stdin.
- SSH shell-injection / breakage: prompt, agent_name, and the session_file path
were interpolated into the remote shell string unquoted (-m '{prompt}',
cat {session_file}), breaking or injecting on quotes/spaces. All are now
shlex.quote'd.
- SSH session cleanup used a hardcoded "operator" dir instead of the actual
agent_name; now rm -rf ~/.openclaw/agents/{quoted_agent}/sessions/*.
- Uncaught OSError/FileNotFoundError: a missing binary (gemini/oc) or missing
ssh crashed the run because core.subprocess.run does not wrap OSError. Both
CLI runners and run_cli_agent now catch (SubprocessError, OSError) (and the
local runner catches OSError alongside CalledProcessError) and return the
standard failed-result dict.
- _parse_openclaw_session TypeError: tool-only assistant turns can carry
content: null; iterating it raised. Now coalesced via `or []`.
- Greedy JSON regex: parse_gemini_cli_output used ({.*}) with DOTALL, which
spans across unrelated log lines containing braces and corrupts the capture
(silently dropping tokens/tools/session_id). Replaced with a balanced-brace
scan (_extract_json_object) that skips braces inside strings and returns the
last span that parses to a dict.
- Unescaped glob id: the session-file glob interpolated short_id raw; now
glob.escape(short_id) so metacharacters match literally.
Improvements vs legacy:
- none (improvements land in the follow-up feat(agents) commit)
Non-bug behavioral improvements over legacy, layered on the relocated runners.
Modules moved/refactored:
- see base move commit (refactor(agents): relocate legacy runners ...)
Bugs fixed vs legacy:
- none (bug fixes land in the preceding fix(agents) commit)
Improvements vs legacy:
- GeminiCliAgent resolves the binary path via
first_env("AGENT_TARGET", "GEMINI_PATH", default="gemini"), giving an explicit,
documented precedence instead of chained get_env calls.
- run_openclaw_agent no longer falls back to sandbox-specific defaults (an
author's GCP project, a getpass-derived *_google_com SSH user, an internal
VM host). OPENCLAW_SSH_USER and OPENCLAW_VM_HOST are now required via
core.require_env, raising ConfigError when unset so the runner never silently
targets some other host. (OPENCLAW_SSH_KEY keeps the standard gcloud default.)
- extract_trajectory_from_session gains a parent-folder fallback: a SKILL.md
path with no "skills" directory (e.g. /plugin/my-skill/SKILL.md) now yields
the skill name from the SKILL.md's parent folder.
3901757 to
99ba7ff
Compare
Owner
Author
|
Superseded by the reconciled cross-cutting refactor (see docs/refactor/e2e-refactor-sequencing-plan.md). Reworked into the layered devops_bench/ package on branch refactor/integration; replaced by the reworked component PRs and capstone #23. Closing as superseded. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Restructures the legacy agent runners into the flat
devops_bench/agents/package.AgentHarnessABC +AGENTSregistry (agents/base.py).cli/gemini.py(←pkg/agents/runner/gcli.py) andcli/openclaw.py(←pkg/agents/runner/openclaw.py), self-registered.AGENT_PROVIDER/AGENT_MODELconfig; no hardcoded model ids.deepevalimported lazily.tests/unit/agents/.Stacked draft PR — part of the in-place Stage 2/3 restructure (see
docs/migration/pr-plan.md). Base is the fork branch shown above; it will be retargeted togke-labs/mainonce Stage 1 (gke-labs#89–92) merges. PRs are intended to be reviewed and merged in stage order.Status: peer-reviewed by 2 teammates + senior sign-off on the full integration branch; full suite green (ruff + 374 unit tests). Do NOT mark ready until its stage is up for merge.