feat(agents): base harness + CLI agents (gemini, openclaw)#29
Draft
pradeepvrd wants to merge 1 commit into
Draft
feat(agents): base harness + CLI agents (gemini, openclaw)#29pradeepvrd wants to merge 1 commit into
pradeepvrd wants to merge 1 commit into
Conversation
pradeepvrd
commented
Jun 20, 2026
822d901 to
51db598
Compare
d789dcd to
c4a1093
Compare
pradeepvrd
added a commit
that referenced
this pull request
Jun 23, 2026
…nels Deliver MCP + skills to the OpenClaw agent the same way the Gemini agent does, through oc's own per-run channels so nothing touches ~/.openclaw and concurrent runs never race: - State: OPENCLAW_STATE_DIR -> <run>/state isolates sessions + the skills root; replaces the global ~/.openclaw/.../sessions wipe. - MCP: each binding with a launch command becomes an mcp.servers entry in <run>/openclaw.json, selected via OPENCLAW_CONFIG_PATH (command[0]->command, rest->args; command-less bindings skipped). - Skills: SKILL.md files discovered under the bound paths (reusing parse_skill_md) are materialized to <OPENCLAW_STATE_DIR>/skills/<name>/SKILL.md. - Model auth: config.api_key threaded into the provider env var. OpenClawAgent now assigns self.mcp_servers + self.skills, so it structurally satisfies SupportsMcp/SupportsSkills alongside SupportsRules. Default agent id is "main" (oc's built-in default, present in every config incl. the isolated one). Builds on the events.jsonl trajectory parser from PR #29 (this branch is rebased on the fixed base): the per-run export-trajectory bundle is parsed via the same dotted tool.call/tool.result/model.completed schema. Verified e2e on a real GKE secret-rotation run (rules+MCP+skills granted; ToolInvocation 1.0). Unit suite green. Completes the "Openclaw needs equivalent wiring" follow-up from the Gemini change.
43c5d8f to
1962e0d
Compare
51db598 to
887c755
Compare
1962e0d to
09c113b
Compare
887c755 to
6e9a5ac
Compare
09c113b to
c785a87
Compare
6e9a5ac to
1f3c053
Compare
The agent-execution layer used to live in `pkg/agents/runner/` (`gcli.py` dispatch + `openclaw.py`), driven by `evaluate.py`; this adds `devops_bench/agents/` — a typed `AgentHarness` base (`base.py`, `config.py`, `result.py`) plus the two CLI agents `cli/gemini.py` and `cli/openclaw.py`, built on the capability bindings. **Behavior changes** - Agent selection is a registry (`AGENTS.get`) keyed by a canonical name (with `cli`/`binary` aliases), replacing the substring match on `AGENT_TARGET`. - Both agents return a typed `AgentResult` (output, canonical `ToolCall` trajectory, tokens, errors) instead of an ad-hoc dict. - Gemini trajectory is parsed from the official `--output-format stream-json` event stream; it used to glob and scrape `~/.gemini/tmp` session files. - OpenClaw runs `oc` locally (the SSH transport is gone) and reads its trajectory from the `oc sessions export-trajectory` bundle (`events.jsonl`); it used to scrape the `sessionFile=` path out of debug stdout. - An unmatched `tool_result` is dropped from the trajectory and surfaced on `AgentResult.errors` rather than silently discarded, uniformly across both agents.
c785a87 to
4585a19
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The agent-execution layer used to live in
pkg/agents/runner/(gcli.pydispatch +openclaw.py), driven byevaluate.py; this addsdevops_bench/agents/— a typedAgentHarnessbase (base.py,config.py,result.py) plus the two CLI agentscli/gemini.pyandcli/openclaw.py, built on the capability bindings.Behavior changes
AGENTS.get) keyed by a canonical name (withcli/binaryaliases), replacing the substring match onAGENT_TARGET.AgentResult(output, canonicalToolCalltrajectory, tokens, errors) instead of an ad-hoc dict.--output-format stream-jsonevent stream; it used to glob and scrape~/.gemini/tmpsession files.oclocally (the SSH transport is gone) and reads its trajectory from theoc sessions export-trajectorybundle (events.jsonl); it used to scrape thesessionFile=path out of debug stdout.tool_resultis dropped from the trajectory and surfaced onAgentResult.errorsrather than silently discarded, uniformly across both agents.