Skip to content

feat(agents): API/MCP loop agent (Stage 2e)#9

Closed
pradeepvrd wants to merge 3 commits into
feat/devops-bench-agentsfrom
feat/devops-bench-agents-api
Closed

feat(agents): API/MCP loop agent (Stage 2e)#9
pradeepvrd wants to merge 3 commits into
feat/devops-bench-agentsfrom
feat/devops-bench-agents-api

Conversation

@pradeepvrd

Copy link
Copy Markdown
Owner

Restructures the legacy API/MCP agent into devops_bench/agents/api/ (← pkg/agents/runner/api/{api,mcp_client}.py).

  • loop.py (ApiAgent(AgentHarness) registered api) and mcp.py (MCPClient).
  • Model-agnostic via devops_bench.models; mcp SDK imported lazily; bounded turn cap (AGENT_MAX_TURNS, default 50). Legacy utils.filter_schema_for_gemini dropped (lives in the gemini adapter).
  • Stacked on the 2a agents PR (needs agents/base).
  • Tests under tests/unit/agents/api/.

Stacked draft PR — part of the in-place Stage 2/3 restructure (see docs/migration/pr-plan.md). Base is the fork branch shown above; it will be retargeted to gke-labs/main once Stage 1 (gke-labs#89–92) merges. PRs are intended to be reviewed and merged in stage order.

Status: peer-reviewed by 2 teammates + senior sign-off on the full integration branch; full suite green (ruff + 374 unit tests). Do NOT mark ready until its stage is up for merge.

@pradeepvrd pradeepvrd force-pushed the feat/devops-bench-agents branch from e5d7e5a to 3901757 Compare June 18, 2026 07:57
@pradeepvrd pradeepvrd force-pushed the feat/devops-bench-agents-api branch from d9dbe47 to ba52e26 Compare June 18, 2026 07:57
Modules moved/refactored:
- pkg/agents/runner/api/api.py        -> devops_bench/agents/api/loop.py (ApiAgent, registered "api")
- pkg/agents/runner/api/mcp_client.py -> devops_bench/agents/api/mcp.py
- pkg/agents/runner/api/utils.py      -> dropped (filter_schema_for_gemini)
- pkg/agents/runner/api/llm_client.py + llm_adapters.py -> dropped (replaced by devops_bench.models)

Bugs fixed vs legacy:
- none in this commit; see the follow-up fix(agents) commit.

Improvements vs legacy:
- ApiAgent subclasses AgentHarness and self-registers via @AGENTS.register("api"),
  conforming to the standardized result contract and adding the previously-missing
  "skills" key.
- Model-/provider-agnostic: the loop drives the neutral models-layer LLMClient
  obtained from get_model(provider, model_name) (AGENT_PROVIDER/AGENT_MODEL) instead
  of the legacy provider-specific LLMClient/adapters; no provider SDK is imported.
- Lazy heavy imports: the `mcp` SDK and deepeval @observe are imported lazily inside
  functions, never at package import; __init__.py stays light.
- Dropped legacy utils.filter_schema_for_gemini: tool formatting goes through
  LLMClient.format_tools(...), and Gemini schema filtering lives in models.google,
  so a provider-specific filter does not belong in the model-agnostic loop.
- Bounded turn cap (AGENT_MAX_TURNS, default 50) replaces the legacy unbounded
  `while True`, guarding against a model that never stops requesting tools.
- All legacy bare imports package-qualified; prints replaced with the core logger.
…parsing (api loop)

Modules moved/refactored:
- pkg/agents/runner/api/api.py        -> devops_bench/agents/api/loop.py (ApiAgent, registered "api") (see base move commit)
- pkg/agents/runner/api/mcp_client.py -> devops_bench/agents/api/mcp.py (see base move commit)

Bugs fixed vs legacy:
- MCPClient.__aenter__ passed the whole server_path to
  StdioServerParameters(command=...), so a multi-word command (e.g. "uv run
  mcp-server") was treated as a single executable name and failed with
  FileNotFoundError (the stdio transport spawns without a shell). Now
  shlex.split the command into command + args, with a clear ValueError for an
  empty/whitespace-only server_path.
- process_query called mcp_client.call_tool(...) even when mcp_client is None
  (bench_use_mcp=False) but the model still requested a tool, raising
  AttributeError. Added an explicit `elif mcp_client is None` guard that returns
  a clear error string instead.
- Tool-result extraction only read content[0], dropping later content blocks.
  Added _extract_tool_text to aggregate the text of all content blocks
  (newline-joined), falling back to str(result) when none carry text.

Improvements vs legacy:
- none in this commit; see the follow-up feat(agents) commit.

Tests: None-client tool request, multi-block aggregation, _extract_tool_text
fallback, multi-word server_path split, and empty server_path ValueError.
… test (api loop)

Modules moved/refactored:
- pkg/agents/runner/api/api.py        -> devops_bench/agents/api/loop.py (ApiAgent, registered "api") (see base move commit)
- pkg/agents/runner/api/mcp_client.py -> devops_bench/agents/api/mcp.py (see base move commit)

Bugs fixed vs legacy:
- none in this commit; see the preceding fix(agents) commit.

Improvements vs legacy:
- The async loop no longer performs blocking filesystem I/O on the event loop.
  The skill-file read in process_query is extracted into _read_skill_file and
  run via asyncio.to_thread; the skill discovery in run_api_agent
  (os.path.exists + recursive glob.glob + parse_skill_md) is extracted into
  _discover_skill_tools and likewise run via asyncio.to_thread. Behavior is
  unchanged; only the threading boundary moves.

Tests: a happy-path MCPClient lifecycle test that mocks StdioServerParameters,
stdio_client, and ClientSession and asserts connect (session set + initialize),
list_tools/call_tool, and a clean __aexit__ (session and transport both closed).
@pradeepvrd

Copy link
Copy Markdown
Owner Author

Superseded by the reconciled cross-cutting refactor (see docs/refactor/e2e-refactor-sequencing-plan.md). Reworked into the layered devops_bench/ package on branch refactor/integration; replaced by the reworked component PRs and capstone #23. Closing as superseded.

@pradeepvrd pradeepvrd closed this Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant