feat(agents): API/MCP loop agent (Stage 2e)#9
Closed
pradeepvrd wants to merge 3 commits into
Closed
Conversation
e5d7e5a to
3901757
Compare
d9dbe47 to
ba52e26
Compare
Modules moved/refactored:
- pkg/agents/runner/api/api.py -> devops_bench/agents/api/loop.py (ApiAgent, registered "api")
- pkg/agents/runner/api/mcp_client.py -> devops_bench/agents/api/mcp.py
- pkg/agents/runner/api/utils.py -> dropped (filter_schema_for_gemini)
- pkg/agents/runner/api/llm_client.py + llm_adapters.py -> dropped (replaced by devops_bench.models)
Bugs fixed vs legacy:
- none in this commit; see the follow-up fix(agents) commit.
Improvements vs legacy:
- ApiAgent subclasses AgentHarness and self-registers via @AGENTS.register("api"),
conforming to the standardized result contract and adding the previously-missing
"skills" key.
- Model-/provider-agnostic: the loop drives the neutral models-layer LLMClient
obtained from get_model(provider, model_name) (AGENT_PROVIDER/AGENT_MODEL) instead
of the legacy provider-specific LLMClient/adapters; no provider SDK is imported.
- Lazy heavy imports: the `mcp` SDK and deepeval @observe are imported lazily inside
functions, never at package import; __init__.py stays light.
- Dropped legacy utils.filter_schema_for_gemini: tool formatting goes through
LLMClient.format_tools(...), and Gemini schema filtering lives in models.google,
so a provider-specific filter does not belong in the model-agnostic loop.
- Bounded turn cap (AGENT_MAX_TURNS, default 50) replaces the legacy unbounded
`while True`, guarding against a model that never stops requesting tools.
- All legacy bare imports package-qualified; prints replaced with the core logger.
…parsing (api loop) Modules moved/refactored: - pkg/agents/runner/api/api.py -> devops_bench/agents/api/loop.py (ApiAgent, registered "api") (see base move commit) - pkg/agents/runner/api/mcp_client.py -> devops_bench/agents/api/mcp.py (see base move commit) Bugs fixed vs legacy: - MCPClient.__aenter__ passed the whole server_path to StdioServerParameters(command=...), so a multi-word command (e.g. "uv run mcp-server") was treated as a single executable name and failed with FileNotFoundError (the stdio transport spawns without a shell). Now shlex.split the command into command + args, with a clear ValueError for an empty/whitespace-only server_path. - process_query called mcp_client.call_tool(...) even when mcp_client is None (bench_use_mcp=False) but the model still requested a tool, raising AttributeError. Added an explicit `elif mcp_client is None` guard that returns a clear error string instead. - Tool-result extraction only read content[0], dropping later content blocks. Added _extract_tool_text to aggregate the text of all content blocks (newline-joined), falling back to str(result) when none carry text. Improvements vs legacy: - none in this commit; see the follow-up feat(agents) commit. Tests: None-client tool request, multi-block aggregation, _extract_tool_text fallback, multi-word server_path split, and empty server_path ValueError.
… test (api loop) Modules moved/refactored: - pkg/agents/runner/api/api.py -> devops_bench/agents/api/loop.py (ApiAgent, registered "api") (see base move commit) - pkg/agents/runner/api/mcp_client.py -> devops_bench/agents/api/mcp.py (see base move commit) Bugs fixed vs legacy: - none in this commit; see the preceding fix(agents) commit. Improvements vs legacy: - The async loop no longer performs blocking filesystem I/O on the event loop. The skill-file read in process_query is extracted into _read_skill_file and run via asyncio.to_thread; the skill discovery in run_api_agent (os.path.exists + recursive glob.glob + parse_skill_md) is extracted into _discover_skill_tools and likewise run via asyncio.to_thread. Behavior is unchanged; only the threading boundary moves. Tests: a happy-path MCPClient lifecycle test that mocks StdioServerParameters, stdio_client, and ClientSession and asserts connect (session set + initialize), list_tools/call_tool, and a clean __aexit__ (session and transport both closed).
3901757 to
99ba7ff
Compare
ba52e26 to
caddb25
Compare
Owner
Author
|
Superseded by the reconciled cross-cutting refactor (see docs/refactor/e2e-refactor-sequencing-plan.md). Reworked into the layered devops_bench/ package on branch refactor/integration; replaced by the reworked component PRs and capstone #23. Closing as superseded. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Restructures the legacy API/MCP agent into
devops_bench/agents/api/(←pkg/agents/runner/api/{api,mcp_client}.py).loop.py(ApiAgent(AgentHarness)registeredapi) andmcp.py(MCPClient).devops_bench.models;mcpSDK imported lazily; bounded turn cap (AGENT_MAX_TURNS, default 50). Legacyutils.filter_schema_for_geminidropped (lives in the gemini adapter).agents/base).tests/unit/agents/api/.Stacked draft PR — part of the in-place Stage 2/3 restructure (see
docs/migration/pr-plan.md). Base is the fork branch shown above; it will be retargeted togke-labs/mainonce Stage 1 (gke-labs#89–92) merges. PRs are intended to be reviewed and merged in stage order.Status: peer-reviewed by 2 teammates + senior sign-off on the full integration branch; full suite green (ruff + 374 unit tests). Do NOT mark ready until its stage is up for merge.