feat(chaos): chaos injector with Trigger/Fault registries (Stage 2c)#7
Closed
pradeepvrd wants to merge 3 commits into
Closed
feat(chaos): chaos injector with Trigger/Fault registries (Stage 2c)#7pradeepvrd wants to merge 3 commits into
pradeepvrd wants to merge 3 commits into
Conversation
2ab9ff8 to
3819a8e
Compare
…BCs and registries (2c)
Modules moved/refactored:
- pkg/agents/chaos/chaos.py -> devops_bench/chaos/agent.py (ChaosAgent loop)
+ devops_bench/chaos/faults/generate_load.py (fault exec)
- new devops_bench/chaos/base.py (Fault/Trigger ABCs + FAULTS/TRIGGERS registries)
- new devops_bench/chaos/__init__.py + devops_bench/chaos/faults/__init__.py (light re-exports; no SDK imports)
- new tests/unit/chaos/test_chaos_agent.py + test_chaos_generate_load.py (legacy chaos_test.py ported to pytest)
Bugs fixed vs legacy:
- none (pure structural move; behavioral fixes land in the following fix(chaos) commit)
Improvements vs legacy:
- split the monolithic ChaosAgent into an orchestration layer (agent.py) and a registered fault (faults/generate_load.py), so faults are pluggable
- added Fault/Trigger ABCs and the FAULTS/TRIGGERS registries (base.py) per the component design, replacing ad-hoc dispatch on action "type"
- made the LLM loop model-agnostic: drive it through the neutral devops_bench.models LLMClient interface (get_model + format_tools/generate_content/extract_function_calls/get_text_content) instead of the hardcoded google.genai chat client, with provider/model from CHAOS_PROVIDER/CHAOS_MODEL falling back to AGENT_PROVIDER/AGENT_MODEL
- preserved the chaos_active_event signaling so the harness can detect an active load spike
- exposed command execution as a single run_command tool and bounded the loop with a turn cap
…s, and event ordering Modules moved/refactored: - see base move commit (devops_bench/chaos/agent.py, devops_bench/chaos/faults/generate_load.py) Bugs fixed vs legacy: - ChaosAgent._run_async dropped the model's final text when a tool call landed on the last turn (or the turn cap): final_text was only assigned when there were no function calls. Now set final_text on every turn so an accompanying summary is never lost. - _execute_tool raised AttributeError when the model returned non-dict tool args (str/list/None): args.get(...) was called unconditionally. Now guard with isinstance(args, dict) and return "Error: tool args must be an object"; the caller passes raw args so the guard fires. - run_chaos_command raised IndexError on an empty command string (shlex.split -> [] -> run([])). Now short-circuit with "Error: command string is empty" before parsing. - run_chaos_command set chaos_active_event BEFORE parsing, so a command that failed shlex.split still told the harness "load active". Now signal the event only after a successful parse, immediately before execution. Improvements vs legacy: - none (behavioral bug fixes only; further improvements land in the following feat(chaos) commit)
…ndency injection Modules moved/refactored: - see base move commit (devops_bench/chaos/agent.py, devops_bench/chaos/faults/generate_load.py) Bugs fixed vs legacy: - none (fixes landed in the preceding fix(chaos) commit) Improvements vs legacy: - expand a leading ~ in each command token (os.path.expanduser) so model-emitted paths like ~/go/bin/fortio resolve under the shell-free argv executor instead of failing execvp; document that only single, non-piped commands are supported (no pipes/redirection/$VAR) in the run_command prompt and docstring. - drive the fortio target URL from the spec: read target.service_url (rewritten by the harness to the local port-forward) via target_url_from_spec() with a single _DEFAULT_TARGET_URL fallback, and inject it into both the goal and the system instruction (build_system_instruction(target_url)), removing the hardcoded http://localhost:8080 from SYSTEM_INSTRUCTION and goal(). - ChaosAgent.__init__ now accepts optional system_instruction and tools (defaulting to the module constants), used throughout the loop, so the agent is reusable for other faults. - decouple the orchestrator from the concrete fault: drop the top-level import of run_chaos_command and inject a tool_handler callable into the ctor (lazily defaulting to run_chaos_command); _execute_tool dispatches via self._tool_handler.
3819a8e to
4a10e71
Compare
This was referenced Jun 20, 2026
Owner
Author
|
Superseded by the reconciled cross-cutting refactor (see docs/refactor/e2e-refactor-sequencing-plan.md). Reworked into the layered devops_bench/ package on branch refactor/integration; replaced by the reworked component PRs and capstone #23. Closing as superseded. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Splits the legacy chaos module into
devops_bench/chaos/(←pkg/agents/chaos/chaos.py).base.py(Trigger/Fault ABCs +FAULTS/TRIGGERSregistries),agent.py(ChaosAgent loop),faults/generate_load.py.devops_bench.models(get_model/LLMClienttool-calling) — no provider SDK.chaos_active_eventsignaling preserved.tests/unit/chaos/.Stacked draft PR — part of the in-place Stage 2/3 restructure (see
docs/migration/pr-plan.md). Base is the fork branch shown above; it will be retargeted togke-labs/mainonce Stage 1 (gke-labs#89–92) merges. PRs are intended to be reviewed and merged in stage order.Status: peer-reviewed by 2 teammates + senior sign-off on the full integration branch; full suite green (ruff + 374 unit tests). Do NOT mark ready until its stage is up for merge.