feat(chaos): typed ChaosResult/Fault/Trigger; loop reuse; authoring contract (Phase 3)#15
Merged
Merged
Conversation
…ontract Phase 3 chaos refactor (rework PR #7). Brings the chaos package onto the shared model-agnostic loop and the same discriminated-union typed-node idiom verification uses, so chaos and agents become true Layer-2 siblings. Changes - New ``ChaosResult`` pydantic model (success / injected_fault / output / elapsed_time / error) replacing the legacy ``{status, output}`` dict. - ``Fault`` and ``Trigger`` reshaped as ``type``-tagged pydantic models in ``chaos/base.py``; FAULTS / TRIGGERS registries unchanged. - ``chaos/spec.py`` adds the ``ChaosSpec`` discriminated union (``ChaosAction`` / ``ChaosTrigger``). ``verify`` is a plain string key — chaos never imports a verification node. The legacy ``verification`` field name is accepted as an alias so the real ``optimize-scale/task.yaml`` parses unchanged ahead of the Phase-B task-file migration. - ``ChaosAgent`` reworked to drive ``models.loop.run_tool_loop``; fortio specifics (system prompt builder, RUN_COMMAND_TOOL, run_chaos_command, load target) move into ``faults/generate_load.py`` along with the typed ``GenerateLoadFault`` / ``LoadTarget`` nodes. The agent is now fault-agnostic — system instruction, tool descriptor, and command handler are injected by the concrete fault. - New ``triggers/time_delay.py`` houses the ``TimeTrigger`` previously inlined in ``harness/scenario.py``. - Slim ``chaos/__init__.py`` exports only ``Fault``, ``Trigger``, ``ChaosResult``, ``FAULTS``, ``TRIGGERS``, ``ChaosSpec``; no eager import of ``ChaosAgent``. - New ``chaos/schema.py`` emits JSON Schema and validates a single chaos entry (the Phase-A authoring contract). Tests (35 new under ``tests/unit/chaos/``) - Spec discrimination: minimal / named / verify-reference / legacy alias / bare-list rejection / extra-forbid / unknown discriminator / registry membership. - ``ChaosResult`` shape + round-trip; ``Fault`` / ``Trigger`` abstractness. - ``ChaosAgent`` against a fake ``LLMClient`` (no SDK / no network): no-tool turn, tool dispatch + message-shape pinning, non-dict args, unknown tool, final-text retention across the turn cap. - ``GenerateLoadFault.inject`` with mocked subprocess + stubbed agent: success path, exception → ``success=False`` with error, event threaded through. - ``TimeTrigger`` zero vs positive delay (``time.sleep`` patched). - Regression: the REAL ``complextasks/optimize-scale/task.yaml`` chaos entry parses through ``ChaosSpec`` and discriminates to ``GenerateLoadFault`` / ``TimeTrigger`` with the documented ``target.service_url`` / ``qps`` / ``delay_seconds``. - Import hygiene: ``import devops_bench.chaos`` pulls no provider SDK, ``mcp``, ``deepeval``, or ``ollama``; ``ChaosAgent`` is not exported. Bar - ``uv run ruff check devops_bench/ tests/unit/chaos/`` clean. - Full ``uv run pytest tests/ -q`` green (456 passed; +35 chaos tests on top of the 421-test baseline).
…; align __init__ docstring
CONVENTIONS §9: dummy ``@FAULTS.register("dummy")`` / ``@TRIGGERS.register("dummy")``
resolve via ``REGISTRY.get(key)`` with no edit to ``chaos/spec.py``. Comment
calls out Phase A's split: registry drives harness resolution; ``ChaosSpec``
parsing still requires a Union edit until Phase 4's registry-driven swap.
Also: pin ``args={}`` (missing ``command`` key) dispatch returning an
``"Error: ..."`` string (no crash); align ``chaos/__init__.py`` docstring with
the import-hygiene test it claims to satisfy — concretes load with the
package (required for the discriminated union), but provider SDKs / fortio /
``deepeval`` / ``mcp`` stay strictly lazy.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 3 chaos refactor (rework of fork PR #7). Brings chaos onto the shared
models.loop.run_tool_loopand the same discriminated-union typed-node idiomverification uses, so chaos and agents become true Layer-2 siblings.
ChaosResult(pydantic;success/injected_fault/
output/elapsed_time/error) replaces the legacy{status, output}dict.
Fault/Triggeraretype-tagged pydantic models discriminated inChaosSpec.verifyis an opaque string key — chaos never importsverification.ChaosAgentdrivesrun_tool_loop(no duplicatedturn loop). Fortio specifics (system prompt,
RUN_COMMAND_TOOL, the argvexecutor, the typed
GenerateLoadFault/LoadTarget) move intofaults/generate_load.py. Newtriggers/time_delay.pyhousesTimeTrigger.chaos/__init__.pyexports onlyFault,Trigger,ChaosResult,FAULTS,TRIGGERS,ChaosSpec.ChaosAgentis notexported;
import devops_bench.chaospulls no provider SDK.chaos/schema.pyemits JSON Schema + validates asingle chaos entry. The regression test parses the REAL
complextasks/optimize-scale/task.yamlchaos block throughChaosSpecandasserts it discriminates to
GenerateLoadFault/TimeTriggerwith thedocumented
target.service_url/qps=300/delay_seconds=5. The legacyverificationfield name is accepted as an alias ofverifyso the fileparses unchanged ahead of Phase B.
What reviewers should scrutinize
ChaosSpecalias decision.verifyacceptsverificationas avalidation_alias. Worth confirming this is the right Phase-A bridgeversus forcing the task-file migration up front.
generate_load.pyimportsChaosAgentat module top per handoff §5.3 ("no lazy imports; one-way
generate_load→
agent→models").chaos/__init__.pydoesn't directly importChaosAgent, but it does load the agent module transitively viaChaosSpec→GenerateLoadFault. This mirrorsverification/__init__pulling
verifiers/*transitively. Both stay SDK-free at import time.scenario.pydrivingaction.inject/trigger.wait/ chaos report wiring) is not in thisPR — that is the harness step per the sequencing plan §4.6.
Test plan
uv run ruff check devops_bench/ tests/unit/chaos/clean.uv run pytest tests/ -qgreen: 456 passed (35 new chaos tests on topof the 421-test Wave-1 baseline).
tests/unit/chaos/test_package_import.pyconfirms importing the packagepulls no
deepeval/mcp/anthropic/google.genai/openai/ollama, and thatChaosAgentis not exported.tests/unit/chaos/test_optimize_scale_regression.pyparses the realcomplextasks/optimize-scale/task.yamlchaos entry.ChaosAgentexercised via a fakeLLMClient(no SDK / no network) forno-tool turns, tool dispatch + message-shape, malformed args, unknown
tool, and final-text retention across the turn cap.