Skip to content

feat(chaos): typed ChaosResult/Fault/Trigger; loop reuse; authoring contract (Phase 3)#15

Merged
pradeepvrd merged 2 commits into
refactor/integrationfrom
refactor/chaos
Jun 20, 2026
Merged

feat(chaos): typed ChaosResult/Fault/Trigger; loop reuse; authoring contract (Phase 3)#15
pradeepvrd merged 2 commits into
refactor/integrationfrom
refactor/chaos

Conversation

@pradeepvrd

Copy link
Copy Markdown
Owner

Summary

Phase 3 chaos refactor (rework of fork PR #7). Brings chaos onto the shared
models.loop.run_tool_loop and the same discriminated-union typed-node idiom
verification uses, so chaos and agents become true Layer-2 siblings.

  • Typed end to end. New ChaosResult (pydantic; success / injected_fault
    / output / elapsed_time / error) replaces the legacy {status, output}
    dict. Fault / Trigger are type-tagged pydantic models discriminated in
    ChaosSpec. verify is an opaque string key — chaos never imports
    verification.
  • One canonical loop. ChaosAgent drives run_tool_loop (no duplicated
    turn loop). Fortio specifics (system prompt, RUN_COMMAND_TOOL, the argv
    executor, the typed GenerateLoadFault / LoadTarget) move into
    faults/generate_load.py. New triggers/time_delay.py houses TimeTrigger.
  • Slim package surface. chaos/__init__.py exports only Fault, Trigger,
    ChaosResult, FAULTS, TRIGGERS, ChaosSpec. ChaosAgent is not
    exported; import devops_bench.chaos pulls no provider SDK.
  • Authoring contract. chaos/schema.py emits JSON Schema + validates a
    single chaos entry. The regression test parses the REAL
    complextasks/optimize-scale/task.yaml chaos block through ChaosSpec and
    asserts it discriminates to GenerateLoadFault / TimeTrigger with the
    documented target.service_url / qps=300 / delay_seconds=5. The legacy
    verification field name is accepted as an alias of verify so the file
    parses unchanged ahead of Phase B.

What reviewers should scrutinize

  • ChaosSpec alias decision. verify accepts verification as a
    validation_alias. Worth confirming this is the right Phase-A bridge
    versus forcing the task-file migration up front.
  • Agent / fault import direction. generate_load.py imports ChaosAgent
    at module top per handoff §5.3 ("no lazy imports; one-way generate_load
    agentmodels"). chaos/__init__.py doesn't directly import
    ChaosAgent, but it does load the agent module transitively via
    ChaosSpecGenerateLoadFault. This mirrors verification/__init__
    pulling verifiers/* transitively. Both stay SDK-free at import time.
  • Phase scope. Phase B (task-file YAML migration + scenario.py driving
    action.inject / trigger.wait / chaos report wiring) is not in this
    PR — that is the harness step per the sequencing plan §4.6.

Test plan

  • uv run ruff check devops_bench/ tests/unit/chaos/ clean.
  • uv run pytest tests/ -q green: 456 passed (35 new chaos tests on top
    of the 421-test Wave-1 baseline).
  • tests/unit/chaos/test_package_import.py confirms importing the package
    pulls no deepeval / mcp / anthropic / google.genai / openai /
    ollama, and that ChaosAgent is not exported.
  • tests/unit/chaos/test_optimize_scale_regression.py parses the real
    complextasks/optimize-scale/task.yaml chaos entry.
  • ChaosAgent exercised via a fake LLMClient (no SDK / no network) for
    no-tool turns, tool dispatch + message-shape, malformed args, unknown
    tool, and final-text retention across the turn cap.

…ontract

Phase 3 chaos refactor (rework PR #7). Brings the chaos package onto the
shared model-agnostic loop and the same discriminated-union typed-node idiom
verification uses, so chaos and agents become true Layer-2 siblings.

Changes
- New ``ChaosResult`` pydantic model (success / injected_fault / output /
  elapsed_time / error) replacing the legacy ``{status, output}`` dict.
- ``Fault`` and ``Trigger`` reshaped as ``type``-tagged pydantic models in
  ``chaos/base.py``; FAULTS / TRIGGERS registries unchanged.
- ``chaos/spec.py`` adds the ``ChaosSpec`` discriminated union
  (``ChaosAction`` / ``ChaosTrigger``). ``verify`` is a plain string key — chaos
  never imports a verification node. The legacy ``verification`` field name is
  accepted as an alias so the real ``optimize-scale/task.yaml`` parses
  unchanged ahead of the Phase-B task-file migration.
- ``ChaosAgent`` reworked to drive ``models.loop.run_tool_loop``; fortio
  specifics (system prompt builder, RUN_COMMAND_TOOL, run_chaos_command, load
  target) move into ``faults/generate_load.py`` along with the typed
  ``GenerateLoadFault`` / ``LoadTarget`` nodes. The agent is now
  fault-agnostic — system instruction, tool descriptor, and command handler
  are injected by the concrete fault.
- New ``triggers/time_delay.py`` houses the ``TimeTrigger`` previously inlined
  in ``harness/scenario.py``.
- Slim ``chaos/__init__.py`` exports only ``Fault``, ``Trigger``,
  ``ChaosResult``, ``FAULTS``, ``TRIGGERS``, ``ChaosSpec``; no eager import of
  ``ChaosAgent``.
- New ``chaos/schema.py`` emits JSON Schema and validates a single chaos entry
  (the Phase-A authoring contract).

Tests (35 new under ``tests/unit/chaos/``)
- Spec discrimination: minimal / named / verify-reference / legacy alias /
  bare-list rejection / extra-forbid / unknown discriminator / registry
  membership.
- ``ChaosResult`` shape + round-trip; ``Fault`` / ``Trigger`` abstractness.
- ``ChaosAgent`` against a fake ``LLMClient`` (no SDK / no network): no-tool
  turn, tool dispatch + message-shape pinning, non-dict args, unknown tool,
  final-text retention across the turn cap.
- ``GenerateLoadFault.inject`` with mocked subprocess + stubbed agent: success
  path, exception → ``success=False`` with error, event threaded through.
- ``TimeTrigger`` zero vs positive delay (``time.sleep`` patched).
- Regression: the REAL ``complextasks/optimize-scale/task.yaml`` chaos entry
  parses through ``ChaosSpec`` and discriminates to ``GenerateLoadFault`` /
  ``TimeTrigger`` with the documented ``target.service_url`` / ``qps`` /
  ``delay_seconds``.
- Import hygiene: ``import devops_bench.chaos`` pulls no provider SDK,
  ``mcp``, ``deepeval``, or ``ollama``; ``ChaosAgent`` is not exported.

Bar
- ``uv run ruff check devops_bench/ tests/unit/chaos/`` clean.
- Full ``uv run pytest tests/ -q`` green (456 passed; +35 chaos tests on top
  of the 421-test baseline).
…; align __init__ docstring

CONVENTIONS §9: dummy ``@FAULTS.register("dummy")`` / ``@TRIGGERS.register("dummy")``
resolve via ``REGISTRY.get(key)`` with no edit to ``chaos/spec.py``. Comment
calls out Phase A's split: registry drives harness resolution; ``ChaosSpec``
parsing still requires a Union edit until Phase 4's registry-driven swap.

Also: pin ``args={}`` (missing ``command`` key) dispatch returning an
``"Error: ..."`` string (no crash); align ``chaos/__init__.py`` docstring with
the import-hygiene test it claims to satisfy — concretes load with the
package (required for the discriminated union), but provider SDKs / fortio /
``deepeval`` / ``mcp`` stay strictly lazy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant