Skip to content

feat: add property-based fuzzing harness, trace minimization, and CLI (#220, #221, #222, #217)#223

Merged
dgenio merged 3 commits into
mainfrom
claude/github-issues-triage-YrEz0
May 29, 2026
Merged

feat: add property-based fuzzing harness, trace minimization, and CLI (#220, #221, #222, #217)#223
dgenio merged 3 commits into
mainfrom
claude/github-issues-triage-YrEz0

Conversation

@dgenio
Copy link
Copy Markdown
Owner

@dgenio dgenio commented May 29, 2026

Add an active failure-discovery layer that complements ChainWeaver's existing
replay/diff/attest reproduction primitives.

  • chainweaver/fuzz.py (Add property-based fuzzing harness for ChainWeaver flows #220): FlowFuzzer generates/mutates inputs from a flow's
    input_schema (or a base input), optionally injects malformed tool outputs via
    a seeded fault hook, executes the flow, and records FlowProperty violations as
    replayable ExecutionResult traces. All randomness is seeded so runs are
    reproducible; the executor stays randomness-free. Ships FlowProperty,
    FaultConfig, FuzzCase/FuzzFailure/FuzzReport, BUILTIN_PROPERTIES, and
    FuzzConfigError.
  • minimize_failure (Minimize failing execution traces into smallest reproducible failure #221): delta-debugs a failing input to the smallest
    reproducer that still violates a property, re-verifying each reduction.
  • chainweaver fuzz CLI (Add chainweaver fuzz command for property-based flow testing #222): --property (builtin or module:attr), --runs,
    --seed, --input/--input-file, --output-fault-prob, --minimize, --save-failures
    (redacted by default), --format. Non-zero exit on violation. Documented in
    docs/cli.md.
  • RedactionPolicy.redact_step_record / redact_execution_result (Add RedactionPolicy.redact_step_record / redact_execution_result helpers #217): return
    redacted copies of StepRecord / ExecutionResult, used by the save-failures
    path. Implemented with Pydantic model_copy (the models are BaseModel with
    string-only error fields, not dataclasses carrying live exceptions).

Tests cover detection, seed determinism, fault injection, minimization,
redaction round-trips, and the full CLI exit-code contract. All four validation
commands pass; project coverage 91%.

https://claude.ai/code/session_01KQy8CtFEATMZ8rZmkepQ1Y

…#220, #221, #222, #217)

Add an active failure-discovery layer that complements ChainWeaver's existing
replay/diff/attest reproduction primitives.

- chainweaver/fuzz.py (#220): FlowFuzzer generates/mutates inputs from a flow's
  input_schema (or a base input), optionally injects malformed tool outputs via
  a seeded fault hook, executes the flow, and records FlowProperty violations as
  replayable ExecutionResult traces. All randomness is seeded so runs are
  reproducible; the executor stays randomness-free. Ships FlowProperty,
  FaultConfig, FuzzCase/FuzzFailure/FuzzReport, BUILTIN_PROPERTIES, and
  FuzzConfigError.
- minimize_failure (#221): delta-debugs a failing input to the smallest
  reproducer that still violates a property, re-verifying each reduction.
- chainweaver fuzz CLI (#222): --property (builtin or module:attr), --runs,
  --seed, --input/--input-file, --output-fault-prob, --minimize, --save-failures
  (redacted by default), --format. Non-zero exit on violation. Documented in
  docs/cli.md.
- RedactionPolicy.redact_step_record / redact_execution_result (#217): return
  redacted copies of StepRecord / ExecutionResult, used by the save-failures
  path. Implemented with Pydantic model_copy (the models are BaseModel with
  string-only error fields, not dataclasses carrying live exceptions).

Tests cover detection, seed determinism, fault injection, minimization,
redaction round-trips, and the full CLI exit-code contract. All four validation
commands pass; project coverage 91%.

https://claude.ai/code/session_01KQy8CtFEATMZ8rZmkepQ1Y
Copilot AI review requested due to automatic review settings May 29, 2026 07:38
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'ChainWeaver microbenchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.25.

Benchmark suite Current: 98a777c Previous: 6fd70ff Ratio
compiled_overhead_ms_n5_llm200_tool0 0.16273200003524835 ms 0.12229499998284155 ms 1.33

This comment was automatically generated by workflow using github-action-benchmark.

CC: @dgenio

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an active failure-discovery layer to ChainWeaver by introducing a deterministic, property-based fuzzing harness for flows, optional tool-output fault injection, failure minimization, and a chainweaver fuzz CLI workflow that can save replayable traces (with redaction support).

Changes:

  • Adds chainweaver.fuzz with FlowFuzzer, FlowProperty, FaultConfig, result models, and minimize_failure(...) for deterministic shrinking.
  • Adds chainweaver fuzz CLI command with --property, --runs/--seed, optional fault injection, optional minimization, and optional saving of (redacted) failing traces.
  • Extends RedactionPolicy with trace-level helpers (redact_step_record / redact_execution_result) and adds tests + docs + public API snapshot updates for the new surface.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tests/test_redaction.py Adds coverage for new trace-redaction helpers on RedactionPolicy.
tests/test_fuzz.py Adds unit tests for fuzzing determinism, violations, fault injection, and minimization.
tests/test_cli_fuzz.py Adds CLI contract tests for chainweaver fuzz (exit codes, determinism, saving/minimizing, redaction).
tests/fixtures/public_api.json Updates public API snapshot for newly exported fuzzing symbols.
README.md Documents FuzzConfigError and adds an example chainweaver fuzz invocation.
docs/cli.md Documents the new fuzz command flags, JSON output shape, and exit codes.
CHANGELOG.md Adds unreleased changelog entries for fuzzing, minimization, CLI command, and redaction helpers.
chainweaver/log_utils.py Adds RedactionPolicy.redact_step_record and redact_execution_result.
chainweaver/fuzz.py Introduces the property-based fuzzing harness + minimizer implementation.
chainweaver/cli.py Implements the chainweaver fuzz command and property resolution logic.
chainweaver/__init__.py Exports fuzzing API via __all__.

Comment thread chainweaver/fuzz.py Outdated
Comment thread chainweaver/fuzz.py Outdated
Comment thread chainweaver/cli.py
Comment thread chainweaver/cli.py
Comment thread chainweaver/cli.py
Comment thread chainweaver/fuzz.py Outdated
Comment thread chainweaver/fuzz.py Outdated
Comment thread chainweaver/cli.py
Comment thread chainweaver/cli.py
Comment thread chainweaver/cli.py
claude added 2 commits May 29, 2026 20:24
…onfig, redact CLI output

- Promote schema-driven value generator to supported public API
  (chainweaver.attest.generate_value / UnsupportedAnnotation); fuzz.py no
  longer imports private underscore helpers.
- Preserve full executor configuration under fault injection via new
  FlowExecutor.with_replaced_tools(...), so behavior no longer diverges
  when faults are enabled (middleware, caches, cost profile, redaction
  policy, decision callback are kept).
- Redact failing/minimized inputs emitted to stdout when --redact (the
  default) is set, not just saved traces, to avoid leaking secrets in CI.
- Sanitize flow/property names before building saved-failure filenames so
  ':' (callable specs) and path separators are cross-platform safe.
- Reject duplicate --property names up front (exit 1) instead of silently
  collapsing them in props_by_name.
- Add tests for all five fixes; update docs/cli.md and CHANGELOG.

https://claude.ai/code/session_01YCD9Z5YVhyJSbC2TTG8Jaz
Audit follow-up: the executor-arg docstring still described the
fault-injection path as building a 'fresh executor (sharing this one's
registry)' and no longer matched behavior after the per-case executor
began preserving the full configuration via with_replaced_tools. Update
it to describe config preservation and note that a configured step_cache
can return cached outputs and mask per-case faults for repeated inputs.

https://claude.ai/code/session_01YCD9Z5YVhyJSbC2TTG8Jaz
@dgenio dgenio merged commit 3e3ac00 into main May 29, 2026
15 checks passed
@dgenio dgenio deleted the claude/github-issues-triage-YrEz0 branch May 29, 2026 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants