feat: add property-based fuzzing harness, trace minimization, and CLI (#220, #221, #222, #217) by dgenio · Pull Request #223 · dgenio/ChainWeaver

dgenio · 2026-05-29T07:38:50Z

Add an active failure-discovery layer that complements ChainWeaver's existing
replay/diff/attest reproduction primitives.

chainweaver/fuzz.py (Add property-based fuzzing harness for ChainWeaver flows #220): FlowFuzzer generates/mutates inputs from a flow's
input_schema (or a base input), optionally injects malformed tool outputs via
a seeded fault hook, executes the flow, and records FlowProperty violations as
replayable ExecutionResult traces. All randomness is seeded so runs are
reproducible; the executor stays randomness-free. Ships FlowProperty,
FaultConfig, FuzzCase/FuzzFailure/FuzzReport, BUILTIN_PROPERTIES, and
FuzzConfigError.
minimize_failure (Minimize failing execution traces into smallest reproducible failure #221): delta-debugs a failing input to the smallest
reproducer that still violates a property, re-verifying each reduction.
chainweaver fuzz CLI (Add chainweaver fuzz command for property-based flow testing #222): --property (builtin or module:attr), --runs,
--seed, --input/--input-file, --output-fault-prob, --minimize, --save-failures
(redacted by default), --format. Non-zero exit on violation. Documented in
docs/cli.md.
RedactionPolicy.redact_step_record / redact_execution_result (Add RedactionPolicy.redact_step_record / redact_execution_result helpers #217): return
redacted copies of StepRecord / ExecutionResult, used by the save-failures
path. Implemented with Pydantic model_copy (the models are BaseModel with
string-only error fields, not dataclasses carrying live exceptions).

Tests cover detection, seed determinism, fault injection, minimization,
redaction round-trips, and the full CLI exit-code contract. All four validation
commands pass; project coverage 91%.

https://claude.ai/code/session_01KQy8CtFEATMZ8rZmkepQ1Y

…#220, #221, #222, #217) Add an active failure-discovery layer that complements ChainWeaver's existing replay/diff/attest reproduction primitives. - chainweaver/fuzz.py (#220): FlowFuzzer generates/mutates inputs from a flow's input_schema (or a base input), optionally injects malformed tool outputs via a seeded fault hook, executes the flow, and records FlowProperty violations as replayable ExecutionResult traces. All randomness is seeded so runs are reproducible; the executor stays randomness-free. Ships FlowProperty, FaultConfig, FuzzCase/FuzzFailure/FuzzReport, BUILTIN_PROPERTIES, and FuzzConfigError. - minimize_failure (#221): delta-debugs a failing input to the smallest reproducer that still violates a property, re-verifying each reduction. - chainweaver fuzz CLI (#222): --property (builtin or module:attr), --runs, --seed, --input/--input-file, --output-fault-prob, --minimize, --save-failures (redacted by default), --format. Non-zero exit on violation. Documented in docs/cli.md. - RedactionPolicy.redact_step_record / redact_execution_result (#217): return redacted copies of StepRecord / ExecutionResult, used by the save-failures path. Implemented with Pydantic model_copy (the models are BaseModel with string-only error fields, not dataclasses carrying live exceptions). Tests cover detection, seed determinism, fault injection, minimization, redaction round-trips, and the full CLI exit-code contract. All four validation commands pass; project coverage 91%. https://claude.ai/code/session_01KQy8CtFEATMZ8rZmkepQ1Y

github-actions

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'ChainWeaver microbenchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.25.

Benchmark suite	Current: `98a777c`	Previous: `6fd70ff`	Ratio
`compiled_overhead_ms_n5_llm200_tool0`	`0.16273200003524835` ms	`0.12229499998284155` ms	`1.33`

This comment was automatically generated by workflow using github-action-benchmark.

CC: @dgenio

Copilot

Pull request overview

Adds an active failure-discovery layer to ChainWeaver by introducing a deterministic, property-based fuzzing harness for flows, optional tool-output fault injection, failure minimization, and a chainweaver fuzz CLI workflow that can save replayable traces (with redaction support).

Changes:

Adds chainweaver.fuzz with FlowFuzzer, FlowProperty, FaultConfig, result models, and minimize_failure(...) for deterministic shrinking.
Adds chainweaver fuzz CLI command with --property, --runs/--seed, optional fault injection, optional minimization, and optional saving of (redacted) failing traces.
Extends RedactionPolicy with trace-level helpers (redact_step_record / redact_execution_result) and adds tests + docs + public API snapshot updates for the new surface.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
`tests/test_redaction.py`	Adds coverage for new trace-redaction helpers on `RedactionPolicy`.
`tests/test_fuzz.py`	Adds unit tests for fuzzing determinism, violations, fault injection, and minimization.
`tests/test_cli_fuzz.py`	Adds CLI contract tests for `chainweaver fuzz` (exit codes, determinism, saving/minimizing, redaction).
`tests/fixtures/public_api.json`	Updates public API snapshot for newly exported fuzzing symbols.
`README.md`	Documents `FuzzConfigError` and adds an example `chainweaver fuzz` invocation.
`docs/cli.md`	Documents the new `fuzz` command flags, JSON output shape, and exit codes.
`CHANGELOG.md`	Adds unreleased changelog entries for fuzzing, minimization, CLI command, and redaction helpers.
`chainweaver/log_utils.py`	Adds `RedactionPolicy.redact_step_record` and `redact_execution_result`.
`chainweaver/fuzz.py`	Introduces the property-based fuzzing harness + minimizer implementation.
`chainweaver/cli.py`	Implements the `chainweaver fuzz` command and property resolution logic.
`chainweaver/__init__.py`	Exports fuzzing API via `__all__`.

…onfig, redact CLI output - Promote schema-driven value generator to supported public API (chainweaver.attest.generate_value / UnsupportedAnnotation); fuzz.py no longer imports private underscore helpers. - Preserve full executor configuration under fault injection via new FlowExecutor.with_replaced_tools(...), so behavior no longer diverges when faults are enabled (middleware, caches, cost profile, redaction policy, decision callback are kept). - Redact failing/minimized inputs emitted to stdout when --redact (the default) is set, not just saved traces, to avoid leaking secrets in CI. - Sanitize flow/property names before building saved-failure filenames so ':' (callable specs) and path separators are cross-platform safe. - Reject duplicate --property names up front (exit 1) instead of silently collapsing them in props_by_name. - Add tests for all five fixes; update docs/cli.md and CHANGELOG. https://claude.ai/code/session_01YCD9Z5YVhyJSbC2TTG8Jaz

Audit follow-up: the executor-arg docstring still described the fault-injection path as building a 'fresh executor (sharing this one's registry)' and no longer matched behavior after the per-case executor began preserving the full configuration via with_replaced_tools. Update it to describe config preservation and note that a configured step_cache can return cached outputs and mask per-case faults for repeated inputs. https://claude.ai/code/session_01YCD9Z5YVhyJSbC2TTG8Jaz

Copilot AI review requested due to automatic review settings May 29, 2026 07:38

Copilot started reviewing on behalf of dgenio May 29, 2026 07:39 View session

github-actions Bot reviewed May 29, 2026

View reviewed changes

Copilot AI reviewed May 29, 2026

View reviewed changes

claude added 2 commits May 29, 2026 20:24

dgenio merged commit 3e3ac00 into main May 29, 2026
15 checks passed

dgenio deleted the claude/github-issues-triage-YrEz0 branch May 29, 2026 21:06

This was referenced May 30, 2026

Add property-based fuzzing harness for ChainWeaver flows #220

Closed

Minimize failing execution traces into smallest reproducible failure #221

Closed

Add chainweaver fuzz command for property-based flow testing #222

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add property-based fuzzing harness, trace minimization, and CLI (#220, #221, #222, #217)#223

feat: add property-based fuzzing harness, trace minimization, and CLI (#220, #221, #222, #217)#223
dgenio merged 3 commits into
mainfrom
claude/github-issues-triage-YrEz0

dgenio commented May 29, 2026

Uh oh!

github-actions Bot left a comment •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dgenio commented May 29, 2026

Uh oh!

github-actions Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot left a comment •

edited

Loading