feat: add property-based fuzzing harness, trace minimization, and CLI (#220, #221, #222, #217)#223
Merged
Merged
Conversation
…#220, #221, #222, #217) Add an active failure-discovery layer that complements ChainWeaver's existing replay/diff/attest reproduction primitives. - chainweaver/fuzz.py (#220): FlowFuzzer generates/mutates inputs from a flow's input_schema (or a base input), optionally injects malformed tool outputs via a seeded fault hook, executes the flow, and records FlowProperty violations as replayable ExecutionResult traces. All randomness is seeded so runs are reproducible; the executor stays randomness-free. Ships FlowProperty, FaultConfig, FuzzCase/FuzzFailure/FuzzReport, BUILTIN_PROPERTIES, and FuzzConfigError. - minimize_failure (#221): delta-debugs a failing input to the smallest reproducer that still violates a property, re-verifying each reduction. - chainweaver fuzz CLI (#222): --property (builtin or module:attr), --runs, --seed, --input/--input-file, --output-fault-prob, --minimize, --save-failures (redacted by default), --format. Non-zero exit on violation. Documented in docs/cli.md. - RedactionPolicy.redact_step_record / redact_execution_result (#217): return redacted copies of StepRecord / ExecutionResult, used by the save-failures path. Implemented with Pydantic model_copy (the models are BaseModel with string-only error fields, not dataclasses carrying live exceptions). Tests cover detection, seed determinism, fault injection, minimization, redaction round-trips, and the full CLI exit-code contract. All four validation commands pass; project coverage 91%. https://claude.ai/code/session_01KQy8CtFEATMZ8rZmkepQ1Y
Contributor
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'ChainWeaver microbenchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.25.
| Benchmark suite | Current: 98a777c | Previous: 6fd70ff | Ratio |
|---|---|---|---|
compiled_overhead_ms_n5_llm200_tool0 |
0.16273200003524835 ms |
0.12229499998284155 ms |
1.33 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @dgenio
There was a problem hiding this comment.
Pull request overview
Adds an active failure-discovery layer to ChainWeaver by introducing a deterministic, property-based fuzzing harness for flows, optional tool-output fault injection, failure minimization, and a chainweaver fuzz CLI workflow that can save replayable traces (with redaction support).
Changes:
- Adds
chainweaver.fuzzwithFlowFuzzer,FlowProperty,FaultConfig, result models, andminimize_failure(...)for deterministic shrinking. - Adds
chainweaver fuzzCLI command with--property,--runs/--seed, optional fault injection, optional minimization, and optional saving of (redacted) failing traces. - Extends
RedactionPolicywith trace-level helpers (redact_step_record/redact_execution_result) and adds tests + docs + public API snapshot updates for the new surface.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_redaction.py |
Adds coverage for new trace-redaction helpers on RedactionPolicy. |
tests/test_fuzz.py |
Adds unit tests for fuzzing determinism, violations, fault injection, and minimization. |
tests/test_cli_fuzz.py |
Adds CLI contract tests for chainweaver fuzz (exit codes, determinism, saving/minimizing, redaction). |
tests/fixtures/public_api.json |
Updates public API snapshot for newly exported fuzzing symbols. |
README.md |
Documents FuzzConfigError and adds an example chainweaver fuzz invocation. |
docs/cli.md |
Documents the new fuzz command flags, JSON output shape, and exit codes. |
CHANGELOG.md |
Adds unreleased changelog entries for fuzzing, minimization, CLI command, and redaction helpers. |
chainweaver/log_utils.py |
Adds RedactionPolicy.redact_step_record and redact_execution_result. |
chainweaver/fuzz.py |
Introduces the property-based fuzzing harness + minimizer implementation. |
chainweaver/cli.py |
Implements the chainweaver fuzz command and property resolution logic. |
chainweaver/__init__.py |
Exports fuzzing API via __all__. |
…onfig, redact CLI output - Promote schema-driven value generator to supported public API (chainweaver.attest.generate_value / UnsupportedAnnotation); fuzz.py no longer imports private underscore helpers. - Preserve full executor configuration under fault injection via new FlowExecutor.with_replaced_tools(...), so behavior no longer diverges when faults are enabled (middleware, caches, cost profile, redaction policy, decision callback are kept). - Redact failing/minimized inputs emitted to stdout when --redact (the default) is set, not just saved traces, to avoid leaking secrets in CI. - Sanitize flow/property names before building saved-failure filenames so ':' (callable specs) and path separators are cross-platform safe. - Reject duplicate --property names up front (exit 1) instead of silently collapsing them in props_by_name. - Add tests for all five fixes; update docs/cli.md and CHANGELOG. https://claude.ai/code/session_01YCD9Z5YVhyJSbC2TTG8Jaz
Audit follow-up: the executor-arg docstring still described the fault-injection path as building a 'fresh executor (sharing this one's registry)' and no longer matched behavior after the per-case executor began preserving the full configuration via with_replaced_tools. Update it to describe config preservation and note that a configured step_cache can return cached outputs and mask per-case faults for repeated inputs. https://claude.ai/code/session_01YCD9Z5YVhyJSbC2TTG8Jaz
This was referenced May 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add an active failure-discovery layer that complements ChainWeaver's existing
replay/diff/attest reproduction primitives.
input_schema (or a base input), optionally injects malformed tool outputs via
a seeded fault hook, executes the flow, and records FlowProperty violations as
replayable ExecutionResult traces. All randomness is seeded so runs are
reproducible; the executor stays randomness-free. Ships FlowProperty,
FaultConfig, FuzzCase/FuzzFailure/FuzzReport, BUILTIN_PROPERTIES, and
FuzzConfigError.
reproducer that still violates a property, re-verifying each reduction.
--seed, --input/--input-file, --output-fault-prob, --minimize, --save-failures
(redacted by default), --format. Non-zero exit on violation. Documented in
docs/cli.md.
redacted copies of StepRecord / ExecutionResult, used by the save-failures
path. Implemented with Pydantic model_copy (the models are BaseModel with
string-only error fields, not dataclasses carrying live exceptions).
Tests cover detection, seed determinism, fault injection, minimization,
redaction round-trips, and the full CLI exit-code contract. All four validation
commands pass; project coverage 91%.
https://claude.ai/code/session_01KQy8CtFEATMZ8rZmkepQ1Y