test(automerge): shared corpus + BB selftest for bb-automerge.py parity#80
Merged
Merged
Conversation
…cy parity The GH workflow regex (claude-author-automerge.yml) and the BB Python regex (bb-automerge.py) policed the same risk-tier policy in different languages. Wave 4.3 PR 133 incident showed the divergence: BB lacked several auth/billing/infra categories AND the bash wrapper bypassed policy entirely (fix in topcoder1/dotclaude@1a7ff39). Shared corpus (`selftest/risk_patterns_corpus.txt`) is now the single source of truth for RISKY/SAFE test cases. Format: RISKY: must match high-risk in BOTH BB and GH RISKY_BB: must match in BB only (e.g. bitbucket-pipelines.yml) SAFE: must NOT match in either Both selftests read from it, so any drift in either pattern set is caught: - `selftest/test_automerge_risk_patterns.sh` (existing, refactored from inline arrays) tests the GH regex; processes 120 cases. - `selftest/test_bb_automerge_risk_patterns.sh` (new) imports `find_high_risk` from bb-automerge.py via importlib and runs the corpus through it; processes 121 cases. Includes infra/crontabs/* SAFE entries (wxa_vpn#250 lesson) so over-broad infra/* patterns are caught. Companion to topcoder1/dotclaude@1a7ff39 — both repos must merge together (or the BB selftest will fail until dotclaude lands). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Coverage Floor — mode:
|
Add SAFE corpus entries for the 5 paths Codex CAP-6 round 1 flagged as regressions vs prior BB-only behavior: - src/auth.ts (filename-named, not path-segment) - api/login.go (filename-named) - billing.py (filename-named) - stripe/webhooks.ts (stripe segment dropped for GH parity) - src/stripe/client.go All 5 are INTENTIONALLY ungated per Wave 4.4 plan (goal is parity with GH workflow, which has the same segment-anchored behavior). Repos that need filename or Stripe coverage put the path in their `.github/risk-paths.yml` blocked list. Adding to corpus pins the behavior so future regex edits that "accidentally" start catching these paths get caught by the selftest (false-positive direction is just as important to gate). Total corpus now 126 cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No issues found. Test-infra only: corpus parsing logic, bash case ordering, and Python slice math are all correct; no trust-boundary or secret-leakage concerns. |
…th.ts gap) Codex CAP-6 round 3 flagged `secret.py` / `config/secret_key.py` as regressions vs the prior `(^|/)(secret|...)` substring pattern. Same underlying class as round 1's `auth.ts` / `billing.py` finding — the new `(/|$)` segment anchor doesn't match leaf filenames. Per Wave 4.4 plan, this matches GH workflow behavior intentionally. Adding these specific paths to corpus as SAFE so the accepted gap is explicitly version-pinned (false-positive direction is gated too — if a future regex edit accidentally widens to match these, the selftest catches it). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lass) Codex CAP-6 round 4 flagged src/secrets.py + config/secret_keys.yml as regressions vs prior pattern. Same class as rounds 1-3 (leaf filename vs path-segment matching tradeoff). Adding to corpus pins the accepted gap; rounds 1-3 lessons apply identically. Total corpus: 130 cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ompose
Codex CAP-6 round 5 flagged:
- config/secret.json + src/secrets.ts: same leaf-filename class as
rounds 1/3/4 (parity tradeoff vs GH).
- deploy/local/docker-compose.yml + services/api/docker-compose.yaml:
nested compose files. GH workflow has the same `^docker-compose.*`
start-anchor gap, so this is also a parity tradeoff.
Both classes pinned as SAFE to make the accepted-gap explicit in
version control. Repos that nest compose files for staging/local
should add an explicit HIGH_RISK_PATTERNS entry + RISKY: corpus line.
Total corpus: 134 cases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
deploy/local/docker-compose.yml is actually already gated (matches `^deploy/.*` infra rule in both BB and GH). Codex round 5's claim was a false positive on that specific path. services/api/docker-compose.yaml IS a real gap (no `services/` prefix rule in either side). Keep that as SAFE — pins the intentional parity tradeoff. Direct verification: $ python3 -c "import bba; print(bba.find_high_risk(['deploy/local/docker-compose.yml']))" ['deploy/local/docker-compose.yml'] # caught $ python3 -c "import bba; print(bba.find_high_risk(['services/api/docker-compose.yaml']))" [] # gap (intentional, parity with GH) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex CAP-6 round 6 (hard cap) flagged secrets.yaml + secret_manager.py as regressions. Same parity-tradeoff class as rounds 1/3/4/5. Pinned to corpus as SAFE. Codex rounds: 6 (hard cap; same-class findings recur but each new specific path is now corpus-pinned for explicit documentation). Total corpus: 136 cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No issues found. Changes are pure test-infrastructure — shared corpus + BB selftest harness — with no logic bugs, injection risk, or coverage gaps at the 80%+ threshold. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Companion to topcoder1/dotclaude#96 for Wave 4.4 PR A —
~/bin/bb-automergebash wrapper rewrite +bb-automerge.pyHIGH_RISK_PATTERNS widening to GH parity.This PR adds:
selftest/risk_patterns_corpus.txt— single source of truth for RISKY/SAFE test cases. Format:RISKY:must match high-risk in BOTH BB (bb-automerge.py) AND GH (claude-author-automerge.yml)RISKY_BB:must match in BB only (e.g.bitbucket-pipelines.yml)SAFE:must NOT match in eitherselftest/test_bb_automerge_risk_patterns.sh— new BB selftest. Importsfind_high_riskfrombb-automerge.pyvia importlib and runs all 136 corpus cases through it.selftest/test_automerge_risk_patterns.sh— refactored to read RISKY/SAFE from the shared corpus instead of inline arrays. Now covers 134 cases (skips the 2RISKY_BB:entries).Why
The Wave 4.3 PR 133 incident exposed that the BB and GH pattern sets had diverged. Without shared-corpus enforcement, future edits to either side will silently drift.
Beyond drift detection, the corpus now also pins INTENTIONAL parity-not-coverage tradeoffs as
SAFE:entries — leaf-filename gaps (auth.ts,billing.py,secret.py,secrets.yaml, etc.) that are accepted because GH workflow has the same gap. The corpus makes "this is an accepted gap, not an oversight" explicit and version-controlled. If a future regex edit accidentally widens coverage, the SAFE-direction test catches it; if a future edit narrows it further, the RISKY entries catch THAT.Step 0 verdict (full detail in dotclaude PR)
1f312281a686on PR 133 hadCo-Authored-By: Claudetrailer (present).~/bin/bb-automergebash wrapper had ZERO policy checks.bb-automerge.pypatterns were dead code for the operator's actual flow.Test plan
bash selftest/test_bb_automerge_risk_patterns.sh→PASS — all 136 casesbash selftest/test_automerge_risk_patterns.sh→OK: all risk-pattern cases pass.(134 cases)claude-author-automerge.ymlvsbb-automerge.pyHIGH_RISK_PATTERNS)Auto-merge rationale
This PR touches selftest only — no workflow files, no operator-facing scripts. However, it polices the merge-gate policy itself → manual click-merge required (high-risk path: policy enforcement, companion to dotclaude#96).
Codex pre-review
Codex CAP-6 runs on the dotclaude side (Codex rounds: 6, hard cap). This PR's changes are purely additive test surface — no eligibility logic.
Companion PR
claude/wave-4.4-bb-automerge-loadbearing(Python script + bash wrapper + install.sh fix)