test(automerge): shared corpus + BB selftest for bb-automerge.py parity by topcoder1 · Pull Request #80 · topcoder1/ci-workflows

topcoder1 · 2026-05-25T18:36:22Z

Summary

Companion to topcoder1/dotclaude#96 for Wave 4.4 PR A — ~/bin/bb-automerge bash wrapper rewrite + bb-automerge.py HIGH_RISK_PATTERNS widening to GH parity.

This PR adds:

selftest/risk_patterns_corpus.txt — single source of truth for RISKY/SAFE test cases. Format:
- RISKY: must match high-risk in BOTH BB (bb-automerge.py) AND GH (claude-author-automerge.yml)
- RISKY_BB: must match in BB only (e.g. bitbucket-pipelines.yml)
- SAFE: must NOT match in either
selftest/test_bb_automerge_risk_patterns.sh — new BB selftest. Imports find_high_risk from bb-automerge.py via importlib and runs all 136 corpus cases through it.
selftest/test_automerge_risk_patterns.sh — refactored to read RISKY/SAFE from the shared corpus instead of inline arrays. Now covers 134 cases (skips the 2 RISKY_BB: entries).

Why

The Wave 4.3 PR 133 incident exposed that the BB and GH pattern sets had diverged. Without shared-corpus enforcement, future edits to either side will silently drift.

Beyond drift detection, the corpus now also pins INTENTIONAL parity-not-coverage tradeoffs as SAFE: entries — leaf-filename gaps (auth.ts, billing.py, secret.py, secrets.yaml, etc.) that are accepted because GH workflow has the same gap. The corpus makes "this is an accepted gap, not an oversight" explicit and version-controlled. If a future regex edit accidentally widens coverage, the SAFE-direction test catches it; if a future edit narrows it further, the RISKY entries catch THAT.

Step 0 verdict (full detail in dotclaude PR)

HEAD_SHA 1f312281a686 on PR 133 had Co-Authored-By: Claude trailer (present).
Root cause: ~/bin/bb-automerge bash wrapper had ZERO policy checks.
The bb-automerge.py patterns were dead code for the operator's actual flow.

Test plan

bash selftest/test_bb_automerge_risk_patterns.sh → PASS — all 136 cases
bash selftest/test_automerge_risk_patterns.sh → OK: all risk-pattern cases pass. (134 cases)
Reviewer confirms no drift between corpus and either pattern set (claude-author-automerge.yml vs bb-automerge.py HIGH_RISK_PATTERNS)

Auto-merge rationale

This PR touches selftest only — no workflow files, no operator-facing scripts. However, it polices the merge-gate policy itself → manual click-merge required (high-risk path: policy enforcement, companion to dotclaude#96).

Codex pre-review

Codex CAP-6 runs on the dotclaude side (Codex rounds: 6, hard cap). This PR's changes are purely additive test surface — no eligibility logic.

Companion PR

topcoder1/dotclaude#96 — claude/wave-4.4-bb-automerge-loadbearing (Python script + bash wrapper + install.sh fix)

…cy parity The GH workflow regex (claude-author-automerge.yml) and the BB Python regex (bb-automerge.py) policed the same risk-tier policy in different languages. Wave 4.3 PR 133 incident showed the divergence: BB lacked several auth/billing/infra categories AND the bash wrapper bypassed policy entirely (fix in topcoder1/dotclaude@1a7ff39). Shared corpus (`selftest/risk_patterns_corpus.txt`) is now the single source of truth for RISKY/SAFE test cases. Format: RISKY: must match high-risk in BOTH BB and GH RISKY_BB: must match in BB only (e.g. bitbucket-pipelines.yml) SAFE: must NOT match in either Both selftests read from it, so any drift in either pattern set is caught: - `selftest/test_automerge_risk_patterns.sh` (existing, refactored from inline arrays) tests the GH regex; processes 120 cases. - `selftest/test_bb_automerge_risk_patterns.sh` (new) imports `find_high_risk` from bb-automerge.py via importlib and runs the corpus through it; processes 121 cases. Includes infra/crontabs/* SAFE entries (wxa_vpn#250 lesson) so over-broad infra/* patterns are caught. Companion to topcoder1/dotclaude@1a7ff39 — both repos must merge together (or the BB selftest will fail until dotclaude lands). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-25T18:36:59Z

Coverage Floor — mode: enforce

metric	value
measured	`100.0%`
floor (current)	`99.0%`
target	`100.0%`
last bumped	`2026-05-12`

Add SAFE corpus entries for the 5 paths Codex CAP-6 round 1 flagged as regressions vs prior BB-only behavior: - src/auth.ts (filename-named, not path-segment) - api/login.go (filename-named) - billing.py (filename-named) - stripe/webhooks.ts (stripe segment dropped for GH parity) - src/stripe/client.go All 5 are INTENTIONALLY ungated per Wave 4.4 plan (goal is parity with GH workflow, which has the same segment-anchored behavior). Repos that need filename or Stripe coverage put the path in their `.github/risk-paths.yml` blocked list. Adding to corpus pins the behavior so future regex edits that "accidentally" start catching these paths get caught by the selftest (false-positive direction is just as important to gate). Total corpus now 126 cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude · 2026-05-25T18:42:55Z

No issues found. Test-infra only: corpus parsing logic, bash case ordering, and Python slice math are all correct; no trust-boundary or secret-leakage concerns.

…th.ts gap) Codex CAP-6 round 3 flagged `secret.py` / `config/secret_key.py` as regressions vs the prior `(^|/)(secret|...)` substring pattern. Same underlying class as round 1's `auth.ts` / `billing.py` finding — the new `(/|$)` segment anchor doesn't match leaf filenames. Per Wave 4.4 plan, this matches GH workflow behavior intentionally. Adding these specific paths to corpus as SAFE so the accepted gap is explicitly version-pinned (false-positive direction is gated too — if a future regex edit accidentally widens to match these, the selftest catches it). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…lass) Codex CAP-6 round 4 flagged src/secrets.py + config/secret_keys.yml as regressions vs prior pattern. Same class as rounds 1-3 (leaf filename vs path-segment matching tradeoff). Adding to corpus pins the accepted gap; rounds 1-3 lessons apply identically. Total corpus: 130 cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ompose Codex CAP-6 round 5 flagged: - config/secret.json + src/secrets.ts: same leaf-filename class as rounds 1/3/4 (parity tradeoff vs GH). - deploy/local/docker-compose.yml + services/api/docker-compose.yaml: nested compose files. GH workflow has the same `^docker-compose.*` start-anchor gap, so this is also a parity tradeoff. Both classes pinned as SAFE to make the accepted-gap explicit in version control. Repos that nest compose files for staging/local should add an explicit HIGH_RISK_PATTERNS entry + RISKY: corpus line. Total corpus: 134 cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

deploy/local/docker-compose.yml is actually already gated (matches `^deploy/.*` infra rule in both BB and GH). Codex round 5's claim was a false positive on that specific path. services/api/docker-compose.yaml IS a real gap (no `services/` prefix rule in either side). Keep that as SAFE — pins the intentional parity tradeoff. Direct verification: $ python3 -c "import bba; print(bba.find_high_risk(['deploy/local/docker-compose.yml']))" ['deploy/local/docker-compose.yml'] # caught $ python3 -c "import bba; print(bba.find_high_risk(['services/api/docker-compose.yaml']))" [] # gap (intentional, parity with GH) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Codex CAP-6 round 6 (hard cap) flagged secrets.yaml + secret_manager.py as regressions. Same parity-tradeoff class as rounds 1/3/4/5. Pinned to corpus as SAFE. Codex rounds: 6 (hard cap; same-class findings recur but each new specific path is now corpus-pinned for explicit documentation). Total corpus: 136 cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude · 2026-05-25T18:53:19Z

No issues found. Changes are pure test-infrastructure — shared corpus + BB selftest harness — with no logic bugs, injection risk, or coverage gaps at the 80%+ threshold.

github-actions Bot added the risk:standard Risk class: standard label May 25, 2026

topcoder1 and others added 5 commits May 25, 2026 11:43

topcoder1 merged commit f4bc691 into main May 25, 2026
13 checks passed

topcoder1 deleted the claude/wave-4.4-bb-automerge-selftest branch May 25, 2026 21:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(automerge): shared corpus + BB selftest for bb-automerge.py parity#80

test(automerge): shared corpus + BB selftest for bb-automerge.py parity#80
topcoder1 merged 7 commits into
mainfrom
claude/wave-4.4-bb-automerge-selftest

topcoder1 commented May 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 25, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 25, 2026

Uh oh!

claude Bot commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

topcoder1 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Step 0 verdict (full detail in dotclaude PR)

Test plan

Auto-merge rationale

Codex pre-review

Companion PR

Uh oh!

github-actions Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented May 25, 2026

Uh oh!

claude Bot commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

topcoder1 commented May 25, 2026 •

edited

Loading

github-actions Bot commented May 25, 2026 •

edited

Loading