Skip to content

test(automerge): shared corpus + BB selftest for bb-automerge.py parity#80

Merged
topcoder1 merged 7 commits into
mainfrom
claude/wave-4.4-bb-automerge-selftest
May 25, 2026
Merged

test(automerge): shared corpus + BB selftest for bb-automerge.py parity#80
topcoder1 merged 7 commits into
mainfrom
claude/wave-4.4-bb-automerge-selftest

Conversation

@topcoder1
Copy link
Copy Markdown
Owner

@topcoder1 topcoder1 commented May 25, 2026

Summary

Companion to topcoder1/dotclaude#96 for Wave 4.4 PR A — ~/bin/bb-automerge bash wrapper rewrite + bb-automerge.py HIGH_RISK_PATTERNS widening to GH parity.

This PR adds:

  1. selftest/risk_patterns_corpus.txt — single source of truth for RISKY/SAFE test cases. Format:

    • RISKY: must match high-risk in BOTH BB (bb-automerge.py) AND GH (claude-author-automerge.yml)
    • RISKY_BB: must match in BB only (e.g. bitbucket-pipelines.yml)
    • SAFE: must NOT match in either
  2. selftest/test_bb_automerge_risk_patterns.sh — new BB selftest. Imports find_high_risk from bb-automerge.py via importlib and runs all 136 corpus cases through it.

  3. selftest/test_automerge_risk_patterns.sh — refactored to read RISKY/SAFE from the shared corpus instead of inline arrays. Now covers 134 cases (skips the 2 RISKY_BB: entries).

Why

The Wave 4.3 PR 133 incident exposed that the BB and GH pattern sets had diverged. Without shared-corpus enforcement, future edits to either side will silently drift.

Beyond drift detection, the corpus now also pins INTENTIONAL parity-not-coverage tradeoffs as SAFE: entries — leaf-filename gaps (auth.ts, billing.py, secret.py, secrets.yaml, etc.) that are accepted because GH workflow has the same gap. The corpus makes "this is an accepted gap, not an oversight" explicit and version-controlled. If a future regex edit accidentally widens coverage, the SAFE-direction test catches it; if a future edit narrows it further, the RISKY entries catch THAT.

Step 0 verdict (full detail in dotclaude PR)

  • HEAD_SHA 1f312281a686 on PR 133 had Co-Authored-By: Claude trailer (present).
  • Root cause: ~/bin/bb-automerge bash wrapper had ZERO policy checks.
  • The bb-automerge.py patterns were dead code for the operator's actual flow.

Test plan

  • bash selftest/test_bb_automerge_risk_patterns.shPASS — all 136 cases
  • bash selftest/test_automerge_risk_patterns.shOK: all risk-pattern cases pass. (134 cases)
  • Reviewer confirms no drift between corpus and either pattern set (claude-author-automerge.yml vs bb-automerge.py HIGH_RISK_PATTERNS)

Auto-merge rationale

This PR touches selftest only — no workflow files, no operator-facing scripts. However, it polices the merge-gate policy itself → manual click-merge required (high-risk path: policy enforcement, companion to dotclaude#96).

Codex pre-review

Codex CAP-6 runs on the dotclaude side (Codex rounds: 6, hard cap). This PR's changes are purely additive test surface — no eligibility logic.

Companion PR

  • topcoder1/dotclaude#96 — claude/wave-4.4-bb-automerge-loadbearing (Python script + bash wrapper + install.sh fix)

…cy parity

The GH workflow regex (claude-author-automerge.yml) and the BB Python
regex (bb-automerge.py) policed the same risk-tier policy in different
languages. Wave 4.3 PR 133 incident showed the divergence: BB lacked
several auth/billing/infra categories AND the bash wrapper bypassed
policy entirely (fix in topcoder1/dotclaude@1a7ff39).

Shared corpus (`selftest/risk_patterns_corpus.txt`) is now the single
source of truth for RISKY/SAFE test cases. Format:
  RISKY:    must match high-risk in BOTH BB and GH
  RISKY_BB: must match in BB only (e.g. bitbucket-pipelines.yml)
  SAFE:     must NOT match in either

Both selftests read from it, so any drift in either pattern set is
caught:
- `selftest/test_automerge_risk_patterns.sh` (existing, refactored
  from inline arrays) tests the GH regex; processes 120 cases.
- `selftest/test_bb_automerge_risk_patterns.sh` (new) imports
  `find_high_risk` from bb-automerge.py via importlib and runs the
  corpus through it; processes 121 cases.

Includes infra/crontabs/* SAFE entries (wxa_vpn#250 lesson) so
over-broad infra/* patterns are caught.

Companion to topcoder1/dotclaude@1a7ff39 — both repos must merge
together (or the BB selftest will fail until dotclaude lands).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the risk:standard Risk class: standard label May 25, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 25, 2026

Coverage Floor — mode: enforce

metric value
measured 100.0%
floor (current) 99.0%
target 100.0%
last bumped 2026-05-12

Add SAFE corpus entries for the 5 paths Codex CAP-6 round 1 flagged
as regressions vs prior BB-only behavior:
  - src/auth.ts         (filename-named, not path-segment)
  - api/login.go        (filename-named)
  - billing.py          (filename-named)
  - stripe/webhooks.ts  (stripe segment dropped for GH parity)
  - src/stripe/client.go

All 5 are INTENTIONALLY ungated per Wave 4.4 plan (goal is parity
with GH workflow, which has the same segment-anchored behavior).
Repos that need filename or Stripe coverage put the path in their
`.github/risk-paths.yml` blocked list.

Adding to corpus pins the behavior so future regex edits that
"accidentally" start catching these paths get caught by the selftest
(false-positive direction is just as important to gate).

Total corpus now 126 cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude Bot commented May 25, 2026

No issues found. Test-infra only: corpus parsing logic, bash case ordering, and Python slice math are all correct; no trust-boundary or secret-leakage concerns.

topcoder1 and others added 5 commits May 25, 2026 11:43
…th.ts gap)

Codex CAP-6 round 3 flagged `secret.py` / `config/secret_key.py` as
regressions vs the prior `(^|/)(secret|...)` substring pattern. Same
underlying class as round 1's `auth.ts` / `billing.py` finding — the
new `(/|$)` segment anchor doesn't match leaf filenames. Per Wave 4.4
plan, this matches GH workflow behavior intentionally.

Adding these specific paths to corpus as SAFE so the accepted gap is
explicitly version-pinned (false-positive direction is gated too —
if a future regex edit accidentally widens to match these, the
selftest catches it).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lass)

Codex CAP-6 round 4 flagged src/secrets.py + config/secret_keys.yml
as regressions vs prior pattern. Same class as rounds 1-3 (leaf
filename vs path-segment matching tradeoff). Adding to corpus pins
the accepted gap; rounds 1-3 lessons apply identically.

Total corpus: 130 cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ompose

Codex CAP-6 round 5 flagged:
  - config/secret.json + src/secrets.ts: same leaf-filename class as
    rounds 1/3/4 (parity tradeoff vs GH).
  - deploy/local/docker-compose.yml + services/api/docker-compose.yaml:
    nested compose files. GH workflow has the same `^docker-compose.*`
    start-anchor gap, so this is also a parity tradeoff.

Both classes pinned as SAFE to make the accepted-gap explicit in
version control. Repos that nest compose files for staging/local
should add an explicit HIGH_RISK_PATTERNS entry + RISKY: corpus line.

Total corpus: 134 cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
deploy/local/docker-compose.yml is actually already gated (matches
`^deploy/.*` infra rule in both BB and GH). Codex round 5's claim
was a false positive on that specific path.

services/api/docker-compose.yaml IS a real gap (no `services/`
prefix rule in either side). Keep that as SAFE — pins the
intentional parity tradeoff.

Direct verification:
  $ python3 -c "import bba; print(bba.find_high_risk(['deploy/local/docker-compose.yml']))"
  ['deploy/local/docker-compose.yml']    # caught
  $ python3 -c "import bba; print(bba.find_high_risk(['services/api/docker-compose.yaml']))"
  []                                     # gap (intentional, parity with GH)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex CAP-6 round 6 (hard cap) flagged secrets.yaml + secret_manager.py
as regressions. Same parity-tradeoff class as rounds 1/3/4/5. Pinned
to corpus as SAFE.

Codex rounds: 6 (hard cap; same-class findings recur but each new
specific path is now corpus-pinned for explicit documentation).

Total corpus: 136 cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude Bot commented May 25, 2026

No issues found. Changes are pure test-infrastructure — shared corpus + BB selftest harness — with no logic bugs, injection risk, or coverage gaps at the 80%+ threshold.

@topcoder1 topcoder1 merged commit f4bc691 into main May 25, 2026
13 checks passed
@topcoder1 topcoder1 deleted the claude/wave-4.4-bb-automerge-selftest branch May 25, 2026 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

risk:standard Risk class: standard

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant