fix(dev): close review report contract gaps#379
Conversation
Address four gaps an automated review flagged on the anti-slop redesign: - 03-review-relevancy regains its mandatory `## Output` (R8 action anatomy). - Single-axis runs: an axis not run marks the sections it owns "Not run" (functional owns Phases and Verification), never a placeholder or invented data, so review-code alone stays valid. - Unmet acceptance criteria tagged `fix` now also append a `functional` row to Findings, so the header count matches the verdict instead of reading 0. - Verdict: a `not-applicable` or `fixed` criterion no longer blocks `approve`; only an unchecked `fix` criterion does. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: caeb8071a6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| 2. **Trace.** Fetch the diff, then walk each phase in plan order. For every acceptance criterion, check the box `[x]` only when the diff shows evidence, citing the `file:line`; leave it `[ ]` when unmet or partial, naming the gap. | ||
| 3. **Summarize.** Fill `Verification`: the percent and count of checked criteria, the files checked, every unchecked criterion tagged `fix`, `not-applicable`, or `fixed`, and any unplanned change in the diff that traces to no criterion. Write "none" for an empty list. | ||
| 4. **Record.** Write the phase blocks and the verification summary into `review.md`. No prose paragraphs, boxes and the summary only. | ||
| 4. **Count.** For each unmet criterion tagged `fix`, append a `functional` row to the `Findings` table (its Phase, the gap, a severity) so the header count matches the verdict. A `not-applicable` or `fixed` criterion adds no row. |
There was a problem hiding this comment.
Fill all columns for functional findings
When an acceptance criterion is tagged fix, this new step tells the reviewer to add only the phase, gap, and severity, but the shared Findings table in assets/review-template.md still requires Location, Issue, and Fix columns. In that scenario the generated review can end up with malformed or blank functional rows, or with invented values, so the new count no longer reliably satisfies the report contract; specify what functional rows should use for location and fix text.
Useful? React with 👍 / 👎.
| 2. **Trace.** Fetch the diff, then walk each phase in plan order. For every acceptance criterion, check the box `[x]` only when the diff shows evidence, citing the `file:line`; leave it `[ ]` when unmet or partial, naming the gap. | ||
| 3. **Summarize.** Fill `Verification`: the percent and count of checked criteria, the files checked, every unchecked criterion tagged `fix`, `not-applicable`, or `fixed`, and any unplanned change in the diff that traces to no criterion. Write "none" for an empty list. | ||
| 4. **Record.** Write the phase blocks and the verification summary into `review.md`. No prose paragraphs, boxes and the summary only. | ||
| 4. **Count.** For each unmet criterion tagged `fix`, append a `functional` row to the `Findings` table (its Phase, the gap, a severity) so the header count matches the verdict. A `not-applicable` or `fixed` criterion adds no row. |
There was a problem hiding this comment.
Align the router with functional findings
This adds functional rows, but the top-level router still says functional only writes Phases and Verification while only code and relevancy append to Findings (SKILL.md:27). When an agent follows the router/transversal rule for a full review, unchecked fix criteria can still be omitted from Findings, reintroducing the header-count/verdict mismatch this change is meant to close.
Useful? React with 👍 / 👎.
| One overall verdict, the strictest across the axes run: | ||
|
|
||
| - `approve`: no critical finding, and every acceptance criterion checked, ship it. | ||
| - `approve`: no critical finding, and no criterion left unchecked and tagged `fix`, ship it. A `not-applicable` or `fixed` criterion does not block. |
There was a problem hiding this comment.
Resolve the critical criterion verdict conflict
For a critical acceptance criterion marked not-applicable or fixed, this new approval rule says it no longer blocks, but the unchanged blocked rule below still says any unchecked critical criterion is blocked. That leaves report generation with two valid verdicts for the same verification state, so agents can keep returning blocked for the exact cases this change intends to approve; qualify the blocked rule by the same tag exception.
Useful? React with 👍 / 👎.
| 2. **Trace.** Fetch the diff, then walk each phase in plan order. For every acceptance criterion, check the box `[x]` only when the diff shows evidence, citing the `file:line`; leave it `[ ]` when unmet or partial, naming the gap. | ||
| 3. **Summarize.** Fill `Verification`: the percent and count of checked criteria, the files checked, every unchecked criterion tagged `fix`, `not-applicable`, or `fixed`, and any unplanned change in the diff that traces to no criterion. Write "none" for an empty list. | ||
| 4. **Record.** Write the phase blocks and the verification summary into `review.md`. No prose paragraphs, boxes and the summary only. | ||
| 4. **Count.** For each unmet criterion tagged `fix`, append a `functional` row to the `Findings` table (its Phase, the gap, a severity) so the header count matches the verdict. A `not-applicable` or `fixed` criterion adds no row. |
There was a problem hiding this comment.
Keep blocking functional rows out of minor counts
When an unchecked criterion is tagged fix, the verdict rules make it block approval regardless of severity, but this new instruction lets the reviewer choose any severity for the row. If they mark a small unmet criterion as 🟢 minor, the header can read only minor findings while the verdict is still changes-requested, so the count still does not explain the verdict; require fix rows to use a blocking severity or update the verdict/counting rule.
Useful? React with 👍 / 👎.
Follow-up to the review-contract fixes, closing four consistency gaps: - Functional findings fill every Findings column (Location, Issue, Fix), not just phase and gap, so no row is blank or invented. - A functional `fix` row uses a blocking severity (critical or warning), never minor, so the header count always explains the verdict. - The router rule now states functional also appends its `fix` rows to Findings, matching the action. - The `blocked` rule carries the same tag exception as `approve`: a `not-applicable` or `fixed` criterion never blocks, whatever its severity. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c09aaaa301
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| 2. **Trace.** Fetch the diff, then walk each phase in plan order. For every acceptance criterion, check the box `[x]` only when the diff shows evidence, citing the `file:line`; leave it `[ ]` when unmet or partial, naming the gap. | ||
| 3. **Summarize.** Fill `Verification`: the percent and count of checked criteria, the files checked, every unchecked criterion tagged `fix`, `not-applicable`, or `fixed`, and any unplanned change in the diff that traces to no criterion. Write "none" for an empty list. | ||
| 4. **Record.** Write the phase blocks and the verification summary into `review.md`. No prose paragraphs, boxes and the summary only. | ||
| 4. **Count.** For each unmet criterion tagged `fix`, append a `functional` row to the `Findings` table with every column filled: Phase, Location (the phase's target file, or `-`), Issue (the unmet criterion and its gap), Fix (what to implement), and a blocking severity (🔴 critical or 🟡 warning, never 🟢 minor), so the header count matches the verdict. A `not-applicable` or `fixed` criterion adds no row. |
There was a problem hiding this comment.
Require critical functional gaps to count as critical
When an unmet acceptance criterion is itself critical and tagged fix, the rubric's blocked rule still applies, but this instruction permits recording that row as 🟡 warning. That can produce a blocked verdict with 0 critical in the header, so the Findings count still does not explain the verdict for critical functional gaps; require critical fix criteria to use 🔴 critical here.
Useful? React with 👍 / 👎.
| - Report: fill `review.md` from `assets/review-template.md`. One shared `Findings` table for every axis: functional writes the `Phases` boxes and the `Verification` table, code and relevancy append rows to `Findings` under their `Kind`. Tables and boxes, no prose, no per-axis sections. The Phase column ties a finding to the plan when one is in scope, `-` otherwise. | ||
| - Report: fill `review.md` from `assets/review-template.md`. One shared `Findings` table for every axis: functional writes the `Phases` boxes and the `Verification` table, and appends a `functional` row to `Findings` for each unmet criterion tagged `fix`; code and relevancy append rows to `Findings` under their `Kind`. Tables and boxes, no prose, no per-axis sections. The Phase column ties a finding to the plan when one is in scope, `-` otherwise. | ||
| - Sections: the report has exactly the sections in `assets/review-validator.yml`. Before returning, verify against it and remove any section not listed. | ||
| - Not run: every required section always exists. An axis that did not run marks the sections it owns "Not run" (functional owns `Phases` and `Verification`); it never leaves a placeholder or invents data. |
There was a problem hiding this comment.
Mark skipped code and relevancy axes too
When the caller names review-functional, the code and relevancy axes are skipped, but this new rule only gives a Not run marker to functional-owned sections. Because Findings is shared and the template still allows None. when the run finds nothing, a functional-only report with no fix rows can read like a full clean review instead of a review where code and relevancy never ran; add an explicit skipped-axis marker or record which axes ran.
Useful? React with 👍 / 👎.
Two more consistency gaps from review feedback: - A `fix` functional row uses 🔴 critical when the criterion itself is critical, otherwise 🟡 warning, so a blocked verdict never shows 0 critical. - The header carries `Axes run`, so a skipped code or relevancy axis is visible even though Findings is shared; a functional-only report no longer reads like a clean full review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fixes four review-contract gaps flagged on #377: relevancy
## Output(R8); single-axis runs mark unrun sections "Not run"; unmetfixcriteria append afunctionalFindings row so count matches verdict;not-applicable/fixedno longer blockapprove. Validated with real e2e reviews (empty-diff, code-only, full).🤖 Generated with Claude Code