Skip to content

fix(dev): close review report contract gaps#379

Open
blafourcade wants to merge 3 commits into
nextfrom
fix/review-contract-gaps-2
Open

fix(dev): close review report contract gaps#379
blafourcade wants to merge 3 commits into
nextfrom
fix/review-contract-gaps-2

Conversation

@blafourcade

Copy link
Copy Markdown
Contributor

Fixes four review-contract gaps flagged on #377: relevancy ## Output (R8); single-axis runs mark unrun sections "Not run"; unmet fix criteria append a functional Findings row so count matches verdict; not-applicable/fixed no longer block approve. Validated with real e2e reviews (empty-diff, code-only, full).

🤖 Generated with Claude Code

Address four gaps an automated review flagged on the anti-slop redesign:

- 03-review-relevancy regains its mandatory `## Output` (R8 action anatomy).
- Single-axis runs: an axis not run marks the sections it owns "Not run"
  (functional owns Phases and Verification), never a placeholder or invented
  data, so review-code alone stays valid.
- Unmet acceptance criteria tagged `fix` now also append a `functional` row to
  Findings, so the header count matches the verdict instead of reading 0.
- Verdict: a `not-applicable` or `fixed` criterion no longer blocks `approve`;
  only an unchecked `fix` criterion does.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: caeb8071a6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

2. **Trace.** Fetch the diff, then walk each phase in plan order. For every acceptance criterion, check the box `[x]` only when the diff shows evidence, citing the `file:line`; leave it `[ ]` when unmet or partial, naming the gap.
3. **Summarize.** Fill `Verification`: the percent and count of checked criteria, the files checked, every unchecked criterion tagged `fix`, `not-applicable`, or `fixed`, and any unplanned change in the diff that traces to no criterion. Write "none" for an empty list.
4. **Record.** Write the phase blocks and the verification summary into `review.md`. No prose paragraphs, boxes and the summary only.
4. **Count.** For each unmet criterion tagged `fix`, append a `functional` row to the `Findings` table (its Phase, the gap, a severity) so the header count matches the verdict. A `not-applicable` or `fixed` criterion adds no row.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fill all columns for functional findings

When an acceptance criterion is tagged fix, this new step tells the reviewer to add only the phase, gap, and severity, but the shared Findings table in assets/review-template.md still requires Location, Issue, and Fix columns. In that scenario the generated review can end up with malformed or blank functional rows, or with invented values, so the new count no longer reliably satisfies the report contract; specify what functional rows should use for location and fix text.

Useful? React with 👍 / 👎.

2. **Trace.** Fetch the diff, then walk each phase in plan order. For every acceptance criterion, check the box `[x]` only when the diff shows evidence, citing the `file:line`; leave it `[ ]` when unmet or partial, naming the gap.
3. **Summarize.** Fill `Verification`: the percent and count of checked criteria, the files checked, every unchecked criterion tagged `fix`, `not-applicable`, or `fixed`, and any unplanned change in the diff that traces to no criterion. Write "none" for an empty list.
4. **Record.** Write the phase blocks and the verification summary into `review.md`. No prose paragraphs, boxes and the summary only.
4. **Count.** For each unmet criterion tagged `fix`, append a `functional` row to the `Findings` table (its Phase, the gap, a severity) so the header count matches the verdict. A `not-applicable` or `fixed` criterion adds no row.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align the router with functional findings

This adds functional rows, but the top-level router still says functional only writes Phases and Verification while only code and relevancy append to Findings (SKILL.md:27). When an agent follows the router/transversal rule for a full review, unchecked fix criteria can still be omitted from Findings, reintroducing the header-count/verdict mismatch this change is meant to close.

Useful? React with 👍 / 👎.

One overall verdict, the strictest across the axes run:

- `approve`: no critical finding, and every acceptance criterion checked, ship it.
- `approve`: no critical finding, and no criterion left unchecked and tagged `fix`, ship it. A `not-applicable` or `fixed` criterion does not block.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve the critical criterion verdict conflict

For a critical acceptance criterion marked not-applicable or fixed, this new approval rule says it no longer blocks, but the unchanged blocked rule below still says any unchecked critical criterion is blocked. That leaves report generation with two valid verdicts for the same verification state, so agents can keep returning blocked for the exact cases this change intends to approve; qualify the blocked rule by the same tag exception.

Useful? React with 👍 / 👎.

2. **Trace.** Fetch the diff, then walk each phase in plan order. For every acceptance criterion, check the box `[x]` only when the diff shows evidence, citing the `file:line`; leave it `[ ]` when unmet or partial, naming the gap.
3. **Summarize.** Fill `Verification`: the percent and count of checked criteria, the files checked, every unchecked criterion tagged `fix`, `not-applicable`, or `fixed`, and any unplanned change in the diff that traces to no criterion. Write "none" for an empty list.
4. **Record.** Write the phase blocks and the verification summary into `review.md`. No prose paragraphs, boxes and the summary only.
4. **Count.** For each unmet criterion tagged `fix`, append a `functional` row to the `Findings` table (its Phase, the gap, a severity) so the header count matches the verdict. A `not-applicable` or `fixed` criterion adds no row.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep blocking functional rows out of minor counts

When an unchecked criterion is tagged fix, the verdict rules make it block approval regardless of severity, but this new instruction lets the reviewer choose any severity for the row. If they mark a small unmet criterion as 🟢 minor, the header can read only minor findings while the verdict is still changes-requested, so the count still does not explain the verdict; require fix rows to use a blocking severity or update the verdict/counting rule.

Useful? React with 👍 / 👎.

Follow-up to the review-contract fixes, closing four consistency gaps:

- Functional findings fill every Findings column (Location, Issue, Fix), not
  just phase and gap, so no row is blank or invented.
- A functional `fix` row uses a blocking severity (critical or warning), never
  minor, so the header count always explains the verdict.
- The router rule now states functional also appends its `fix` rows to
  Findings, matching the action.
- The `blocked` rule carries the same tag exception as `approve`: a
  `not-applicable` or `fixed` criterion never blocks, whatever its severity.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c09aaaa301

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

2. **Trace.** Fetch the diff, then walk each phase in plan order. For every acceptance criterion, check the box `[x]` only when the diff shows evidence, citing the `file:line`; leave it `[ ]` when unmet or partial, naming the gap.
3. **Summarize.** Fill `Verification`: the percent and count of checked criteria, the files checked, every unchecked criterion tagged `fix`, `not-applicable`, or `fixed`, and any unplanned change in the diff that traces to no criterion. Write "none" for an empty list.
4. **Record.** Write the phase blocks and the verification summary into `review.md`. No prose paragraphs, boxes and the summary only.
4. **Count.** For each unmet criterion tagged `fix`, append a `functional` row to the `Findings` table with every column filled: Phase, Location (the phase's target file, or `-`), Issue (the unmet criterion and its gap), Fix (what to implement), and a blocking severity (🔴 critical or 🟡 warning, never 🟢 minor), so the header count matches the verdict. A `not-applicable` or `fixed` criterion adds no row.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require critical functional gaps to count as critical

When an unmet acceptance criterion is itself critical and tagged fix, the rubric's blocked rule still applies, but this instruction permits recording that row as 🟡 warning. That can produce a blocked verdict with 0 critical in the header, so the Findings count still does not explain the verdict for critical functional gaps; require critical fix criteria to use 🔴 critical here.

Useful? React with 👍 / 👎.

- Report: fill `review.md` from `assets/review-template.md`. One shared `Findings` table for every axis: functional writes the `Phases` boxes and the `Verification` table, code and relevancy append rows to `Findings` under their `Kind`. Tables and boxes, no prose, no per-axis sections. The Phase column ties a finding to the plan when one is in scope, `-` otherwise.
- Report: fill `review.md` from `assets/review-template.md`. One shared `Findings` table for every axis: functional writes the `Phases` boxes and the `Verification` table, and appends a `functional` row to `Findings` for each unmet criterion tagged `fix`; code and relevancy append rows to `Findings` under their `Kind`. Tables and boxes, no prose, no per-axis sections. The Phase column ties a finding to the plan when one is in scope, `-` otherwise.
- Sections: the report has exactly the sections in `assets/review-validator.yml`. Before returning, verify against it and remove any section not listed.
- Not run: every required section always exists. An axis that did not run marks the sections it owns "Not run" (functional owns `Phases` and `Verification`); it never leaves a placeholder or invents data.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Mark skipped code and relevancy axes too

When the caller names review-functional, the code and relevancy axes are skipped, but this new rule only gives a Not run marker to functional-owned sections. Because Findings is shared and the template still allows None. when the run finds nothing, a functional-only report with no fix rows can read like a full clean review instead of a review where code and relevancy never ran; add an explicit skipped-axis marker or record which axes ran.

Useful? React with 👍 / 👎.

Two more consistency gaps from review feedback:

- A `fix` functional row uses 🔴 critical when the criterion itself is
  critical, otherwise 🟡 warning, so a blocked verdict never shows 0 critical.
- The header carries `Axes run`, so a skipped code or relevancy axis is visible
  even though Findings is shared; a functional-only report no longer reads like
  a clean full review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant