feat: map-reduce for large PRs by frankekn · Pull Request #4 · frankekn/needlefish

frankekn · 2026-06-27T09:42:07Z

Summary

Map-reduce style review for large PRs: map blast-radius → deep review per hotspot → merge → adversarial critic
Stops stuffing full diff into context for big changesets
Banner added to README

Changes

Core review pipeline split into map + per-chunk deep + merge + critic
Prompts updated for precondition substitution + verdict gate
README now has project banner

Base: main

…-> merge -> critic); stop stuffing full diff

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: b31bd65 → Head: 886c149

CHANGES_REQUESTED: large-PR review can skip mapped surfaces, drop blocking residual risks, and give the critic insufficient diff context.

Findings

P1: Review every mapped hotspot before returning a verdict

src/core/review.ts:93

Why this breaks: The map prompt allows up to 8 hotspots and requires every changed file to appear in one, but this line silently slices the mapped list to 6. On a large PR mapped into 7 or 8 surfaces, the omitted surfaces get no deep review, no findings, and no checked entry, so Needlefish can return PASS while some changed files were never reviewed.

Suggested fix: Process all returned hotspots, or make MAX_HOTSPOTS match the prompt and enforce that the reviewed hotspot union covers every changed file before deriving a verdict.

Validation: Add a unit/integration test with a mocked map result containing 7 hotspots and assert reviewLarge invokes the deep pass for all 7 or fails closed.

P1: Preserve residual risks from deep review passes

src/core/review.ts:113

Why this breaks: Each deep pass returns residual_risks, and deriveVerdict maps any blocks=true risk to needs_human. The large-review merge replaces all deep residual risks with an empty array, so a deep pass that could not verify a material guard no longer blocks the verdict and the final review can incorrectly pass.

Suggested fix: Accumulate res.residual_risks from each deep pass, include them in candidateMerged, let the critic prune them, and derive the final verdict from the pruned residual risks.

Validation: Mock one deep result with residual_risks:[{text:'unverified guard',blocks:true}] and assert the final ReviewResult verdict is needs_human unless the critic explicitly removes it.

P2: Give the critic the diff it is required to validate

src/core/review.ts:117

Why this breaks: critic.md says it receives the original diff and must delete findings not introduced by this diff or not tied to changed lines. In large mode this passes only --stat output, and candidateMerged also omits base/head SHAs, so the critic cannot verify changed-line anchoring for valid deep findings. That can cause real findings to be pruned as unverifiable, or weak findings to survive without the intended diff check.

Suggested fix: Pass the full diff for files that have candidate findings, or include base/head SHAs and update the critic prompt to run git diff for those files before pruning.

Validation: Run a large-review test where a deep pass emits a valid finding on a changed line and assert the critic prompt contains enough unified diff context, not only diffstat.

Checked

Inspected the provided diff for src/core/review.ts, prompts/map.md, prompts/deep.md, prompts/critic.md usage, and schema changes.
Verified map prompt allows up to 8 hotspots while reviewLarge reviews only the first 6.
Verified reviewLarge drops deep residual_risks before critic/verdict by constructing candidateMerged with residual_risks: [].
Verified reviewLarge passes bundle.patchStat at the critic PATCH slot in large mode instead of the original diff.

github-actions · 2026-06-27T09:45:09Z

+    throw new Error("map pass produced no hotspots");
+  }
+
+  const hotspots = sortByRisk(mapResult.hotspots).slice(0, MAX_HOTSPOTS);


P1 (bug): Review every mapped hotspot before returning a verdict

The map prompt allows up to 8 hotspots and requires every changed file to appear in one, but this line silently slices the mapped list to 6. On a large PR mapped into 7 or 8 surfaces, the omitted surfaces get no deep review, no findings, and no checked entry, so Needlefish can return PASS while some changed files were never reviewed.

Suggested fix: Process all returned hotspots, or make MAX_HOTSPOTS match the prompt and enforce that the reviewed hotspot union covers every changed file before deriving a verdict.

github-actions · 2026-06-27T09:45:09Z

+    summary: mapResult.summary,
+    findings: merged,
+    checked,
+    residual_risks: [],


P1 (bug): Preserve residual risks from deep review passes

Each deep pass returns residual_risks, and deriveVerdict maps any blocks=true risk to needs_human. The large-review merge replaces all deep residual risks with an empty array, so a deep pass that could not verify a material guard no longer blocks the verdict and the final review can incorrectly pass.

Suggested fix: Accumulate res.residual_risks from each deep pass, include them in candidateMerged, let the critic prune them, and derive the final verdict from the pruned residual risks.

github-actions · 2026-06-27T09:45:09Z

+  };
+  const criticPrompt = loadPrompt("critic.md")
+    .replace("{{FINDINGS}}", () => JSON.stringify(candidateMerged, null, 2))
+    .replace("{{PATCH}}", () => bundle.patchStat || "(see git diff --stat; repo at head)");


P2 (bug): Give the critic the diff it is required to validate

critic.md says it receives the original diff and must delete findings not introduced by this diff or not tied to changed lines. In large mode this passes only --stat output, and candidateMerged also omits base/head SHAs, so the critic cannot verify changed-line anchoring for valid deep findings. That can cause real findings to be pruned as unverifiable, or weak findings to survive without the intended diff check.

Suggested fix: Pass the full diff for files that have candidate findings, or include base/head SHAs and update the critic prompt to run git diff for those files before pruning.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 886c149439

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-27T09:45:52Z

+    throw new Error("map pass produced no hotspots");
+  }
+
+  const hotspots = sortByRisk(mapResult.hotspots).slice(0, MAX_HOTSPOTS);


Review all mapped hotspots before passing

map.md permits up to 8 hotspots and requires every changed file to be placed in one, but this slice silently drops any surfaces after the first six. In a large PR with seven or eight independent surfaces, the omitted files never enter a deep pass or a blocking residual risk, so defects in those changed files cannot affect the final verdict. Review all mapped hotspots or explicitly return needs_human when any are skipped.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-27T09:45:52Z

+    summary: mapResult.summary,
+    findings: merged,
+    checked,
+    residual_risks: [],


Preserve blocking residual risks from deep passes

deep.md allows each hotspot review to return residual_risks, and deriveVerdict turns any blocks: true risk into needs_human; however this merge always replaces all hotspot risks with an empty array before the critic sees them. On large PRs where a hotspot cannot verify a material precondition, Needlefish can incorrectly post a passing review instead of escalating to a human. Carry forward and prune the collected residual risks.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-27T09:45:52Z

+    baseSha: bundle.baseSha,
+    headSha: bundle.headSha,
+    patchStat: bundle.patchStat,
+    changedFiles: bundle.changedFiles,
+    agentsMd: bundle.agentsMd,


Forward focus context into large-review prompts

For large diffs, this reduced map bundle is the only review context passed before the deep prompts, and it omits bundle.focus; the deep prompt also only receives the hotspot plus SHAs. As a result needlefish --focus security becomes a no-op whenever isLarge() selects this path, whereas small reviews still include the focus in the full bundle. Forward focus into the map/deep prompts.

Useful? React with 👍 / 👎.

…-hotspot try/catch + codex retry, forward deep residuals, tighter dedup, critic gets SHAs to re-verify guards

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: b31bd65 → Head: 8362481

changes_requested: the large-review path drops existing review options, making --focus/--deep ineffective on large PRs.

Findings

P2: Forward review options into large mode

src/core/review.ts:83-89

Why this breaks: The local CLI still records deep and focus in the Bundle, and small reviews pass the full Bundle into the review prompt. The new large-review map bundle omits both fields, and the deep prompt only receives the hotspot plus SHAs, so needlefish --focus security and needlefish --deep become no-ops whenever isLarge() selects this path.

Suggested fix: Include the requested review options in the large-mode map/deep prompt context and update the map/deep prompts to honor them.

Validation: Run a large local review with --focus security --deep using a Codex stub and assert the captured map/deep prompts include those option values.

Checked

Reviewed the supplied diff for the large-review pipeline and prompt inputs.
Confirmed reviewSmall serializes the full Bundle, while reviewLarge constructs a reduced mapBundle without focus/deep and builds deep prompts only from AGENTS, HOTSPOT, BASE, and HEAD.
No downstream guard-rejection claim was present, so no guard substitution check was required.
Kept the finding because it is introduced by the changed large-review path, affects existing CLI contract behavior, is line-anchored, and has a minimal fix.

Residual Risk

Did not execute the test suite; this review used read-only static inspection of the supplied diff.

github-actions · 2026-06-27T10:10:24Z

+
+// Large PR: map (blast-radius survey, no diff text) -> deep per hotspot -> merge -> critic.
+async function reviewLarge(bundle: Bundle): Promise<ReviewResult> {
+  const mapBundle = {


P2 (contract): Forward review options into large mode

The local CLI still records deep and focus in the Bundle, and small reviews pass the full Bundle into the review prompt. The new large-review map bundle omits both fields, and the deep prompt only receives the hotspot plus SHAs, so needlefish --focus security and needlefish --deep become no-ops whenever isLarge() selects this path.

Suggested fix: Include the requested review options in the large-mode map/deep prompt context and update the map/deep prompts to honor them.

…view catch)

github-actions

Needlefish PR Review

Verdict: ✅ pass
Base: b31bd65 → Head: a64b50c

approved: no high-confidence blocking findings remained after pruning.

Findings

No actionable findings. Prefer this over padding weak ones.

Checked

Reviewed the candidate finding against src/core/review.ts large-review flow in the supplied diff.
Confirmed map.md explicitly instructs the map pass to keep hotspots to <= 6 total, so outputs beyond MAX_HOTSPOTS are already prompt-invalid rather than an expected review surface.
Confirmed the alleged skipped case depends on an omitted hotspot sharing files with a selected hotspot while only carrying distinct edges; deep.md still instructs each selected file surface to inspect changed hunks and trace downstream consumers via edges or rg, so the claimed never-reviewed effect is not concrete enough to block merge.
Kept the non-blocking residual risk about TypeScript validation being unavailable in this checkout.

Residual Risk

TypeScript validation could not be used in this checkout because tsc cannot resolve Node built-in types/@types/node; source review still reached a verdict.

t added 2 commits June 27, 2026 17:40

feat: map-reduce for large PRs (map blast-radius -> deep per hotspot …

61bdfae

…-> merge -> critic); stop stuffing full diff

docs: add banner to README

886c149

github-actions Bot requested changes Jun 27, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 27, 2026

View reviewed changes

fix(council): coverage tail-backstop (no silently-skipped files), per…

8362481

…-hotspot try/catch + codex retry, forward deep residuals, tighter dedup, critic gets SHAs to re-verify guards

github-actions Bot requested changes Jun 27, 2026

View reviewed changes

fix: forward --focus/--deep into large-mode map+deep prompts (self-re…

a64b50c

…view catch)

github-actions Bot reviewed Jun 27, 2026

View reviewed changes

frankekn merged commit 929297e into main Jun 27, 2026
2 checks passed

frankekn deleted the feat/map-reduce-large-prs branch June 27, 2026 10:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: map-reduce for large PRs#4

feat: map-reduce for large PRs#4
frankekn merged 4 commits into
mainfrom
feat/map-reduce-large-prs

frankekn commented Jun 27, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Jun 27, 2026

Uh oh!

github-actions Bot Jun 27, 2026

Uh oh!

github-actions Bot Jun 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 27, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 27, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 27, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Jun 27, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frankekn commented Jun 27, 2026

Summary

Changes

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Needlefish PR Review

Findings

P1: Review every mapped hotspot before returning a verdict

P1: Preserve residual risks from deep review passes

P2: Give the critic the diff it is required to validate

Checked

Uh oh!

github-actions Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Needlefish PR Review

Findings

P2: Forward review options into large mode

Checked

Residual Risk

Uh oh!

github-actions Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Needlefish PR Review

Findings

Checked

Residual Risk

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant