Skip to content

feat: map-reduce for large PRs#4

Merged
frankekn merged 4 commits into
mainfrom
feat/map-reduce-large-prs
Jun 27, 2026
Merged

feat: map-reduce for large PRs#4
frankekn merged 4 commits into
mainfrom
feat/map-reduce-large-prs

Conversation

@frankekn

Copy link
Copy Markdown
Owner

Summary

  • Map-reduce style review for large PRs: map blast-radius → deep review per hotspot → merge → adversarial critic
  • Stops stuffing full diff into context for big changesets
  • Banner added to README

Changes

  • Core review pipeline split into map + per-chunk deep + merge + critic
  • Prompts updated for precondition substitution + verdict gate
  • README now has project banner

Base: main

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: b31bd65Head: 886c149

CHANGES_REQUESTED: large-PR review can skip mapped surfaces, drop blocking residual risks, and give the critic insufficient diff context.

Findings

P1: Review every mapped hotspot before returning a verdict

src/core/review.ts:93

Why this breaks: The map prompt allows up to 8 hotspots and requires every changed file to appear in one, but this line silently slices the mapped list to 6. On a large PR mapped into 7 or 8 surfaces, the omitted surfaces get no deep review, no findings, and no checked entry, so Needlefish can return PASS while some changed files were never reviewed.

Suggested fix: Process all returned hotspots, or make MAX_HOTSPOTS match the prompt and enforce that the reviewed hotspot union covers every changed file before deriving a verdict.

Validation: Add a unit/integration test with a mocked map result containing 7 hotspots and assert reviewLarge invokes the deep pass for all 7 or fails closed.

P1: Preserve residual risks from deep review passes

src/core/review.ts:113

Why this breaks: Each deep pass returns residual_risks, and deriveVerdict maps any blocks=true risk to needs_human. The large-review merge replaces all deep residual risks with an empty array, so a deep pass that could not verify a material guard no longer blocks the verdict and the final review can incorrectly pass.

Suggested fix: Accumulate res.residual_risks from each deep pass, include them in candidateMerged, let the critic prune them, and derive the final verdict from the pruned residual risks.

Validation: Mock one deep result with residual_risks:[{text:'unverified guard',blocks:true}] and assert the final ReviewResult verdict is needs_human unless the critic explicitly removes it.

P2: Give the critic the diff it is required to validate

src/core/review.ts:117

Why this breaks: critic.md says it receives the original diff and must delete findings not introduced by this diff or not tied to changed lines. In large mode this passes only --stat output, and candidateMerged also omits base/head SHAs, so the critic cannot verify changed-line anchoring for valid deep findings. That can cause real findings to be pruned as unverifiable, or weak findings to survive without the intended diff check.

Suggested fix: Pass the full diff for files that have candidate findings, or include base/head SHAs and update the critic prompt to run git diff for those files before pruning.

Validation: Run a large-review test where a deep pass emits a valid finding on a changed line and assert the critic prompt contains enough unified diff context, not only diffstat.

Checked

  • Inspected the provided diff for src/core/review.ts, prompts/map.md, prompts/deep.md, prompts/critic.md usage, and schema changes.
  • Verified map prompt allows up to 8 hotspots while reviewLarge reviews only the first 6.
  • Verified reviewLarge drops deep residual_risks before critic/verdict by constructing candidateMerged with residual_risks: [].
  • Verified reviewLarge passes bundle.patchStat at the critic PATCH slot in large mode instead of the original diff.

Comment thread src/core/review.ts
throw new Error("map pass produced no hotspots");
}

const hotspots = sortByRisk(mapResult.hotspots).slice(0, MAX_HOTSPOTS);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 (bug): Review every mapped hotspot before returning a verdict

The map prompt allows up to 8 hotspots and requires every changed file to appear in one, but this line silently slices the mapped list to 6. On a large PR mapped into 7 or 8 surfaces, the omitted surfaces get no deep review, no findings, and no checked entry, so Needlefish can return PASS while some changed files were never reviewed.

Suggested fix: Process all returned hotspots, or make MAX_HOTSPOTS match the prompt and enforce that the reviewed hotspot union covers every changed file before deriving a verdict.

Comment thread src/core/review.ts Outdated
summary: mapResult.summary,
findings: merged,
checked,
residual_risks: [],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 (bug): Preserve residual risks from deep review passes

Each deep pass returns residual_risks, and deriveVerdict maps any blocks=true risk to needs_human. The large-review merge replaces all deep residual risks with an empty array, so a deep pass that could not verify a material guard no longer blocks the verdict and the final review can incorrectly pass.

Suggested fix: Accumulate res.residual_risks from each deep pass, include them in candidateMerged, let the critic prune them, and derive the final verdict from the pruned residual risks.

Comment thread src/core/review.ts Outdated
};
const criticPrompt = loadPrompt("critic.md")
.replace("{{FINDINGS}}", () => JSON.stringify(candidateMerged, null, 2))
.replace("{{PATCH}}", () => bundle.patchStat || "(see git diff --stat; repo at head)");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (bug): Give the critic the diff it is required to validate

critic.md says it receives the original diff and must delete findings not introduced by this diff or not tied to changed lines. In large mode this passes only --stat output, and candidateMerged also omits base/head SHAs, so the critic cannot verify changed-line anchoring for valid deep findings. That can cause real findings to be pruned as unverifiable, or weak findings to survive without the intended diff check.

Suggested fix: Pass the full diff for files that have candidate findings, or include base/head SHAs and update the critic prompt to run git diff for those files before pruning.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 886c149439

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/core/review.ts
throw new Error("map pass produced no hotspots");
}

const hotspots = sortByRisk(mapResult.hotspots).slice(0, MAX_HOTSPOTS);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Review all mapped hotspots before passing

map.md permits up to 8 hotspots and requires every changed file to be placed in one, but this slice silently drops any surfaces after the first six. In a large PR with seven or eight independent surfaces, the omitted files never enter a deep pass or a blocking residual risk, so defects in those changed files cannot affect the final verdict. Review all mapped hotspots or explicitly return needs_human when any are skipped.

Useful? React with 👍 / 👎.

Comment thread src/core/review.ts Outdated
summary: mapResult.summary,
findings: merged,
checked,
residual_risks: [],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve blocking residual risks from deep passes

deep.md allows each hotspot review to return residual_risks, and deriveVerdict turns any blocks: true risk into needs_human; however this merge always replaces all hotspot risks with an empty array before the critic sees them. On large PRs where a hotspot cannot verify a material precondition, Needlefish can incorrectly post a passing review instead of escalating to a human. Carry forward and prune the collected residual risks.

Useful? React with 👍 / 👎.

Comment thread src/core/review.ts
Comment on lines +81 to +85
baseSha: bundle.baseSha,
headSha: bundle.headSha,
patchStat: bundle.patchStat,
changedFiles: bundle.changedFiles,
agentsMd: bundle.agentsMd,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Forward focus context into large-review prompts

For large diffs, this reduced map bundle is the only review context passed before the deep prompts, and it omits bundle.focus; the deep prompt also only receives the hotspot plus SHAs. As a result needlefish --focus security becomes a no-op whenever isLarge() selects this path, whereas small reviews still include the focus in the full bundle. Forward focus into the map/deep prompts.

Useful? React with 👍 / 👎.

…-hotspot try/catch + codex retry, forward deep residuals, tighter dedup, critic gets SHAs to re-verify guards

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: b31bd65Head: 8362481

changes_requested: the large-review path drops existing review options, making --focus/--deep ineffective on large PRs.

Findings

P2: Forward review options into large mode

src/core/review.ts:83-89

Why this breaks: The local CLI still records deep and focus in the Bundle, and small reviews pass the full Bundle into the review prompt. The new large-review map bundle omits both fields, and the deep prompt only receives the hotspot plus SHAs, so needlefish --focus security and needlefish --deep become no-ops whenever isLarge() selects this path.

Suggested fix: Include the requested review options in the large-mode map/deep prompt context and update the map/deep prompts to honor them.

Validation: Run a large local review with --focus security --deep using a Codex stub and assert the captured map/deep prompts include those option values.

Checked

  • Reviewed the supplied diff for the large-review pipeline and prompt inputs.
  • Confirmed reviewSmall serializes the full Bundle, while reviewLarge constructs a reduced mapBundle without focus/deep and builds deep prompts only from AGENTS, HOTSPOT, BASE, and HEAD.
  • No downstream guard-rejection claim was present, so no guard substitution check was required.
  • Kept the finding because it is introduced by the changed large-review path, affects existing CLI contract behavior, is line-anchored, and has a minimal fix.

Residual Risk

  • Did not execute the test suite; this review used read-only static inspection of the supplied diff.

Comment thread src/core/review.ts

// Large PR: map (blast-radius survey, no diff text) -> deep per hotspot -> merge -> critic.
async function reviewLarge(bundle: Bundle): Promise<ReviewResult> {
const mapBundle = {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (contract): Forward review options into large mode

The local CLI still records deep and focus in the Bundle, and small reviews pass the full Bundle into the review prompt. The new large-review map bundle omits both fields, and the deep prompt only receives the hotspot plus SHAs, so needlefish --focus security and needlefish --deep become no-ops whenever isLarge() selects this path.

Suggested fix: Include the requested review options in the large-mode map/deep prompt context and update the map/deep prompts to honor them.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ✅ pass
Base: b31bd65Head: a64b50c

approved: no high-confidence blocking findings remained after pruning.

Findings

No actionable findings. Prefer this over padding weak ones.

Checked

  • Reviewed the candidate finding against src/core/review.ts large-review flow in the supplied diff.
  • Confirmed map.md explicitly instructs the map pass to keep hotspots to <= 6 total, so outputs beyond MAX_HOTSPOTS are already prompt-invalid rather than an expected review surface.
  • Confirmed the alleged skipped case depends on an omitted hotspot sharing files with a selected hotspot while only carrying distinct edges; deep.md still instructs each selected file surface to inspect changed hunks and trace downstream consumers via edges or rg, so the claimed never-reviewed effect is not concrete enough to block merge.
  • Kept the non-blocking residual risk about TypeScript validation being unavailable in this checkout.

Residual Risk

  • TypeScript validation could not be used in this checkout because tsc cannot resolve Node built-in types/@types/node; source review still reached a verdict.

@frankekn frankekn merged commit 929297e into main Jun 27, 2026
2 checks passed
@frankekn frankekn deleted the feat/map-reduce-large-prs branch June 27, 2026 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant