Conversation
There was a problem hiding this comment.
Needlefish PR Review
Verdict: ⛔ changes_requested
Base: 9ed22eb → Head: 612df2d
changes_requested: two concrete security defects let review runs mutate worktrees or execute untrusted PR code on a persistent runner.
Findings
P1: Force Codex reviews into read-only sandbox mode
src/shared/codex.ts:20
Why this breaks: Needlefish advertises read-only review, but this invocation inherits the caller's Codex config. On this runner the local config is approval_policy=never and sandbox_mode=danger-full-access, so the model spawned by a review can run mutating commands despite the prompt saying not to edit files.
Suggested fix: Add explicit Codex exec flags such as --sandbox read-only and a non-interactive approval policy/config override so review runs cannot write regardless of user config.
Validation: codex exec --sandbox read-only --color never "print exactly ok"
P1: Do not run pull_request code on a persistent self-hosted runner
.github/workflows/review.yml:18-28
Why this breaks: The workflow triggers on pull_request, checks out the PR head, then runs pnpm install on a self-hosted runner. A PR can change install scripts or repository code and execute arbitrary commands on Frank's persistent machine before review logic runs.
Suggested fix: Gate this job to same-repository PRs or run it only inside an ephemeral isolated runner/container; do not execute fork PR code on the persistent self-hosted host.
Validation: Open a fork PR that changes package install behavior or src/cli.ts and confirm the workflow no longer runs on the persistent self-hosted runner.
Checked
- No AGENTS.md file exists under /Users/termtek/actions-runner/_work/needlefish/needlefish; applied the provided review policy from the context bundle.
- Inspected changed workflow, CLI, local/GitHub adapters, review orchestration, Codex runner, schema normalization, and package config.
- Ran
pnpm check; TypeScript completed withtsc --noEmit. - Verified Codex CLI supports
--sandbox read-onlyand that current local Codex config issandbox_mode = "danger-full-access"withapproval_policy = "never".
Residual Risk
- I did not exercise live GitHub review/check-run posting against PR #1, so API-shape issues beyond static inspection are not ruled out.
|
|
||
| const tmp = mkdtempSync(path.join(os.tmpdir(), "needlefish-")); | ||
| const lastMsg = path.join(tmp, "last.txt"); | ||
| const args = ["exec", "--color", "never", "--output-last-message", lastMsg]; |
There was a problem hiding this comment.
P1 (security): Force Codex reviews into read-only sandbox mode
Needlefish advertises read-only review, but this invocation inherits the caller's Codex config. On this runner the local config is approval_policy=never and sandbox_mode=danger-full-access, so the model spawned by a review can run mutating commands despite the prompt saying not to edit files.
Suggested fix: Add explicit Codex exec flags such as --sandbox read-only and a non-interactive approval policy/config override so review runs cannot write regardless of user config.
|
|
||
| jobs: | ||
| review: | ||
| runs-on: self-hosted |
There was a problem hiding this comment.
P1 (security): Do not run pull_request code on a persistent self-hosted runner
The workflow triggers on pull_request, checks out the PR head, then runs pnpm install on a self-hosted runner. A PR can change install scripts or repository code and execute arbitrary commands on Frank's persistent machine before review logic runs.
Suggested fix: Gate this job to same-repository PRs or run it only inside an ephemeral isolated runner/container; do not execute fork PR code on the persistent self-hosted host.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a26882a28a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| repo, | ||
| headSha, | ||
| null, | ||
| "neutral", |
There was a problem hiding this comment.
Fail the check when the review errors
In GitHub Action mode, this failure path reports the advertised Needlefish check as neutral, but GitHub allows required checks with neutral conclusions to satisfy branch protection (docs). If review(bundle) or posting the review throws, a repo that requires the Needlefish check can still merge the PR despite the review never completing, contrary to the “failed review never passes” behavior; use a failing conclusion for this path instead.
Useful? React with 👍 / 👎.
|
|
||
| const tmp = mkdtempSync(path.join(os.tmpdir(), "needlefish-")); | ||
| const lastMsg = path.join(tmp, "last.txt"); | ||
| const args = ["exec", "--color", "never", "--output-last-message", lastMsg]; |
There was a problem hiding this comment.
Force Codex into read-only mode
The tool is advertised as read-only, but this codex exec invocation does not pass --sandbox read-only or --ask-for-approval never, and the Codex CLI documents that the sandbox defaults to the user configuration (docs). When a developer or runner has Codex configured for workspace-write or full access, the review model can edit the target repo while supposedly only reviewing it; make the read-only sandbox explicit for these calls.
Useful? React with 👍 / 👎.
|
|
||
| jobs: | ||
| review: | ||
| runs-on: self-hosted |
There was a problem hiding this comment.
Isolate self-hosted PR jobs from untrusted code
For PRs from forks or any repo with untrusted contributors, this pull_request workflow runs on a persistent self-hosted runner and then checks out and installs the PR's code. GitHub warns that self-hosted runners can be persistently compromised by untrusted workflow code (docs); use a GitHub-hosted or ephemeral isolated runner for PRs, or gate this job to trusted branches/authors only.
Useful? React with 👍 / 👎.
| repo, | ||
| headSha, | ||
| null, | ||
| "neutral", |
There was a problem hiding this comment.
Fail the check when the review errors
In GitHub Action mode, this failure path reports the advertised Needlefish check as neutral, but GitHub allows required checks with neutral conclusions to satisfy branch protection (docs). If review(bundle) or posting the review throws, a repo that requires the Needlefish check can still merge the PR despite the review never completing, contrary to the “failed review never passes” behavior; use a failing conclusion for this path instead.
Useful? React with 👍 / 👎.
| const headSha = pr.head?.sha ?? git(["rev-parse", "HEAD"], cwd); | ||
| const baseSha = pr.base?.sha ?? git(["merge-base", "origin/" + (pr.base?.ref ?? "main"), "HEAD"], cwd); | ||
|
|
||
| const patch = ghText(["pr", "diff", String(prNumber)]); |
There was a problem hiding this comment.
Review the checked-out PR SHA, not the live head
This fetches the current PR head and diff from GitHub after the workflow has already checked out the event's head SHA. If a newer commit is pushed while an older run is still installing or queued, the run will inspect the old workspace but post the review/check against the newer pr.head.sha, allowing a stale run to approve or fail the wrong commit; derive the head SHA and patch from the checked-out/event SHA or cancel superseded runs.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Needlefish PR Review
Verdict: ⛔ changes_requested
Base: 9ed22eb → Head: 5db438c
changes_requested: GitHub mode can report a failed review as mergeable and can review/post against a different PR head than the checked-out code.
Findings
P1: Fail the Needlefish check when review execution fails
src/adapters/github.ts:191
Why this breaks: When review(), postReview(), or the successful postCheck() path throws, the catch block creates the same Needlefish check run with conclusion neutral. Repositories that require this check can treat neutral as non-failing, so a PR can merge even though Needlefish never completed its review.
Suggested fix: Post the failure-path check with conclusion failure instead of neutral, or use a distinct non-required check name for infrastructure failures.
Validation: Force review(bundle) to throw in --github mode and verify the check-runs payload uses conclusion=failure for name=Needlefish.
P1: Review the checked-out PR SHA, not the live PR head
src/adapters/github.ts:140-143
Why this breaks: The workflow checks out github.event.pull_request.head.sha, but runGithub then reads the current PR head and diff from GitHub. If a newer commit is pushed while an older workflow run is queued or installing, this run inspects the old workspace while posting the review and check against the newer pr.head.sha, allowing stale results to approve or fail the wrong commit.
Suggested fix: Derive headSha from git rev-parse HEAD or the event SHA passed by the workflow, and build the patch from the checked-out workspace (git diff baseSha HEAD) instead of gh pr diff's live PR state.
Validation: Start a run for commit A, push commit B before the review step, and verify the run posts against A or cancels instead of posting a check on B.
Checked
- Read repository policy supplied in the context bundle; no AGENTS.md exists in the checked-out tree.
- Mapped changed surfaces: workflow, docs, dependency lockfile, CLI/source runtime, Codex invocation, rendering/schema.
- Inspected GitHub adapter failure handling, check-run posting, PR metadata collection, head/base/patch selection, and inline review fallback.
- Verified prior review comments: workflow is gated to same-repository PRs and Codex exec passes
-s read-only. - Searched for existing Needlefish check behavior, neutral conclusions, headSha usage, GitHub workflow checkout SHA, and read-only Codex invocation.
- Ran
pnpm check; TypeScript completed withtsc --noEmit.
| repo, | ||
| headSha, | ||
| null, | ||
| "neutral", |
There was a problem hiding this comment.
P1 (runtime): Fail the Needlefish check when review execution fails
When review(), postReview(), or the successful postCheck() path throws, the catch block creates the same Needlefish check run with conclusion neutral. Repositories that require this check can treat neutral as non-failing, so a PR can merge even though Needlefish never completed its review.
Suggested fix: Post the failure-path check with conclusion failure instead of neutral, or use a distinct non-required check name for infrastructure failures.
| "api", | ||
| `repos/${repo}/pulls/${prNumber}`, | ||
| ]); | ||
| const headSha = pr.head?.sha ?? git(["rev-parse", "HEAD"], cwd); |
There was a problem hiding this comment.
P1 (runtime): Review the checked-out PR SHA, not the live PR head
The workflow checks out github.event.pull_request.head.sha, but runGithub then reads the current PR head and diff from GitHub. If a newer commit is pushed while an older workflow run is queued or installing, this run inspects the old workspace while posting the review and check against the newer pr.head.sha, allowing stale results to approve or fail the wrong commit.
Suggested fix: Derive headSha from git rev-parse HEAD or the event SHA passed by the workflow, and build the patch from the checked-out workspace (git diff baseSha HEAD) instead of gh pr diff's live PR state.
…EAD, not live pr head
frankekn
left a comment
There was a problem hiding this comment.
Needlefish PR Review
Verdict: pass
No findings.
Checked
- a
- b
There was a problem hiding this comment.
Needlefish PR Review
Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: 7df7bae
changes requested: prompt assembly can corrupt reviewed diffs, and manual workflow dispatch reviews the wrong ref.
Findings
P2: Use callback replacements for prompt payloads
src/core/review.ts:23-33
Why this breaks: String.replace treats dollar sequences such as $& in replacement strings as substitution tokens. Because the replacement payload is the JSON bundle and raw patch, any reviewed diff containing those sequences is rewritten before Codex sees it, so Needlefish can review corrupted code and miss or invent findings.
Suggested fix: Pass replacement callbacks for dynamic payloads, e.g. replace("{{BUNDLE}}", () => JSON.stringify(bundle, null, 2)) and replace("{{PATCH}}", () => bundle.patch).
Validation: node -e 'console.log("A {{PATCH}} B".replace("{{PATCH}}", "x$&y")); console.log("A {{PATCH}} B".replace("{{PATCH}}", () => "x$&y"))'
P2: Check out the requested PR for manual dispatch
.github/workflows/review.yml:29-35
Why this breaks: On workflow_dispatch there is no github.event.pull_request object, so checkout falls back to github.ref and BASE_REF falls back to main while the run step still passes the input PR number. A manual review therefore inspects the dispatch branch, often main with an empty diff, instead of the requested PR head.
Suggested fix: For workflow_dispatch, resolve the PR head/base from the input PR number before review and fetch/checkout that head, or remove the manual trigger until it can supply PR_HEAD_SHA, PR_BASE_SHA, and the matching workspace checkout.
Validation: Trigger workflow_dispatch with pr_number for an open PR from the default branch context and verify git rev-parse HEAD equals that PR head SHA and git diff base..HEAD is non-empty.
Checked
- Read supplied review policy; repo has no AGENTS.md file to apply.
- Reviewed changed workflow, CLI, adapters, core review orchestration, Codex wrapper, schema, render, classify, package, and lockfile surfaces.
- Traced GitHub workflow inputs through runGithub head/base/patch selection.
- Ran pnpm exec tsc --noEmit successfully.
- Verified codex exec supports -s/--sandbox read-only.
- Reproduced JavaScript String.replace replacement-token expansion with node.
Residual Risk
- Did not execute the GitHub workflow or post GitHub reviews/checks from this read-only review; workflow_dispatch behavior is derived from the workflow expressions and runGithub fallback code.
| export async function review( | ||
| bundle: Bundle | ||
| ): Promise<ReviewResult> { | ||
| const reviewPrompt = loadPrompt("review.md").replace( |
There was a problem hiding this comment.
P2 (bug): Use callback replacements for prompt payloads
String.replace treats dollar sequences such as $& in replacement strings as substitution tokens. Because the replacement payload is the JSON bundle and raw patch, any reviewed diff containing those sequences is rewritten before Codex sees it, so Needlefish can review corrupted code and miss or invent findings.
Suggested fix: Pass replacement callbacks for dynamic payloads, e.g. replace("{{BUNDLE}}", () => JSON.stringify(bundle, null, 2)) and replace("{{PATCH}}", () => bundle.patch).
| - name: Checkout | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| ref: ${{ github.event.pull_request.head.sha || github.ref }} |
There was a problem hiding this comment.
P2 (runtime): Check out the requested PR for manual dispatch
On workflow_dispatch there is no github.event.pull_request object, so checkout falls back to github.ref and BASE_REF falls back to main while the run step still passes the input PR number. A manual review therefore inspects the dispatch branch, often main with an empty diff, instead of the requested PR head.
Suggested fix: For workflow_dispatch, resolve the PR head/base from the input PR number before review and fetch/checkout that head, or remove the manual trigger until it can supply PR_HEAD_SHA, PR_BASE_SHA, and the matching workspace checkout.
There was a problem hiding this comment.
Needlefish PR Review
Verdict: ⛔ changes_requested
Base: 474877f → Head: 3f78261
changes_requested: GitHub mode can review the wrong diff, and malformed model output can be normalized into a passing review.
Findings
P1: Compare PR heads against the merge base
src/adapters/github.ts:138-140
Why this breaks: PR_BASE_SHA is supplied from the workflow as the current base-branch tip, but GitHub PR review diffs must be computed from the merge base. When the base branch has advanced since the PR branched, git diff baseSha HEAD reviews unrelated base changes and can also miss files that are part of the PR. In this checkout, git merge-base 474877f1f2da1683339040fe5bf563ab2c48b509 3f782614d7df761c880c5c743177273044f5ee20 is d9ffd69..., not the supplied base SHA, and the two ranges produce different file sets.
Suggested fix: Compute a diff base with git merge-base baseSha HEAD in runGithub before building patch and changedFiles, and put that merge-base SHA in the bundle.
Validation: Create or use a PR branch whose base branch advanced after branching; verify needlefish --github --pr Nsends the same files asgit diff --name-only $(git merge-base HEAD) HEAD.
P2: Reject malformed review JSON instead of passing it
src/shared/schema.ts:114-128
Why this breaks: normalizeReview({}) returns empty findings and empty blocking risks, so deriveVerdict turns malformed Codex or critic output into pass. A bad model response should fail the Needlefish check, not green-light the PR with an empty review.
Suggested fix: Replace the permissive defaults with strict boundary parsing: require summary, findings, checked, and residual_risks to have the expected types, reject invalid severities/categories/line numbers, and throw so GitHub mode posts the failure check.
Validation: Stub the Codex output to {}and confirm review execution fails instead of producingpass; then run pnpm check.
Checked
- No local AGENTS.md exists; the context bundle also has agentsMd=null.
- Reviewed changed workflow, adapters, core review flow, schema normalization, prompts, and README contract.
- Ran
pnpm check(tsc --noEmit): passed. - Ran
git diff --checkfor the reviewed range: passed. - Compared supplied base/head against their merge base and confirmed the reviewed range differs from the PR merge-base range.
- Checked
gh pr view --jsonfields;baseRefis invalid, but that output is unused and command substitution does not fail the step, so it was not raised as a blocking finding.
|
|
||
| const pr = gh(["api", `repos/${repo}/pulls/${prNumber}`]); | ||
| const headSha = process.env.PR_HEAD_SHA || git(["rev-parse", "HEAD"], cwd); | ||
| const baseSha = process.env.PR_BASE_SHA || git(["merge-base", "origin/main", "HEAD"], cwd); |
There was a problem hiding this comment.
P1 (bug): Compare PR heads against the merge base
PR_BASE_SHA is supplied from the workflow as the current base-branch tip, but GitHub PR review diffs must be computed from the merge base. When the base branch has advanced since the PR branched, git diff baseSha HEAD reviews unrelated base changes and can also miss files that are part of the PR. In this checkout, git merge-base 474877f1f2da1683339040fe5bf563ab2c48b509 3f782614d7df761c880c5c743177273044f5ee20 is d9ffd69..., not the supplied base SHA, and the two ranges produce different file sets.
Suggested fix: Compute a diff base with git merge-base baseSha HEAD in runGithub before building patch and changedFiles, and put that merge-base SHA in the bundle.
| }; | ||
| } | ||
|
|
||
| export function normalizeReview(raw: any): RawReview { |
There was a problem hiding this comment.
P2 (validation): Reject malformed review JSON instead of passing it
normalizeReview({}) returns empty findings and empty blocking risks, so deriveVerdict turns malformed Codex or critic output into pass. A bad model response should fail the Needlefish check, not green-light the PR with an empty review.
Suggested fix: Replace the permissive defaults with strict boundary parsing: require summary, findings, checked, and residual_risks to have the expected types, reject invalid severities/categories/line numbers, and throw so GitHub mode posts the failure check.
There was a problem hiding this comment.
Needlefish PR Review
Verdict: ⛔ changes_requested
Base: 474877f → Head: 908ef40
changes requested: malformed model output can still be normalized into a passing review.
Findings
P2: Reject malformed review JSON before deriving a verdict
src/shared/schema.ts:114-128
Why this breaks: normalizeReview silently converts missing findings and residual_risks to empty arrays, and invalid finding fields to defaults. A critic response like {"summary":"ok"} is accepted by review.ts, then deriveVerdict sees no blocking findings or risks and returns pass even though the review output was malformed.
Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, reject invalid severities/categories/line numbers, and throw instead of defaulting missing arrays to empty.
Validation: Add a small check that normalizeReview({ summary: 'ok' }) throws, then run pnpm exec tsc --noEmit
Checked
- Read the supplied AGENTS.md review policy; no repo AGENTS.md exists in the worktree
- Mapped workflow, dependency, docs, CLI, core, and schema surfaces from the changed files
- Inspected GitHub workflow dispatch/head/base handling and runGithub diff construction
- Verified Codex exec uses -s read-only against the local codex exec --help output
- Ran pnpm exec tsc --noEmit successfully
- Checked pnpm config recognition of pnpm-workspace.yaml allowBuilds
- Confirmed gh pr view does not support baseRef; the current command emits an error but only populates an unused base_ref output
Residual Risk
- Runtime tsx snippets could not be executed in this sandbox because tsx failed to create its IPC pipe, but static inspection and tsc were sufficient for the reported validation bug.
| }; | ||
| } | ||
|
|
||
| export function normalizeReview(raw: any): RawReview { |
There was a problem hiding this comment.
P2 (validation): Reject malformed review JSON before deriving a verdict
normalizeReview silently converts missing findings and residual_risks to empty arrays, and invalid finding fields to defaults. A critic response like {"summary":"ok"} is accepted by review.ts, then deriveVerdict sees no blocking findings or risks and returns pass even though the review output was malformed.
Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, reject invalid severities/categories/line numbers, and throw instead of defaulting missing arrays to empty.
There was a problem hiding this comment.
Needlefish PR Review
Verdict: ⛔ changes_requested
Base: 474877f → Head: a79deb4
changes_requested: one model-output validation bug can turn malformed review JSON into a passing verdict.
Findings
P2: Reject malformed model output before deriving a verdict
src/shared/schema.ts:114-129
Why this breaks: normalizeReview silently converts missing findings, checked, and residual_risks fields to empty arrays. review.ts only requires a non-empty summary, so a critic response like {"summary":"ok"} normalizes to no findings/no blocking risks and deriveVerdict returns pass, green-lighting a PR even though the review payload did not match the required contract.
Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.
Validation: node --import tsx -e 'import { normalizeReview } from "./src/shared/schema.ts"; import { deriveVerdict } from "./src/core/verdict.ts"; const r=normalizeReview({summary:"ok"}); console.log(JSON.stringify({r, verdict: deriveVerdict(r.findings, r.residual_risks)}));'
Checked
- No repository AGENTS.md exists under /Users/termtek/actions-runner/_work/needlefish/needlefish; applied the supplied review policy.
- Inspected the changed workflow, GitHub adapter, local adapter, CLI, review orchestration, Codex invocation, renderer, classifier, and schema.
- Ran pnpm check; TypeScript completed successfully.
- Confirmed normalizeReview({summary:"ok"}) produces empty findings/residual risks and deriveVerdict returns pass.
| }; | ||
| } | ||
|
|
||
| export function normalizeReview(raw: any): RawReview { |
There was a problem hiding this comment.
P2 (validation): Reject malformed model output before deriving a verdict
normalizeReview silently converts missing findings, checked, and residual_risks fields to empty arrays. review.ts only requires a non-empty summary, so a critic response like {"summary":"ok"} normalizes to no findings/no blocking risks and deriveVerdict returns pass, green-lighting a PR even though the review payload did not match the required contract.
Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.
There was a problem hiding this comment.
Needlefish PR Review
Verdict: ⛔ changes_requested
Base: 474877f → Head: cab9f99
changes_requested: three concrete bugs remain in PR range metadata, model-output validation, and large diff collection.
Findings
P2: Use the merge base as the bundle base SHA
src/adapters/github.ts:163-166
Why this breaks: GitHub mode computes the patch from mergeBase..headSha but still sends PR_BASE_SHA as bundle.baseSha. When the base branch has advanced, the prompt gives Codex a base/head pair that does not reproduce the supplied patch; in this PR, merge-base is d9ffd69... while baseSha is 474877f..., and the two ranges produce different file sets. Any reviewer command using bundle.baseSha inspects the wrong diff.
Suggested fix: Set bundle.baseSha to mergeBase, or add a separate baseRefSha field and keep baseSha as the actual diff base used for patch and changedFiles.
Validation: git merge-base "$PR_BASE_SHA" "$PR_HEAD_SHA" && compare git diff --name-only "$PR_BASE_SHA" "$PR_HEAD_SHA" with git diff --name-only "$(git merge-base "$PR_BASE_SHA" "$PR_HEAD_SHA")" "$PR_HEAD_SHA"
P2: Reject malformed model output before deriving verdicts
src/shared/schema.ts:114-128
Why this breaks: normalizeReview silently converts missing findings and residual_risks to empty arrays. review.ts now requires only summary and checked to be non-empty, so a critic response like {"summary":"ok","checked":["x"]} becomes no findings and no blocking risks, then deriveVerdict returns pass even though the review payload violated the required contract.
Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.
Validation: Add a small test or script that feeds {"summary":"ok","checked":["x"]} through normalizeReview/review verdict flow and assert it throws instead of returning pass.
P2: Raise git output buffers for diff collection
src/adapters/local.ts:10-11
Why this breaks: The git helper uses spawnSync without maxBuffer, so Node's default output cap aborts commands that emit around 1 MiB. Needlefish collects full patches with git diff; a moderately large PR will throw ENOBUFS before review. The same helper shape exists in GitHub mode, so both local and action reviews can fail on large diffs.
Suggested fix: Set a larger maxBuffer on git spawnSync, matching the 64 MiB buffer already used for gh and Codex calls.
Validation: node -e 'const {spawnSync}=require("node:child_process"); const r=spawnSync(process.execPath,["-e","process.stdout.write(\"x\".repeat(2*1024*1024))"],{encoding:"utf8"}); console.log(r.error&&r.error.code)'
Checked
- Applied the AGENTS.md policy supplied in the request; no AGENTS.md file exists in the repo checkout.
- Mapped changed surfaces, with workflow, dependency, prompt, CLI/adapter/core/schema, and render files inspected.
- Read the full changed implementations for GitHub/local adapters, CLI parsing, review orchestration, verdict derivation, Codex invocation, classification, rendering, and schema normalization.
- Cross-checked prior PR review comments against current head; several earlier issues are fixed, including read-only Codex sandbox, failure check conclusion, event SHA usage, merge-base patching, replacement callbacks, and non-empty checked validation.
- Ran pnpm exec tsc --noEmit successfully.
- Verified Node spawnSync without maxBuffer returns ENOBUFS on a 2 MiB stdout payload.
Residual Risk
- A dynamic tsx repro for normalizeReview was blocked by the read-only sandbox denying tsx IPC pipe creation, but the code path is directly visible in schema.ts and review.ts.
| checks: [], | ||
| }; | ||
|
|
||
| const bundle: Bundle = { |
There was a problem hiding this comment.
P2 (bug): Use the merge base as the bundle base SHA
GitHub mode computes the patch from mergeBase..headSha but still sends PR_BASE_SHA as bundle.baseSha. When the base branch has advanced, the prompt gives Codex a base/head pair that does not reproduce the supplied patch; in this PR, merge-base is d9ffd69... while baseSha is 474877f..., and the two ranges produce different file sets. Any reviewer command using bundle.baseSha inspects the wrong diff.
Suggested fix: Set bundle.baseSha to mergeBase, or add a separate baseRefSha field and keep baseSha as the actual diff base used for patch and changedFiles.
| }; | ||
| } | ||
|
|
||
| export function normalizeReview(raw: any): RawReview { |
There was a problem hiding this comment.
P2 (validation): Reject malformed model output before deriving verdicts
normalizeReview silently converts missing findings and residual_risks to empty arrays. review.ts now requires only summary and checked to be non-empty, so a critic response like {"summary":"ok","checked":["x"]} becomes no findings and no blocking risks, then deriveVerdict returns pass even though the review payload violated the required contract.
Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.
| import { renderMarkdown } from "../shared/render"; | ||
| import type { Bundle, PrMeta, ReviewResult } from "../shared/schema"; | ||
|
|
||
| function git(args: string[], cwd: string): string { |
There was a problem hiding this comment.
P2 (runtime): Raise git output buffers for diff collection
The git helper uses spawnSync without maxBuffer, so Node's default output cap aborts commands that emit around 1 MiB. Needlefish collects full patches with git diff; a moderately large PR will throw ENOBUFS before review. The same helper shape exists in GitHub mode, so both local and action reviews can fail on large diffs.
Suggested fix: Set a larger maxBuffer on git spawnSync, matching the 64 MiB buffer already used for gh and Codex calls.
There was a problem hiding this comment.
Needlefish PR Review
Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: f0c2f0e
changes_requested: model-output normalization can turn malformed review JSON into a passing verdict.
Findings
P2: Reject malformed review JSON before deriving verdicts
src/shared/schema.ts:114-128
Why this breaks: normalizeReview silently converts missing findings and residual_risks to empty arrays. review.ts only requires a summary and non-empty checked list, so a critic response like {"summary":"ok","checked":["x"]} reaches deriveVerdict with no findings and no blocking risks, producing pass even though the model output violated the required contract.
Suggested fix: Parse this boundary strictly: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.
Validation: Add a small unit/self-check that normalize/review rejects {"summary":"ok","checked":["x"]}; run pnpm exec tsc --noEmit
Checked
- No repo AGENTS.md exists at head/base; applied the AGENTS.md policy supplied in the review context.
- Read the workflow, CLI, GitHub/local adapters, Codex orchestration, schema normalization, verdict derivation, prompts, README, package, and tsconfig.
- Traced review output flow: runCodex -> extractJson -> normalizeReview -> review.ts shape checks -> deriveVerdict.
- Checked workflow and supply-chain surfaces, including self-hosted pull_request gating, checkout ref resolution, install step, and check-run failure path.
- Verified Codex CLI supports -s/--sandbox read-only and ran pnpm exec tsc --noEmit successfully.
Residual Risk
- Did not execute the full GitHub Action or spawn real Codex review calls because this review was constrained to read-only inspection.
| }; | ||
| } | ||
|
|
||
| export function normalizeReview(raw: any): RawReview { |
There was a problem hiding this comment.
P2 (validation): Reject malformed review JSON before deriving verdicts
normalizeReview silently converts missing findings and residual_risks to empty arrays. review.ts only requires a summary and non-empty checked list, so a critic response like {"summary":"ok","checked":["x"]} reaches deriveVerdict with no findings and no blocking risks, producing pass even though the model output violated the required contract.
Suggested fix: Parse this boundary strictly: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.
There was a problem hiding this comment.
Needlefish PR Review
Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: 3cd57e2
changes_requested: one remaining validation bug can turn malformed blocking findings into a passing review.
Findings
P2: Reject invalid finding severity before deriving verdicts
src/shared/schema.ts:97-99
Why this breaks: normalizeFinding silently converts any unrecognized severity to P3. Because deriveVerdict only blocks on P0/P1/P2, a malformed model output such as severity "P1 " for an otherwise valid blocking finding is normalized to non-blocking P3 and the review can return pass.
Suggested fix: Make finding parsing fail closed: if severity is not exactly one of P0, P1, P2, or P3, throw a malformed-output error instead of defaulting to P3. Apply the same strictness to other required finding fields.
Validation: node --import tsx -e "import { normalizeReview } from './src/shared/schema.ts'; import { deriveVerdict } from './src/core/verdict.ts'; const r=normalizeReview({summary:'s',checked:['c'],residual_risks:[],findings:[{severity:'P1 ',title:'t',category:'bug',file:'src/x.ts',lineStart:1,lineEnd:1,confidence:1,whyItBreaks:'w',suggestedFix:'f',validation:'v'}]}); console.log(r.findings[0].severity); console.log(deriveVerdict(r.findings,r.residual_risks));"
Checked
- Read changed surface map: workflow, dependency/config, docs, CLI, GitHub adapter, local adapter, Codex runner, review orchestration, verdict, renderer, schema
- Verified no repo-local AGENTS.md exists on disk; applied the supplied review policy
- Traced GitHub workflow refs through runGithub patch/base/head handling
- Traced Codex output through extractJson, normalizeReview, normalizeFinding, and deriveVerdict
- Confirmed prior PR metadata issues for merge-base, failure check conclusion, prompt replacement, git maxBuffer, and read-only Codex sandbox are addressed in this head
- Ran pnpm exec tsc --noEmit successfully
- Reproduced invalid severity normalization to P3 and resulting pass verdict with node --import tsx
Residual Risk
- Live GitHub review/check posting and real Codex execution were not exercised; this review used read-only local inspection and targeted command reproduction.
|
|
||
| export function normalizeFinding(raw: Partial<Finding>): Finding { | ||
| return { | ||
| severity: (SEVERITIES.includes(raw.severity as Severity) |
There was a problem hiding this comment.
P2 (validation): Reject invalid finding severity before deriving verdicts
normalizeFinding silently converts any unrecognized severity to P3. Because deriveVerdict only blocks on P0/P1/P2, a malformed model output such as severity "P1 " for an otherwise valid blocking finding is normalized to non-blocking P3 and the review can return pass.
Suggested fix: Make finding parsing fail closed: if severity is not exactly one of P0, P1, P2, or P3, throw a malformed-output error instead of defaulting to P3. Apply the same strictness to other required finding fields.
…ed findings can't false-pass
There was a problem hiding this comment.
Needlefish PR Review
Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: 505f89d
One actionable workflow bug: manual dispatch resolves PR refs before any checkout without telling gh which repository to query.
Findings
P2: Pass the repository when resolving manual PR refs
.github/workflows/review.yml:38-39
Why this breaks: On workflow_dispatch this step runs before actions/checkout, so there is no local git repository for gh pr view to infer the repo from. The workflow sets GITHUB_REPOSITORY, but gh documents -R/--repo or GH_REPO for explicit repo selection, so manual reviews can fail in Resolve PR refs before reaching checkout.
Suggested fix: Call gh pr view "$PR_NUM" -R "$GITHUB_REPOSITORY" ... for both head and base, or set GH_REPO: ${{ github.repository }} in the step environment.
Validation: Trigger workflow_dispatch with pr_number and confirm Resolve PR refs completes before checkout.
Checked
- Read the supplied AGENTS.md review policy and found no repo-local AGENTS.md files with find.
- Inspected the changed workflow, CLI, GitHub/local adapters, Codex runner, prompt orchestration, renderer, classifier, and schema normalization.
- Ran
pnpm check; TypeScript completed withtsc --noEmit. - Checked
gh pr view --help; explicit repository selection is-R/--repoor GH_REPO, not GITHUB_REPOSITORY.
| echo "head=$EVENT_HEAD" >> "$GITHUB_OUTPUT" | ||
| echo "base=$EVENT_BASE" >> "$GITHUB_OUTPUT" | ||
| else | ||
| echo "head=$(gh pr view "$PR_NUM" --json headRefOid -q .headRefOid)" >> "$GITHUB_OUTPUT" |
There was a problem hiding this comment.
P2 (runtime): Pass the repository when resolving manual PR refs
On workflow_dispatch this step runs before actions/checkout, so there is no local git repository for gh pr view to infer the repo from. The workflow sets GITHUB_REPOSITORY, but gh documents -R/--repo or GH_REPO for explicit repo selection, so manual reviews can fail in Resolve PR refs before reaching checkout.
Suggested fix: Call gh pr view "$PR_NUM" -R "$GITHUB_REPOSITORY" ... for both head and base, or set GH_REPO: ${{ github.repository }} in the step environment.
There was a problem hiding this comment.
Needlefish PR Review
Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: fd23bb5
changes_requested: malformed model findings can be coerced into blocking PR reviews.
Findings
P2: Reject malformed findings before deriving a verdict
src/shared/schema.ts:95-110
Why this breaks: normalizeFinding coerces missing or invalid finding fields into defaults, including a blocking P1 severity plus empty file and line 0. If Codex or the critic returns a malformed finding object, review.ts can derive changes_requested and render an unanchored '(no file):0' finding instead of failing the review-output boundary.
Suggested fix: Validate required finding fields strictly: severity/category must be exact allowed values, file/title/whyItBreaks/suggestedFix must be non-empty strings, lineStart/lineEnd must be positive finite numbers, and invalid findings should throw before deriveVerdict runs.
Validation: Add a self-check that normalizeReview rejects a finding with missing file/line or invalid severity, then run pnpm exec tsc --noEmit.
Checked
- Read supplied AGENTS.md review policy; no repo AGENTS.md exists in the workspace.
- Mapped changed surfaces: workflow, dependency/lockfile, prompts, CLI/adapters/core/shared source, docs.
- Inspected GitHub workflow ref resolution, checkout, permissions, install, and review/check posting paths.
- Traced review(), normalizeReview(), normalizeFinding(), deriveVerdict(), renderMarkdown(), runGithub(), and runLocal call paths.
- Ran pnpm exec tsc --noEmit successfully.
- Verified current Codex CLI supports -s/--sandbox read-only.
Residual Risk
- CLI smoke tests through tsx could not run in this managed read-only sandbox because tsx failed to create its IPC pipe with EPERM; this is an environment restriction, not evidence of a repo defect.
| "validation", | ||
| ]; | ||
|
|
||
| export function normalizeFinding(raw: Partial<Finding>): Finding { |
There was a problem hiding this comment.
P2 (validation): Reject malformed findings before deriving a verdict
normalizeFinding coerces missing or invalid finding fields into defaults, including a blocking P1 severity plus empty file and line 0. If Codex or the critic returns a malformed finding object, review.ts can derive changes_requested and render an unanchored '(no file):0' finding instead of failing the review-output boundary.
Suggested fix: Validate required finding fields strictly: severity/category must be exact allowed values, file/title/whyItBreaks/suggestedFix must be non-empty strings, lineStart/lineEnd must be positive finite numbers, and invalid findings should throw before deriveVerdict runs.
…e so critic can prune
There was a problem hiding this comment.
Needlefish PR Review
Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: 8cee022
changes_requested: malformed first-pass findings can be silently dropped before the critic, allowing a false pass.
Findings
P2: Fail on malformed candidate findings
src/core/review.ts:29
Why this breaks: The first Codex pass is parsed with strict=false, so normalizeReview filters out malformed findings instead of failing. A candidate output with a real P1/P2 finding that has one malformed required field is reduced to findings: [], then the critic can only prune that empty list and deriveVerdict returns pass.
Suggested fix: Parse the candidate review strictly, or make lenient parsing fail if any finding was malformed instead of silently filtering it out.
Validation: node --import tsx -e 'import { normalizeReview } from "./src/shared/schema.ts"; const raw={summary:"ok",checked:["x"],residual_risks:[],findings:[{severity:"P1",category:"bug",file:"",lineStart:10,lineEnd:10,title:"real",whyItBreaks:"breaks",suggestedFix:"fix"}]}; console.log(normalizeReview(raw,false).findings.length)'
Checked
- Read the provided AGENTS.md policy; no AGENTS.md file exists in this repo checkout.
- Reviewed workflow, CLI, GitHub/local adapters, Codex orchestration, schema normalization, rendering, prompts, package and lockfile surfaces.
- Ran pnpm check successfully: tsc --noEmit exited 0.
- Reproduced normalizeReview(raw,false) dropping a malformed P1 finding and deriveVerdict([], []) returning pass.
Residual Risk
- Network access to GitHub API was unavailable, so live PR metadata endpoint behavior could not be inspected; local code paths and gh help output were checked instead.
| ); | ||
|
|
||
| const rawCandidate = runCodex(reviewPrompt, { repoPath: bundle.repoPath }); | ||
| const candidate = normalizeReview(extractJson(rawCandidate), false); |
There was a problem hiding this comment.
P2 (validation): Fail on malformed candidate findings
The first Codex pass is parsed with strict=false, so normalizeReview filters out malformed findings instead of failing. A candidate output with a real P1/P2 finding that has one malformed required field is reduced to findings: [], then the critic can only prune that empty list and deriveVerdict returns pass.
Suggested fix: Parse the candidate review strictly, or make lenient parsing fail if any finding was malformed instead of silently filtering it out.
There was a problem hiding this comment.
Needlefish PR Review
Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: 49350eb
One actionable defect: the GitHub review fallback can attach stale findings to the live PR head.
Findings
P2: Pin fallback reviews to the reviewed commit
src/adapters/github.ts:101-110
Why this breaks: The primary review payload pins commit_id to the SHA that was reviewed, but the fallback path omits commit_id. If inline comments are rejected and a newer commit has been pushed while the old run is still finishing, this fallback can post REQUEST_CHANGES against the current PR head using findings generated from the old checkout.
Suggested fix: Include commit_id: headSha in the fallback request, or retry the same JSON payload without comments while preserving commit_id.
Validation: Force the inline-comment path to fail, push a newer commit before fallback posts, then verify the created review's commit_id remains PR_HEAD_SHA.
Checked
- Applied the supplied AGENTS.md review policy; no repository AGENTS.md file was present.
- Mapped changed surfaces, with workflow and GitHub adapter treated as high-risk.
- Read the GitHub workflow, CLI, local adapter, GitHub adapter, Codex runner, review orchestration, verdict, rendering, classification, schema normalization, package scripts, and tsconfig.
- Traced runGithub through ref resolution, diff construction, postReview, postCheck, and the failure path.
- Checked prior PR review comments in the bundle against current head changes.
- Ran pnpm exec tsc --noEmit successfully.
- Ran git diff --check successfully.
Residual Risk
- Network was restricted, so live GitHub API behavior was not exercised; review conclusions are based on local code, GitHub CLI help, and the supplied diff.
| payload | ||
| ); | ||
| } catch { | ||
| gh( |
There was a problem hiding this comment.
P2 (runtime): Pin fallback reviews to the reviewed commit
The primary review payload pins commit_id to the SHA that was reviewed, but the fallback path omits commit_id. If inline comments are rejected and a newer commit has been pushed while the old run is still finishing, this fallback can post REQUEST_CHANGES against the current PR head using findings generated from the old checkout.
Suggested fix: Include commit_id: headSha in the fallback request, or retry the same JSON payload without comments while preserving commit_id.
There was a problem hiding this comment.
Needlefish PR Review
Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: f139346
changes_requested: two concrete security defects let PR-controlled code or the model subprocess bypass the deterministic review gate.
Findings
P1: Run the reviewer from trusted code
.github/workflows/review.yml:59
Why this breaks: The workflow checks out the PR head, installs it, then executes src/cli.ts from that same checkout with write-scoped PR/check permissions. A PR that changes Needlefish itself can replace the review logic and post a passing review/check, so the merge gate is controlled by the candidate code it is supposed to evaluate.
Suggested fix: Use a trusted checkout of Needlefish, such as the base branch or a pinned release, as the reviewer tool, and pass the PR checkout only as the target repo via --repo.
Validation: Open a test PR that changes src/cli.ts to short-circuit success; the workflow should still run the trusted reviewer and not the modified PR copy.
P1: Scrub write tokens from the Codex subprocess
src/shared/codex.ts:23-29
Why this breaks: spawnSync is called without an env override, so codex exec inherits the parent environment. In GitHub mode the parent step provides GH_TOKEN with pull-requests:write and checks:write, which lets the model-driven subprocess run gh api and mutate reviews or checks directly, bypassing Needlefish's parser and deterministic verdict.
Suggested fix: Pass an explicit sanitized env to spawnSync that removes GH_TOKEN, GITHUB_TOKEN, and other GitHub write-token variables from the Codex child while keeping only the auth/config needed to run Codex.
Validation: Run runCodex with a CODEX_BIN stub that prints process.env.GH_TOKEN; it should be absent even when the parent process has GH_TOKEN set.
Checked
- Applied the provided AGENTS.md review policy; rg found no repo-local AGENTS.md file.
- Inspected changed workflow, GitHub adapter, Codex subprocess wrapper, schema normalization, verdict derivation, and CLI entrypoint.
- Ran pnpm check; TypeScript completed successfully.
- Verified Node child processes inherit environment variables by default with a small spawnSync check.
- Searched for GH_TOKEN, GITHUB_TOKEN, actions/checkout, pnpm install, codex exec, and spawnSync usage across the changed files.
Residual Risk
- Did not execute the live GitHub Action or post to GitHub APIs; verdict is based on static workflow and subprocess inspection plus local TypeScript verification.
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| PR_BASE_SHA: ${{ steps.refs.outputs.base }} | ||
| PR_HEAD_SHA: ${{ steps.refs.outputs.head }} | ||
| run: ./node_modules/.bin/tsx src/cli.ts --github --pr ${{ github.event.inputs.pr_number || github.event.pull_request.number }} |
There was a problem hiding this comment.
P1 (security): Run the reviewer from trusted code
The workflow checks out the PR head, installs it, then executes src/cli.ts from that same checkout with write-scoped PR/check permissions. A PR that changes Needlefish itself can replace the review logic and post a passing review/check, so the merge gate is controlled by the candidate code it is supposed to evaluate.
Suggested fix: Use a trusted checkout of Needlefish, such as the base branch or a pinned release, as the reviewer tool, and pass the PR checkout only as the target repo via --repo.
| const args = ["exec", "--color", "never", "-s", "read-only", "--output-last-message", lastMsg]; | ||
| if (model) args.push("-m", model); | ||
|
|
||
| const res = spawnSync(bin, args, { |
There was a problem hiding this comment.
P1 (security): Scrub write tokens from the Codex subprocess
spawnSync is called without an env override, so codex exec inherits the parent environment. In GitHub mode the parent step provides GH_TOKEN with pull-requests:write and checks:write, which lets the model-driven subprocess run gh api and mutate reviews or checks directly, bypassing Needlefish's parser and deterministic verdict.
Suggested fix: Pass an explicit sanitized env to spawnSync that removes GH_TOKEN, GITHUB_TOKEN, and other GitHub write-token variables from the Codex child while keeping only the auth/config needed to run Codex.
There was a problem hiding this comment.
Needlefish PR Review
Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: 68e5e34
changes_requested: one security defect lets PR-controlled code operate the review merge gate.
Findings
P1: Run the reviewer from trusted code
.github/workflows/review.yml:51-59
Why this breaks: The job checks out the PR head, installs that checkout, then runs src/cli.ts from the same PR with pull-requests: write and checks: write. A PR can change package lifecycle behavior or the Needlefish CLI itself to post a passing review/check, so the candidate code controls the merge gate that is supposed to review it.
Suggested fix: Run Needlefish from a trusted checkout such as the base branch or a pinned release, and check out the PR only as the target repository passed via --repo.
Validation: Open a test PR that changes src/cli.ts to skip review and post success; verify the current workflow executes that PR-controlled file, then verify the fixed workflow executes the trusted checkout instead.
Checked
- Read the supplied AGENTS.md policy; no AGENTS.md exists in the repo or base commit.
- Inspected changed workflow, GitHub adapter, local adapter, CLI parser, Codex invocation, schema normalization, verdict derivation, prompts, package metadata, and README contract.
- Verified previous review issues for read-only Codex sandboxing, GitHub token scrubbing, failure check conclusion, merge-base diffing, prompt replacement callbacks, strict output parsing, and git maxBuffer are fixed in head.
pnpm exec tsc --noEmitexited 0.git diff --check d9ffd69b3de124964aa76d5179c6244e513fc34d...68e5e34ca1f099547b00db8c40f3fe12cb750219exited 0.codex exec --helpconfirms-s read-onlyis the sandbox flag used bysrc/shared/codex.ts.
Residual Risk
- Could not run the tsx CLI smoke check in this read-only sandbox because tsx failed to create its IPC pipe with EPERM; typecheck covered compile-time behavior.
- No live GitHub Actions run or GitHub API posting was executed; workflow behavior was reviewed statically from the checked-out files.
|
|
||
| - name: Install | ||
| shell: zsh -l {0} | ||
| run: corepack enable && pnpm install --frozen-lockfile |
There was a problem hiding this comment.
P1 (security): Run the reviewer from trusted code
The job checks out the PR head, installs that checkout, then runs src/cli.ts from the same PR with pull-requests: write and checks: write. A PR can change package lifecycle behavior or the Needlefish CLI itself to post a passing review/check, so the candidate code controls the merge gate that is supposed to review it.
Suggested fix: Run Needlefish from a trusted checkout such as the base branch or a pinned release, and check out the PR only as the target repository passed via --repo.
|
Shipped to main as v0.1 (commit 094a271) with the trusted-tool self-review workflow. Dogfood complete: the tool found ~16 real defects in itself across the run, all fixed. The implement branch is kept for the fix history. |
Scaffold: schema + deterministic verdict, 2-call Codex orchestration (review + adversarial critic), local + github adapters, read-only by default.
Self-review trigger: the
needlefish-reviewaction on main will review this PR using the PR's own src/.Dogfooded locally: reviewed itself, found 2 real bugs (git->gh for pr metadata; broken pnpm allowBuilds placeholder), both fixed.