Skip to content

Implement needlefish: core, adapters, prompts#1

Closed
frankekn wants to merge 25 commits into
mainfrom
implement
Closed

Implement needlefish: core, adapters, prompts#1
frankekn wants to merge 25 commits into
mainfrom
implement

Conversation

@frankekn

Copy link
Copy Markdown
Owner

Scaffold: schema + deterministic verdict, 2-call Codex orchestration (review + adversarial critic), local + github adapters, read-only by default.

Self-review trigger: the needlefish-review action on main will review this PR using the PR's own src/.

Dogfooded locally: reviewed itself, found 2 real bugs (git->gh for pr metadata; broken pnpm allowBuilds placeholder), both fixed.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: 9ed22ebHead: 612df2d

changes_requested: two concrete security defects let review runs mutate worktrees or execute untrusted PR code on a persistent runner.

Findings

P1: Force Codex reviews into read-only sandbox mode

src/shared/codex.ts:20

Why this breaks: Needlefish advertises read-only review, but this invocation inherits the caller's Codex config. On this runner the local config is approval_policy=never and sandbox_mode=danger-full-access, so the model spawned by a review can run mutating commands despite the prompt saying not to edit files.

Suggested fix: Add explicit Codex exec flags such as --sandbox read-only and a non-interactive approval policy/config override so review runs cannot write regardless of user config.

Validation: codex exec --sandbox read-only --color never "print exactly ok"

P1: Do not run pull_request code on a persistent self-hosted runner

.github/workflows/review.yml:18-28

Why this breaks: The workflow triggers on pull_request, checks out the PR head, then runs pnpm install on a self-hosted runner. A PR can change install scripts or repository code and execute arbitrary commands on Frank's persistent machine before review logic runs.

Suggested fix: Gate this job to same-repository PRs or run it only inside an ephemeral isolated runner/container; do not execute fork PR code on the persistent self-hosted host.

Validation: Open a fork PR that changes package install behavior or src/cli.ts and confirm the workflow no longer runs on the persistent self-hosted runner.

Checked

  • No AGENTS.md file exists under /Users/termtek/actions-runner/_work/needlefish/needlefish; applied the provided review policy from the context bundle.
  • Inspected changed workflow, CLI, local/GitHub adapters, review orchestration, Codex runner, schema normalization, and package config.
  • Ran pnpm check; TypeScript completed with tsc --noEmit.
  • Verified Codex CLI supports --sandbox read-only and that current local Codex config is sandbox_mode = "danger-full-access" with approval_policy = "never".

Residual Risk

  • I did not exercise live GitHub review/check-run posting against PR #1, so API-shape issues beyond static inspection are not ruled out.

Comment thread src/shared/codex.ts Outdated

const tmp = mkdtempSync(path.join(os.tmpdir(), "needlefish-"));
const lastMsg = path.join(tmp, "last.txt");
const args = ["exec", "--color", "never", "--output-last-message", lastMsg];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 (security): Force Codex reviews into read-only sandbox mode

Needlefish advertises read-only review, but this invocation inherits the caller's Codex config. On this runner the local config is approval_policy=never and sandbox_mode=danger-full-access, so the model spawned by a review can run mutating commands despite the prompt saying not to edit files.

Suggested fix: Add explicit Codex exec flags such as --sandbox read-only and a non-interactive approval policy/config override so review runs cannot write regardless of user config.


jobs:
review:
runs-on: self-hosted

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 (security): Do not run pull_request code on a persistent self-hosted runner

The workflow triggers on pull_request, checks out the PR head, then runs pnpm install on a self-hosted runner. A PR can change install scripts or repository code and execute arbitrary commands on Frank's persistent machine before review logic runs.

Suggested fix: Gate this job to same-repository PRs or run it only inside an ephemeral isolated runner/container; do not execute fork PR code on the persistent self-hosted host.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a26882a28a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/adapters/github.ts Outdated
repo,
headSha,
null,
"neutral",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail the check when the review errors

In GitHub Action mode, this failure path reports the advertised Needlefish check as neutral, but GitHub allows required checks with neutral conclusions to satisfy branch protection (docs). If review(bundle) or posting the review throws, a repo that requires the Needlefish check can still merge the PR despite the review never completing, contrary to the “failed review never passes” behavior; use a failing conclusion for this path instead.

Useful? React with 👍 / 👎.

Comment thread src/shared/codex.ts Outdated

const tmp = mkdtempSync(path.join(os.tmpdir(), "needlefish-"));
const lastMsg = path.join(tmp, "last.txt");
const args = ["exec", "--color", "never", "--output-last-message", lastMsg];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Force Codex into read-only mode

The tool is advertised as read-only, but this codex exec invocation does not pass --sandbox read-only or --ask-for-approval never, and the Codex CLI documents that the sandbox defaults to the user configuration (docs). When a developer or runner has Codex configured for workspace-write or full access, the review model can edit the target repo while supposedly only reviewing it; make the read-only sandbox explicit for these calls.

Useful? React with 👍 / 👎.


jobs:
review:
runs-on: self-hosted

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Isolate self-hosted PR jobs from untrusted code

For PRs from forks or any repo with untrusted contributors, this pull_request workflow runs on a persistent self-hosted runner and then checks out and installs the PR's code. GitHub warns that self-hosted runners can be persistently compromised by untrusted workflow code (docs); use a GitHub-hosted or ephemeral isolated runner for PRs, or gate this job to trusted branches/authors only.

Useful? React with 👍 / 👎.

Comment thread src/adapters/github.ts Outdated
repo,
headSha,
null,
"neutral",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail the check when the review errors

In GitHub Action mode, this failure path reports the advertised Needlefish check as neutral, but GitHub allows required checks with neutral conclusions to satisfy branch protection (docs). If review(bundle) or posting the review throws, a repo that requires the Needlefish check can still merge the PR despite the review never completing, contrary to the “failed review never passes” behavior; use a failing conclusion for this path instead.

Useful? React with 👍 / 👎.

Comment thread src/adapters/github.ts Outdated
Comment on lines +140 to +143
const headSha = pr.head?.sha ?? git(["rev-parse", "HEAD"], cwd);
const baseSha = pr.base?.sha ?? git(["merge-base", "origin/" + (pr.base?.ref ?? "main"), "HEAD"], cwd);

const patch = ghText(["pr", "diff", String(prNumber)]);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Review the checked-out PR SHA, not the live head

This fetches the current PR head and diff from GitHub after the workflow has already checked out the event's head SHA. If a newer commit is pushed while an older run is still installing or queued, the run will inspect the old workspace but post the review/check against the newer pr.head.sha, allowing a stale run to approve or fail the wrong commit; derive the head SHA and patch from the checked-out/event SHA or cancel superseded runs.

Useful? React with 👍 / 👎.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: 9ed22ebHead: 5db438c

changes_requested: GitHub mode can report a failed review as mergeable and can review/post against a different PR head than the checked-out code.

Findings

P1: Fail the Needlefish check when review execution fails

src/adapters/github.ts:191

Why this breaks: When review(), postReview(), or the successful postCheck() path throws, the catch block creates the same Needlefish check run with conclusion neutral. Repositories that require this check can treat neutral as non-failing, so a PR can merge even though Needlefish never completed its review.

Suggested fix: Post the failure-path check with conclusion failure instead of neutral, or use a distinct non-required check name for infrastructure failures.

Validation: Force review(bundle) to throw in --github mode and verify the check-runs payload uses conclusion=failure for name=Needlefish.

P1: Review the checked-out PR SHA, not the live PR head

src/adapters/github.ts:140-143

Why this breaks: The workflow checks out github.event.pull_request.head.sha, but runGithub then reads the current PR head and diff from GitHub. If a newer commit is pushed while an older workflow run is queued or installing, this run inspects the old workspace while posting the review and check against the newer pr.head.sha, allowing stale results to approve or fail the wrong commit.

Suggested fix: Derive headSha from git rev-parse HEAD or the event SHA passed by the workflow, and build the patch from the checked-out workspace (git diff baseSha HEAD) instead of gh pr diff's live PR state.

Validation: Start a run for commit A, push commit B before the review step, and verify the run posts against A or cancels instead of posting a check on B.

Checked

  • Read repository policy supplied in the context bundle; no AGENTS.md exists in the checked-out tree.
  • Mapped changed surfaces: workflow, docs, dependency lockfile, CLI/source runtime, Codex invocation, rendering/schema.
  • Inspected GitHub adapter failure handling, check-run posting, PR metadata collection, head/base/patch selection, and inline review fallback.
  • Verified prior review comments: workflow is gated to same-repository PRs and Codex exec passes -s read-only.
  • Searched for existing Needlefish check behavior, neutral conclusions, headSha usage, GitHub workflow checkout SHA, and read-only Codex invocation.
  • Ran pnpm check; TypeScript completed with tsc --noEmit.

Comment thread src/adapters/github.ts Outdated
repo,
headSha,
null,
"neutral",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 (runtime): Fail the Needlefish check when review execution fails

When review(), postReview(), or the successful postCheck() path throws, the catch block creates the same Needlefish check run with conclusion neutral. Repositories that require this check can treat neutral as non-failing, so a PR can merge even though Needlefish never completed its review.

Suggested fix: Post the failure-path check with conclusion failure instead of neutral, or use a distinct non-required check name for infrastructure failures.

Comment thread src/adapters/github.ts Outdated
"api",
`repos/${repo}/pulls/${prNumber}`,
]);
const headSha = pr.head?.sha ?? git(["rev-parse", "HEAD"], cwd);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 (runtime): Review the checked-out PR SHA, not the live PR head

The workflow checks out github.event.pull_request.head.sha, but runGithub then reads the current PR head and diff from GitHub. If a newer commit is pushed while an older workflow run is queued or installing, this run inspects the old workspace while posting the review and check against the newer pr.head.sha, allowing stale results to approve or fail the wrong commit.

Suggested fix: Derive headSha from git rev-parse HEAD or the event SHA passed by the workflow, and build the patch from the checked-out workspace (git diff baseSha HEAD) instead of gh pr diff's live PR state.

@frankekn frankekn left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diagnostic ping

@frankekn frankekn left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diagnostic ping 2

@frankekn frankekn left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty-comments test

@frankekn frankekn left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no-comments-key test

@frankekn frankekn left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: pass

No findings.

Checked

  • a
  • b

@frankekn frankekn closed this Jun 26, 2026
@frankekn frankekn reopened this Jun 26, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69Head: 7df7bae

changes requested: prompt assembly can corrupt reviewed diffs, and manual workflow dispatch reviews the wrong ref.

Findings

P2: Use callback replacements for prompt payloads

src/core/review.ts:23-33

Why this breaks: String.replace treats dollar sequences such as $& in replacement strings as substitution tokens. Because the replacement payload is the JSON bundle and raw patch, any reviewed diff containing those sequences is rewritten before Codex sees it, so Needlefish can review corrupted code and miss or invent findings.

Suggested fix: Pass replacement callbacks for dynamic payloads, e.g. replace("{{BUNDLE}}", () => JSON.stringify(bundle, null, 2)) and replace("{{PATCH}}", () => bundle.patch).

Validation: node -e 'console.log("A {{PATCH}} B".replace("{{PATCH}}", "x$&y")); console.log("A {{PATCH}} B".replace("{{PATCH}}", () => "x$&y"))'

P2: Check out the requested PR for manual dispatch

.github/workflows/review.yml:29-35

Why this breaks: On workflow_dispatch there is no github.event.pull_request object, so checkout falls back to github.ref and BASE_REF falls back to main while the run step still passes the input PR number. A manual review therefore inspects the dispatch branch, often main with an empty diff, instead of the requested PR head.

Suggested fix: For workflow_dispatch, resolve the PR head/base from the input PR number before review and fetch/checkout that head, or remove the manual trigger until it can supply PR_HEAD_SHA, PR_BASE_SHA, and the matching workspace checkout.

Validation: Trigger workflow_dispatch with pr_number for an open PR from the default branch context and verify git rev-parse HEAD equals that PR head SHA and git diff base..HEAD is non-empty.

Checked

  • Read supplied review policy; repo has no AGENTS.md file to apply.
  • Reviewed changed workflow, CLI, adapters, core review orchestration, Codex wrapper, schema, render, classify, package, and lockfile surfaces.
  • Traced GitHub workflow inputs through runGithub head/base/patch selection.
  • Ran pnpm exec tsc --noEmit successfully.
  • Verified codex exec supports -s/--sandbox read-only.
  • Reproduced JavaScript String.replace replacement-token expansion with node.

Residual Risk

  • Did not execute the GitHub workflow or post GitHub reviews/checks from this read-only review; workflow_dispatch behavior is derived from the workflow expressions and runGithub fallback code.

Comment thread src/core/review.ts
export async function review(
bundle: Bundle
): Promise<ReviewResult> {
const reviewPrompt = loadPrompt("review.md").replace(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (bug): Use callback replacements for prompt payloads

String.replace treats dollar sequences such as $& in replacement strings as substitution tokens. Because the replacement payload is the JSON bundle and raw patch, any reviewed diff containing those sequences is rewritten before Codex sees it, so Needlefish can review corrupted code and miss or invent findings.

Suggested fix: Pass replacement callbacks for dynamic payloads, e.g. replace("{{BUNDLE}}", () => JSON.stringify(bundle, null, 2)) and replace("{{PATCH}}", () => bundle.patch).

Comment thread .github/workflows/review.yml Outdated
- name: Checkout
uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha || github.ref }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (runtime): Check out the requested PR for manual dispatch

On workflow_dispatch there is no github.event.pull_request object, so checkout falls back to github.ref and BASE_REF falls back to main while the run step still passes the input PR number. A manual review therefore inspects the dispatch branch, often main with an empty diff, instead of the requested PR head.

Suggested fix: For workflow_dispatch, resolve the PR head/base from the input PR number before review and fetch/checkout that head, or remove the manual trigger until it can supply PR_HEAD_SHA, PR_BASE_SHA, and the matching workspace checkout.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: 474877fHead: 3f78261

changes_requested: GitHub mode can review the wrong diff, and malformed model output can be normalized into a passing review.

Findings

P1: Compare PR heads against the merge base

src/adapters/github.ts:138-140

Why this breaks: PR_BASE_SHA is supplied from the workflow as the current base-branch tip, but GitHub PR review diffs must be computed from the merge base. When the base branch has advanced since the PR branched, git diff baseSha HEAD reviews unrelated base changes and can also miss files that are part of the PR. In this checkout, git merge-base 474877f1f2da1683339040fe5bf563ab2c48b509 3f782614d7df761c880c5c743177273044f5ee20 is d9ffd69..., not the supplied base SHA, and the two ranges produce different file sets.

Suggested fix: Compute a diff base with git merge-base baseSha HEAD in runGithub before building patch and changedFiles, and put that merge-base SHA in the bundle.

Validation: Create or use a PR branch whose base branch advanced after branching; verify needlefish --github --pr Nsends the same files asgit diff --name-only $(git merge-base HEAD) HEAD.

P2: Reject malformed review JSON instead of passing it

src/shared/schema.ts:114-128

Why this breaks: normalizeReview({}) returns empty findings and empty blocking risks, so deriveVerdict turns malformed Codex or critic output into pass. A bad model response should fail the Needlefish check, not green-light the PR with an empty review.

Suggested fix: Replace the permissive defaults with strict boundary parsing: require summary, findings, checked, and residual_risks to have the expected types, reject invalid severities/categories/line numbers, and throw so GitHub mode posts the failure check.

Validation: Stub the Codex output to {}and confirm review execution fails instead of producingpass; then run pnpm check.

Checked

  • No local AGENTS.md exists; the context bundle also has agentsMd=null.
  • Reviewed changed workflow, adapters, core review flow, schema normalization, prompts, and README contract.
  • Ran pnpm check (tsc --noEmit): passed.
  • Ran git diff --check for the reviewed range: passed.
  • Compared supplied base/head against their merge base and confirmed the reviewed range differs from the PR merge-base range.
  • Checked gh pr view --json fields; baseRef is invalid, but that output is unused and command substitution does not fail the step, so it was not raised as a blocking finding.

Comment thread src/adapters/github.ts

const pr = gh(["api", `repos/${repo}/pulls/${prNumber}`]);
const headSha = process.env.PR_HEAD_SHA || git(["rev-parse", "HEAD"], cwd);
const baseSha = process.env.PR_BASE_SHA || git(["merge-base", "origin/main", "HEAD"], cwd);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 (bug): Compare PR heads against the merge base

PR_BASE_SHA is supplied from the workflow as the current base-branch tip, but GitHub PR review diffs must be computed from the merge base. When the base branch has advanced since the PR branched, git diff baseSha HEAD reviews unrelated base changes and can also miss files that are part of the PR. In this checkout, git merge-base 474877f1f2da1683339040fe5bf563ab2c48b509 3f782614d7df761c880c5c743177273044f5ee20 is d9ffd69..., not the supplied base SHA, and the two ranges produce different file sets.

Suggested fix: Compute a diff base with git merge-base baseSha HEAD in runGithub before building patch and changedFiles, and put that merge-base SHA in the bundle.

Comment thread src/shared/schema.ts Outdated
};
}

export function normalizeReview(raw: any): RawReview {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (validation): Reject malformed review JSON instead of passing it

normalizeReview({}) returns empty findings and empty blocking risks, so deriveVerdict turns malformed Codex or critic output into pass. A bad model response should fail the Needlefish check, not green-light the PR with an empty review.

Suggested fix: Replace the permissive defaults with strict boundary parsing: require summary, findings, checked, and residual_risks to have the expected types, reject invalid severities/categories/line numbers, and throw so GitHub mode posts the failure check.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: 474877fHead: 908ef40

changes requested: malformed model output can still be normalized into a passing review.

Findings

P2: Reject malformed review JSON before deriving a verdict

src/shared/schema.ts:114-128

Why this breaks: normalizeReview silently converts missing findings and residual_risks to empty arrays, and invalid finding fields to defaults. A critic response like {"summary":"ok"} is accepted by review.ts, then deriveVerdict sees no blocking findings or risks and returns pass even though the review output was malformed.

Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, reject invalid severities/categories/line numbers, and throw instead of defaulting missing arrays to empty.

Validation: Add a small check that normalizeReview({ summary: 'ok' }) throws, then run pnpm exec tsc --noEmit

Checked

  • Read the supplied AGENTS.md review policy; no repo AGENTS.md exists in the worktree
  • Mapped workflow, dependency, docs, CLI, core, and schema surfaces from the changed files
  • Inspected GitHub workflow dispatch/head/base handling and runGithub diff construction
  • Verified Codex exec uses -s read-only against the local codex exec --help output
  • Ran pnpm exec tsc --noEmit successfully
  • Checked pnpm config recognition of pnpm-workspace.yaml allowBuilds
  • Confirmed gh pr view does not support baseRef; the current command emits an error but only populates an unused base_ref output

Residual Risk

  • Runtime tsx snippets could not be executed in this sandbox because tsx failed to create its IPC pipe, but static inspection and tsc were sufficient for the reported validation bug.

Comment thread src/shared/schema.ts Outdated
};
}

export function normalizeReview(raw: any): RawReview {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (validation): Reject malformed review JSON before deriving a verdict

normalizeReview silently converts missing findings and residual_risks to empty arrays, and invalid finding fields to defaults. A critic response like {"summary":"ok"} is accepted by review.ts, then deriveVerdict sees no blocking findings or risks and returns pass even though the review output was malformed.

Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, reject invalid severities/categories/line numbers, and throw instead of defaulting missing arrays to empty.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: 474877fHead: a79deb4

changes_requested: one model-output validation bug can turn malformed review JSON into a passing verdict.

Findings

P2: Reject malformed model output before deriving a verdict

src/shared/schema.ts:114-129

Why this breaks: normalizeReview silently converts missing findings, checked, and residual_risks fields to empty arrays. review.ts only requires a non-empty summary, so a critic response like {"summary":"ok"} normalizes to no findings/no blocking risks and deriveVerdict returns pass, green-lighting a PR even though the review payload did not match the required contract.

Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.

Validation: node --import tsx -e 'import { normalizeReview } from "./src/shared/schema.ts"; import { deriveVerdict } from "./src/core/verdict.ts"; const r=normalizeReview({summary:"ok"}); console.log(JSON.stringify({r, verdict: deriveVerdict(r.findings, r.residual_risks)}));'

Checked

  • No repository AGENTS.md exists under /Users/termtek/actions-runner/_work/needlefish/needlefish; applied the supplied review policy.
  • Inspected the changed workflow, GitHub adapter, local adapter, CLI, review orchestration, Codex invocation, renderer, classifier, and schema.
  • Ran pnpm check; TypeScript completed successfully.
  • Confirmed normalizeReview({summary:"ok"}) produces empty findings/residual risks and deriveVerdict returns pass.

Comment thread src/shared/schema.ts Outdated
};
}

export function normalizeReview(raw: any): RawReview {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (validation): Reject malformed model output before deriving a verdict

normalizeReview silently converts missing findings, checked, and residual_risks fields to empty arrays. review.ts only requires a non-empty summary, so a critic response like {"summary":"ok"} normalizes to no findings/no blocking risks and deriveVerdict returns pass, green-lighting a PR even though the review payload did not match the required contract.

Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: 474877fHead: cab9f99

changes_requested: three concrete bugs remain in PR range metadata, model-output validation, and large diff collection.

Findings

P2: Use the merge base as the bundle base SHA

src/adapters/github.ts:163-166

Why this breaks: GitHub mode computes the patch from mergeBase..headSha but still sends PR_BASE_SHA as bundle.baseSha. When the base branch has advanced, the prompt gives Codex a base/head pair that does not reproduce the supplied patch; in this PR, merge-base is d9ffd69... while baseSha is 474877f..., and the two ranges produce different file sets. Any reviewer command using bundle.baseSha inspects the wrong diff.

Suggested fix: Set bundle.baseSha to mergeBase, or add a separate baseRefSha field and keep baseSha as the actual diff base used for patch and changedFiles.

Validation: git merge-base "$PR_BASE_SHA" "$PR_HEAD_SHA" && compare git diff --name-only "$PR_BASE_SHA" "$PR_HEAD_SHA" with git diff --name-only "$(git merge-base "$PR_BASE_SHA" "$PR_HEAD_SHA")" "$PR_HEAD_SHA"

P2: Reject malformed model output before deriving verdicts

src/shared/schema.ts:114-128

Why this breaks: normalizeReview silently converts missing findings and residual_risks to empty arrays. review.ts now requires only summary and checked to be non-empty, so a critic response like {"summary":"ok","checked":["x"]} becomes no findings and no blocking risks, then deriveVerdict returns pass even though the review payload violated the required contract.

Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.

Validation: Add a small test or script that feeds {"summary":"ok","checked":["x"]} through normalizeReview/review verdict flow and assert it throws instead of returning pass.

P2: Raise git output buffers for diff collection

src/adapters/local.ts:10-11

Why this breaks: The git helper uses spawnSync without maxBuffer, so Node's default output cap aborts commands that emit around 1 MiB. Needlefish collects full patches with git diff; a moderately large PR will throw ENOBUFS before review. The same helper shape exists in GitHub mode, so both local and action reviews can fail on large diffs.

Suggested fix: Set a larger maxBuffer on git spawnSync, matching the 64 MiB buffer already used for gh and Codex calls.

Validation: node -e 'const {spawnSync}=require("node:child_process"); const r=spawnSync(process.execPath,["-e","process.stdout.write(\"x\".repeat(2*1024*1024))"],{encoding:"utf8"}); console.log(r.error&&r.error.code)'

Checked

  • Applied the AGENTS.md policy supplied in the request; no AGENTS.md file exists in the repo checkout.
  • Mapped changed surfaces, with workflow, dependency, prompt, CLI/adapter/core/schema, and render files inspected.
  • Read the full changed implementations for GitHub/local adapters, CLI parsing, review orchestration, verdict derivation, Codex invocation, classification, rendering, and schema normalization.
  • Cross-checked prior PR review comments against current head; several earlier issues are fixed, including read-only Codex sandbox, failure check conclusion, event SHA usage, merge-base patching, replacement callbacks, and non-empty checked validation.
  • Ran pnpm exec tsc --noEmit successfully.
  • Verified Node spawnSync without maxBuffer returns ENOBUFS on a 2 MiB stdout payload.

Residual Risk

  • A dynamic tsx repro for normalizeReview was blocked by the read-only sandbox denying tsx IPC pipe creation, but the code path is directly visible in schema.ts and review.ts.

Comment thread src/adapters/github.ts
checks: [],
};

const bundle: Bundle = {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (bug): Use the merge base as the bundle base SHA

GitHub mode computes the patch from mergeBase..headSha but still sends PR_BASE_SHA as bundle.baseSha. When the base branch has advanced, the prompt gives Codex a base/head pair that does not reproduce the supplied patch; in this PR, merge-base is d9ffd69... while baseSha is 474877f..., and the two ranges produce different file sets. Any reviewer command using bundle.baseSha inspects the wrong diff.

Suggested fix: Set bundle.baseSha to mergeBase, or add a separate baseRefSha field and keep baseSha as the actual diff base used for patch and changedFiles.

Comment thread src/shared/schema.ts Outdated
};
}

export function normalizeReview(raw: any): RawReview {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (validation): Reject malformed model output before deriving verdicts

normalizeReview silently converts missing findings and residual_risks to empty arrays. review.ts now requires only summary and checked to be non-empty, so a critic response like {"summary":"ok","checked":["x"]} becomes no findings and no blocking risks, then deriveVerdict returns pass even though the review payload violated the required contract.

Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.

Comment thread src/adapters/local.ts
import { renderMarkdown } from "../shared/render";
import type { Bundle, PrMeta, ReviewResult } from "../shared/schema";

function git(args: string[], cwd: string): string {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (runtime): Raise git output buffers for diff collection

The git helper uses spawnSync without maxBuffer, so Node's default output cap aborts commands that emit around 1 MiB. Needlefish collects full patches with git diff; a moderately large PR will throw ENOBUFS before review. The same helper shape exists in GitHub mode, so both local and action reviews can fail on large diffs.

Suggested fix: Set a larger maxBuffer on git spawnSync, matching the 64 MiB buffer already used for gh and Codex calls.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69Head: f0c2f0e

changes_requested: model-output normalization can turn malformed review JSON into a passing verdict.

Findings

P2: Reject malformed review JSON before deriving verdicts

src/shared/schema.ts:114-128

Why this breaks: normalizeReview silently converts missing findings and residual_risks to empty arrays. review.ts only requires a summary and non-empty checked list, so a critic response like {"summary":"ok","checked":["x"]} reaches deriveVerdict with no findings and no blocking risks, producing pass even though the model output violated the required contract.

Suggested fix: Parse this boundary strictly: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.

Validation: Add a small unit/self-check that normalize/review rejects {"summary":"ok","checked":["x"]}; run pnpm exec tsc --noEmit

Checked

  • No repo AGENTS.md exists at head/base; applied the AGENTS.md policy supplied in the review context.
  • Read the workflow, CLI, GitHub/local adapters, Codex orchestration, schema normalization, verdict derivation, prompts, README, package, and tsconfig.
  • Traced review output flow: runCodex -> extractJson -> normalizeReview -> review.ts shape checks -> deriveVerdict.
  • Checked workflow and supply-chain surfaces, including self-hosted pull_request gating, checkout ref resolution, install step, and check-run failure path.
  • Verified Codex CLI supports -s/--sandbox read-only and ran pnpm exec tsc --noEmit successfully.

Residual Risk

  • Did not execute the full GitHub Action or spawn real Codex review calls because this review was constrained to read-only inspection.

Comment thread src/shared/schema.ts Outdated
};
}

export function normalizeReview(raw: any): RawReview {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (validation): Reject malformed review JSON before deriving verdicts

normalizeReview silently converts missing findings and residual_risks to empty arrays. review.ts only requires a summary and non-empty checked list, so a critic response like {"summary":"ok","checked":["x"]} reaches deriveVerdict with no findings and no blocking risks, producing pass even though the model output violated the required contract.

Suggested fix: Parse this boundary strictly: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69Head: 3cd57e2

changes_requested: one remaining validation bug can turn malformed blocking findings into a passing review.

Findings

P2: Reject invalid finding severity before deriving verdicts

src/shared/schema.ts:97-99

Why this breaks: normalizeFinding silently converts any unrecognized severity to P3. Because deriveVerdict only blocks on P0/P1/P2, a malformed model output such as severity "P1 " for an otherwise valid blocking finding is normalized to non-blocking P3 and the review can return pass.

Suggested fix: Make finding parsing fail closed: if severity is not exactly one of P0, P1, P2, or P3, throw a malformed-output error instead of defaulting to P3. Apply the same strictness to other required finding fields.

Validation: node --import tsx -e "import { normalizeReview } from './src/shared/schema.ts'; import { deriveVerdict } from './src/core/verdict.ts'; const r=normalizeReview({summary:'s',checked:['c'],residual_risks:[],findings:[{severity:'P1 ',title:'t',category:'bug',file:'src/x.ts',lineStart:1,lineEnd:1,confidence:1,whyItBreaks:'w',suggestedFix:'f',validation:'v'}]}); console.log(r.findings[0].severity); console.log(deriveVerdict(r.findings,r.residual_risks));"

Checked

  • Read changed surface map: workflow, dependency/config, docs, CLI, GitHub adapter, local adapter, Codex runner, review orchestration, verdict, renderer, schema
  • Verified no repo-local AGENTS.md exists on disk; applied the supplied review policy
  • Traced GitHub workflow refs through runGithub patch/base/head handling
  • Traced Codex output through extractJson, normalizeReview, normalizeFinding, and deriveVerdict
  • Confirmed prior PR metadata issues for merge-base, failure check conclusion, prompt replacement, git maxBuffer, and read-only Codex sandbox are addressed in this head
  • Ran pnpm exec tsc --noEmit successfully
  • Reproduced invalid severity normalization to P3 and resulting pass verdict with node --import tsx

Residual Risk

  • Live GitHub review/check posting and real Codex execution were not exercised; this review used read-only local inspection and targeted command reproduction.

Comment thread src/shared/schema.ts Outdated

export function normalizeFinding(raw: Partial<Finding>): Finding {
return {
severity: (SEVERITIES.includes(raw.severity as Severity)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (validation): Reject invalid finding severity before deriving verdicts

normalizeFinding silently converts any unrecognized severity to P3. Because deriveVerdict only blocks on P0/P1/P2, a malformed model output such as severity "P1 " for an otherwise valid blocking finding is normalized to non-blocking P3 and the review can return pass.

Suggested fix: Make finding parsing fail closed: if severity is not exactly one of P0, P1, P2, or P3, throw a malformed-output error instead of defaulting to P3. Apply the same strictness to other required finding fields.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69Head: 505f89d

One actionable workflow bug: manual dispatch resolves PR refs before any checkout without telling gh which repository to query.

Findings

P2: Pass the repository when resolving manual PR refs

.github/workflows/review.yml:38-39

Why this breaks: On workflow_dispatch this step runs before actions/checkout, so there is no local git repository for gh pr view to infer the repo from. The workflow sets GITHUB_REPOSITORY, but gh documents -R/--repo or GH_REPO for explicit repo selection, so manual reviews can fail in Resolve PR refs before reaching checkout.

Suggested fix: Call gh pr view "$PR_NUM" -R "$GITHUB_REPOSITORY" ... for both head and base, or set GH_REPO: ${{ github.repository }} in the step environment.

Validation: Trigger workflow_dispatch with pr_number and confirm Resolve PR refs completes before checkout.

Checked

  • Read the supplied AGENTS.md review policy and found no repo-local AGENTS.md files with find.
  • Inspected the changed workflow, CLI, GitHub/local adapters, Codex runner, prompt orchestration, renderer, classifier, and schema normalization.
  • Ran pnpm check; TypeScript completed with tsc --noEmit.
  • Checked gh pr view --help; explicit repository selection is -R/--repo or GH_REPO, not GITHUB_REPOSITORY.

Comment thread .github/workflows/review.yml Outdated
echo "head=$EVENT_HEAD" >> "$GITHUB_OUTPUT"
echo "base=$EVENT_BASE" >> "$GITHUB_OUTPUT"
else
echo "head=$(gh pr view "$PR_NUM" --json headRefOid -q .headRefOid)" >> "$GITHUB_OUTPUT"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (runtime): Pass the repository when resolving manual PR refs

On workflow_dispatch this step runs before actions/checkout, so there is no local git repository for gh pr view to infer the repo from. The workflow sets GITHUB_REPOSITORY, but gh documents -R/--repo or GH_REPO for explicit repo selection, so manual reviews can fail in Resolve PR refs before reaching checkout.

Suggested fix: Call gh pr view "$PR_NUM" -R "$GITHUB_REPOSITORY" ... for both head and base, or set GH_REPO: ${{ github.repository }} in the step environment.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69Head: fd23bb5

changes_requested: malformed model findings can be coerced into blocking PR reviews.

Findings

P2: Reject malformed findings before deriving a verdict

src/shared/schema.ts:95-110

Why this breaks: normalizeFinding coerces missing or invalid finding fields into defaults, including a blocking P1 severity plus empty file and line 0. If Codex or the critic returns a malformed finding object, review.ts can derive changes_requested and render an unanchored '(no file):0' finding instead of failing the review-output boundary.

Suggested fix: Validate required finding fields strictly: severity/category must be exact allowed values, file/title/whyItBreaks/suggestedFix must be non-empty strings, lineStart/lineEnd must be positive finite numbers, and invalid findings should throw before deriveVerdict runs.

Validation: Add a self-check that normalizeReview rejects a finding with missing file/line or invalid severity, then run pnpm exec tsc --noEmit.

Checked

  • Read supplied AGENTS.md review policy; no repo AGENTS.md exists in the workspace.
  • Mapped changed surfaces: workflow, dependency/lockfile, prompts, CLI/adapters/core/shared source, docs.
  • Inspected GitHub workflow ref resolution, checkout, permissions, install, and review/check posting paths.
  • Traced review(), normalizeReview(), normalizeFinding(), deriveVerdict(), renderMarkdown(), runGithub(), and runLocal call paths.
  • Ran pnpm exec tsc --noEmit successfully.
  • Verified current Codex CLI supports -s/--sandbox read-only.

Residual Risk

  • CLI smoke tests through tsx could not run in this managed read-only sandbox because tsx failed to create its IPC pipe with EPERM; this is an environment restriction, not evidence of a repo defect.

Comment thread src/shared/schema.ts Outdated
"validation",
];

export function normalizeFinding(raw: Partial<Finding>): Finding {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (validation): Reject malformed findings before deriving a verdict

normalizeFinding coerces missing or invalid finding fields into defaults, including a blocking P1 severity plus empty file and line 0. If Codex or the critic returns a malformed finding object, review.ts can derive changes_requested and render an unanchored '(no file):0' finding instead of failing the review-output boundary.

Suggested fix: Validate required finding fields strictly: severity/category must be exact allowed values, file/title/whyItBreaks/suggestedFix must be non-empty strings, lineStart/lineEnd must be positive finite numbers, and invalid findings should throw before deriveVerdict runs.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69Head: 8cee022

changes_requested: malformed first-pass findings can be silently dropped before the critic, allowing a false pass.

Findings

P2: Fail on malformed candidate findings

src/core/review.ts:29

Why this breaks: The first Codex pass is parsed with strict=false, so normalizeReview filters out malformed findings instead of failing. A candidate output with a real P1/P2 finding that has one malformed required field is reduced to findings: [], then the critic can only prune that empty list and deriveVerdict returns pass.

Suggested fix: Parse the candidate review strictly, or make lenient parsing fail if any finding was malformed instead of silently filtering it out.

Validation: node --import tsx -e 'import { normalizeReview } from "./src/shared/schema.ts"; const raw={summary:"ok",checked:["x"],residual_risks:[],findings:[{severity:"P1",category:"bug",file:"",lineStart:10,lineEnd:10,title:"real",whyItBreaks:"breaks",suggestedFix:"fix"}]}; console.log(normalizeReview(raw,false).findings.length)'

Checked

  • Read the provided AGENTS.md policy; no AGENTS.md file exists in this repo checkout.
  • Reviewed workflow, CLI, GitHub/local adapters, Codex orchestration, schema normalization, rendering, prompts, package and lockfile surfaces.
  • Ran pnpm check successfully: tsc --noEmit exited 0.
  • Reproduced normalizeReview(raw,false) dropping a malformed P1 finding and deriveVerdict([], []) returning pass.

Residual Risk

  • Network access to GitHub API was unavailable, so live PR metadata endpoint behavior could not be inspected; local code paths and gh help output were checked instead.

Comment thread src/core/review.ts Outdated
);

const rawCandidate = runCodex(reviewPrompt, { repoPath: bundle.repoPath });
const candidate = normalizeReview(extractJson(rawCandidate), false);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (validation): Fail on malformed candidate findings

The first Codex pass is parsed with strict=false, so normalizeReview filters out malformed findings instead of failing. A candidate output with a real P1/P2 finding that has one malformed required field is reduced to findings: [], then the critic can only prune that empty list and deriveVerdict returns pass.

Suggested fix: Parse the candidate review strictly, or make lenient parsing fail if any finding was malformed instead of silently filtering it out.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69Head: 49350eb

One actionable defect: the GitHub review fallback can attach stale findings to the live PR head.

Findings

P2: Pin fallback reviews to the reviewed commit

src/adapters/github.ts:101-110

Why this breaks: The primary review payload pins commit_id to the SHA that was reviewed, but the fallback path omits commit_id. If inline comments are rejected and a newer commit has been pushed while the old run is still finishing, this fallback can post REQUEST_CHANGES against the current PR head using findings generated from the old checkout.

Suggested fix: Include commit_id: headSha in the fallback request, or retry the same JSON payload without comments while preserving commit_id.

Validation: Force the inline-comment path to fail, push a newer commit before fallback posts, then verify the created review's commit_id remains PR_HEAD_SHA.

Checked

  • Applied the supplied AGENTS.md review policy; no repository AGENTS.md file was present.
  • Mapped changed surfaces, with workflow and GitHub adapter treated as high-risk.
  • Read the GitHub workflow, CLI, local adapter, GitHub adapter, Codex runner, review orchestration, verdict, rendering, classification, schema normalization, package scripts, and tsconfig.
  • Traced runGithub through ref resolution, diff construction, postReview, postCheck, and the failure path.
  • Checked prior PR review comments in the bundle against current head changes.
  • Ran pnpm exec tsc --noEmit successfully.
  • Ran git diff --check successfully.

Residual Risk

  • Network was restricted, so live GitHub API behavior was not exercised; review conclusions are based on local code, GitHub CLI help, and the supplied diff.

Comment thread src/adapters/github.ts
payload
);
} catch {
gh(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (runtime): Pin fallback reviews to the reviewed commit

The primary review payload pins commit_id to the SHA that was reviewed, but the fallback path omits commit_id. If inline comments are rejected and a newer commit has been pushed while the old run is still finishing, this fallback can post REQUEST_CHANGES against the current PR head using findings generated from the old checkout.

Suggested fix: Include commit_id: headSha in the fallback request, or retry the same JSON payload without comments while preserving commit_id.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69Head: f139346

changes_requested: two concrete security defects let PR-controlled code or the model subprocess bypass the deterministic review gate.

Findings

P1: Run the reviewer from trusted code

.github/workflows/review.yml:59

Why this breaks: The workflow checks out the PR head, installs it, then executes src/cli.ts from that same checkout with write-scoped PR/check permissions. A PR that changes Needlefish itself can replace the review logic and post a passing review/check, so the merge gate is controlled by the candidate code it is supposed to evaluate.

Suggested fix: Use a trusted checkout of Needlefish, such as the base branch or a pinned release, as the reviewer tool, and pass the PR checkout only as the target repo via --repo.

Validation: Open a test PR that changes src/cli.ts to short-circuit success; the workflow should still run the trusted reviewer and not the modified PR copy.

P1: Scrub write tokens from the Codex subprocess

src/shared/codex.ts:23-29

Why this breaks: spawnSync is called without an env override, so codex exec inherits the parent environment. In GitHub mode the parent step provides GH_TOKEN with pull-requests:write and checks:write, which lets the model-driven subprocess run gh api and mutate reviews or checks directly, bypassing Needlefish's parser and deterministic verdict.

Suggested fix: Pass an explicit sanitized env to spawnSync that removes GH_TOKEN, GITHUB_TOKEN, and other GitHub write-token variables from the Codex child while keeping only the auth/config needed to run Codex.

Validation: Run runCodex with a CODEX_BIN stub that prints process.env.GH_TOKEN; it should be absent even when the parent process has GH_TOKEN set.

Checked

  • Applied the provided AGENTS.md review policy; rg found no repo-local AGENTS.md file.
  • Inspected changed workflow, GitHub adapter, Codex subprocess wrapper, schema normalization, verdict derivation, and CLI entrypoint.
  • Ran pnpm check; TypeScript completed successfully.
  • Verified Node child processes inherit environment variables by default with a small spawnSync check.
  • Searched for GH_TOKEN, GITHUB_TOKEN, actions/checkout, pnpm install, codex exec, and spawnSync usage across the changed files.

Residual Risk

  • Did not execute the live GitHub Action or post to GitHub APIs; verdict is based on static workflow and subprocess inspection plus local TypeScript verification.

GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
PR_BASE_SHA: ${{ steps.refs.outputs.base }}
PR_HEAD_SHA: ${{ steps.refs.outputs.head }}
run: ./node_modules/.bin/tsx src/cli.ts --github --pr ${{ github.event.inputs.pr_number || github.event.pull_request.number }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 (security): Run the reviewer from trusted code

The workflow checks out the PR head, installs it, then executes src/cli.ts from that same checkout with write-scoped PR/check permissions. A PR that changes Needlefish itself can replace the review logic and post a passing review/check, so the merge gate is controlled by the candidate code it is supposed to evaluate.

Suggested fix: Use a trusted checkout of Needlefish, such as the base branch or a pinned release, as the reviewer tool, and pass the PR checkout only as the target repo via --repo.

Comment thread src/shared/codex.ts
const args = ["exec", "--color", "never", "-s", "read-only", "--output-last-message", lastMsg];
if (model) args.push("-m", model);

const res = spawnSync(bin, args, {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 (security): Scrub write tokens from the Codex subprocess

spawnSync is called without an env override, so codex exec inherits the parent environment. In GitHub mode the parent step provides GH_TOKEN with pull-requests:write and checks:write, which lets the model-driven subprocess run gh api and mutate reviews or checks directly, bypassing Needlefish's parser and deterministic verdict.

Suggested fix: Pass an explicit sanitized env to spawnSync that removes GH_TOKEN, GITHUB_TOKEN, and other GitHub write-token variables from the Codex child while keeping only the auth/config needed to run Codex.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69Head: 68e5e34

changes_requested: one security defect lets PR-controlled code operate the review merge gate.

Findings

P1: Run the reviewer from trusted code

.github/workflows/review.yml:51-59

Why this breaks: The job checks out the PR head, installs that checkout, then runs src/cli.ts from the same PR with pull-requests: write and checks: write. A PR can change package lifecycle behavior or the Needlefish CLI itself to post a passing review/check, so the candidate code controls the merge gate that is supposed to review it.

Suggested fix: Run Needlefish from a trusted checkout such as the base branch or a pinned release, and check out the PR only as the target repository passed via --repo.

Validation: Open a test PR that changes src/cli.ts to skip review and post success; verify the current workflow executes that PR-controlled file, then verify the fixed workflow executes the trusted checkout instead.

Checked

  • Read the supplied AGENTS.md policy; no AGENTS.md exists in the repo or base commit.
  • Inspected changed workflow, GitHub adapter, local adapter, CLI parser, Codex invocation, schema normalization, verdict derivation, prompts, package metadata, and README contract.
  • Verified previous review issues for read-only Codex sandboxing, GitHub token scrubbing, failure check conclusion, merge-base diffing, prompt replacement callbacks, strict output parsing, and git maxBuffer are fixed in head.
  • pnpm exec tsc --noEmit exited 0.
  • git diff --check d9ffd69b3de124964aa76d5179c6244e513fc34d...68e5e34ca1f099547b00db8c40f3fe12cb750219 exited 0.
  • codex exec --help confirms -s read-only is the sandbox flag used by src/shared/codex.ts.

Residual Risk

  • Could not run the tsx CLI smoke check in this read-only sandbox because tsx failed to create its IPC pipe with EPERM; typecheck covered compile-time behavior.
  • No live GitHub Actions run or GitHub API posting was executed; workflow behavior was reviewed statically from the checked-out files.


- name: Install
shell: zsh -l {0}
run: corepack enable && pnpm install --frozen-lockfile

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 (security): Run the reviewer from trusted code

The job checks out the PR head, installs that checkout, then runs src/cli.ts from the same PR with pull-requests: write and checks: write. A PR can change package lifecycle behavior or the Needlefish CLI itself to post a passing review/check, so the candidate code controls the merge gate that is supposed to review it.

Suggested fix: Run Needlefish from a trusted checkout such as the base branch or a pinned release, and check out the PR only as the target repository passed via --repo.

@frankekn

Copy link
Copy Markdown
Owner Author

Shipped to main as v0.1 (commit 094a271) with the trusted-tool self-review workflow. Dogfood complete: the tool found ~16 real defects in itself across the run, all fixed. The implement branch is kept for the fix history.

@frankekn frankekn closed this Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant