Implement needlefish: core, adapters, prompts by frankekn · Pull Request #1 · frankekn/needlefish

frankekn · 2026-06-26T07:47:52Z

Scaffold: schema + deterministic verdict, 2-call Codex orchestration (review + adversarial critic), local + github adapters, read-only by default.

Self-review trigger: the needlefish-review action on main will review this PR using the PR's own src/.

Dogfooded locally: reviewed itself, found 2 real bugs (git->gh for pr metadata; broken pnpm allowBuilds placeholder), both fixed.

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: 9ed22eb → Head: 612df2d

changes_requested: two concrete security defects let review runs mutate worktrees or execute untrusted PR code on a persistent runner.

Findings

P1: Force Codex reviews into read-only sandbox mode

src/shared/codex.ts:20

Why this breaks: Needlefish advertises read-only review, but this invocation inherits the caller's Codex config. On this runner the local config is approval_policy=never and sandbox_mode=danger-full-access, so the model spawned by a review can run mutating commands despite the prompt saying not to edit files.

Suggested fix: Add explicit Codex exec flags such as --sandbox read-only and a non-interactive approval policy/config override so review runs cannot write regardless of user config.

Validation: codex exec --sandbox read-only --color never "print exactly ok"

P1: Do not run pull_request code on a persistent self-hosted runner

.github/workflows/review.yml:18-28

Why this breaks: The workflow triggers on pull_request, checks out the PR head, then runs pnpm install on a self-hosted runner. A PR can change install scripts or repository code and execute arbitrary commands on Frank's persistent machine before review logic runs.

Suggested fix: Gate this job to same-repository PRs or run it only inside an ephemeral isolated runner/container; do not execute fork PR code on the persistent self-hosted host.

Validation: Open a fork PR that changes package install behavior or src/cli.ts and confirm the workflow no longer runs on the persistent self-hosted runner.

Checked

No AGENTS.md file exists under /Users/termtek/actions-runner/_work/needlefish/needlefish; applied the provided review policy from the context bundle.
Inspected changed workflow, CLI, local/GitHub adapters, review orchestration, Codex runner, schema normalization, and package config.
Ran pnpm check; TypeScript completed with tsc --noEmit.
Verified Codex CLI supports --sandbox read-only and that current local Codex config is sandbox_mode = "danger-full-access" with approval_policy = "never".

Residual Risk

I did not exercise live GitHub review/check-run posting against PR #1, so API-shape issues beyond static inspection are not ruled out.

github-actions · 2026-06-26T07:57:42Z

+
+  const tmp = mkdtempSync(path.join(os.tmpdir(), "needlefish-"));
+  const lastMsg = path.join(tmp, "last.txt");
+  const args = ["exec", "--color", "never", "--output-last-message", lastMsg];


P1 (security): Force Codex reviews into read-only sandbox mode

Needlefish advertises read-only review, but this invocation inherits the caller's Codex config. On this runner the local config is approval_policy=never and sandbox_mode=danger-full-access, so the model spawned by a review can run mutating commands despite the prompt saying not to edit files.

Suggested fix: Add explicit Codex exec flags such as --sandbox read-only and a non-interactive approval policy/config override so review runs cannot write regardless of user config.

github-actions · 2026-06-26T07:57:42Z

+
+jobs:
+  review:
+    runs-on: self-hosted


P1 (security): Do not run pull_request code on a persistent self-hosted runner

The workflow triggers on pull_request, checks out the PR head, then runs pnpm install on a self-hosted runner. A PR can change install scripts or repository code and execute arbitrary commands on Frank's persistent machine before review logic runs.

Suggested fix: Gate this job to same-repository PRs or run it only inside an ephemeral isolated runner/container; do not execute fork PR code on the persistent self-hosted host.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a26882a28a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-26T08:00:00Z

+      repo,
+      headSha,
+      null,
+      "neutral",


Fail the check when the review errors

In GitHub Action mode, this failure path reports the advertised Needlefish check as neutral, but GitHub allows required checks with neutral conclusions to satisfy branch protection (docs). If review(bundle) or posting the review throws, a repo that requires the Needlefish check can still merge the PR despite the review never completing, contrary to the “failed review never passes” behavior; use a failing conclusion for this path instead.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-26T08:00:01Z

+
+  const tmp = mkdtempSync(path.join(os.tmpdir(), "needlefish-"));
+  const lastMsg = path.join(tmp, "last.txt");
+  const args = ["exec", "--color", "never", "--output-last-message", lastMsg];


Force Codex into read-only mode

The tool is advertised as read-only, but this codex exec invocation does not pass --sandbox read-only or --ask-for-approval never, and the Codex CLI documents that the sandbox defaults to the user configuration (docs). When a developer or runner has Codex configured for workspace-write or full access, the review model can edit the target repo while supposedly only reviewing it; make the read-only sandbox explicit for these calls.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-26T08:00:01Z

+
+jobs:
+  review:
+    runs-on: self-hosted


Isolate self-hosted PR jobs from untrusted code

For PRs from forks or any repo with untrusted contributors, this pull_request workflow runs on a persistent self-hosted runner and then checks out and installs the PR's code. GitHub warns that self-hosted runners can be persistently compromised by untrusted workflow code (docs); use a GitHub-hosted or ephemeral isolated runner for PRs, or gate this job to trusted branches/authors only.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-26T08:00:01Z

+      repo,
+      headSha,
+      null,
+      "neutral",


Fail the check when the review errors

In GitHub Action mode, this failure path reports the advertised Needlefish check as neutral, but GitHub allows required checks with neutral conclusions to satisfy branch protection (docs). If review(bundle) or posting the review throws, a repo that requires the Needlefish check can still merge the PR despite the review never completing, contrary to the “failed review never passes” behavior; use a failing conclusion for this path instead.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-26T08:00:01Z

+  const headSha = pr.head?.sha ?? git(["rev-parse", "HEAD"], cwd);
+  const baseSha = pr.base?.sha ?? git(["merge-base", "origin/" + (pr.base?.ref ?? "main"), "HEAD"], cwd);
+
+  const patch = ghText(["pr", "diff", String(prNumber)]);


Review the checked-out PR SHA, not the live head

This fetches the current PR head and diff from GitHub after the workflow has already checked out the event's head SHA. If a newer commit is pushed while an older run is still installing or queued, the run will inspect the old workspace but post the review/check against the newer pr.head.sha, allowing a stale run to approve or fail the wrong commit; derive the head SHA and patch from the checked-out/event SHA or cancel superseded runs.

Useful? React with 👍 / 👎.

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: 9ed22eb → Head: 5db438c

changes_requested: GitHub mode can report a failed review as mergeable and can review/post against a different PR head than the checked-out code.

Findings

P1: Fail the Needlefish check when review execution fails

src/adapters/github.ts:191

Why this breaks: When review(), postReview(), or the successful postCheck() path throws, the catch block creates the same Needlefish check run with conclusion neutral. Repositories that require this check can treat neutral as non-failing, so a PR can merge even though Needlefish never completed its review.

Suggested fix: Post the failure-path check with conclusion failure instead of neutral, or use a distinct non-required check name for infrastructure failures.

Validation: Force review(bundle) to throw in --github mode and verify the check-runs payload uses conclusion=failure for name=Needlefish.

P1: Review the checked-out PR SHA, not the live PR head

src/adapters/github.ts:140-143

Why this breaks: The workflow checks out github.event.pull_request.head.sha, but runGithub then reads the current PR head and diff from GitHub. If a newer commit is pushed while an older workflow run is queued or installing, this run inspects the old workspace while posting the review and check against the newer pr.head.sha, allowing stale results to approve or fail the wrong commit.

Suggested fix: Derive headSha from git rev-parse HEAD or the event SHA passed by the workflow, and build the patch from the checked-out workspace (git diff baseSha HEAD) instead of gh pr diff's live PR state.

Validation: Start a run for commit A, push commit B before the review step, and verify the run posts against A or cancels instead of posting a check on B.

Checked

Read repository policy supplied in the context bundle; no AGENTS.md exists in the checked-out tree.
Mapped changed surfaces: workflow, docs, dependency lockfile, CLI/source runtime, Codex invocation, rendering/schema.
Inspected GitHub adapter failure handling, check-run posting, PR metadata collection, head/base/patch selection, and inline review fallback.
Verified prior review comments: workflow is gated to same-repository PRs and Codex exec passes -s read-only.
Searched for existing Needlefish check behavior, neutral conclusions, headSha usage, GitHub workflow checkout SHA, and read-only Codex invocation.
Ran pnpm check; TypeScript completed with tsc --noEmit.

github-actions · 2026-06-26T08:04:16Z

+      repo,
+      headSha,
+      null,
+      "neutral",


P1 (runtime): Fail the Needlefish check when review execution fails

When review(), postReview(), or the successful postCheck() path throws, the catch block creates the same Needlefish check run with conclusion neutral. Repositories that require this check can treat neutral as non-failing, so a PR can merge even though Needlefish never completed its review.

Suggested fix: Post the failure-path check with conclusion failure instead of neutral, or use a distinct non-required check name for infrastructure failures.

github-actions · 2026-06-26T08:04:16Z

+    "api",
+    `repos/${repo}/pulls/${prNumber}`,
+  ]);
+  const headSha = pr.head?.sha ?? git(["rev-parse", "HEAD"], cwd);


P1 (runtime): Review the checked-out PR SHA, not the live PR head

The workflow checks out github.event.pull_request.head.sha, but runGithub then reads the current PR head and diff from GitHub. If a newer commit is pushed while an older workflow run is queued or installing, this run inspects the old workspace while posting the review and check against the newer pr.head.sha, allowing stale results to approve or fail the wrong commit.

Suggested fix: Derive headSha from git rev-parse HEAD or the event SHA passed by the workflow, and build the patch from the checked-out workspace (git diff baseSha HEAD) instead of gh pr diff's live PR state.

…EAD, not live pr head

…and locality gap

frankekn

diagnostic ping

frankekn

diagnostic ping 2

frankekn

empty-comments test

frankekn

no-comments-key test

frankekn

Needlefish PR Review

Verdict: pass

No findings.

Checked

a
b

…_dispatch)

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: 7df7bae

changes requested: prompt assembly can corrupt reviewed diffs, and manual workflow dispatch reviews the wrong ref.

Findings

P2: Use callback replacements for prompt payloads

src/core/review.ts:23-33

Why this breaks: String.replace treats dollar sequences such as $& in replacement strings as substitution tokens. Because the replacement payload is the JSON bundle and raw patch, any reviewed diff containing those sequences is rewritten before Codex sees it, so Needlefish can review corrupted code and miss or invent findings.

Suggested fix: Pass replacement callbacks for dynamic payloads, e.g. replace("{{BUNDLE}}", () => JSON.stringify(bundle, null, 2)) and replace("{{PATCH}}", () => bundle.patch).

Validation: node -e 'console.log("A {{PATCH}} B".replace("{{PATCH}}", "x$&y")); console.log("A {{PATCH}} B".replace("{{PATCH}}", () => "x$&y"))'

P2: Check out the requested PR for manual dispatch

.github/workflows/review.yml:29-35

Why this breaks: On workflow_dispatch there is no github.event.pull_request object, so checkout falls back to github.ref and BASE_REF falls back to main while the run step still passes the input PR number. A manual review therefore inspects the dispatch branch, often main with an empty diff, instead of the requested PR head.

Suggested fix: For workflow_dispatch, resolve the PR head/base from the input PR number before review and fetch/checkout that head, or remove the manual trigger until it can supply PR_HEAD_SHA, PR_BASE_SHA, and the matching workspace checkout.

Validation: Trigger workflow_dispatch with pr_number for an open PR from the default branch context and verify git rev-parse HEAD equals that PR head SHA and git diff base..HEAD is non-empty.

Checked

Read supplied review policy; repo has no AGENTS.md file to apply.
Reviewed changed workflow, CLI, adapters, core review orchestration, Codex wrapper, schema, render, classify, package, and lockfile surfaces.
Traced GitHub workflow inputs through runGithub head/base/patch selection.
Ran pnpm exec tsc --noEmit successfully.
Verified codex exec supports -s/--sandbox read-only.
Reproduced JavaScript String.replace replacement-token expansion with node.

Residual Risk

Did not execute the GitHub workflow or post GitHub reviews/checks from this read-only review; workflow_dispatch behavior is derived from the workflow expressions and runGithub fallback code.

github-actions · 2026-06-26T09:35:21Z

+export async function review(
+  bundle: Bundle
+): Promise<ReviewResult> {
+  const reviewPrompt = loadPrompt("review.md").replace(


P2 (bug): Use callback replacements for prompt payloads

String.replace treats dollar sequences such as $& in replacement strings as substitution tokens. Because the replacement payload is the JSON bundle and raw patch, any reviewed diff containing those sequences is rewritten before Codex sees it, so Needlefish can review corrupted code and miss or invent findings.

Suggested fix: Pass replacement callbacks for dynamic payloads, e.g. replace("{{BUNDLE}}", () => JSON.stringify(bundle, null, 2)) and replace("{{PATCH}}", () => bundle.patch).

github-actions · 2026-06-26T09:35:21Z

+      - name: Checkout
+        uses: actions/checkout@v4
+        with:
+          ref: ${{ github.event.pull_request.head.sha || github.ref }}


P2 (runtime): Check out the requested PR for manual dispatch

On workflow_dispatch there is no github.event.pull_request object, so checkout falls back to github.ref and BASE_REF falls back to main while the run step still passes the input PR number. A manual review therefore inspects the dispatch branch, often main with an empty diff, instead of the requested PR head.

Suggested fix: For workflow_dispatch, resolve the PR head/base from the input PR number before review and fetch/checkout that head, or remove the manual trigger until it can supply PR_HEAD_SHA, PR_BASE_SHA, and the matching workspace checkout.

…ngled

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: 474877f → Head: 3f78261

changes_requested: GitHub mode can review the wrong diff, and malformed model output can be normalized into a passing review.

Findings

P1: Compare PR heads against the merge base

src/adapters/github.ts:138-140

Why this breaks: PR_BASE_SHA is supplied from the workflow as the current base-branch tip, but GitHub PR review diffs must be computed from the merge base. When the base branch has advanced since the PR branched, git diff baseSha HEAD reviews unrelated base changes and can also miss files that are part of the PR. In this checkout, git merge-base 474877f1f2da1683339040fe5bf563ab2c48b509 3f782614d7df761c880c5c743177273044f5ee20 is d9ffd69..., not the supplied base SHA, and the two ranges produce different file sets.

Suggested fix: Compute a diff base with git merge-base baseSha HEAD in runGithub before building patch and changedFiles, and put that merge-base SHA in the bundle.

Validation: Create or use a PR branch whose base branch advanced after branching; verify needlefish --github --pr Nsends the same files asgit diff --name-only $(git merge-base HEAD) HEAD.

P2: Reject malformed review JSON instead of passing it

src/shared/schema.ts:114-128

Why this breaks: normalizeReview({}) returns empty findings and empty blocking risks, so deriveVerdict turns malformed Codex or critic output into pass. A bad model response should fail the Needlefish check, not green-light the PR with an empty review.

Suggested fix: Replace the permissive defaults with strict boundary parsing: require summary, findings, checked, and residual_risks to have the expected types, reject invalid severities/categories/line numbers, and throw so GitHub mode posts the failure check.

Validation: Stub the Codex output to {}and confirm review execution fails instead of producingpass; then run pnpm check.

Checked

No local AGENTS.md exists; the context bundle also has agentsMd=null.
Reviewed changed workflow, adapters, core review flow, schema normalization, prompts, and README contract.
Ran pnpm check (tsc --noEmit): passed.
Ran git diff --check for the reviewed range: passed.
Compared supplied base/head against their merge base and confirmed the reviewed range differs from the PR merge-base range.
Checked gh pr view --json fields; baseRef is invalid, but that output is unused and command substitution does not fail the step, so it was not raised as a blocking finding.

github-actions · 2026-06-26T10:45:44Z

+
+  const pr = gh(["api", `repos/${repo}/pulls/${prNumber}`]);
+  const headSha = process.env.PR_HEAD_SHA || git(["rev-parse", "HEAD"], cwd);
+  const baseSha = process.env.PR_BASE_SHA || git(["merge-base", "origin/main", "HEAD"], cwd);


P1 (bug): Compare PR heads against the merge base

PR_BASE_SHA is supplied from the workflow as the current base-branch tip, but GitHub PR review diffs must be computed from the merge base. When the base branch has advanced since the PR branched, git diff baseSha HEAD reviews unrelated base changes and can also miss files that are part of the PR. In this checkout, git merge-base 474877f1f2da1683339040fe5bf563ab2c48b509 3f782614d7df761c880c5c743177273044f5ee20 is d9ffd69..., not the supplied base SHA, and the two ranges produce different file sets.

Suggested fix: Compute a diff base with git merge-base baseSha HEAD in runGithub before building patch and changedFiles, and put that merge-base SHA in the bundle.

github-actions · 2026-06-26T10:45:44Z

+  };
+}
+
+export function normalizeReview(raw: any): RawReview {


P2 (validation): Reject malformed review JSON instead of passing it

normalizeReview({}) returns empty findings and empty blocking risks, so deriveVerdict turns malformed Codex or critic output into pass. A bad model response should fail the Needlefish check, not green-light the PR with an empty review.

Suggested fix: Replace the permissive defaults with strict boundary parsing: require summary, findings, checked, and residual_risks to have the expected types, reject invalid severities/categories/line numbers, and throw so GitHub mode posts the failure check.

… output as error

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: 474877f → Head: 908ef40

changes requested: malformed model output can still be normalized into a passing review.

Findings

P2: Reject malformed review JSON before deriving a verdict

src/shared/schema.ts:114-128

Why this breaks: normalizeReview silently converts missing findings and residual_risks to empty arrays, and invalid finding fields to defaults. A critic response like {"summary":"ok"} is accepted by review.ts, then deriveVerdict sees no blocking findings or risks and returns pass even though the review output was malformed.

Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, reject invalid severities/categories/line numbers, and throw instead of defaulting missing arrays to empty.

Validation: Add a small check that normalizeReview({ summary: 'ok' }) throws, then run pnpm exec tsc --noEmit

Checked

Read the supplied AGENTS.md review policy; no repo AGENTS.md exists in the worktree
Mapped workflow, dependency, docs, CLI, core, and schema surfaces from the changed files
Inspected GitHub workflow dispatch/head/base handling and runGithub diff construction
Verified Codex exec uses -s read-only against the local codex exec --help output
Ran pnpm exec tsc --noEmit successfully
Checked pnpm config recognition of pnpm-workspace.yaml allowBuilds
Confirmed gh pr view does not support baseRef; the current command emits an error but only populates an unused base_ref output

Residual Risk

Runtime tsx snippets could not be executed in this sandbox because tsx failed to create its IPC pipe, but static inspection and tsc were sufficient for the reported validation bug.

github-actions · 2026-06-26T10:55:29Z

+  };
+}
+
+export function normalizeReview(raw: any): RawReview {


P2 (validation): Reject malformed review JSON before deriving a verdict

normalizeReview silently converts missing findings and residual_risks to empty arrays, and invalid finding fields to defaults. A critic response like {"summary":"ok"} is accepted by review.ts, then deriveVerdict sees no blocking findings or risks and returns pass even though the review output was malformed.

Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, reject invalid severities/categories/line numbers, and throw instead of defaulting missing arrays to empty.

… base_ref field

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: 474877f → Head: a79deb4

changes_requested: one model-output validation bug can turn malformed review JSON into a passing verdict.

Findings

P2: Reject malformed model output before deriving a verdict

src/shared/schema.ts:114-129

Why this breaks: normalizeReview silently converts missing findings, checked, and residual_risks fields to empty arrays. review.ts only requires a non-empty summary, so a critic response like {"summary":"ok"} normalizes to no findings/no blocking risks and deriveVerdict returns pass, green-lighting a PR even though the review payload did not match the required contract.

Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.

Validation: node --import tsx -e 'import { normalizeReview } from "./src/shared/schema.ts"; import { deriveVerdict } from "./src/core/verdict.ts"; const r=normalizeReview({summary:"ok"}); console.log(JSON.stringify({r, verdict: deriveVerdict(r.findings, r.residual_risks)}));'

Checked

No repository AGENTS.md exists under /Users/termtek/actions-runner/_work/needlefish/needlefish; applied the supplied review policy.
Inspected the changed workflow, GitHub adapter, local adapter, CLI, review orchestration, Codex invocation, renderer, classifier, and schema.
Ran pnpm check; TypeScript completed successfully.
Confirmed normalizeReview({summary:"ok"}) produces empty findings/residual risks and deriveVerdict returns pass.

github-actions · 2026-06-26T11:15:08Z

+  };
+}
+
+export function normalizeReview(raw: any): RawReview {


P2 (validation): Reject malformed model output before deriving a verdict

normalizeReview silently converts missing findings, checked, and residual_risks fields to empty arrays. review.ts only requires a non-empty summary, so a critic response like {"summary":"ok"} normalizes to no findings/no blocking risks and deriveVerdict returns pass, green-lighting a PR even though the review payload did not match the required contract.

Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.

…d a pass

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: 474877f → Head: cab9f99

changes_requested: three concrete bugs remain in PR range metadata, model-output validation, and large diff collection.

Findings

P2: Use the merge base as the bundle base SHA

src/adapters/github.ts:163-166

Why this breaks: GitHub mode computes the patch from mergeBase..headSha but still sends PR_BASE_SHA as bundle.baseSha. When the base branch has advanced, the prompt gives Codex a base/head pair that does not reproduce the supplied patch; in this PR, merge-base is d9ffd69... while baseSha is 474877f..., and the two ranges produce different file sets. Any reviewer command using bundle.baseSha inspects the wrong diff.

Suggested fix: Set bundle.baseSha to mergeBase, or add a separate baseRefSha field and keep baseSha as the actual diff base used for patch and changedFiles.

Validation: git merge-base "$PR_BASE_SHA" "$PR_HEAD_SHA" && compare git diff --name-only "$PR_BASE_SHA" "$PR_HEAD_SHA" with git diff --name-only "$(git merge-base "$PR_BASE_SHA" "$PR_HEAD_SHA")" "$PR_HEAD_SHA"

P2: Reject malformed model output before deriving verdicts

src/shared/schema.ts:114-128

Why this breaks: normalizeReview silently converts missing findings and residual_risks to empty arrays. review.ts now requires only summary and checked to be non-empty, so a critic response like {"summary":"ok","checked":["x"]} becomes no findings and no blocking risks, then deriveVerdict returns pass even though the review payload violated the required contract.

Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.

Validation: Add a small test or script that feeds {"summary":"ok","checked":["x"]} through normalizeReview/review verdict flow and assert it throws instead of returning pass.

P2: Raise git output buffers for diff collection

src/adapters/local.ts:10-11

Why this breaks: The git helper uses spawnSync without maxBuffer, so Node's default output cap aborts commands that emit around 1 MiB. Needlefish collects full patches with git diff; a moderately large PR will throw ENOBUFS before review. The same helper shape exists in GitHub mode, so both local and action reviews can fail on large diffs.

Suggested fix: Set a larger maxBuffer on git spawnSync, matching the 64 MiB buffer already used for gh and Codex calls.

Validation: node -e 'const {spawnSync}=require("node:child_process"); const r=spawnSync(process.execPath,["-e","process.stdout.write(\"x\".repeat(2*1024*1024))"],{encoding:"utf8"}); console.log(r.error&&r.error.code)'

Checked

Applied the AGENTS.md policy supplied in the request; no AGENTS.md file exists in the repo checkout.
Mapped changed surfaces, with workflow, dependency, prompt, CLI/adapter/core/schema, and render files inspected.
Read the full changed implementations for GitHub/local adapters, CLI parsing, review orchestration, verdict derivation, Codex invocation, classification, rendering, and schema normalization.
Cross-checked prior PR review comments against current head; several earlier issues are fixed, including read-only Codex sandbox, failure check conclusion, event SHA usage, merge-base patching, replacement callbacks, and non-empty checked validation.
Ran pnpm exec tsc --noEmit successfully.
Verified Node spawnSync without maxBuffer returns ENOBUFS on a 2 MiB stdout payload.

Residual Risk

A dynamic tsx repro for normalizeReview was blocked by the read-only sandbox denying tsx IPC pipe creation, but the code path is directly visible in schema.ts and review.ts.

github-actions · 2026-06-26T11:25:50Z

+    checks: [],
+  };
+
+  const bundle: Bundle = {


P2 (bug): Use the merge base as the bundle base SHA

GitHub mode computes the patch from mergeBase..headSha but still sends PR_BASE_SHA as bundle.baseSha. When the base branch has advanced, the prompt gives Codex a base/head pair that does not reproduce the supplied patch; in this PR, merge-base is d9ffd69... while baseSha is 474877f..., and the two ranges produce different file sets. Any reviewer command using bundle.baseSha inspects the wrong diff.

Suggested fix: Set bundle.baseSha to mergeBase, or add a separate baseRefSha field and keep baseSha as the actual diff base used for patch and changedFiles.

github-actions · 2026-06-26T11:25:50Z

+  };
+}
+
+export function normalizeReview(raw: any): RawReview {


P2 (validation): Reject malformed model output before deriving verdicts

normalizeReview silently converts missing findings and residual_risks to empty arrays. review.ts now requires only summary and checked to be non-empty, so a critic response like {"summary":"ok","checked":["x"]} becomes no findings and no blocking risks, then deriveVerdict returns pass even though the review payload violated the required contract.

Suggested fix: Make the model-output boundary strict: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.

github-actions · 2026-06-26T11:25:50Z

+import { renderMarkdown } from "../shared/render";
+import type { Bundle, PrMeta, ReviewResult } from "../shared/schema";
+
+function git(args: string[], cwd: string): string {


P2 (runtime): Raise git output buffers for diff collection

The git helper uses spawnSync without maxBuffer, so Node's default output cap aborts commands that emit around 1 MiB. Needlefish collects full patches with git diff; a moderately large PR will throw ENOBUFS before review. The same helper shape exists in GitHub mode, so both local and action reviews can fail on large diffs.

Suggested fix: Set a larger maxBuffer on git spawnSync, matching the 64 MiB buffer already used for gh and Codex calls.

… (large diffs)

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: f0c2f0e

changes_requested: model-output normalization can turn malformed review JSON into a passing verdict.

Findings

P2: Reject malformed review JSON before deriving verdicts

src/shared/schema.ts:114-128

Why this breaks: normalizeReview silently converts missing findings and residual_risks to empty arrays. review.ts only requires a summary and non-empty checked list, so a critic response like {"summary":"ok","checked":["x"]} reaches deriveVerdict with no findings and no blocking risks, producing pass even though the model output violated the required contract.

Suggested fix: Parse this boundary strictly: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.

Validation: Add a small unit/self-check that normalize/review rejects {"summary":"ok","checked":["x"]}; run pnpm exec tsc --noEmit

Checked

No repo AGENTS.md exists at head/base; applied the AGENTS.md policy supplied in the review context.
Read the workflow, CLI, GitHub/local adapters, Codex orchestration, schema normalization, verdict derivation, prompts, README, package, and tsconfig.
Traced review output flow: runCodex -> extractJson -> normalizeReview -> review.ts shape checks -> deriveVerdict.
Checked workflow and supply-chain surfaces, including self-hosted pull_request gating, checkout ref resolution, install step, and check-run failure path.
Verified Codex CLI supports -s/--sandbox read-only and ran pnpm exec tsc --noEmit successfully.

Residual Risk

Did not execute the full GitHub Action or spawn real Codex review calls because this review was constrained to read-only inspection.

github-actions · 2026-06-26T11:30:19Z

+  };
+}
+
+export function normalizeReview(raw: any): RawReview {


P2 (validation): Reject malformed review JSON before deriving verdicts

normalizeReview silently converts missing findings and residual_risks to empty arrays. review.ts only requires a summary and non-empty checked list, so a critic response like {"summary":"ok","checked":["x"]} reaches deriveVerdict with no findings and no blocking risks, producing pass even though the model output violated the required contract.

Suggested fix: Parse this boundary strictly: require summary, findings, checked, and residual_risks with expected types, and reject invalid finding severities/categories/line numbers instead of defaulting them.

…(missing keys)

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: 3cd57e2

changes_requested: one remaining validation bug can turn malformed blocking findings into a passing review.

Findings

P2: Reject invalid finding severity before deriving verdicts

src/shared/schema.ts:97-99

Why this breaks: normalizeFinding silently converts any unrecognized severity to P3. Because deriveVerdict only blocks on P0/P1/P2, a malformed model output such as severity "P1 " for an otherwise valid blocking finding is normalized to non-blocking P3 and the review can return pass.

Suggested fix: Make finding parsing fail closed: if severity is not exactly one of P0, P1, P2, or P3, throw a malformed-output error instead of defaulting to P3. Apply the same strictness to other required finding fields.

Validation: node --import tsx -e "import { normalizeReview } from './src/shared/schema.ts'; import { deriveVerdict } from './src/core/verdict.ts'; const r=normalizeReview({summary:'s',checked:['c'],residual_risks:[],findings:[{severity:'P1 ',title:'t',category:'bug',file:'src/x.ts',lineStart:1,lineEnd:1,confidence:1,whyItBreaks:'w',suggestedFix:'f',validation:'v'}]}); console.log(r.findings[0].severity); console.log(deriveVerdict(r.findings,r.residual_risks));"

Checked

Read changed surface map: workflow, dependency/config, docs, CLI, GitHub adapter, local adapter, Codex runner, review orchestration, verdict, renderer, schema
Verified no repo-local AGENTS.md exists on disk; applied the supplied review policy
Traced GitHub workflow refs through runGithub patch/base/head handling
Traced Codex output through extractJson, normalizeReview, normalizeFinding, and deriveVerdict
Confirmed prior PR metadata issues for merge-base, failure check conclusion, prompt replacement, git maxBuffer, and read-only Codex sandbox are addressed in this head
Ran pnpm exec tsc --noEmit successfully
Reproduced invalid severity normalization to P3 and resulting pass verdict with node --import tsx

Residual Risk

Live GitHub review/check posting and real Codex execution were not exercised; this review used read-only local inspection and targeted command reproduction.

github-actions · 2026-06-26T11:35:18Z

+
+export function normalizeFinding(raw: Partial<Finding>): Finding {
+  return {
+    severity: (SEVERITIES.includes(raw.severity as Severity)


P2 (validation): Reject invalid finding severity before deriving verdicts

normalizeFinding silently converts any unrecognized severity to P3. Because deriveVerdict only blocks on P0/P1/P2, a malformed model output such as severity "P1 " for an otherwise valid blocking finding is normalized to non-blocking P3 and the review can return pass.

Suggested fix: Make finding parsing fail closed: if severity is not exactly one of P0, P1, P2, or P3, throw a malformed-output error instead of defaulting to P3. Apply the same strictness to other required finding fields.

…ed findings can't false-pass

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: 505f89d

One actionable workflow bug: manual dispatch resolves PR refs before any checkout without telling gh which repository to query.

Findings

P2: Pass the repository when resolving manual PR refs

.github/workflows/review.yml:38-39

Why this breaks: On workflow_dispatch this step runs before actions/checkout, so there is no local git repository for gh pr view to infer the repo from. The workflow sets GITHUB_REPOSITORY, but gh documents -R/--repo or GH_REPO for explicit repo selection, so manual reviews can fail in Resolve PR refs before reaching checkout.

Suggested fix: Call gh pr view "$PR_NUM" -R "$GITHUB_REPOSITORY" ... for both head and base, or set GH_REPO: ${{ github.repository }} in the step environment.

Validation: Trigger workflow_dispatch with pr_number and confirm Resolve PR refs completes before checkout.

Checked

Read the supplied AGENTS.md review policy and found no repo-local AGENTS.md files with find.
Inspected the changed workflow, CLI, GitHub/local adapters, Codex runner, prompt orchestration, renderer, classifier, and schema normalization.
Ran pnpm check; TypeScript completed with tsc --noEmit.
Checked gh pr view --help; explicit repository selection is -R/--repo or GH_REPO, not GITHUB_REPOSITORY.

github-actions · 2026-06-26T11:38:51Z

+            echo "head=$EVENT_HEAD" >> "$GITHUB_OUTPUT"
+            echo "base=$EVENT_BASE" >> "$GITHUB_OUTPUT"
+          else
+            echo "head=$(gh pr view "$PR_NUM" --json headRefOid -q .headRefOid)" >> "$GITHUB_OUTPUT"


P2 (runtime): Pass the repository when resolving manual PR refs

On workflow_dispatch this step runs before actions/checkout, so there is no local git repository for gh pr view to infer the repo from. The workflow sets GITHUB_REPOSITORY, but gh documents -R/--repo or GH_REPO for explicit repo selection, so manual reviews can fail in Resolve PR refs before reaching checkout.

Suggested fix: Call gh pr view "$PR_NUM" -R "$GITHUB_REPOSITORY" ... for both head and base, or set GH_REPO: ${{ github.repository }} in the step environment.

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: fd23bb5

changes_requested: malformed model findings can be coerced into blocking PR reviews.

Findings

P2: Reject malformed findings before deriving a verdict

src/shared/schema.ts:95-110

Why this breaks: normalizeFinding coerces missing or invalid finding fields into defaults, including a blocking P1 severity plus empty file and line 0. If Codex or the critic returns a malformed finding object, review.ts can derive changes_requested and render an unanchored '(no file):0' finding instead of failing the review-output boundary.

Suggested fix: Validate required finding fields strictly: severity/category must be exact allowed values, file/title/whyItBreaks/suggestedFix must be non-empty strings, lineStart/lineEnd must be positive finite numbers, and invalid findings should throw before deriveVerdict runs.

Validation: Add a self-check that normalizeReview rejects a finding with missing file/line or invalid severity, then run pnpm exec tsc --noEmit.

Checked

Read supplied AGENTS.md review policy; no repo AGENTS.md exists in the workspace.
Mapped changed surfaces: workflow, dependency/lockfile, prompts, CLI/adapters/core/shared source, docs.
Inspected GitHub workflow ref resolution, checkout, permissions, install, and review/check posting paths.
Traced review(), normalizeReview(), normalizeFinding(), deriveVerdict(), renderMarkdown(), runGithub(), and runLocal call paths.
Ran pnpm exec tsc --noEmit successfully.
Verified current Codex CLI supports -s/--sandbox read-only.

Residual Risk

CLI smoke tests through tsx could not run in this managed read-only sandbox because tsx failed to create its IPC pipe with EPERM; this is an environment restriction, not evidence of a repo defect.

github-actions · 2026-06-26T11:44:09Z

+  "validation",
+];
+
+export function normalizeFinding(raw: Partial<Finding>): Finding {


P2 (validation): Reject malformed findings before deriving a verdict

normalizeFinding coerces missing or invalid finding fields into defaults, including a blocking P1 severity plus empty file and line 0. If Codex or the critic returns a malformed finding object, review.ts can derive changes_requested and render an unanchored '(no file):0' finding instead of failing the review-output boundary.

Suggested fix: Validate required finding fields strictly: severity/category must be exact allowed values, file/title/whyItBreaks/suggestedFix must be non-empty strings, lineStart/lineEnd must be positive finite numbers, and invalid findings should throw before deriveVerdict runs.

…e so critic can prune

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: 8cee022

changes_requested: malformed first-pass findings can be silently dropped before the critic, allowing a false pass.

Findings

P2: Fail on malformed candidate findings

src/core/review.ts:29

Why this breaks: The first Codex pass is parsed with strict=false, so normalizeReview filters out malformed findings instead of failing. A candidate output with a real P1/P2 finding that has one malformed required field is reduced to findings: [], then the critic can only prune that empty list and deriveVerdict returns pass.

Suggested fix: Parse the candidate review strictly, or make lenient parsing fail if any finding was malformed instead of silently filtering it out.

Validation: node --import tsx -e 'import { normalizeReview } from "./src/shared/schema.ts"; const raw={summary:"ok",checked:["x"],residual_risks:[],findings:[{severity:"P1",category:"bug",file:"",lineStart:10,lineEnd:10,title:"real",whyItBreaks:"breaks",suggestedFix:"fix"}]}; console.log(normalizeReview(raw,false).findings.length)'

Checked

Read the provided AGENTS.md policy; no AGENTS.md file exists in this repo checkout.
Reviewed workflow, CLI, GitHub/local adapters, Codex orchestration, schema normalization, rendering, prompts, package and lockfile surfaces.
Ran pnpm check successfully: tsc --noEmit exited 0.
Reproduced normalizeReview(raw,false) dropping a malformed P1 finding and deriveVerdict([], []) returning pass.

Residual Risk

Network access to GitHub API was unavailable, so live PR metadata endpoint behavior could not be inspected; local code paths and gh help output were checked instead.

github-actions · 2026-06-26T11:49:30Z

+  );
+
+  const rawCandidate = runCodex(reviewPrompt, { repoPath: bundle.repoPath });
+  const candidate = normalizeReview(extractJson(rawCandidate), false);


P2 (validation): Fail on malformed candidate findings

The first Codex pass is parsed with strict=false, so normalizeReview filters out malformed findings instead of failing. A candidate output with a real P1/P2 finding that has one malformed required field is reduced to findings: [], then the critic can only prune that empty list and deriveVerdict returns pass.

Suggested fix: Parse the candidate review strictly, or make lenient parsing fail if any finding was malformed instead of silently filtering it out.

…he review

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: 49350eb

One actionable defect: the GitHub review fallback can attach stale findings to the live PR head.

Findings

P2: Pin fallback reviews to the reviewed commit

src/adapters/github.ts:101-110

Why this breaks: The primary review payload pins commit_id to the SHA that was reviewed, but the fallback path omits commit_id. If inline comments are rejected and a newer commit has been pushed while the old run is still finishing, this fallback can post REQUEST_CHANGES against the current PR head using findings generated from the old checkout.

Suggested fix: Include commit_id: headSha in the fallback request, or retry the same JSON payload without comments while preserving commit_id.

Validation: Force the inline-comment path to fail, push a newer commit before fallback posts, then verify the created review's commit_id remains PR_HEAD_SHA.

Checked

Applied the supplied AGENTS.md review policy; no repository AGENTS.md file was present.
Mapped changed surfaces, with workflow and GitHub adapter treated as high-risk.
Read the GitHub workflow, CLI, local adapter, GitHub adapter, Codex runner, review orchestration, verdict, rendering, classification, schema normalization, package scripts, and tsconfig.
Traced runGithub through ref resolution, diff construction, postReview, postCheck, and the failure path.
Checked prior PR review comments in the bundle against current head changes.
Ran pnpm exec tsc --noEmit successfully.
Ran git diff --check successfully.

Residual Risk

Network was restricted, so live GitHub API behavior was not exercised; review conclusions are based on local code, GitHub CLI help, and the supplied diff.

github-actions · 2026-06-26T11:54:12Z

+      payload
+    );
+  } catch {
+    gh(


P2 (runtime): Pin fallback reviews to the reviewed commit

The primary review payload pins commit_id to the SHA that was reviewed, but the fallback path omits commit_id. If inline comments are rejected and a newer commit has been pushed while the old run is still finishing, this fallback can post REQUEST_CHANGES against the current PR head using findings generated from the old checkout.

Suggested fix: Include commit_id: headSha in the fallback request, or retry the same JSON payload without comments while preserving commit_id.

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: f139346

changes_requested: two concrete security defects let PR-controlled code or the model subprocess bypass the deterministic review gate.

Findings

P1: Run the reviewer from trusted code

.github/workflows/review.yml:59

Why this breaks: The workflow checks out the PR head, installs it, then executes src/cli.ts from that same checkout with write-scoped PR/check permissions. A PR that changes Needlefish itself can replace the review logic and post a passing review/check, so the merge gate is controlled by the candidate code it is supposed to evaluate.

Suggested fix: Use a trusted checkout of Needlefish, such as the base branch or a pinned release, as the reviewer tool, and pass the PR checkout only as the target repo via --repo.

Validation: Open a test PR that changes src/cli.ts to short-circuit success; the workflow should still run the trusted reviewer and not the modified PR copy.

P1: Scrub write tokens from the Codex subprocess

src/shared/codex.ts:23-29

Why this breaks: spawnSync is called without an env override, so codex exec inherits the parent environment. In GitHub mode the parent step provides GH_TOKEN with pull-requests:write and checks:write, which lets the model-driven subprocess run gh api and mutate reviews or checks directly, bypassing Needlefish's parser and deterministic verdict.

Suggested fix: Pass an explicit sanitized env to spawnSync that removes GH_TOKEN, GITHUB_TOKEN, and other GitHub write-token variables from the Codex child while keeping only the auth/config needed to run Codex.

Validation: Run runCodex with a CODEX_BIN stub that prints process.env.GH_TOKEN; it should be absent even when the parent process has GH_TOKEN set.

Checked

Applied the provided AGENTS.md review policy; rg found no repo-local AGENTS.md file.
Inspected changed workflow, GitHub adapter, Codex subprocess wrapper, schema normalization, verdict derivation, and CLI entrypoint.
Ran pnpm check; TypeScript completed successfully.
Verified Node child processes inherit environment variables by default with a small spawnSync check.
Searched for GH_TOKEN, GITHUB_TOKEN, actions/checkout, pnpm install, codex exec, and spawnSync usage across the changed files.

Residual Risk

Did not execute the live GitHub Action or post to GitHub APIs; verdict is based on static workflow and subprocess inspection plus local TypeScript verification.

github-actions · 2026-06-26T11:58:54Z

+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          PR_BASE_SHA: ${{ steps.refs.outputs.base }}
+          PR_HEAD_SHA: ${{ steps.refs.outputs.head }}
+        run: ./node_modules/.bin/tsx src/cli.ts --github --pr ${{ github.event.inputs.pr_number || github.event.pull_request.number }}


P1 (security): Run the reviewer from trusted code

The workflow checks out the PR head, installs it, then executes src/cli.ts from that same checkout with write-scoped PR/check permissions. A PR that changes Needlefish itself can replace the review logic and post a passing review/check, so the merge gate is controlled by the candidate code it is supposed to evaluate.

Suggested fix: Use a trusted checkout of Needlefish, such as the base branch or a pinned release, as the reviewer tool, and pass the PR checkout only as the target repo via --repo.

github-actions · 2026-06-26T11:58:54Z

+  const args = ["exec", "--color", "never", "-s", "read-only", "--output-last-message", lastMsg];
+  if (model) args.push("-m", model);
+
+  const res = spawnSync(bin, args, {


P1 (security): Scrub write tokens from the Codex subprocess

spawnSync is called without an env override, so codex exec inherits the parent environment. In GitHub mode the parent step provides GH_TOKEN with pull-requests:write and checks:write, which lets the model-driven subprocess run gh api and mutate reviews or checks directly, bypassing Needlefish's parser and deterministic verdict.

Suggested fix: Pass an explicit sanitized env to spawnSync that removes GH_TOKEN, GITHUB_TOKEN, and other GitHub write-token variables from the Codex child while keeping only the auth/config needed to run Codex.

…pass the gate

github-actions

Needlefish PR Review

Verdict: ⛔ changes_requested
Base: d9ffd69 → Head: 68e5e34

changes_requested: one security defect lets PR-controlled code operate the review merge gate.

Findings

P1: Run the reviewer from trusted code

.github/workflows/review.yml:51-59

Why this breaks: The job checks out the PR head, installs that checkout, then runs src/cli.ts from the same PR with pull-requests: write and checks: write. A PR can change package lifecycle behavior or the Needlefish CLI itself to post a passing review/check, so the candidate code controls the merge gate that is supposed to review it.

Suggested fix: Run Needlefish from a trusted checkout such as the base branch or a pinned release, and check out the PR only as the target repository passed via --repo.

Validation: Open a test PR that changes src/cli.ts to skip review and post success; verify the current workflow executes that PR-controlled file, then verify the fixed workflow executes the trusted checkout instead.

Checked

Read the supplied AGENTS.md policy; no AGENTS.md exists in the repo or base commit.
Inspected changed workflow, GitHub adapter, local adapter, CLI parser, Codex invocation, schema normalization, verdict derivation, prompts, package metadata, and README contract.
Verified previous review issues for read-only Codex sandboxing, GitHub token scrubbing, failure check conclusion, merge-base diffing, prompt replacement callbacks, strict output parsing, and git maxBuffer are fixed in head.
pnpm exec tsc --noEmit exited 0.
git diff --check d9ffd69b3de124964aa76d5179c6244e513fc34d...68e5e34ca1f099547b00db8c40f3fe12cb750219 exited 0.
codex exec --help confirms -s read-only is the sandbox flag used by src/shared/codex.ts.

Residual Risk

Could not run the tsx CLI smoke check in this read-only sandbox because tsx failed to create its IPC pipe with EPERM; typecheck covered compile-time behavior.
No live GitHub Actions run or GitHub API posting was executed; workflow behavior was reviewed statically from the checked-out files.

github-actions · 2026-06-26T12:03:56Z

+
+      - name: Install
+        shell: zsh -l {0}
+        run: corepack enable && pnpm install --frozen-lockfile


P1 (security): Run the reviewer from trusted code

The job checks out the PR head, installs that checkout, then runs src/cli.ts from the same PR with pull-requests: write and checks: write. A PR can change package lifecycle behavior or the Needlefish CLI itself to post a passing review/check, so the candidate code controls the merge gate that is supposed to review it.

Suggested fix: Run Needlefish from a trusted checkout such as the base branch or a pinned release, and check out the PR only as the target repository passed via --repo.

frankekn · 2026-06-26T12:11:50Z

Shipped to main as v0.1 (commit 094a271) with the trusted-tool self-review workflow. Dogfood complete: the tool found ~16 real defects in itself across the run, all fixed. The implement branch is kept for the fix history.

t added 3 commits June 26, 2026 15:30

implement needlefish: core, adapters, prompts, workflow

f837be4

fix: gh for pr metadata; allowBuilds esbuild; self-review workflow

a26882a

ci: run steps under zsh -l (sync from main)

612df2d

github-actions Bot requested changes Jun 26, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 26, 2026

View reviewed changes

fix: force read-only codex sandbox; gate action to non-fork PRs

5db438c

github-actions Bot requested changes Jun 26, 2026

View reviewed changes

t added 2 commits June 26, 2026 16:13

fix: post failure (not neutral) on review error; review checked-out H…

8dfce64

…EAD, not live pr head

fix: use event SHAs (PR_BASE_SHA/PR_HEAD_SHA) to avoid live-API race …

b89fd25

…and locality gap

frankekn commented Jun 26, 2026

View reviewed changes

t added 2 commits June 26, 2026 16:40

fix: post COMMENT (not APPROVE) on pass; GITHUB_TOKEN can't approve PRs

1b97c79

trigger re-review

95de3c7

frankekn closed this Jun 26, 2026

frankekn reopened this Jun 26, 2026

t added 4 commits June 26, 2026 17:00

ci: add workflow_dispatch to bypass pull_request webhook throttle

a7ed214

ci: fetch base branch so merge-base fallback works (esp. for workflow…

89432cd

…_dispatch)

fix: use || not ?? for PR_*_SHA so empty-string falls back to git

f32d93c

ci: fetch base branch at full depth with explicit refspec

7df7bae

github-actions Bot requested changes Jun 26, 2026

View reviewed changes

t added 2 commits June 26, 2026 17:42

fix: use function replacement so $-sequences in diff/bundle aren't ma…

b5a11d3

…ngled

ci: resolve PR head/base for both events so dispatch reviews the real PR

3f78261

github-actions Bot requested changes Jun 26, 2026

View reviewed changes

fix: review merge-base..head (not base-tip..head); reject empty model…

908ef40

… output as error

github-actions Bot requested changes Jun 26, 2026

View reviewed changes

fix: require non-empty summary (reject malformed output); drop unused…

a79deb4

… base_ref field

github-actions Bot requested changes Jun 26, 2026

View reviewed changes

fix: require non-empty checked list too, so a bare summary can't yiel…

cab9f99

…d a pass

github-actions Bot requested changes Jun 26, 2026

View reviewed changes

fix: bundle.baseSha=mergeBase (consistency); maxBuffer on git helpers…

f0c2f0e

… (large diffs)

github-actions Bot requested changes Jun 26, 2026

View reviewed changes

fix: strict normalizeReview rejects malformed/truncated model output …

3cd57e2

…(missing keys)

github-actions Bot requested changes Jun 26, 2026

View reviewed changes

fix: fail-safe severity (trim+upper, invalid->P1 blocking) so malform…

505f89d

…ed findings can't false-pass

github-actions Bot requested changes Jun 26, 2026

View reviewed changes

ci: pass -R repo to gh pr view in resolve step (runs before checkout)

fd23bb5

github-actions Bot requested changes Jun 26, 2026

View reviewed changes

fix: strict finding validation (throw on malformed); lenient candidat…

8cee022

…e so critic can prune

github-actions Bot requested changes Jun 26, 2026

View reviewed changes

fix: parse candidate strictly too; any malformed model output fails t…

49350eb

…he review

github-actions Bot requested changes Jun 26, 2026

View reviewed changes

fix: pin fallback review to commit_id=headSha (no stale-head posts)

f139346

github-actions Bot requested changes Jun 26, 2026

View reviewed changes

fix: strip GitHub write tokens from codex child env so model can't by…

68e5e34

…pass the gate

github-actions Bot requested changes Jun 26, 2026

View reviewed changes

frankekn closed this Jun 26, 2026

Conversation

frankekn commented Jun 26, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Needlefish PR Review

Findings

P1: Force Codex reviews into read-only sandbox mode

P1: Do not run pull_request code on a persistent self-hosted runner

Checked

Residual Risk

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Needlefish PR Review

Findings

P1: Fail the Needlefish check when review execution fails

P1: Review the checked-out PR SHA, not the live PR head

Checked

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

frankekn left a comment

Choose a reason for hiding this comment

Uh oh!

frankekn left a comment

Choose a reason for hiding this comment

Uh oh!

frankekn left a comment

Choose a reason for hiding this comment

Uh oh!

frankekn left a comment

Choose a reason for hiding this comment

Uh oh!

frankekn left a comment

Choose a reason for hiding this comment

Needlefish PR Review

Checked

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Needlefish PR Review

Findings

P2: Use callback replacements for prompt payloads

P2: Check out the requested PR for manual dispatch

Checked

Residual Risk

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!