Skip to content

Native binary CLI (codex-style): hermetic runtimes, whisper.cpp, bundled studio + MCP#16

Merged
nmbrthirteen merged 41 commits into
mainfrom
feat/native-cli-studio
Jun 16, 2026
Merged

Native binary CLI (codex-style): hermetic runtimes, whisper.cpp, bundled studio + MCP#16
nmbrthirteen merged 41 commits into
mainfrom
feat/native-cli-studio

Conversation

@nmbrthirteen

@nmbrthirteen nmbrthirteen commented Jun 15, 2026

Copy link
Copy Markdown
Owner

Turns podcli from an npm/Python hybrid into a self-contained native binary CLI (modeled on openai/codex). Big one — installs as one binary, runs from anywhere, provisions everything it needs.

What's in it

  • Go launchersetup, doctor, update (self-update + atomic swap), config, mcp
  • Hermetic runtimes — provisions CPython (PyTorch-free), ffmpeg/ffprobe, whisper.cpp, Node, and per-platform Remotion bundle into a managed dir
  • Embedded Python backend — runs from any directory; data/.env resolve project-local
  • whisper.cpp transcription (replaces openai-whisper/torch) with energy-snap timing + a parity harness
  • Bundled Web UI studio + MCP server — served by hermetic Node; MCP auto-registers with Claude Code
  • Real rendering — captioned clips + thumbnails via provisioned Remotion + headless Chrome
  • PodStack commands (auto, generate-titles, …) forward to Claude/Codex
  • Distribution — npm/bun wrapper + GitHub release workflow (5 binaries, whisper, studio, remotion, npm publish)

Notes

  • Three audits over the migration; parity intact, data project-local.
  • Larger install (~600MB) — the cost of real Remotion/Chrome output.

Before merge / first release

  • Add NPM_TOKEN repo secret
  • Cut v2.0.0 to exercise the full CI → release → provision → self-update chain (per-platform assets are unverified until the first tagged run)

Summary by CodeRabbit

Release Notes

  • New Features
    • Native CLI launcher with on-demand provisioning, self-update, and npm wrapper installation.
    • Engine selection adds whisper.cpp/whisper.cpp-style transcription with improved word timing behavior.
    • AI “find a specific moment” workflow, plus new clip preview/download endpoints and server support.
    • Provisioning for Studio/MCP and bundled web UI server assets.
  • Bug Fixes
    • Better transcription error handling, clearer speaker-setup status, and more robust tool/path overrides.
    • Engine-aware transcript caching and additional cache/output directory overrides.
  • Documentation
    • Updated install/usage and release process guides.
  • Tests
    • Added/expanded coverage for moment search, whisper.cpp timing snapping, engine routing, config migration, and security.

…ration

Two-layer correctness model around the existing transcript JSON contract:
- Layer 1 (committed, CI-ready): synthetic neutral transcript + caption goldens
  for all 4 styles, word-text normalization + zero-duration-floor assertions.
- Layer 2 (local/gitignored): capture_baseline.py freezes current openai-whisper
  output; compare.py gates a candidate engine on WER + word-timestamp drift.

Also adds plans/native-cli.md (full migration plan).
…ract

Shells to whisper-cli, merges subword tokens into words on the leading-space
boundary, and applies the same word-text normalization (strip) the caption
pipeline and apply_corrections() expect. Emits the same {words, segments,
duration, language} dict as the openai-whisper path, so it is a drop-in engine.

Validated against an openai-whisper baseline via tests/parity: matching text
(WER 0.059) and segmentation; benign bulk word-timing drift (~0.18s median).
Known gap: base model can stretch trailing words across silence; --vad is the fix.
cli/ Go module that replaces the bash podcli launcher and install.cmd:
- paths: per-OS global managed dir (runtime/models/bin/config)
- engine: resolve interpreter + backend/cli.py (hermetic > dev venv > system),
  exec with UTF-8 + ffmpeg env, propagate exit code
- main: version/doctor commands, update/setup stubs, pass-through routing

Drives the existing Python engine end-to-end (e.g. podcli clips list). Hermetic
provisioning and self-update land in later phases.
transcribe_file() now dispatches on PODCLI_ENGINE: the default whisper-py path is
untouched; whispercpp routes to the whisper.cpp adapter, resolving whisper-cli
and the ggml model from the managed dir (env-overridable). Adds a first-class
--engine flag to the process command and optional VAD support to the adapter
(off by default). This is the dual-engine escape hatch from the migration plan:
users can pick the native PyTorch-free path or fall back instantly.

Verified: --engine routes through the Go launcher; engine dispatch returns the
contract dict; Layer-1 parity (6) and unit tests (27) green.
podcli setup downloads pinned ggml models into the managed dir with resumable
(HTTP Range) transfers, checksum verification, and atomic writes; first use of
--engine whispercpp auto-provisions the base model. doctor reports model state.
The Python whispercpp path resolves the model from the same managed dir, so no
env wiring is needed once provisioned.

Also trims narrating comments from the launcher down to WHY-only.
proc.run() resolves bare ffmpeg/ffprobe invocations to PODCLI_FFMPEG /
PODCLI_FFPROBE when set, so the whole backend uses a provisioned binary with one
central change instead of touching ~15 call sites. encoder's fingerprint helper
honors the same override. The Go launcher resolves ffprobe alongside ffmpeg from
the runtime dir and exports both env vars; doctor reports ffprobe state.

Verified: override is used when set, falls back to PATH otherwise, and leaves
non-ffmpeg commands untouched (proc + encoder unit tests green).
podcli setup now fetches static ffmpeg builds per platform (evermeet on macOS,
johnvansickle tar.xz on Linux, BtbN zip on Windows), extracting the binaries
into the managed runtime dir. Archives are downloaded with the resumable
transfer and unpacked by sniffing the format (zip via stdlib, tar.xz via tar).
ffmpeg failure is non-fatal — the backend falls back to PATH.

The launcher exports the provisioned ffmpeg/ffprobe so the backend uses them.
Static sources aren't checksum-pinned yet (upstream "latest" URLs); they get our
own pinned builds once podcli hosts releases.

Verified on macOS: both binaries provision, run (8.1.1), and the backend's
proc.run uses the hermetic ffprobe; setup is idempotent.
podcli setup now provisions python-build-standalone (resolved via the GitHub
latest-release API so it tracks upstream), extracting the install_only tarball
into the managed runtime dir, then pip-installs requirements-runtime.txt — the
backend deps minus openai-whisper/torch. That is the ~2GB the native install
saves; the shipped binary transcribes with whisper.cpp only.

Because the hermetic interpreter has no openai-whisper, the launcher defaults
process/studio to whisper.cpp when it resolves the hermetic Python (whisper-py
stays available on dev/source installs). A guarded integration test covers asset
resolution, tar.gz extraction (incl. the symlink-before-parent-dir ordering),
and that the interpreter runs.

Verified on macOS: Python 3.12.13 provisions and runs; cv2/numpy/Pillow and the
rest of the slim set import cleanly.
The launcher now checks GitHub Releases for a newer version: a fast, non-fatal
on-launch notice before engine commands, and `podcli update` for an explicit
check. With no binary release published yet, update points the user at the
npm/bun reinstall (the designed fallback); the in-place binary swap lands once
releases publish per-platform assets.

Adds a config package persisting the off-switch: `podcli config set update.auto
off` (or PODCLI_NO_UPDATE=1) silences all checks. `config get/set` are owned by
the launcher; other `config` subcommands (status/export/import/use) still pass
through to the Python backend.

Verified: off-switch round-trips to config.json, update degrades gracefully with
no release, unknown keys error, and Python config passthrough works.
A thin Node package (name: podcli) whose postinstall fetches the platform's
native binary into the managed dir and whose bin shim execs it, passing args and
exit codes through. If the pre-fetch is blocked, the shim fetches lazily on first
run, so a failed postinstall never bricks the install. PODCLI_BINARY_SRC copies a
local binary instead of downloading (testing + the release build).

This is the npm i -g podcli / bun add -g podcli front door; it pairs with the
self-update fallback that points users back here.

Verified locally: install, shim exec with arg + exit-code passthrough, and lazy
first-run fetch all work against a local binary.
On a v* tag, cross-compiles the Go launcher for darwin/linux (amd64+arm64) and
windows/amd64 with the version stamped via ldflags, publishes them as release
assets under the names the npm wrapper and self-update expect, then publishes the
npm package pinned to the tag.

whisper.cpp per-platform builds are left as a marked TODO (they need native
runners); until then transcription uses a PATH whisper-cli.

Verified locally: the build command cross-compiles (e.g. linux/arm64) and stamps
main.Version; the workflow YAML parses.
podcli update now downloads the release binary for this platform and swaps the
managed binary (~/.podcli/bin/podcli) in place — the file both the npm shim and
direct installs exec, so it updates regardless of install method. Unix uses an
atomic rename; Windows moves the running .exe aside first. If the swap fails it
falls back to the npm/bun reinstall message.

Unit tests cover the swap and the version comparison.
The launcher resolves whisper-cli from the managed runtime dir and falls back to
PATH, exporting PODCLI_WHISPER_CLI so the backend uses the hermetic binary.
EnsureWhisperCpp fetches whisper-cli-<os>-<arch> from the latest release (PATH
fallback until one is published); doctor reports its state.

The release workflow gains a per-platform whisper.cpp matrix (static, Metal
embedded on macOS) producing the matching assets. That matrix is unverified
until the first tagged run.

Verified locally: resolution prefers a runtime whisper-cli over PATH; YAML parses;
Go build + tests green.
whisper.cpp can stretch the final phrase across trailing silence, desyncing
captions. After transcription the adapter clamps word timings to the voiced span
(RMS energy threshold) and re-flows them, so words stranded in silence snap back
near the last voiced frame while words over speech are untouched. On by default;
PODCLI_WHISPERCPP_NO_SNAP=1 disables it. Also fixes a leaked temp dir.

Validated on synthetic true-silence audio (voiced word untouched; a word
stranded at 1.55s pulled back to 1.15s). On a clip with an outro the trailing
audio is genuinely voiced, so the snap correctly no-ops there.
First-run dep install ran pip with -q, so the multi-second download
showed nothing and read as frozen. Stream pip's progress bars (drop -q,
force --progress-bar=on) and unbuffer output so it surfaces on Windows
pipes too.
Sub-megabyte models (the 864 KB Silero VAD) floored to "0 MB" and
looked like a failed download.
An installed podcli had no backend off the source tree, so it only ran
inside the repo. Embed backend/ via go:embed (synced into files/ at build
time by `go generate`, gitignored) and extract it to runtime/backend on
setup. The CLI now resolves its backend from the managed dir and runs from
any working directory, like a real native binary.
The Go launcher showed its own help on a bare invocation and dropped the
PodStack workflow commands the old bash wrapper forwarded to Claude/Codex.
Bare `podcli` now opens the backend's branded interactive menu, and
auto/generate-titles/… launch the AI agent with the matching slash command
(embedded and installed into the project on demand), honoring
--claude/--codex/--ai and PODCLI_AI with a Codex fallback.
The studio is a Node/TypeScript app not shipped with the native binary, so
launching it ran npm against the managed runtime dir and crashed with ENOENT.
Detect the missing package.json (and missing npm) and print how to run the
studio from a source checkout instead.
Open Web UI ran npm against the managed runtime and crashed on a native
install. Provision a hermetic Node and a prebuilt studio bundle (esbuild'd
server + SPA, built by scripts/build-studio.sh and shipped as a release
asset), then launch them from the menu with rendering delegated to the
hermetic Python + ffmpeg. The studio resolves its backend via a new
PODCLI_BACKEND override in paths.ts; doctor and setup report node/studio;
CI builds and uploads studio-bundle.tar.gz. Falls back to system Node, then
to the npm dev flow from a source checkout.
With the backend relocated to the global runtime dir, the Python derived
.podcli/, data/, and .env from the backend's location, so a native install
stranded each project's episodes, presets, and secrets in a global runtime
folder. The launcher now resolves the project root from the working dir
(nearest ancestor with .podcli/.podcli-home, else cwd) and passes
PODCLI_HOME/PODCLI_DATA/PODCLI_ENV_FILE; explicit env still wins. Existing
projects work unchanged — just run podcli in the project dir.
Four parity fixes vs the source install: --engine whisper-py now fails with
a clear message instead of a raw ModuleNotFoundError; info/menu report
speakers as unavailable when pyannote/torch aren't present rather than
'configured' on HF_TOKEN alone; the launcher filters the objc/FP16/warnings
stderr noise the old bash wrapper stripped; and the unused 10MB res10
caffemodel is dropped from the embedded backend (YuNet onnx is the live one).
The mcp__podcli__* tools (the entire Claude/Codex integration the PodStack
slash commands depend on) lived only in the TypeScript dist/index.js and were
absent from the native install. Bundle the MCP stdio server alongside the
studio (esbuild -> mcp-server.mjs), add a `podcli mcp` command that execs it
under hermetic Node with the backend/python/ffmpeg + project env wired, and a
`podcli mcp install` (run automatically by setup) that registers it with
Claude Code via `claude mcp add podcli -- <self> mcp`. doctor reports it.
Captions and thumbnails render through Remotion (a headless-Chromium React
pipeline) that the native install didn't ship, so both failed by default off
a source checkout. Provision a per-platform Remotion bundle (remotion/ + a
production node_modules with the native @rspack/compositor binaries, built by
scripts/build-remotion.sh and uploaded per-platform in CI) into the runtime
dir, point clip_generator + thumbnail_html at the hermetic Node, cache the
project-independent composition bundle once in the managed dir, and pre-warm
the bundle + Chrome at setup. Verified end-to-end: a real captioned clip and
a thumbnail both render under the wired path.

Also fixes re-audit regressions: whisper-py engine fails with a clean message
(spinner stopped, no traceback); ProjectDir only pins project-local data when
a .podcli/.podcli-home marker exists, so read-only commands don't scatter
.podcli into arbitrary dirs; stderr noise filter matches warnings.warn lines
precisely instead of swallowing tracebacks; setup skips MCP re-registration
when already pointed at this binary.
- Remotion bundle build failed on a clean install: build-remotion.sh omitted
  @fontsource/dm-sans, which the compositions import — so the prebundle could
  not resolve the font and captions/bookends broke. Added the dep.
- podcli studio was unreachable: clip_studio.py lived at repo-root scripts/ and
  was never shipped. Moved it into backend/ (so it embeds) and resolve it from
  the backend dir; it also now honors PODCLI_DATA for output.
- render-bookend.mjs hardcoded its bundle cache into the runtime dir; it now
  honors PODCLI_CACHE_DIR and shares the prewarmed bundle with render.mjs.
- New-project data no longer lands in the global runtime dir (which setup/
  self-update overwrite): ProjectDir falls back to the working dir, keeping
  data project-local. doctor now prints the resolved project dir.
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Introduces a native Go launcher, hermetic runtime provisioning, whisper.cpp transcription, moment search wiring, npm distribution, release automation, and parity validation for podcli.

Changes

Native CLI Initiative

Layer / File(s) Summary
Managed paths and config
cli/go.mod, cli/internal/paths/paths.go, cli/internal/config/config.go, backend/config/paths.py, src/config/paths.ts, backend/clip_studio.py, backend/utils/proc.py, backend/services/encoder.py, backend/services/thumbnail_html.py, backend/services/transcript_packer.py, remotion/render-bookend.mjs, backend/requirements-runtime.txt, backend/requirements.txt, .gitignore, backend/cli.py, backend/main.py, backend/config_bundle.py, backend/presets.py
Defines managed home/config paths, environment-based backend/output/tool overrides, transcript cache namespacing, launcher env wiring, and dependency updates used by the native runtime layout.
Model and binary provisioning
cli/internal/provision/provision.go, cli/internal/provision/provision_manual_test.go, cli/internal/provision/security_test.go, cli/internal/provision/redirect_test.go, install.sh, install.ps1
Implements resumable downloads, checksum verification, Whisper model/VAD provisioning, whisper-cli/ffmpeg/Python acquisition, and extraction safety checks with manual and security tests plus platform installers.
Node, Remotion, and Studio bundles
cli/internal/provision/studio.go, scripts/build-remotion.sh, scripts/build-studio.sh, backend/scripts/remotion_screenshot.cjs
Adds hermetic Node, Remotion, and Studio bundle installers plus the build scripts that package the browser runtimes and browser preflight.
Launcher, engine, embed, and PodStack
cli/internal/engine/engine.go, cli/main.go, cli/internal/backend/embed.go, cli/internal/backend/sync.sh, cli/internal/podstack/podstack.go, cli/internal/podstack/sync.sh, podcli, cli/main_test.go
Adds backend execution helpers, the native CLI dispatcher, setup and doctor flows, embedded backend extraction, PodStack routing, and the shell launcher banner.
Whisper.cpp transcription
backend/services/transcription.py, backend/services/transcription_whispercpp.py, tests/test_whispercpp_snap.py, tests/test_transcription_engine.py, tests/test_whispercpp_adapter.py, cli/main_test.go
Adds whisper.cpp transcription, engine selection, voiced-interval snapping, token-to-word conversion, and tests for engine fallback and token cleanup.
Find-moment workflow
backend/services/claude_suggest.py, backend/main.py, src/models/index.ts, src/ui/web-server.ts, src/ui/client/EpisodeWorkspace.jsx, src/ui/client/ClipDetail.tsx, src/ui/client/StudioHome.tsx
Adds AI-assisted moment search in the backend task router, web server, and client UI, plus clip-id-based preview/download routes.
npm package and release automation
npm/package.json, npm/bin/podcli.js, npm/scripts/install.js, npm/README.md, .github/workflows/release.yml
Adds the npm wrapper package and installer, plus the release workflow that builds binaries and bundles, publishes GitHub release assets, and publishes the npm package.
Parity harness and golden fixtures
tests/parity/..., tests/test_config_bundle.py, tests/test_find_moments.py, tests/test_transcript_cache_engine.py
Adds parity harness docs, baseline capture and comparison tools, caption goldens and tests, transcript-cache namespacing tests, and config migration coverage.
Native CLI design document
plans/native-cli.md
Documents the native CLI architecture, hermetic runtime layout, whisper.cpp swap seam, roadmap, risks, and next step.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • nmbrthirteen/podcli#5: Both PRs modify backend/cli.py’s web UI launch flow and related studio startup behavior.
  • nmbrthirteen/podcli#1: Both PRs adjust backend/services/clip_generator.py Remotion cache/render wiring.
  • nmbrthirteen/podcli#7: Both PRs intersect with backend/services/claude_suggest.py and the moment-search implementation path.

Poem

🐇 I hopped through the code with a whisker and grin,
A Go shell, a new home, and the runtimes walked in.
Whisper went cpp, and the moments came round,
With goldens and bundles and checksums all found.
I munch on the changes, then twiddle my nose—
podcli is nimble from ears to its toes.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/native-cli-studio

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (18)
.github/workflows/release.yml-14-16 (1)

14-16: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scope contents: write to only the job that needs it.

Line 15 grants write on repository contents workflow-wide, giving unnecessary token power to all jobs. Move write permissions to the release/publish job(s) and keep other jobs read-only.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release.yml around lines 14 - 16, The permissions block at
the workflow level grants write access to repository contents for all jobs in
the workflow, which violates the principle of least privilege. Remove the
`contents: write` permission from the workflow-level permissions block (lines
14-16) and move it to only the specific job(s) that actually need it (the
release or publish job). Keep other jobs with read-only permissions or remove
the permissions block from those jobs entirely to ensure minimal token scope.

Source: Linters/SAST tools

.github/workflows/release.yml-58-58 (1)

58-58: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace unsupported runner label macos-13.

Line 58 and Line 116 use macos-13, which is not recognized by current runner labels. Those matrix jobs will fail to schedule.

Also applies to: 116-116

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release.yml at line 58, The `macos-13` runner label used
in the workflow matrix is no longer supported and will cause job scheduling
failures. In `.github/workflows/release.yml` at line 58 and line 116, replace
the `macos-13` runner label with a currently supported macOS runner label such
as `macos-latest` or another actively maintained version. This will ensure the
matrix jobs can schedule successfully on the GitHub Actions runners.

Source: Linters/SAST tools

npm/scripts/install.js-43-79 (1)

43-79: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Add integrity verification for downloaded executables.

Lines 77-79 download a release binary and persist it without checksum/signature validation. A compromised or substituted asset would be executed on first run.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@npm/scripts/install.js` around lines 43 - 79, The download function downloads
executable files without any integrity verification, making the system
vulnerable to compromised binaries. Add checksum or signature validation to
verify the downloaded executable before it is renamed and persisted. In the
download function, after the file write completes but before the fs.renameSync
call that persists the temporary file to the final destination, calculate or
verify a checksum against a known good value (obtained from a separate checksum
file or manifest). Similarly, in the ensure function, ensure that the downloaded
binary from the GitHub release URL is validated. Only proceed with renaming the
temporary file to the final destination after successful integrity verification,
and reject with an error if the checksum does not match.
npm/scripts/install.js-27-28 (1)

27-28: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix Windows home path composition in defaultHome().

Line 27 returns LOCALAPPDATA directly, so the managed home becomes %LOCALAPPDATA% instead of %LOCALAPPDATA%/podcli. This diverges from the macOS/Linux layout and places binaries under a non-namespaced location.

Suggested patch
-  if (process.platform === 'win32') return process.env.LOCALAPPDATA || path.join(h, 'AppData', 'Local', 'podcli');
+  if (process.platform === 'win32') {
+    const base = process.env.LOCALAPPDATA || path.join(h, 'AppData', 'Local');
+    return path.join(base, 'podcli');
+  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@npm/scripts/install.js` around lines 27 - 28, The Windows platform check in
the `defaultHome()` function returns `process.env.LOCALAPPDATA` directly without
appending the 'podcli' subdirectory, whereas the fallback case joins the path
with 'AppData', 'Local', 'podcli', and the non-Windows paths always append
'podcli'. Modify the Windows platform check to use `path.join()` to append
'podcli' to `process.env.LOCALAPPDATA`, making it consistent with the fallback
behavior and other platform implementations.
.github/workflows/release.yml-29-30 (1)

29-30: ⚠️ Potential issue | 🟠 Major

Pin all actions by commit SHA and add persist-credentials: false to checkout steps.

All actions in this workflow use tag-pinning (@v*) instead of SHA-pinning, which allows tags to be reassigned and introduces supply-chain risk. Additionally, all actions/checkout steps lack persist-credentials: false, leaving the GITHUB_TOKEN in the environment for subsequent steps and increasing credential leakage risk.

Update to use commit SHAs (e.g., actions/checkout@<full-sha>) and add persist-credentials: false to each checkout action:

Example fix for checkout steps
- uses: actions/checkout@<commit-sha>
  with:
    persist-credentials: false

Applies to lines: 29, 30, 47, 67, 86, 94, 95, 103, 122, 123, 133, 142, 146, 155, 156

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release.yml around lines 29 - 30, Replace all action
version pins in the workflow from tag-based (`@v`*) to full commit SHA pins
throughout the release.yml file to eliminate supply-chain risks from tag
reassignment. Additionally, add a with clause containing persist-credentials:
false to every actions/checkout step to prevent credential leakage. This applies
to all uses statements referencing actions/checkout and actions/setup-go, as
well as any other GitHub actions used in the workflow. For each affected action,
convert the `@v`* reference to the corresponding full commit SHA and ensure
checkout steps include the persist-credentials configuration.

Source: Linters/SAST tools

backend/services/transcription_whispercpp.py-84-86 (1)

84-86: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid materializing every RMS frame index.

For long podcast WAVs, idx = np.arange(nf)[:, None] * hop + np.arange(frame)[None, :] creates an nf × frame matrix before indexing samples; a one-hour 16 kHz file can allocate gigabytes and crash during timing snap. Compute RMS in chunks or with a streaming/cumulative-sum approach instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/transcription_whispercpp.py` around lines 84 - 86, The idx
matrix creation in the RMS computation block materializes the full nf × frame
index matrix, which causes excessive memory allocation for long audio files.
Instead of creating the entire idx matrix upfront with np.arange(nf)[:, None] *
hop + np.arange(frame)[None, :], refactor the RMS computation to process frames
in chunks or use a streaming cumulative-sum approach that computes rms values
iteratively without allocating the full index matrix. This will prevent
gigabyte-scale memory spikes when processing long podcast WAVs.
backend/services/transcription.py-29-56 (1)

29-56: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Keep diarization on the whisper.cpp path.

Routing to whisper.cpp returns before the existing speaker-detection and face-analysis block, so native transcriptions always return speaker: None and no speaker_segments even when enable_diarization=True. Factor the diarization/face enrichment into a shared post-processing helper and call it for both engines.

As per coding guidelines, backend/**/*.py: “Implement speaker detection as part of the transcribe_podcast tool using Whisper transcription”.

Also applies to: 84-86

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/transcription.py` around lines 29 - 56, The
_transcribe_with_whispercpp function returns transcription results without
applying speaker detection and face analysis enrichment, causing speaker
segments to be missing even when diarization is enabled. Extract the existing
speaker-detection and face-analysis post-processing logic (currently applied to
other transcription paths) into a separate shared helper function, then call
this helper at the end of _transcribe_with_whispercpp before returning the
result, and also ensure it is called for the other transcription path at lines
84-86 to maintain consistent behavior across all transcription engines.

Source: Coding guidelines

backend/services/transcription_whispercpp.py-20-25 (1)

20-25: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Run external tools through the project process wrapper.

Both ffmpeg and whisper-cli are invoked with unbounded subprocess.run(), and the whisper call discards stderr entirely. A stalled media probe/transcode or failed whisper invocation can hang the backend or surface a useless CalledProcessError; use utils.proc.run() here so the adapter inherits mandatory timeouts and captured failure details.

Proposed direction
+from utils.proc import run as proc_run
+
 def _extract_wav(media_path: str, wav_path: str, ffmpeg: str = "ffmpeg") -> None:
-    subprocess.run(
+    proc_run(
         [ffmpeg, "-y", "-loglevel", "error", "-i", media_path,
          "-ar", "16000", "-ac", "1", wav_path],
-        check=True,
+        timeout=600,
     )
@@
-        subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
+        proc_run(cmd, timeout=3600, cwd=tmpdir)

Also applies to: 158-169

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/transcription_whispercpp.py` around lines 20 - 25, The
`_extract_wav` function uses unbounded `subprocess.run()` which lacks mandatory
timeouts and proper error handling, making the backend vulnerable to stalled
processes and losing useful error details. Replace the `subprocess.run()` call
in `_extract_wav` with `utils.proc.run()` to inherit the project's timeout and
failure capture mechanisms. The same fix must also be applied to the whisper-cli
invocation that discards stderr (mentioned in the comment as also applying to
that location), ensuring both external tool invocations inherit the proper
process wrapper behavior.
backend/services/transcription_whispercpp.py-53-58 (1)

53-58: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize markers before appending token text.

The code treats as a word boundary but appends it unchanged, so SentencePiece-style output can produce caption words like ▁hello. Convert the marker to whitespace before accumulation so strip() removes it.

Proposed fix
-        starts_word = raw.startswith(" ") or raw.startswith("▁")
+        starts_word = raw.startswith(" ") or raw.startswith("▁")
+        piece = raw.replace("▁", " ")
         if starts_word and cur_text:
             flush()
         if cur_start is None and t0 is not None:
             cur_start = t0
-        cur_text += raw
+        cur_text += piece
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/transcription_whispercpp.py` around lines 53 - 58, The code
in the token accumulation block treats the SentencePiece marker `▁` as a word
boundary indicator but appends the raw token unchanged to cur_text, causing
caption words to retain the marker (e.g., `▁hello`). Before appending the raw
token to cur_text, normalize the `▁` character to a space so that subsequent
strip() operations can properly remove it. This should be done at the `cur_text
+= raw` line by replacing any `▁` characters with spaces in the raw token before
concatenation.
backend/services/transcription_whispercpp.py-124-130 (1)

124-130: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve positive word durations after upper-bound clamping.

When a stranded trailing word clamps to hi, both s and e can become hi; the fallback e = min(s + 0.05, hi) still leaves end == start. That can produce zero-duration caption words. Shift s back when there is no room to extend e.

Proposed guard
         if s < prev_end:
             s = prev_end
         if e <= s:
             e = min(s + 0.05, hi)
+            if e <= s:
+                s = max(lo, min(prev_end, hi - 0.05))
+                e = hi
         prev_end = e
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/transcription_whispercpp.py` around lines 124 - 130, The
code currently ensures positive word duration with the fallback `e = min(s +
0.05, hi)` when `e <= s`, but this still produces zero-duration words when `s`
is already clamped to the upper bound `hi`. After the existing `if e <= s:`
block (which sets `e = min(s + 0.05, hi)`), add an additional guard that checks
if `e <= s` is still true after the fallback. If so, shift `s` backward by 0.05
(e.g., `s = max(e - 0.05, lo)`) to ensure a positive duration exists while
respecting the lower bound `lo`.
cli/internal/engine/engine.go-65-80 (1)

65-80: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prefer the provisioned backend before walking the working tree.

BackendRoot() currently executes an ancestor backend/cli.py before the managed runtime backend. For an installed binary, running podcli inside an untrusted folder containing backend/cli.py silently executes project-local Python as podcli. Keep PODCLI_BACKEND as the explicit dev override, then prefer the provisioned runtime backend, and only fall back to a source checkout when no runtime backend exists.

Proposed lookup order
 func BackendRoot() (string, bool) {
 	if b := os.Getenv("PODCLI_BACKEND"); b != "" && exists(filepath.Join(b, "cli.py")) {
 		return b, true
 	}
+	cand := filepath.Join(paths.RuntimeDir(), "backend")
+	if exists(filepath.Join(cand, "cli.py")) {
+		return cand, true
+	}
 	if dir, err := os.Getwd(); err == nil {
 		for {
 			cand := filepath.Join(dir, "backend")
 			if exists(filepath.Join(cand, "cli.py")) {
 				return cand, true
@@
 			dir = parent
 		}
 	}
-	cand := filepath.Join(paths.RuntimeDir(), "backend")
-	if exists(filepath.Join(cand, "cli.py")) {
-		return cand, true
-	}
 	return "", false
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/internal/engine/engine.go` around lines 65 - 80, The BackendRoot()
function currently searches for the backend in the wrong order, creating a
security vulnerability where a project-local backend/cli.py in an ancestor
directory takes precedence over the managed runtime backend. Reorder the
function to check the provisioned runtime backend at paths.RuntimeDir()/backend
first (before the directory tree walk), so that the managed runtime backend is
preferred over any ancestor backend/cli.py. This way, only if the runtime
backend does not exist should the function fall back to walking the directory
tree for a source checkout.
backend/utils/proc.py-21-27 (1)

21-27: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve relative override paths before existence check.

If the environment variable contains a relative path (e.g., PODCLI_FFMPEG=./bin/ffmpeg), os.path.exists(override) will check relative to the current working directory, which may differ from the directory where the variable was set. This could cause the hermetic binary to be silently skipped in favor of the system fallback.

🔧 Proposed fix to resolve relative paths
 def _resolve_tool(cmd: Sequence[str]) -> list[str]:
     if not cmd:
         return list(cmd)
     override = os.environ.get(_TOOL_ENV.get(cmd[0], ""))
-    if override and os.path.exists(override):
+    if override:
+        resolved = os.path.abspath(override)
+        if os.path.exists(resolved):
+            return [resolved, *cmd[1:]]
-        return [override, *cmd[1:]]
     return list(cmd)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/utils/proc.py` around lines 21 - 27, The `_resolve_tool` function
checks if an override path exists without first resolving relative paths to
absolute paths. When an environment variable contains a relative path like
`./bin/ffmpeg`, the `os.path.exists(override)` check evaluates relative to the
current working directory, which may differ from where the variable was set. To
fix this, convert the override path to an absolute path using
`os.path.abspath()` before calling `os.path.exists()`. This ensures that
relative paths are consistently resolved and the hermetic binary is not silently
skipped.
cli/internal/provision/provision.go-542-555 (1)

542-555: ⚠️ Potential issue | 🟠 Major

Harden tar link extraction and fail on hard-link creation errors.

The hardlink source path from filepath.Join(dest, h.Linkname) is not validated against the extraction root boundary, allowing a malicious archive to create hardlinks pointing outside the intended extraction directory. Additionally, errors from os.Link() are silently ignored, potentially resulting in incomplete extraction without detection.

Suggested fix
 case tar.TypeLink:
   if err := os.MkdirAll(filepath.Dir(target), 0o755); err != nil {
     return err
   }
   os.Remove(target)
-  os.Link(filepath.Join(dest, h.Linkname), target)
+  linkSrc := filepath.Clean(filepath.Join(dest, h.Linkname))
+  if !strings.HasPrefix(linkSrc+string(os.PathSeparator), root) {
+    return fmt.Errorf("unsafe hardlink target in archive: %s", h.Linkname)
+  }
+  if err := os.Link(linkSrc, target); err != nil {
+    return err
+  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/internal/provision/provision.go` around lines 542 - 555, In the
tar.TypeLink case, the hardlink source path constructed with filepath.Join(dest,
h.Linkname) must be validated to ensure it doesn't escape the extraction root
boundary (similar to how the target path should be validated). Additionally, the
os.Link() call currently ignores its error return value, allowing failures to go
undetected. Fix this by: (1) validating the resolved source path stays within
the extraction root before calling os.Link, and (2) capturing the error returned
from os.Link() and returning it if non-nil, rather than silently ignoring
failures.
cli/internal/provision/provision.go-145-147 (1)

145-147: ⚠️ Potential issue | 🟠 Major

Add end-to-end HTTP timeout to prevent indefinite hangs during provisioning requests.

The HTTP client currently only sets ResponseHeaderTimeout (60 seconds), which times out if response headers stall but not if body transmission slows or halts. This can freeze provisioning on slow/unreliable networks. Add a bounded Timeout field to limit total request time.

Suggested fix
- client := &http.Client{Transport: &http.Transport{ResponseHeaderTimeout: 60 * time.Second}}
+ client := &http.Client{
+   Timeout: 15 * time.Minute,
+   Transport: &http.Transport{
+     ResponseHeaderTimeout: 60 * time.Second,
+   },
+ }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/internal/provision/provision.go` around lines 145 - 147, The HTTP client
in the provision.go file currently only sets ResponseHeaderTimeout (60 seconds)
which protects against stalled response headers but not against slow body
transmission or other network hangs. Add a Timeout field to the http.Client
struct initialization alongside the existing Transport configuration to enforce
an end-to-end timeout on the entire request lifecycle, preventing indefinite
hangs during provisioning requests on unreliable networks.
cli/internal/provision/provision.go-117-119 (1)

117-119: ⚠️ Potential issue | 🟠 Major

Pin checksums for all remotely downloaded artifacts.

download() treats empty checksums as success, allowing several runtime artifacts to remain mutable-by-upstream with no integrity enforcement. The Whisper models tiny.en and small lack checksums (lines 34–39), and the struct comment documents this as intentional ("empty: verification skipped"). Additionally, FFmpeg binaries and WhisperCLI are downloaded via fetch() without any checksum validation (lines 303, 272), despite the comment at lines 198–199 acknowledging they are "not yet pinned by checksum." This undermines hermetic trust guarantees across the provisioning system.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/internal/provision/provision.go` around lines 117 - 119, The provisioning
system allows runtime artifacts to be downloaded without pinned checksums,
undermining hermetic integrity guarantees. To fix this, add specific checksums
for the Whisper models tiny.en and small that currently have empty checksums,
add checksums for the FFmpeg binaries and WhisperCLI that are downloaded via
fetch() without validation, update the struct comment that documents empty
checksums as intentional to reflect that all artifacts are now checksummed, and
ensure the download() function enforces checksum validation rather than treating
empty checksums as valid. Verify that all remotely downloaded artifacts in the
provisioning system now have pinned checksums to maintain integrity enforcement
end-to-end.
cli/internal/update/update.go-152-159 (1)

152-159: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a request timeout to binary download.

Line 157 uses http.Get with no timeout, so podcli update can hang indefinitely on slow/broken networks.

Minimal fix
- resp, err := http.Get(url)
+ client := &http.Client{Timeout: 60 * time.Second}
+ resp, err := client.Get(url)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/internal/update/update.go` around lines 152 - 159, The downloadFile
function uses http.Get without a timeout, which can cause podcli update to hang
indefinitely on slow or broken networks. Create an http.Client with an
appropriate timeout (e.g., 30 seconds) and use that client's Get method instead
of the default http.Get call on line 157 to ensure the download request has a
bounded execution time.
cli/internal/provision/studio.go-220-239 (1)

220-239: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Block symlink-based write escape in tar extraction.

Line 233-Line 238 creates symlinks from archive headers without constraining h.Linkname, and Line 224 writes regular files with os.OpenFile (which follows symlinks). A crafted tar can redirect writes outside dest.

Suggested hardening direction
 case tar.TypeReg:
+    if fi, err := os.Lstat(target); err == nil && (fi.Mode()&os.ModeSymlink) != 0 {
+        return fmt.Errorf("refusing to write through symlink: %s", h.Name)
+    }
     if err := os.MkdirAll(filepath.Dir(target), 0o755); err != nil {
         return err
     }
@@
 case tar.TypeSymlink:
-    if err := os.MkdirAll(filepath.Dir(target), 0o755); err != nil {
-        return err
-    }
-    os.Remove(target)
-    os.Symlink(h.Linkname, target)
+    return fmt.Errorf("symlink entries are not allowed in archive: %s", h.Name)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/internal/provision/studio.go` around lines 220 - 239, The tar extraction
code has a symlink-based write escape vulnerability where the tar.TypeSymlink
case creates symlinks from h.Linkname without validating that the target path
stays within the intended extraction directory, and the tar.TypeReg case writes
files using os.OpenFile which follows symlinks. To fix this, validate that the
h.Linkname target in the tar.TypeSymlink case is constrained to the destination
directory by resolving it as a path and checking it doesn't escape.
Additionally, before writing regular files in the tar.TypeReg case, verify the
target path itself (after resolving any intermediate symlinks) is still within
bounds to prevent writes through malicious symlinks created by a previous entry
in the same archive.
cli/main.go-65-71 (1)

65-71: ⚠️ Potential issue | 🟠 Major

Add provision.EnsureWhisperCpp() to auto-selected whispercpp path in runEngine().

Lines 65–71 provision only the model but not the whisper-cli binary. When whispercpp is auto-selected here, the backend defaults to "whisper-cli" from PATH if PODCLI_WHISPER_CLI is unset. This creates an asymmetry with setup() (line 176), which explicitly calls EnsureWhisperCpp(). If whisper-cli is not in the managed runtime directory and not on PATH, transcription will fail at runtime.

Add provision.EnsureWhisperCpp() in this branch (similar to the setup flow) to ensure the binary is available before passing control to the backend.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/main.go` around lines 65 - 71, In the whispercpp auto-selection branch
(when transcribeEngine(args) == "whispercpp"), the code currently calls
provision.EnsureModel("base") to provision only the model file but does not
provision the whisper-cli binary itself. Add a call to
provision.EnsureWhisperCpp() before or after the EnsureModel call to ensure the
whisper-cli binary is available in the managed runtime, creating consistency
with the setup() function and preventing runtime failures when the binary is not
found on the system PATH.
🟡 Minor comments (3)
backend/cli.py-590-601 (1)

590-601: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Cleanly stop the spinner for native-engine failures too.

The new native path can raise FileNotFoundError for missing GGML models or subprocess-related exceptions from ffmpeg/whisper.cpp. Catching only RuntimeError leaves those failures to print through the active spinner as a traceback.

Proposed fix
-            except RuntimeError as e:
+            except (RuntimeError, FileNotFoundError, subprocess.CalledProcessError) as e:
                 _spin_stop.set()
                 spin_thread.join(timeout=1)
                 print(f"\r{' ' * 70}\r  ✗ {e}\n", flush=True)
                 sys.exit(1)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/cli.py` around lines 590 - 601, The exception handler in the
transcribe_file try-except block only catches RuntimeError, but the
native-engine path can also raise FileNotFoundError for missing GGML models and
subprocess-related exceptions from ffmpeg/whisper.cpp. Change the except clause
to catch a broader set of exceptions (such as Exception or specifically include
FileNotFoundError and OSError) so that the spinner is properly stopped via
_spin_stop.set() and spin_thread.join() for all failure cases, not just
RuntimeError, preventing tracebacks from printing while the spinner is still
active.
tests/test_whispercpp_snap.py-36-38 (1)

36-38: ⚠️ Potential issue | 🟡 Minor

Replace mktemp() with opened temporary files.

tempfile.mktemp() only reserves a name without creating the file, leaving a TOCTOU race window where another process could create or modify the file before _make_wav() writes it. Use NamedTemporaryFile(delete=False) to atomically reserve and create the file, close it, and then pass the path to _make_wav().

Proposed fix
     def setUp(self):
-        self.wav = tempfile.mktemp(suffix=".wav")
+        tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
+        self.wav = tmp.name
+        tmp.close()
         _make_wav(self.wav)
@@
     def test_all_silence_leaves_words_unchanged(self):
-        silent = tempfile.mktemp(suffix=".wav")
+        tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
+        silent = tmp.name
+        tmp.close()
         _make_wav(silent, voiced_s=0.0, silence_s=1.0)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_whispercpp_snap.py` around lines 36 - 38, In the setUp method,
replace the tempfile.mktemp() call with NamedTemporaryFile(delete=False) to
eliminate the TOCTOU race condition. Create a NamedTemporaryFile with the
appropriate suffix, extract its name property, close the file object, and then
assign that name to self.wav before passing it to the _make_wav() function. This
ensures the temporary file is atomically created and reserved before _make_wav()
attempts to write to it.

Source: Linters/SAST tools

plans/native-cli.md-7-7 (1)

7-7: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks.

These fences trigger markdownlint MD040; add explicit languages (for example bash/text) to keep docs lint-clean.

Suggested fix
-```
+```bash
 npm i -g podcli            # or: bun add -g podcli
 podcli process pod.mp4 --top 5
   → first run: silently provisions a hermetic runtime (one time)
   → 9:16 clips with burned captions
 podcli                     # auto-updates itself on launch

@@
- +text
┌─ podcli (Go launcher, ~8MB, per-platform) ──────────────────────────┐
│ • on-launch self-update (GitHub Releases, throttle-free fast check) │
...
└──────────────────────────────────────────────────────────────────────┘

Also applies to: 52-52

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plans/native-cli.md` at line 7, Add language identifiers to the fenced code
blocks in the native-cli.md file to comply with markdownlint MD040. At line 7,
add the bash language identifier to the opening fence containing npm and podcli
commands. At line 52, add the text language identifier to the opening fence
containing the ASCII art diagram. Both code blocks currently have empty language
identifiers (just triple backticks) and need explicit language tags to keep the
documentation lint-clean.

Source: Linters/SAST tools

🧹 Nitpick comments (2)
tests/parity/compare.py (1)

106-129: 💤 Low value

Remove unnecessary f-string prefix.

Line 118 has an f-string without any placeholders.

Minor style fix
-            print(f"Need baseline/<stem> and candidate/<stem>, or two explicit paths.", file=sys.stderr)
+            print("Need baseline/<stem> and candidate/<stem>, or two explicit paths.", file=sys.stderr)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/parity/compare.py` around lines 106 - 129, The error message string in
the main function contains an f-string prefix but has no placeholder variables,
making the f-string unnecessary. Remove the f prefix from the print statement in
the error handling block (the line that prints "Need baseline/<stem> and
candidate/<stem>, or two explicit paths.") and convert it to a regular string
literal.
tests/parity/README.md (1)

32-34: ⚡ Quick win

Tighten the examples for lint and portability.

The fenced blocks are unlabeled, and the venv/bin/python3 examples are POSIX-only. Tag the fences (sh/text) and prefer a platform-neutral invocation here.

Suggested cleanup
-```
-venv/bin/python3 -m pytest tests/parity/test_caption_parity.py -q
-```
+```sh
+python -m pytest tests/parity/test_caption_parity.py -q
+```

Apply the same fence tagging to the other command examples.

Also applies to: 38-40, 53-55, 61-63

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/parity/README.md` around lines 32 - 34, The fenced code blocks in the
README lack language labels and use POSIX-specific paths that reduce
portability. In tests/parity/README.md at lines 32-34 (anchor), add the `sh`
language tag to the opening backticks and replace `venv/bin/python3` with
`python` in the pytest invocation. Apply the same changes at the sibling
locations: lines 38-40, lines 53-55, and lines 61-63. For any non-shell fenced
blocks, use the `text` tag instead. This improves linting compliance and makes
the examples work across different platforms and environments.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/requirements-runtime.txt`:
- Around line 5-11: Update three package version constraints in the requirements
file to address critical security vulnerabilities. Change opencv-python-headless
from >=4.8.0 to >=4.8.1.78 to fix CVE-2023-4863 heap buffer overflow. Change
python-dotenv from >=1.0.0 to >=1.2.2 to fix CVE-2026-28684 symlink following
vulnerability. Change google-api-python-client from >=2.0.0 to >=2.0.1 since
version 2.0.0 was yanked. Leave numpy, Pillow, questionary, and
google-auth-oauthlib at their current specified versions as they have no known
security issues.

In `@backend/services/transcription.py`:
- Around line 35-38: The launcher provisions whisper.cpp models via
`provision.EnsureModel()` but fails to pass the resulting model path to the
backend. Instead of only setting `PODCLI_HOME`, capture the model path returned
by `provision.EnsureModel()` and explicitly set the `PODCLI_WHISPERCPP_MODEL`
environment variable to that path when invoking the backend process. This
ensures the backend's fallback logic in the model assignment (which checks for
`PODCLI_WHISPERCPP_MODEL` before defaulting to the `_managed_home()` path) will
find the correctly provisioned model.

In `@cli/internal/update/update.go`:
- Around line 140-149: The Windows update process backs up the current binary by
moving dest to dest.old, then attempts to install the staged binary. If the
final os.Rename(staged, dest) call on line 149 fails, the original binary
remains in dest.old while dest is missing, leaving the CLI unusable. Add
rollback logic after the final os.Rename(staged, dest) call: if this rename
fails, restore the backup by moving dest.old back to dest before returning the
error, ensuring the system can fall back to the previous working binary if the
installation fails.

In `@tests/parity/test_caption_parity.py`:
- Around line 37-52: In the _render() function, replace the deprecated and
insecure tempfile.mktemp() call with NamedTemporaryFile. Create a
NamedTemporaryFile context manager with delete=False and suffix=".ass", extract
its name attribute to pass to render_captions(), then read the file contents
within the same try-finally block to ensure proper cleanup. This eliminates the
race condition vulnerability where an attacker could create a file at the path
before your code writes to it.

---

Major comments:
In @.github/workflows/release.yml:
- Around line 14-16: The permissions block at the workflow level grants write
access to repository contents for all jobs in the workflow, which violates the
principle of least privilege. Remove the `contents: write` permission from the
workflow-level permissions block (lines 14-16) and move it to only the specific
job(s) that actually need it (the release or publish job). Keep other jobs with
read-only permissions or remove the permissions block from those jobs entirely
to ensure minimal token scope.
- Line 58: The `macos-13` runner label used in the workflow matrix is no longer
supported and will cause job scheduling failures. In
`.github/workflows/release.yml` at line 58 and line 116, replace the `macos-13`
runner label with a currently supported macOS runner label such as
`macos-latest` or another actively maintained version. This will ensure the
matrix jobs can schedule successfully on the GitHub Actions runners.
- Around line 29-30: Replace all action version pins in the workflow from
tag-based (`@v`*) to full commit SHA pins throughout the release.yml file to
eliminate supply-chain risks from tag reassignment. Additionally, add a with
clause containing persist-credentials: false to every actions/checkout step to
prevent credential leakage. This applies to all uses statements referencing
actions/checkout and actions/setup-go, as well as any other GitHub actions used
in the workflow. For each affected action, convert the `@v`* reference to the
corresponding full commit SHA and ensure checkout steps include the
persist-credentials configuration.

In `@backend/services/transcription_whispercpp.py`:
- Around line 84-86: The idx matrix creation in the RMS computation block
materializes the full nf × frame index matrix, which causes excessive memory
allocation for long audio files. Instead of creating the entire idx matrix
upfront with np.arange(nf)[:, None] * hop + np.arange(frame)[None, :], refactor
the RMS computation to process frames in chunks or use a streaming
cumulative-sum approach that computes rms values iteratively without allocating
the full index matrix. This will prevent gigabyte-scale memory spikes when
processing long podcast WAVs.
- Around line 20-25: The `_extract_wav` function uses unbounded
`subprocess.run()` which lacks mandatory timeouts and proper error handling,
making the backend vulnerable to stalled processes and losing useful error
details. Replace the `subprocess.run()` call in `_extract_wav` with
`utils.proc.run()` to inherit the project's timeout and failure capture
mechanisms. The same fix must also be applied to the whisper-cli invocation that
discards stderr (mentioned in the comment as also applying to that location),
ensuring both external tool invocations inherit the proper process wrapper
behavior.
- Around line 53-58: The code in the token accumulation block treats the
SentencePiece marker `▁` as a word boundary indicator but appends the raw token
unchanged to cur_text, causing caption words to retain the marker (e.g.,
`▁hello`). Before appending the raw token to cur_text, normalize the `▁`
character to a space so that subsequent strip() operations can properly remove
it. This should be done at the `cur_text += raw` line by replacing any `▁`
characters with spaces in the raw token before concatenation.
- Around line 124-130: The code currently ensures positive word duration with
the fallback `e = min(s + 0.05, hi)` when `e <= s`, but this still produces
zero-duration words when `s` is already clamped to the upper bound `hi`. After
the existing `if e <= s:` block (which sets `e = min(s + 0.05, hi)`), add an
additional guard that checks if `e <= s` is still true after the fallback. If
so, shift `s` backward by 0.05 (e.g., `s = max(e - 0.05, lo)`) to ensure a
positive duration exists while respecting the lower bound `lo`.

In `@backend/services/transcription.py`:
- Around line 29-56: The _transcribe_with_whispercpp function returns
transcription results without applying speaker detection and face analysis
enrichment, causing speaker segments to be missing even when diarization is
enabled. Extract the existing speaker-detection and face-analysis
post-processing logic (currently applied to other transcription paths) into a
separate shared helper function, then call this helper at the end of
_transcribe_with_whispercpp before returning the result, and also ensure it is
called for the other transcription path at lines 84-86 to maintain consistent
behavior across all transcription engines.

In `@backend/utils/proc.py`:
- Around line 21-27: The `_resolve_tool` function checks if an override path
exists without first resolving relative paths to absolute paths. When an
environment variable contains a relative path like `./bin/ffmpeg`, the
`os.path.exists(override)` check evaluates relative to the current working
directory, which may differ from where the variable was set. To fix this,
convert the override path to an absolute path using `os.path.abspath()` before
calling `os.path.exists()`. This ensures that relative paths are consistently
resolved and the hermetic binary is not silently skipped.

In `@cli/internal/engine/engine.go`:
- Around line 65-80: The BackendRoot() function currently searches for the
backend in the wrong order, creating a security vulnerability where a
project-local backend/cli.py in an ancestor directory takes precedence over the
managed runtime backend. Reorder the function to check the provisioned runtime
backend at paths.RuntimeDir()/backend first (before the directory tree walk), so
that the managed runtime backend is preferred over any ancestor backend/cli.py.
This way, only if the runtime backend does not exist should the function fall
back to walking the directory tree for a source checkout.

In `@cli/internal/provision/provision.go`:
- Around line 542-555: In the tar.TypeLink case, the hardlink source path
constructed with filepath.Join(dest, h.Linkname) must be validated to ensure it
doesn't escape the extraction root boundary (similar to how the target path
should be validated). Additionally, the os.Link() call currently ignores its
error return value, allowing failures to go undetected. Fix this by: (1)
validating the resolved source path stays within the extraction root before
calling os.Link, and (2) capturing the error returned from os.Link() and
returning it if non-nil, rather than silently ignoring failures.
- Around line 145-147: The HTTP client in the provision.go file currently only
sets ResponseHeaderTimeout (60 seconds) which protects against stalled response
headers but not against slow body transmission or other network hangs. Add a
Timeout field to the http.Client struct initialization alongside the existing
Transport configuration to enforce an end-to-end timeout on the entire request
lifecycle, preventing indefinite hangs during provisioning requests on
unreliable networks.
- Around line 117-119: The provisioning system allows runtime artifacts to be
downloaded without pinned checksums, undermining hermetic integrity guarantees.
To fix this, add specific checksums for the Whisper models tiny.en and small
that currently have empty checksums, add checksums for the FFmpeg binaries and
WhisperCLI that are downloaded via fetch() without validation, update the struct
comment that documents empty checksums as intentional to reflect that all
artifacts are now checksummed, and ensure the download() function enforces
checksum validation rather than treating empty checksums as valid. Verify that
all remotely downloaded artifacts in the provisioning system now have pinned
checksums to maintain integrity enforcement end-to-end.

In `@cli/internal/provision/studio.go`:
- Around line 220-239: The tar extraction code has a symlink-based write escape
vulnerability where the tar.TypeSymlink case creates symlinks from h.Linkname
without validating that the target path stays within the intended extraction
directory, and the tar.TypeReg case writes files using os.OpenFile which follows
symlinks. To fix this, validate that the h.Linkname target in the
tar.TypeSymlink case is constrained to the destination directory by resolving it
as a path and checking it doesn't escape. Additionally, before writing regular
files in the tar.TypeReg case, verify the target path itself (after resolving
any intermediate symlinks) is still within bounds to prevent writes through
malicious symlinks created by a previous entry in the same archive.

In `@cli/internal/update/update.go`:
- Around line 152-159: The downloadFile function uses http.Get without a
timeout, which can cause podcli update to hang indefinitely on slow or broken
networks. Create an http.Client with an appropriate timeout (e.g., 30 seconds)
and use that client's Get method instead of the default http.Get call on line
157 to ensure the download request has a bounded execution time.

In `@cli/main.go`:
- Around line 65-71: In the whispercpp auto-selection branch (when
transcribeEngine(args) == "whispercpp"), the code currently calls
provision.EnsureModel("base") to provision only the model file but does not
provision the whisper-cli binary itself. Add a call to
provision.EnsureWhisperCpp() before or after the EnsureModel call to ensure the
whisper-cli binary is available in the managed runtime, creating consistency
with the setup() function and preventing runtime failures when the binary is not
found on the system PATH.

In `@npm/scripts/install.js`:
- Around line 43-79: The download function downloads executable files without
any integrity verification, making the system vulnerable to compromised
binaries. Add checksum or signature validation to verify the downloaded
executable before it is renamed and persisted. In the download function, after
the file write completes but before the fs.renameSync call that persists the
temporary file to the final destination, calculate or verify a checksum against
a known good value (obtained from a separate checksum file or manifest).
Similarly, in the ensure function, ensure that the downloaded binary from the
GitHub release URL is validated. Only proceed with renaming the temporary file
to the final destination after successful integrity verification, and reject
with an error if the checksum does not match.
- Around line 27-28: The Windows platform check in the `defaultHome()` function
returns `process.env.LOCALAPPDATA` directly without appending the 'podcli'
subdirectory, whereas the fallback case joins the path with 'AppData', 'Local',
'podcli', and the non-Windows paths always append 'podcli'. Modify the Windows
platform check to use `path.join()` to append 'podcli' to
`process.env.LOCALAPPDATA`, making it consistent with the fallback behavior and
other platform implementations.

---

Minor comments:
In `@backend/cli.py`:
- Around line 590-601: The exception handler in the transcribe_file try-except
block only catches RuntimeError, but the native-engine path can also raise
FileNotFoundError for missing GGML models and subprocess-related exceptions from
ffmpeg/whisper.cpp. Change the except clause to catch a broader set of
exceptions (such as Exception or specifically include FileNotFoundError and
OSError) so that the spinner is properly stopped via _spin_stop.set() and
spin_thread.join() for all failure cases, not just RuntimeError, preventing
tracebacks from printing while the spinner is still active.

In `@plans/native-cli.md`:
- Line 7: Add language identifiers to the fenced code blocks in the
native-cli.md file to comply with markdownlint MD040. At line 7, add the bash
language identifier to the opening fence containing npm and podcli commands. At
line 52, add the text language identifier to the opening fence containing the
ASCII art diagram. Both code blocks currently have empty language identifiers
(just triple backticks) and need explicit language tags to keep the
documentation lint-clean.

In `@tests/test_whispercpp_snap.py`:
- Around line 36-38: In the setUp method, replace the tempfile.mktemp() call
with NamedTemporaryFile(delete=False) to eliminate the TOCTOU race condition.
Create a NamedTemporaryFile with the appropriate suffix, extract its name
property, close the file object, and then assign that name to self.wav before
passing it to the _make_wav() function. This ensures the temporary file is
atomically created and reserved before _make_wav() attempts to write to it.

---

Nitpick comments:
In `@tests/parity/compare.py`:
- Around line 106-129: The error message string in the main function contains an
f-string prefix but has no placeholder variables, making the f-string
unnecessary. Remove the f prefix from the print statement in the error handling
block (the line that prints "Need baseline/<stem> and candidate/<stem>, or two
explicit paths.") and convert it to a regular string literal.

In `@tests/parity/README.md`:
- Around line 32-34: The fenced code blocks in the README lack language labels
and use POSIX-specific paths that reduce portability. In tests/parity/README.md
at lines 32-34 (anchor), add the `sh` language tag to the opening backticks and
replace `venv/bin/python3` with `python` in the pytest invocation. Apply the
same changes at the sibling locations: lines 38-40, lines 53-55, and lines
61-63. For any non-shell fenced blocks, use the `text` tag instead. This
improves linting compliance and makes the examples work across different
platforms and environments.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5e29c951-3895-4dfe-9d6b-5789ace174c5

📥 Commits

Reviewing files that changed from the base of the PR and between 88e5428 and ec91bd2.

📒 Files selected for processing (46)
  • .github/workflows/release.yml
  • .gitignore
  • backend/cli.py
  • backend/clip_studio.py
  • backend/main.py
  • backend/requirements-runtime.txt
  • backend/services/clip_generator.py
  • backend/services/encoder.py
  • backend/services/thumbnail_html.py
  • backend/services/transcription.py
  • backend/services/transcription_whispercpp.py
  • backend/utils/proc.py
  • cli/go.mod
  • cli/internal/backend/embed.go
  • cli/internal/backend/sync.sh
  • cli/internal/config/config.go
  • cli/internal/engine/engine.go
  • cli/internal/paths/paths.go
  • cli/internal/podstack/podstack.go
  • cli/internal/podstack/sync.sh
  • cli/internal/provision/provision.go
  • cli/internal/provision/provision_manual_test.go
  • cli/internal/provision/studio.go
  • cli/internal/update/update.go
  • cli/internal/update/update_test.go
  • cli/main.go
  • npm/README.md
  • npm/bin/podcli.js
  • npm/package.json
  • npm/scripts/install.js
  • plans/native-cli.md
  • remotion/render-bookend.mjs
  • scripts/build-remotion.sh
  • scripts/build-studio.sh
  • src/config/paths.ts
  • tests/parity/.gitignore
  • tests/parity/README.md
  • tests/parity/capture_baseline.py
  • tests/parity/compare.py
  • tests/parity/golden/branded.ass.expected
  • tests/parity/golden/hormozi.ass.expected
  • tests/parity/golden/karaoke.ass.expected
  • tests/parity/golden/subtle.ass.expected
  • tests/parity/test_caption_parity.py
  • tests/parity/transcript_synthetic.json
  • tests/test_whispercpp_snap.py

Comment thread backend/requirements-runtime.txt Outdated
Comment thread backend/services/transcription.py Outdated
Comment thread cli/internal/update/update.go
Comment thread tests/parity/test_caption_parity.py
…block symlink traversal

- Publish checksums.txt in the release workflow; verify it in self-update,
  whisper-cli/studio/remotion provisioning, and the npm installer
- Pin binary downloads/redirects to GitHub hosts (replaces manual redirect
  recursion in the self-updater)
- Reject archive symlinks/hardlinks whose targets escape the extract dir
- Namespace the transcript cache by engine so whisper.cpp doesn't reuse a
  whisper-py transcript; honor PODCLI_FFPROBE in clip_studio
- Fail fast when NPM_TOKEN is unset before publishing

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
backend/services/transcription_whispercpp.py (3)

84-86: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid building the full frame index matrix for podcast-length audio.

idx materializes nf * frame samples; for hour-long files this can exceed memory and crash during the post-transcription snap pass.

Proposed low-memory RMS calculation
-    idx = np.arange(nf)[:, None] * hop + np.arange(frame)[None, :]
-    rms = np.sqrt((samples[idx] ** 2).mean(axis=1))
+    rms = np.empty(nf, dtype=np.float32)
+    for i in range(nf):
+        start_idx = i * hop
+        window = samples[start_idx : start_idx + frame]
+        rms[i] = np.sqrt(np.mean(window * window))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/transcription_whispercpp.py` around lines 84 - 86, The index
matrix `idx` in the RMS calculation materializes all nf * frame indices in
memory at once, which causes memory exhaustion for long audio files. Replace
this full materialization with a loop-based or rolling window approach that
processes one or a small batch of frames at a time, computing the RMS values
incrementally without constructing the large `idx` matrix. The key is to iterate
through frame positions (using the `hop`, `frame`, and `nf` variables) and
calculate the squared mean for each individual frame or small batch, then take
the square root, rather than creating the full indexing array upfront.

39-44: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Wire speaker attribution into the whisper.cpp path.

This adapter currently returns every word and segment with speaker: None, so --engine whispercpp disables speaker-aware clipping/cropping even when speaker detection is requested upstream.

As per coding guidelines, backend/**/*.py: “Implement speaker detection as part of the transcribe_podcast tool using Whisper transcription”.

Also applies to: 178-185

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/transcription_whispercpp.py` around lines 39 - 44, The
whisper.cpp transcription adapter is hardcoding speaker attribution as None,
which disables speaker-aware processing downstream. In
backend/services/transcription_whispercpp.py at lines 39-44 (the words
dictionary) and lines 178-185 (the segments dictionary), replace the hardcoded
"speaker": None with actual speaker attribution data extracted from the
whisper.cpp transcription results. Extract speaker information from the
whisper.cpp output for each word and segment, and use that value instead of None
to enable speaker-aware clipping and cropping when speaker detection is enabled
upstream.

Source: Coding guidelines


123-131: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent snapped words from collapsing to zero duration.

When a word starts after hi, both s and e clamp to hi, so Line 129 keeps end == start; the trailing caption can disappear or become invalid.

Proposed guard for clamped tail words
     out, prev_end = [], lo
     for w in words:
+        min_dur = 0.05
         s = min(max(float(w.get("start", 0.0)), lo), hi)
         e = min(max(float(w.get("end", 0.0)), lo), hi)
         if s < prev_end:
             s = prev_end
         if e <= s:
-            e = min(s + 0.05, hi)
+            if s >= hi:
+                s = max(prev_end, hi - min_dur)
+                e = s + min_dur
+            else:
+                e = min(s + min_dur, hi)
         prev_end = e
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/transcription_whispercpp.py` around lines 123 - 131, The
condition on line 129 that expands zero-duration words uses `min(s + 0.05, hi)`
which prevents proper expansion when both `s` and `e` are clamped to `hi` (i.e.,
when a word starts after the `hi` boundary). This results in `e` remaining equal
to `s` and zero-duration words. Modify the expansion logic to ensure the minimum
duration of 0.05 is applied without being constrained by `hi`, so that even tail
words clamped to the boundary receive valid non-zero duration. Consider removing
the `hi` constraint in the `min()` call or adding a separate guard specifically
for words that start at or beyond `hi`.
backend/cli.py (1)

568-597: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Import subprocess before catching CalledProcessError.

Line 597 references subprocess.CalledProcessError, but subprocess is not in scope here. Any transcription failure will turn into a NameError, bypassing the spinner cleanup and friendly exit path.

Proposed fix
             warnings.filterwarnings("ignore", message="FP16 is not supported on CPU")
             from services.transcription import transcribe_file
+            import subprocess
             import threading
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/cli.py` around lines 568 - 597, The except clause on line 597
references subprocess.CalledProcessError but the subprocess module is not
imported, which will cause a NameError when any transcription failure occurs and
prevent proper spinner cleanup. Add import subprocess with the other imports at
the beginning of the function (alongside the imports for threading and
transcribe_file from services.transcription) to ensure the exception handler can
properly catch subprocess.CalledProcessError.

Source: Linters/SAST tools

backend/services/transcript_packer.py (1)

79-90: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Skip legacy cache fallback for engine-namespaced transcripts.

When PODCLI_ENGINE=whispercpp, this misses <hash>-whispercpp.json, loads the unsuffixed legacy MD5 cache, then saves that whisper-py-shaped data under the whisper.cpp filename on Line 89.

Proposed fix
     canonical = transcript_json_path(cache_hash)
     if os.path.exists(canonical):
         with open(canonical, encoding="utf-8") as f:
             return json.load(f)
+    if _engine_cache_suffix():
+        return None
     legacy = legacy_md5_cache_path(video_path)
     if os.path.exists(legacy):
         with open(legacy, encoding="utf-8") as f:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/transcript_packer.py` around lines 79 - 90, The
load_cached_transcript_for_video function currently falls back to loading from
the legacy MD5 cache regardless of whether an engine namespace is being used.
When PODCLI_ENGINE is set to a specific engine like whispercpp, this causes the
function to load whisper-py-shaped data from the legacy cache and save it under
the engine-specific filename via save_cached_transcript_for_video on line 89,
creating a data mismatch. Skip the legacy cache fallback (lines 87-90) when an
engine namespace is active by adding a condition to check if the transcript is
engine-specific, ensuring that only engine-appropriate cached data is loaded or
nothing is loaded if the engine-specific cache doesn't exist.
🧹 Nitpick comments (3)
cli/internal/provision/security_test.go (1)

51-75: ⚡ Quick win

Consider adding a test for hardlink escape rejection.

extractTarGz validates both symlinks (tested here) and hardlinks (tar.TypeLink at lines 668-672 in provision.go). Adding a similar test for hardlinks would ensure the security invariant is tested end-to-end for both link types.

func TestExtractTarGzRejectsEscapingHardlink(t *testing.T) {
	if runtime.GOOS == "windows" {
		t.Skip("hardlink semantics differ on Windows")
	}
	var buf bytes.Buffer
	gz := gzip.NewWriter(&buf)
	tw := tar.NewWriter(gz)
	// First create a regular file that exists
	tw.WriteHeader(&tar.Header{Name: "legit", Typeflag: tar.TypeReg, Size: 0, Mode: 0o644})
	// Then a hardlink pointing outside
	tw.WriteHeader(&tar.Header{Name: "evil", Typeflag: tar.TypeLink, Linkname: "../../../etc/passwd", Mode: 0o644})
	tw.Close()
	gz.Close()

	dir := t.TempDir()
	archive := filepath.Join(dir, "evil.tar.gz")
	if err := os.WriteFile(archive, buf.Bytes(), 0o644); err != nil {
		t.Fatal(err)
	}
	dest := filepath.Join(dir, "out")
	if err := extractTarGz(archive, dest); err == nil {
		t.Fatal("expected extractTarGz to reject escaping hardlink, got nil error")
	}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/internal/provision/security_test.go` around lines 51 - 75, Add a new test
function `TestExtractTarGzRejectsEscapingHardlink` to the test file that
validates `extractTarGz` rejects hardlinks escaping the destination directory.
The test should follow the same pattern as the existing
`TestExtractTarGzRejectsEscapingSymlink` test but use `tar.TypeLink` instead of
`tar.TypeSymlink`, include the Windows skip condition since hardlink semantics
differ on Windows, create a regular file entry first (using `tar.TypeReg`), then
create a hardlink entry with a Linkname pointing outside the destination (e.g.,
"../../../etc/passwd"), and finally verify that `extractTarGz` returns an error
when attempting to extract the archive. This ensures the security validation for
hardlinks is tested end-to-end alongside symlinks.
cli/internal/provision/studio.go (1)

43-82: ⚖️ Poor tradeoff

Node.js downloads lack checksum verification.

Unlike whisper-cli, Remotion, and Studio (which verify against checksums.txt), Node.js downloads from nodejs.org aren't verified. Node does publish SHASUMS256.txt files at https://nodejs.org/dist/v{version}/SHASUMS256.txt.asc (GPG-signed) or SHASUMS256.txt.

This is lower risk since nodejs.org is a trusted source over HTTPS, but for parity with other provisioned assets, consider fetching and verifying against the upstream checksum.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/internal/provision/studio.go` around lines 43 - 82, The EnsureNode()
function downloads Node.js but does not verify the downloaded archive against a
checksum, unlike other provisioned assets. Add checksum verification by fetching
the appropriate SHASUMS256.txt file from nodejs.org (either the GPG-signed or
plain text version at https://nodejs.org/dist/v{nodeVersion}/SHASUMS256.txt)
after the fetch() call completes successfully, then verify the downloaded
archive file against the checksum before proceeding to the extraction step using
extractZipStrip1() or extractTarGzStrip1(). This ensures parity with checksum
verification for other provisioned dependencies and adds a safety check for the
Node.js download.
cli/internal/update/update.go (1)

192-196: 💤 Low value

Consider failing closed when checksum entry is missing for the current asset.

A compromised release could omit the checksum entry for a specific platform's binary while providing entries for others, allowing that platform's binary to bypass verification. Since new releases will include checksums.txt, consider returning an error when the file exists but lacks an entry for the current asset.

This would only affect edge cases where checksums.txt is present but incomplete.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/internal/update/update.go` around lines 192 - 196, The checksum
verification logic in the ParseChecksums block currently returns nil (success)
when the checksum entry for assetName() is not found in the parsed checksums. To
prevent a compromised release from bypassing verification by selectively
omitting entries, change this behavior to return an error instead of nil when ok
is false. This ensures that if checksums.txt exists but lacks an entry for the
current asset, the verification fails rather than silently succeeding,
implementing a fail-closed security posture.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@backend/cli.py`:
- Around line 568-597: The except clause on line 597 references
subprocess.CalledProcessError but the subprocess module is not imported, which
will cause a NameError when any transcription failure occurs and prevent proper
spinner cleanup. Add import subprocess with the other imports at the beginning
of the function (alongside the imports for threading and transcribe_file from
services.transcription) to ensure the exception handler can properly catch
subprocess.CalledProcessError.

In `@backend/services/transcript_packer.py`:
- Around line 79-90: The load_cached_transcript_for_video function currently
falls back to loading from the legacy MD5 cache regardless of whether an engine
namespace is being used. When PODCLI_ENGINE is set to a specific engine like
whispercpp, this causes the function to load whisper-py-shaped data from the
legacy cache and save it under the engine-specific filename via
save_cached_transcript_for_video on line 89, creating a data mismatch. Skip the
legacy cache fallback (lines 87-90) when an engine namespace is active by adding
a condition to check if the transcript is engine-specific, ensuring that only
engine-appropriate cached data is loaded or nothing is loaded if the
engine-specific cache doesn't exist.

In `@backend/services/transcription_whispercpp.py`:
- Around line 84-86: The index matrix `idx` in the RMS calculation materializes
all nf * frame indices in memory at once, which causes memory exhaustion for
long audio files. Replace this full materialization with a loop-based or rolling
window approach that processes one or a small batch of frames at a time,
computing the RMS values incrementally without constructing the large `idx`
matrix. The key is to iterate through frame positions (using the `hop`, `frame`,
and `nf` variables) and calculate the squared mean for each individual frame or
small batch, then take the square root, rather than creating the full indexing
array upfront.
- Around line 39-44: The whisper.cpp transcription adapter is hardcoding speaker
attribution as None, which disables speaker-aware processing downstream. In
backend/services/transcription_whispercpp.py at lines 39-44 (the words
dictionary) and lines 178-185 (the segments dictionary), replace the hardcoded
"speaker": None with actual speaker attribution data extracted from the
whisper.cpp transcription results. Extract speaker information from the
whisper.cpp output for each word and segment, and use that value instead of None
to enable speaker-aware clipping and cropping when speaker detection is enabled
upstream.
- Around line 123-131: The condition on line 129 that expands zero-duration
words uses `min(s + 0.05, hi)` which prevents proper expansion when both `s` and
`e` are clamped to `hi` (i.e., when a word starts after the `hi` boundary). This
results in `e` remaining equal to `s` and zero-duration words. Modify the
expansion logic to ensure the minimum duration of 0.05 is applied without being
constrained by `hi`, so that even tail words clamped to the boundary receive
valid non-zero duration. Consider removing the `hi` constraint in the `min()`
call or adding a separate guard specifically for words that start at or beyond
`hi`.

---

Nitpick comments:
In `@cli/internal/provision/security_test.go`:
- Around line 51-75: Add a new test function
`TestExtractTarGzRejectsEscapingHardlink` to the test file that validates
`extractTarGz` rejects hardlinks escaping the destination directory. The test
should follow the same pattern as the existing
`TestExtractTarGzRejectsEscapingSymlink` test but use `tar.TypeLink` instead of
`tar.TypeSymlink`, include the Windows skip condition since hardlink semantics
differ on Windows, create a regular file entry first (using `tar.TypeReg`), then
create a hardlink entry with a Linkname pointing outside the destination (e.g.,
"../../../etc/passwd"), and finally verify that `extractTarGz` returns an error
when attempting to extract the archive. This ensures the security validation for
hardlinks is tested end-to-end alongside symlinks.

In `@cli/internal/provision/studio.go`:
- Around line 43-82: The EnsureNode() function downloads Node.js but does not
verify the downloaded archive against a checksum, unlike other provisioned
assets. Add checksum verification by fetching the appropriate SHASUMS256.txt
file from nodejs.org (either the GPG-signed or plain text version at
https://nodejs.org/dist/v{nodeVersion}/SHASUMS256.txt) after the fetch() call
completes successfully, then verify the downloaded archive file against the
checksum before proceeding to the extraction step using extractZipStrip1() or
extractTarGzStrip1(). This ensures parity with checksum verification for other
provisioned dependencies and adds a safety check for the Node.js download.

In `@cli/internal/update/update.go`:
- Around line 192-196: The checksum verification logic in the ParseChecksums
block currently returns nil (success) when the checksum entry for assetName() is
not found in the parsed checksums. To prevent a compromised release from
bypassing verification by selectively omitting entries, change this behavior to
return an error instead of nil when ok is false. This ensures that if
checksums.txt exists but lacks an entry for the current asset, the verification
fails rather than silently succeeding, implementing a fail-closed security
posture.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7cca89ec-090b-4aef-9769-c629374e671d

📥 Commits

Reviewing files that changed from the base of the PR and between ec91bd2 and e6aa165.

📒 Files selected for processing (14)
  • .github/workflows/release.yml
  • backend/cli.py
  • backend/clip_studio.py
  • backend/services/transcript_packer.py
  • backend/services/transcription_whispercpp.py
  • cli/internal/provision/provision.go
  • cli/internal/provision/security_test.go
  • cli/internal/provision/studio.go
  • cli/internal/update/update.go
  • cli/main.go
  • cli/main_test.go
  • npm/scripts/install.js
  • tests/test_transcript_cache_engine.py
  • tests/test_whispercpp_adapter.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_whispercpp_adapter.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • backend/clip_studio.py
  • .github/workflows/release.yml
  • cli/main.go

The goldens encoded two host-specific inputs, so they passed on macOS but
failed on CI's Ubuntu:
- the font name (DETECTED_FONT resolves Arial on macOS, Liberation Sans on
  Ubuntu via fc-list) written into the ASS Style line — affected all 4 styles
- per-word text widths (fc-match + freetype metrics) — affected the branded
  style's positioning and pill geometry

Pin both in the parity test fixture (font -> Arial, deterministic width model)
and regenerate the goldens. Production rendering still uses real host metrics;
only the harness is pinned, which is its whole point: lock the engine-independent
ASS pipeline, not the host's typography.
- requirements: bump opencv-python-headless (CVE-2023-4863), python-dotenv
  (CVE-2026-28684), google-api-python-client (2.0.0 yanked) floors in both
  runtime and dev requirements
- cli.py: import subprocess at module scope — the native-engine except clause
  referenced subprocess.CalledProcessError without it in scope (NameError)
- install.js: Windows defaultHome now nests under LOCALAPPDATA\podcli to match
  the Go launcher's managed dir (binary was installed where the shim couldn't
  find it)
- update/provision: add ResponseHeaderTimeout to the release HTTP clients so a
  stalled server can't hang update/provisioning indefinitely
- update.go: roll back the Windows binary swap if the staged rename fails, so a
  failed self-update can't brick the CLI
- whisper.cpp snap: guarantee positive word duration when clamping to the
  voiced upper bound (was collapsing to zero)
- transcript cache: don't adopt the legacy whisper-py cache under a whisper.cpp
  namespace
- release.yml: default permissions to read; grant contents:write only to the
  release job
- tests: parity goldens use mkstemp; add hardlink-escape and zero-duration-snap
  regression tests
…r.cpp

Path model: the brand brain (presets, knowledge, assets, history, config) and
the transcript cache now live in the global managed dir so they follow the user
across directories; only rendered clips are written to <cwd>/podcli-clips. The Go
launcher injects PODCLI_HOME/PODCLI_DATA (global), PODCLI_OUTPUT and PODCLI_CWD;
Python and the studio honor PODCLI_OUTPUT.

Migration: running podcli in an old ./podcli folder imports that folder's brand
brain into the global home (via the bundle export/import so asset paths are
rewritten), moves the transcript cache, picks up ancient top-level presets/, and
copies .env. Non-destructive: it never overwrites a populated global home.

Studio: the library already reads the global clips.json; clips are now streamed
by history id from wherever they were rendered, with the recorded output_path as
the allowlist (symlink-resolved, regular-file + extension checked).

Engine: native installs auto-use whisper.cpp instead of erroring. The launcher
selects it on a hermetic Python for the interactive menu, process, and studio;
the backend also falls back automatically when openai-whisper is absent and the
user did not explicitly request the whisper-py engine.

The old in-repo launcher prints an upgrade notice pointing at the native CLI.
…cript

Studio gets a text area to paste a quote or describe moments; the backend uses
the local AI CLI to locate them in the transcript and appends the matches to the
clip suggestions (deduped against existing ones). This mirrors the CLI's
interactive "find a specific moment" flow.

The moment-finding logic moves into claude_suggest.find_moments_from_text (the
CLI now delegates to it), exposed as a find_moment backend task and a
POST /api/find-moment route. Status goes to the progress channel; warnings to
stderr so the task runner's stdout JSON-RPC stays clean.
- Interactive thumbnails passed output=None (the bare namespace has no argparse
  default), crashing in os.makedirs. Default to ./thumbnails at the point of use
  and in the menu namespace.
- Studio launch derived PODCLI_DATA from PODCLI_OUTPUT, which now points at the
  cwd clip dir — scattering cache/working/logs into the working directory. Derive
  data dir from the cache path and propagate PODCLI_OUTPUT to the child instead.
- Migration: resolve the .env target from PODCLI_ENV_FILE (what the loader reads)
  so migrated secrets land where they're loaded and "pending" clears; never treat
  the global managed dir itself as a legacy project; guard presets move against
  source==target.
- whisper.cpp auto-fallback now also engages when a broken openai-whisper install
  fails to load/run, not only when it's missing.
- Studio binds to 127.0.0.1 by default (it serves local files with no auth);
  set PODCLI_HOST=0.0.0.0 to expose on the LAN.
… reset

- clip_generator: Remotion-fallback and boundary-revert messages went to stdout,
  which corrupts the JSON-RPC task channel (create_clip/batch_clips via MCP/studio)
  on the fallback path. Route them to stderr.
- config bundle: asset paths in presets/ui-state were keyed in the rewrite map by
  resolved path but matched against the literal stored string, so any symlinked
  path (macOS /var, /tmp) was left pointing at the source machine after import.
  Register raw/expanded/resolved aliases so the literal value is rewritten. Adds a
  regression test (tmpdirs are symlinked on macOS, reproducing the failure).
- find_moments_from_text: write the prompt via the managed .podcli/tmp helper
  instead of the repo root, matching suggest_with_claude.
- studio: reset deselectedIndices when claude-suggest replaces the suggestion list
  so the wrong clips aren't silently excluded from export.
- whisper.cpp: add timeouts to the ffmpeg and whisper-cli subprocess calls so a
  stalled child can't hang the transcribe task forever.
downloadOnce followed redirects to any host while only the checksums.txt fetch
was host-pinned — and Node, CPython, and ffmpeg are downloaded without checksum
verification, then executed. Route all binary/model downloads through a client
that refuses redirects off the known source hosts (GitHub + *.githubusercontent,
HuggingFace + its LFS CDN, nodejs.org, evermeet.cx, johnvansickle.com), and use
the pinned client for the GitHub API calls too. Adds tests for the host
allowlist (incl. suffix-spoofing) and redirect refusal.
Run setup automatically the first time a runtime command (the interactive menu,
process, transcribe, studio) is invoked with no backend present, so a fresh
install works without a separate `podcli setup`. Skipped for lightweight and mcp
commands. Trim narration comments added in recent commits.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
.github/workflows/release.yml (1)

79-86: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate that whisper-cli binary was found before copying.

The find command on line 83 could produce an empty $SRC if the build structure differs from expectations or the build failed silently. The subsequent cp would then fail with a cryptic "cannot stat" error rather than a clear message.

🛡️ Proposed validation
           EXT=""; [ "${{ matrix.goos }}" = "windows" ] && EXT=".exe"
           SRC=$(find build -name "whisper-cli${EXT}" -type f | head -1)
+          if [ -z "$SRC" ]; then
+            echo "::error::whisper-cli binary not found in build/"
+            exit 1
+          fi
           OUT="whisper-cli-${{ matrix.goos }}-${{ matrix.goarch }}${EXT}"
           cp "$SRC" "$OUT"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release.yml around lines 79 - 86, Add validation after the
find command that locates the whisper-cli binary to ensure the SRC variable is
not empty before attempting to copy the file. Insert a conditional check that
explicitly verifies SRC contains a valid file path, and if it is empty, output a
clear error message and exit with a non-zero status code. This prevents the cp
command from failing with a cryptic "cannot stat" error when the binary is not
found, instead providing immediate and explicit feedback about the build
artifact missing.
backend/main.py (1)

502-519: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Expose the required MCP tool names in TASK_HANDLERS.

TASK_HANDLERS currently doesn’t register several required MCP suite names (transcribe_podcast, batch_create_clips, knowledge_base, manage_assets, clip_history). This can break tool dispatch contracts when callers use the specified MCP names.

As per coding guidelines, "Implement the MCP tools suite for podcli engine: transcribe_podcast, suggest_clips, create_clip, batch_create_clips, knowledge_base, manage_assets, and clip_history."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/main.py` around lines 502 - 519, The TASK_HANDLERS dictionary in the
backend is missing several required MCP tool names that are part of the
specified MCP suite. Add the following missing tool names to the TASK_HANDLERS
dictionary: transcribe_podcast, batch_create_clips, knowledge_base,
manage_assets, and clip_history. Each key should map to its corresponding
handler function (for example, transcribe_podcast should map to
handle_transcribe, batch_create_clips should map to handle_batch_clips, and the
other three tools need their respective handler functions). Ensure all handler
functions are properly defined before being referenced in the TASK_HANDLERS
mapping to avoid runtime errors during tool dispatch.

Source: Coding guidelines

cli/internal/engine/engine.go (1)

65-79: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prefer the provisioned backend before walking the working tree.

BackendRoot currently executes the first backend/cli.py found above cwd even when the runtime backend exists. In the native binary path, running podcli inside any repo with a backend/cli.py would execute local Python instead of the embedded backend, breaking the self-contained runtime trust boundary. Prefer paths.RuntimeDir()/backend after explicit PODCLI_BACKEND, leaving cwd probing as a dev fallback.

🛡️ Proposed resolution-order fix
 func BackendRoot() (string, bool) {
 	if b := os.Getenv("PODCLI_BACKEND"); b != "" && exists(filepath.Join(b, "cli.py")) {
 		return b, true
 	}
+	cand := filepath.Join(paths.RuntimeDir(), "backend")
+	if exists(filepath.Join(cand, "cli.py")) {
+		return cand, true
+	}
 	if dir, err := os.Getwd(); err == nil {
 		for {
 			cand := filepath.Join(dir, "backend")
 			if exists(filepath.Join(cand, "cli.py")) {
 				return cand, true
@@
 			dir = parent
 		}
 	}
-	cand := filepath.Join(paths.RuntimeDir(), "backend")
-	if exists(filepath.Join(cand, "cli.py")) {
-		return cand, true
-	}
 	return "", false
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cli/internal/engine/engine.go` around lines 65 - 79, The BackendRoot function
currently prioritizes the working directory tree walk over the provisioned
runtime backend, which creates a security issue where local backends can
override the embedded one. Reorder the backend discovery logic to check
paths.RuntimeDir()/backend first by testing for the existence of cli.py there,
and only fall back to the os.Getwd() directory tree walking if the runtime
backend is not found. This ensures the provisioned backend takes precedence
while preserving the development fallback capability.
🧹 Nitpick comments (6)
.github/workflows/release.yml (5)

23-28: 💤 Low value

Consider adding windows/arm64 to the build matrix.

The matrix includes arm64 variants for darwin and linux but omits windows/arm64. Windows 11 supports ARM64 natively, and GitHub Actions now offers windows-11-arm runners. If ARM64 Windows support is intentionally excluded, consider documenting the rationale in a comment.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release.yml around lines 23 - 28, The build matrix in the
include section is missing a windows/arm64 configuration entry, while darwin and
linux both include arm64 variants. Add a new entry following the same pattern
for windows/arm64 with goos: windows and goarch: arm64 to the include list, or
if this exclusion is intentional, add a comment explaining the rationale for
omitting Windows ARM64 support.

127-133: ⚡ Quick win

Add error checking for the remotion build script.

Similar to the studio job, if scripts/build-remotion.sh fails, the workflow will package an incomplete bundle without clear failure.

🛡️ Proposed hardening
         run: |
+          set -e
           sh scripts/build-remotion.sh dist/remotion-bundle
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release.yml around lines 127 - 133, The Build remotion
bundle step lacks error checking for the scripts/build-remotion.sh script
execution. If the script fails, the workflow will continue and package an
incomplete bundle instead of failing. Add error checking to the bash script
block (either by using set -e at the start of the run command or by appending ||
exit 1 after the sh scripts/build-remotion.sh command) to ensure the workflow
fails immediately if the build script encounters any errors.

40-46: ⚡ Quick win

Add explicit error checking after go generate.

The go generate ./... command could fail silently if backend or PodStack sync scripts error out. Consider adding set -e at the top of the run block or checking $? after the generate step to fail fast.

🛡️ Proposed hardening
         run: |
+          set -e
           VERSION="${GITHUB_REF_NAME#v}"
           go generate ./...   # sync backend + PodStack commands for go:embed
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release.yml around lines 40 - 46, The `go generate ./...`
command in the release workflow's run block lacks error checking and could fail
silently if backend or PodStack sync scripts encounter errors. Add `set -e` at
the very beginning of the run block (as the first line after the `run: |`
statement) to ensure the workflow exits immediately if any command fails,
preventing subsequent commands like `go build` from executing with potentially
incomplete or corrupt generated files.

99-103: ⚡ Quick win

Add error checking for the studio build script.

If scripts/build-studio.sh fails, the workflow will continue to tar an empty or incomplete directory. The error might not surface until runtime when the provisioner tries to use the broken bundle.

🛡️ Proposed hardening
         run: |
+          set -e
           npm ci
           sh scripts/build-studio.sh dist/studio
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release.yml around lines 99 - 103, The studio build
workflow step does not include error checking, which means if
scripts/build-studio.sh fails, the workflow will continue executing and create a
tar archive of an incomplete or empty directory. Add error handling by including
set -e at the beginning of the run block to ensure the workflow exits
immediately if the build-studio.sh script fails, preventing incomplete bundles
from being created and used downstream.

149-153: ⚡ Quick win

Add validation that checksums.txt was created successfully.

If sha256sum fails or produces no output (e.g., due to missing files in dist/), the mv command might move an empty file, and the release would be published with broken checksums.

🛡️ Proposed validation
         run: |
           sha256sum * > "$RUNNER_TEMP/checksums.txt"
+          if [ ! -s "$RUNNER_TEMP/checksums.txt" ]; then
+            echo "::error::checksums.txt is empty - no artifacts found in dist/"
+            exit 1
+          fi
           mv "$RUNNER_TEMP/checksums.txt" checksums.txt
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release.yml around lines 149 - 153, The "Generate
checksums" step in the release workflow does not validate that the checksums.txt
file was created successfully before moving it. If sha256sum fails or produces
no output due to missing files in the dist directory, an empty checksums.txt
would be moved to the release artifacts. Add validation after the sha256sum
command to check that checksums.txt exists and is not empty, and exit with an
error if the validation fails, ensuring the workflow stops before proceeding
with the release if checksum generation is incomplete.
tests/test_find_moments.py (1)

51-70: ⚡ Quick win

Add a malformed-output regression test for find-moment parsing.

Current tests don’t cover malformed scores/timestamps from AI output. Add a case where one bad clip appears alongside one valid clip, and assert the valid one is still returned.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_find_moments.py` around lines 51 - 70, Add a new test method to
the test file that verifies find_moments_from_text properly handles malformed AI
output. The test should call find_moments_from_text with SEGMENTS data that
includes both a malformed clip (with invalid scores or timestamps) and a valid
clip, then assert that only the valid clip is returned in the results. This
regression test will ensure that the find_moments_from_text function can
gracefully filter out bad clips while still returning valid ones from mixed
output.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/cli.py`:
- Around line 296-300: The help text for the output option is out of sync with
the resolver logic implemented in the base_output_dir assignment. The resolver
now checks configured_output_dir, then the PODCLI_OUTPUT environment variable,
then falls back to a clips folder relative to the video path, but the help text
still claims the default is "./clips". Update all help text descriptions for the
output parameter to accurately document this three-level fallback behavior:
first configured option, then PODCLI_OUTPUT environment variable, then the clips
subfolder of the video's directory. This ensures users understand where the
output will actually be written after running with the native launcher which
sets PODCLI_OUTPUT.

In `@backend/config_bundle.py`:
- Around line 408-412: The `_has_managed_content` function is incorrectly
identifying empty scaffold directories (presets/, knowledge/, assets/) as
existing profile data, causing `migrate_legacy_home` to skip legacy import when
it shouldn't. Modify `_has_managed_content` to return true only when the managed
directories contain actual files or data, not just when the directories
themselves exist. This check is used at the condition starting around line
408-412 where it returns early with skipped_existing set to true, and also
applies to similar logic around lines 496-503.

In `@backend/services/claude_suggest.py`:
- Around line 485-501: Add defensive error handling around the numeric
conversions in the clip processing logic to prevent one malformed clip from
failing the entire batch. Wrap the sum(scores.values()) call and the float()
conversions for start, end, start_second, and end_second fields in try-except
blocks, skipping individual clips that have invalid data rather than letting the
exception propagate to the broad exception handler at line 525-527. This ensures
that one clip with non-numeric output does not cause the entire find-moment
result to return an empty list, preserving all valid matches.

In `@backend/services/transcript_packer.py`:
- Around line 85-94: The engine cache isolation guard that prevents whisper.cpp
from adopting legacy md5 caches exists in load_cached_transcript_for_video
(checking _engine_cache_suffix() == ""), but the same protection is missing in
the load_cache function. Apply the same engine cache suffix guard in load_cache
(around lines 357-360) to ensure whisper.cpp runs cannot fall back to legacy
transcripts unconditionally. Add a check for _engine_cache_suffix() == "" before
allowing the legacy cache fallback in load_cache, matching the logic already
present in load_cached_transcript_for_video.

In `@backend/services/transcription_whispercpp.py`:
- Around line 130-134: The line where e is set to s + 0.05 applies a fixed
overhang unconditionally, and when this occurs for multiple consecutive clamped
words, the cumulative effect causes prev_end to drift progressively further past
the intended voiced bounds. Fix this by bounding the overhang value so that e
does not extend beyond a reasonable maximum (such as the next token boundary or
a configured upper bound). Use a min() function or similar constraint to ensure
the 0.05 overhang serves its purpose without accumulating excessive drift across
repeated clamped tokens.

In `@backend/services/transcription.py`:
- Around line 107-110: The whisper.cpp transcription path at the
_transcribe_with_whispercpp call returns immediately without flowing to the
shared diarization block at Line 212+, which causes the enable_diarization and
num_speakers parameters to be ignored for that path. Instead of returning
directly after _transcribe_with_whispercpp execution at the locations marked
(Line 109 and Line 127), refactor the code so that whisper.cpp transcription
results flow through the same diarization and speaker detection logic that other
transcription paths use, ensuring that the shared diarization block processes
all transcription outputs consistently.

In `@src/ui/client/ClipDetail.tsx`:
- Around line 86-87: The player render guard on line 179 references an undefined
variable `file` which breaks compilation. Replace the `file` reference with an
appropriate variable that is actually defined in the ClipDetail component. Based
on the context of this component which works with clip data, thumbnail images,
and preview URLs (as seen in variables like `clip`, `thumbImage`, and
`thumbTimestamp`), determine the correct variable name that should be used in
the conditional check and substitute it for the undefined `file` reference.

In `@src/ui/client/EpisodeWorkspace.jsx`:
- Around line 1513-1534: The condition on line 1513 in the EpisodeWorkspace.jsx
component allows the "Find a specific moment" feature to display when either
transcript or transcriptText exists, but the /api/find-moment endpoint requires
proper parsed transcript segments. Change the conditional check from (transcript
|| transcriptText) to only check for transcript availability, ensuring the
feature only displays when the full parsed transcript segments are available and
not when only raw transcript text exists.

In `@src/ui/web-server.ts`:
- Around line 1036-1042: The route handlers for "/api/clips/:id/preview" and
"/api/clips/:id/download" suppress promise rejections from serveClipById by
using the void operator, which causes unhandled errors instead of proper HTTP
responses. Replace the void keyword with proper error handling by adding a
.catch() handler to the serveClipById promise at both route definitions, or
alternatively make the route callbacks async and wrap the serveClipById calls in
try-catch blocks to ensure errors are caught and handled before they become
unhandled rejections.

In `@tests/test_config_bundle.py`:
- Around line 139-146: The `_enter_migration` method saves and restores
environment variables for PODCLI_CWD and paths, but does not handle
PODCLI_ENV_FILE, which can cause test isolation issues if the variable is set
externally. Save the current PODCLI_ENV_FILE environment variable at the start
of _enter_migration (around line 139-146) similar to how PODCLI_CWD is saved,
then unset or control PODCLI_ENV_FILE to prevent it from interfering with the
test. In the cleanup code (around lines 148-157), restore the saved
PODCLI_ENV_FILE value. Also verify the code at lines 281-298 for any additional
changes needed to maintain test isolation.

---

Outside diff comments:
In @.github/workflows/release.yml:
- Around line 79-86: Add validation after the find command that locates the
whisper-cli binary to ensure the SRC variable is not empty before attempting to
copy the file. Insert a conditional check that explicitly verifies SRC contains
a valid file path, and if it is empty, output a clear error message and exit
with a non-zero status code. This prevents the cp command from failing with a
cryptic "cannot stat" error when the binary is not found, instead providing
immediate and explicit feedback about the build artifact missing.

In `@backend/main.py`:
- Around line 502-519: The TASK_HANDLERS dictionary in the backend is missing
several required MCP tool names that are part of the specified MCP suite. Add
the following missing tool names to the TASK_HANDLERS dictionary:
transcribe_podcast, batch_create_clips, knowledge_base, manage_assets, and
clip_history. Each key should map to its corresponding handler function (for
example, transcribe_podcast should map to handle_transcribe, batch_create_clips
should map to handle_batch_clips, and the other three tools need their
respective handler functions). Ensure all handler functions are properly defined
before being referenced in the TASK_HANDLERS mapping to avoid runtime errors
during tool dispatch.

In `@cli/internal/engine/engine.go`:
- Around line 65-79: The BackendRoot function currently prioritizes the working
directory tree walk over the provisioned runtime backend, which creates a
security issue where local backends can override the embedded one. Reorder the
backend discovery logic to check paths.RuntimeDir()/backend first by testing for
the existence of cli.py there, and only fall back to the os.Getwd() directory
tree walking if the runtime backend is not found. This ensures the provisioned
backend takes precedence while preserving the development fallback capability.

---

Nitpick comments:
In @.github/workflows/release.yml:
- Around line 23-28: The build matrix in the include section is missing a
windows/arm64 configuration entry, while darwin and linux both include arm64
variants. Add a new entry following the same pattern for windows/arm64 with
goos: windows and goarch: arm64 to the include list, or if this exclusion is
intentional, add a comment explaining the rationale for omitting Windows ARM64
support.
- Around line 127-133: The Build remotion bundle step lacks error checking for
the scripts/build-remotion.sh script execution. If the script fails, the
workflow will continue and package an incomplete bundle instead of failing. Add
error checking to the bash script block (either by using set -e at the start of
the run command or by appending || exit 1 after the sh scripts/build-remotion.sh
command) to ensure the workflow fails immediately if the build script encounters
any errors.
- Around line 40-46: The `go generate ./...` command in the release workflow's
run block lacks error checking and could fail silently if backend or PodStack
sync scripts encounter errors. Add `set -e` at the very beginning of the run
block (as the first line after the `run: |` statement) to ensure the workflow
exits immediately if any command fails, preventing subsequent commands like `go
build` from executing with potentially incomplete or corrupt generated files.
- Around line 99-103: The studio build workflow step does not include error
checking, which means if scripts/build-studio.sh fails, the workflow will
continue executing and create a tar archive of an incomplete or empty directory.
Add error handling by including set -e at the beginning of the run block to
ensure the workflow exits immediately if the build-studio.sh script fails,
preventing incomplete bundles from being created and used downstream.
- Around line 149-153: The "Generate checksums" step in the release workflow
does not validate that the checksums.txt file was created successfully before
moving it. If sha256sum fails or produces no output due to missing files in the
dist directory, an empty checksums.txt would be moved to the release artifacts.
Add validation after the sha256sum command to check that checksums.txt exists
and is not empty, and exit with an error if the validation fails, ensuring the
workflow stops before proceeding with the release if checksum generation is
incomplete.

In `@tests/test_find_moments.py`:
- Around line 51-70: Add a new test method to the test file that verifies
find_moments_from_text properly handles malformed AI output. The test should
call find_moments_from_text with SEGMENTS data that includes both a malformed
clip (with invalid scores or timestamps) and a valid clip, then assert that only
the valid clip is returned in the results. This regression test will ensure that
the find_moments_from_text function can gracefully filter out bad clips while
still returning valid ones from mixed output.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90bd9d52-971a-45ce-bbe1-f7ee46e0672c

📥 Commits

Reviewing files that changed from the base of the PR and between e6aa165 and 0111fd4.

📒 Files selected for processing (33)
  • .github/workflows/release.yml
  • .gitignore
  • backend/cli.py
  • backend/config/paths.py
  • backend/config_bundle.py
  • backend/main.py
  • backend/requirements-runtime.txt
  • backend/requirements.txt
  • backend/services/claude_suggest.py
  • backend/services/clip_generator.py
  • backend/services/transcript_packer.py
  • backend/services/transcription.py
  • backend/services/transcription_whispercpp.py
  • cli/internal/engine/engine.go
  • cli/internal/provision/provision.go
  • cli/internal/provision/redirect_test.go
  • cli/internal/provision/security_test.go
  • cli/internal/update/update.go
  • cli/main.go
  • npm/scripts/install.js
  • podcli
  • src/config/paths.ts
  • src/models/index.ts
  • src/ui/client/ClipDetail.tsx
  • src/ui/client/EpisodeWorkspace.jsx
  • src/ui/client/StudioHome.tsx
  • src/ui/web-server.ts
  • tests/parity/golden/branded.ass.expected
  • tests/parity/test_caption_parity.py
  • tests/test_config_bundle.py
  • tests/test_find_moments.py
  • tests/test_transcription_engine.py
  • tests/test_whispercpp_snap.py
✅ Files skipped from review due to trivial changes (3)
  • .gitignore
  • backend/requirements-runtime.txt
  • tests/parity/golden/branded.ass.expected
🚧 Files skipped from review as they are similar to previous changes (7)
  • src/config/paths.ts
  • tests/parity/test_caption_parity.py
  • npm/scripts/install.js
  • tests/test_whispercpp_snap.py
  • cli/internal/update/update.go
  • cli/main.go
  • cli/internal/provision/provision.go

Comment thread backend/cli.py
Comment thread backend/config_bundle.py
Comment thread backend/services/claude_suggest.py
Comment thread backend/services/transcript_packer.py
Comment thread backend/services/transcription_whispercpp.py
Comment thread backend/services/transcription.py
Comment thread src/ui/client/ClipDetail.tsx
Comment thread src/ui/client/EpisodeWorkspace.jsx
Comment thread src/ui/web-server.ts
Comment thread tests/test_config_bundle.py
The thumbnail screenshot opened a Remotion browser via openBrowser without first
ensuring one exists (renderMedia, used for captions, ensures it implicitly), so
the first render just timed out connecting to a browser that was never
provisioned. Call ensureBrowser() before openBrowser(), and run the screenshot
from the runtime root — as caption rendering already does — so the Chrome
Headless Shell caches under a stable .remotion/ and is reused across working
directories instead of re-downloading into (and polluting) each one.
Default allow_ass_fallback to true so a remotion caption-render failure (browser
unavailable, offline, arch mismatch) falls back to ffmpeg-burned ASS captions
instead of exporting zero clips. Healthy installs still use remotion first, so
this only rescues the cases that would otherwise hard-fail.
A new machine can install with only curl or PowerShell — no Go, Node, Python, or
FFmpeg required. install.sh / install.ps1 download the prebuilt, statically
linked binary (checksum-verified) and put it on PATH; the binary provisions its
own hermetic runtimes on first launch. README install section rewritten for the
native flow.
- Knowledge upload wrote the raw multipart originalname into the knowledge dir,
  so a name like ../../x.md escaped it (multer doesn't sanitize separators).
  Use basename for both the stored name and the extension filter.
- extractTarXz shelled out to system tar with no traversal guard (the unverified
  johnvansickle ffmpeg path). List entries first and reject absolute or ..
  paths before extracting.
- Verify the Node tarball against nodejs.org SHASUMS256.txt and the CPython
  build against python-build-standalone's SHA256SUMS after download. Fail closed
  on mismatch; fail open if the manifest is unreachable so a transient network
  issue can't block provisioning.
- README: rewrite usage for the native binary (podcli, not ./podcli), Web UI via
  the menu, and MCP via `podcli mcp install`.
- Add RELEASE.md (tag-driven release + smoke test); align root package.json to 2.0.0.
- README: Web UI via the menu (not setup.sh), project tree shows cli/ launcher +
  installers, setup.sh labeled as dev setup.
- CONTRIBUTING: document the Go launcher build + go test/gofmt in the PR checklist
  alongside the Python/TS dev setup.
- AGENTS.podstack: drop the nonexistent `setup.sh --host` flag.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
RELEASE.md (2)

42-42: 💤 Low value

Complete the install.sh URL placeholder.

Line 42 uses .../install.sh as a generic placeholder. For clarity in the smoke test procedure, consider replacing it with the full canonical URL or note that it matches the URL in line 9.

📝 Proposed clarity improvement
- curl -fsSL .../install.sh | sh      # or npm i -g podcli
+ curl -fsSL https://raw.githubusercontent.com/nmbrthirteen/podcli/main/install.sh | sh      # or npm i -g podcli
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@RELEASE.md` at line 42, The install.sh URL in line 42 is incomplete with a
generic `.../install.sh` placeholder that should be replaced with the full
canonical URL for clarity in the smoke test procedure. Replace the vague
placeholder with the actual complete URL, ensuring it matches the URL provided
in line 9 of the same file so users can execute the command without ambiguity.

3-3: 💤 Low value

Capitalize "GitHub" per official spelling.

LanguageTool flagged GitHub capitalization on lines 3 and 49. "GitHub" is the official name of the platform.

🔤 Proposed capitalization fixes
- Distribution is fully automated by `.github/workflows/release.yml`: pushing a
+ Distribution is fully automated by `.github/workflows/release.yml`: pushing a

Line 3: .github/workflows/release.yml (no change needed for this line—the flag refers to the bare "github" in context)

Line 49: Change "github.com" context reference:

- Needs outbound HTTPS to github.com, huggingface.co, nodejs.org, and the ffmpeg
+ Needs outbound HTTPS to GitHub.com, huggingface.co, nodejs.org, and the ffmpeg

Actually, for hostnames like github.com, the lowercase is technically correct (it's a domain name). The LanguageTool flag may be a false positive for domain names. Consider this a low-priority cosmetic note.

Also applies to: 49-49

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@RELEASE.md` at line 3, The review comment identifies GitHub capitalization
flags at two locations in RELEASE.md (lines 3 and 49) but clarifies that these
are false positives: at line 3, the file path `.github/workflows/release.yml` is
correctly lowercase as it is a directory/file path, not a proper noun reference;
at line 49, the domain name `github.com` is correctly lowercase as hostnames
should be lowercase regardless of the platform's official branding. No changes
are needed at either flagged location. However, if there are any bare,
non-technical references to "github" elsewhere in the RELEASE.md file (outside
of domain names and file paths), those should be capitalized to "GitHub" per
official spelling.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cli/main.go`:
- Around line 65-73: The wantsRuntime function only checks for a limited set of
commands, causing other backend-routed commands like clips, thumbnails,
knowledge, presets, assets, youtube, cache, info, and config to skip
ensureRuntime() and fail during setup. Expand the switch statement in
wantsRuntime to include all backend-routed commands that require the managed
runtime. Additionally, remove the "auto" case from wantsRuntime since it is
already handled by podstack.Run before reaching this function; if auto needs the
managed runtime, add the provisioning gate before the podstack.Run call in the
dispatcher instead.
- Around line 78-90: The ensureRuntime() function calls setup(nil) without
capturing or checking its return value, so if the setup fails, runEngine()
continues executing with an incomplete runtime. Modify ensureRuntime() to
capture the exit code or error from setup(nil) and propagate it to the caller
(either by returning an error/exit code or by exiting directly), then update
runEngine() to check the result from ensureRuntime() and return early if setup
fails instead of continuing with model provisioning and backend startup.

In `@install.ps1`:
- Around line 25-39: The catch block is catching all exceptions including the
intentional checksum mismatch throw on line 32, which silently swallows the
error and allows the script to continue. Restructure the error handling so the
try block only wraps the Invoke-WebRequest call to fetch checksums.txt (the
network operation that should be caught), and move the checksum verification
logic and the mismatch throw statement outside the try/catch block so that
verification failures cause the script to fail instead of being silently caught.

In `@RELEASE.md`:
- Around line 32-36: The RELEASE.md documentation lists the remotion bundle
artifact name incorrectly as `remotion-<os>-<arch>.tar.gz` when the actual
artifact generated by the release workflow is
`remotion-bundle-<os>-<arch>.tar.gz`. Update the artifact name in the list to
include the "-bundle" suffix so it matches what the release workflow actually
produces, making the documentation consistent with the workflow outputs.

---

Nitpick comments:
In `@RELEASE.md`:
- Line 42: The install.sh URL in line 42 is incomplete with a generic
`.../install.sh` placeholder that should be replaced with the full canonical URL
for clarity in the smoke test procedure. Replace the vague placeholder with the
actual complete URL, ensuring it matches the URL provided in line 9 of the same
file so users can execute the command without ambiguity.
- Line 3: The review comment identifies GitHub capitalization flags at two
locations in RELEASE.md (lines 3 and 49) but clarifies that these are false
positives: at line 3, the file path `.github/workflows/release.yml` is correctly
lowercase as it is a directory/file path, not a proper noun reference; at line
49, the domain name `github.com` is correctly lowercase as hostnames should be
lowercase regardless of the platform's official branding. No changes are needed
at either flagged location. However, if there are any bare, non-technical
references to "github" elsewhere in the RELEASE.md file (outside of domain names
and file paths), those should be capitalized to "GitHub" per official spelling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 96fbb05d-3c7c-4fed-9dcb-0484619d0e9a

📥 Commits

Reviewing files that changed from the base of the PR and between 0111fd4 and 97d8fdc.

📒 Files selected for processing (15)
  • AGENTS.podstack.md
  • CONTRIBUTING.md
  • README.md
  • RELEASE.md
  • backend/config_bundle.py
  • backend/presets.py
  • backend/scripts/remotion_screenshot.cjs
  • backend/services/thumbnail_html.py
  • cli/internal/provision/provision.go
  • cli/internal/provision/studio.go
  • cli/main.go
  • install.ps1
  • install.sh
  • package.json
  • src/ui/web-server.ts
✅ Files skipped from review due to trivial changes (3)
  • package.json
  • AGENTS.podstack.md
  • README.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/ui/web-server.ts
  • backend/config_bundle.py
  • cli/internal/provision/studio.go
  • cli/internal/provision/provision.go

Comment thread cli/main.go
Comment thread cli/main.go
Comment thread install.ps1
Comment thread RELEASE.md
@nmbrthirteen nmbrthirteen merged commit 33b7af2 into main Jun 16, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant