Native binary CLI (codex-style): hermetic runtimes, whisper.cpp, bundled studio + MCP#16
Conversation
…ration Two-layer correctness model around the existing transcript JSON contract: - Layer 1 (committed, CI-ready): synthetic neutral transcript + caption goldens for all 4 styles, word-text normalization + zero-duration-floor assertions. - Layer 2 (local/gitignored): capture_baseline.py freezes current openai-whisper output; compare.py gates a candidate engine on WER + word-timestamp drift. Also adds plans/native-cli.md (full migration plan).
…ract
Shells to whisper-cli, merges subword tokens into words on the leading-space
boundary, and applies the same word-text normalization (strip) the caption
pipeline and apply_corrections() expect. Emits the same {words, segments,
duration, language} dict as the openai-whisper path, so it is a drop-in engine.
Validated against an openai-whisper baseline via tests/parity: matching text
(WER 0.059) and segmentation; benign bulk word-timing drift (~0.18s median).
Known gap: base model can stretch trailing words across silence; --vad is the fix.
cli/ Go module that replaces the bash podcli launcher and install.cmd: - paths: per-OS global managed dir (runtime/models/bin/config) - engine: resolve interpreter + backend/cli.py (hermetic > dev venv > system), exec with UTF-8 + ffmpeg env, propagate exit code - main: version/doctor commands, update/setup stubs, pass-through routing Drives the existing Python engine end-to-end (e.g. podcli clips list). Hermetic provisioning and self-update land in later phases.
transcribe_file() now dispatches on PODCLI_ENGINE: the default whisper-py path is untouched; whispercpp routes to the whisper.cpp adapter, resolving whisper-cli and the ggml model from the managed dir (env-overridable). Adds a first-class --engine flag to the process command and optional VAD support to the adapter (off by default). This is the dual-engine escape hatch from the migration plan: users can pick the native PyTorch-free path or fall back instantly. Verified: --engine routes through the Go launcher; engine dispatch returns the contract dict; Layer-1 parity (6) and unit tests (27) green.
podcli setup downloads pinned ggml models into the managed dir with resumable (HTTP Range) transfers, checksum verification, and atomic writes; first use of --engine whispercpp auto-provisions the base model. doctor reports model state. The Python whispercpp path resolves the model from the same managed dir, so no env wiring is needed once provisioned. Also trims narrating comments from the launcher down to WHY-only.
proc.run() resolves bare ffmpeg/ffprobe invocations to PODCLI_FFMPEG / PODCLI_FFPROBE when set, so the whole backend uses a provisioned binary with one central change instead of touching ~15 call sites. encoder's fingerprint helper honors the same override. The Go launcher resolves ffprobe alongside ffmpeg from the runtime dir and exports both env vars; doctor reports ffprobe state. Verified: override is used when set, falls back to PATH otherwise, and leaves non-ffmpeg commands untouched (proc + encoder unit tests green).
podcli setup now fetches static ffmpeg builds per platform (evermeet on macOS, johnvansickle tar.xz on Linux, BtbN zip on Windows), extracting the binaries into the managed runtime dir. Archives are downloaded with the resumable transfer and unpacked by sniffing the format (zip via stdlib, tar.xz via tar). ffmpeg failure is non-fatal — the backend falls back to PATH. The launcher exports the provisioned ffmpeg/ffprobe so the backend uses them. Static sources aren't checksum-pinned yet (upstream "latest" URLs); they get our own pinned builds once podcli hosts releases. Verified on macOS: both binaries provision, run (8.1.1), and the backend's proc.run uses the hermetic ffprobe; setup is idempotent.
podcli setup now provisions python-build-standalone (resolved via the GitHub latest-release API so it tracks upstream), extracting the install_only tarball into the managed runtime dir, then pip-installs requirements-runtime.txt — the backend deps minus openai-whisper/torch. That is the ~2GB the native install saves; the shipped binary transcribes with whisper.cpp only. Because the hermetic interpreter has no openai-whisper, the launcher defaults process/studio to whisper.cpp when it resolves the hermetic Python (whisper-py stays available on dev/source installs). A guarded integration test covers asset resolution, tar.gz extraction (incl. the symlink-before-parent-dir ordering), and that the interpreter runs. Verified on macOS: Python 3.12.13 provisions and runs; cv2/numpy/Pillow and the rest of the slim set import cleanly.
The launcher now checks GitHub Releases for a newer version: a fast, non-fatal on-launch notice before engine commands, and `podcli update` for an explicit check. With no binary release published yet, update points the user at the npm/bun reinstall (the designed fallback); the in-place binary swap lands once releases publish per-platform assets. Adds a config package persisting the off-switch: `podcli config set update.auto off` (or PODCLI_NO_UPDATE=1) silences all checks. `config get/set` are owned by the launcher; other `config` subcommands (status/export/import/use) still pass through to the Python backend. Verified: off-switch round-trips to config.json, update degrades gracefully with no release, unknown keys error, and Python config passthrough works.
A thin Node package (name: podcli) whose postinstall fetches the platform's native binary into the managed dir and whose bin shim execs it, passing args and exit codes through. If the pre-fetch is blocked, the shim fetches lazily on first run, so a failed postinstall never bricks the install. PODCLI_BINARY_SRC copies a local binary instead of downloading (testing + the release build). This is the npm i -g podcli / bun add -g podcli front door; it pairs with the self-update fallback that points users back here. Verified locally: install, shim exec with arg + exit-code passthrough, and lazy first-run fetch all work against a local binary.
On a v* tag, cross-compiles the Go launcher for darwin/linux (amd64+arm64) and windows/amd64 with the version stamped via ldflags, publishes them as release assets under the names the npm wrapper and self-update expect, then publishes the npm package pinned to the tag. whisper.cpp per-platform builds are left as a marked TODO (they need native runners); until then transcription uses a PATH whisper-cli. Verified locally: the build command cross-compiles (e.g. linux/arm64) and stamps main.Version; the workflow YAML parses.
podcli update now downloads the release binary for this platform and swaps the managed binary (~/.podcli/bin/podcli) in place — the file both the npm shim and direct installs exec, so it updates regardless of install method. Unix uses an atomic rename; Windows moves the running .exe aside first. If the swap fails it falls back to the npm/bun reinstall message. Unit tests cover the swap and the version comparison.
The launcher resolves whisper-cli from the managed runtime dir and falls back to PATH, exporting PODCLI_WHISPER_CLI so the backend uses the hermetic binary. EnsureWhisperCpp fetches whisper-cli-<os>-<arch> from the latest release (PATH fallback until one is published); doctor reports its state. The release workflow gains a per-platform whisper.cpp matrix (static, Metal embedded on macOS) producing the matching assets. That matrix is unverified until the first tagged run. Verified locally: resolution prefers a runtime whisper-cli over PATH; YAML parses; Go build + tests green.
whisper.cpp can stretch the final phrase across trailing silence, desyncing captions. After transcription the adapter clamps word timings to the voiced span (RMS energy threshold) and re-flows them, so words stranded in silence snap back near the last voiced frame while words over speech are untouched. On by default; PODCLI_WHISPERCPP_NO_SNAP=1 disables it. Also fixes a leaked temp dir. Validated on synthetic true-silence audio (voiced word untouched; a word stranded at 1.55s pulled back to 1.15s). On a clip with an outro the trailing audio is genuinely voiced, so the snap correctly no-ops there.
First-run dep install ran pip with -q, so the multi-second download showed nothing and read as frozen. Stream pip's progress bars (drop -q, force --progress-bar=on) and unbuffer output so it surfaces on Windows pipes too.
Sub-megabyte models (the 864 KB Silero VAD) floored to "0 MB" and looked like a failed download.
An installed podcli had no backend off the source tree, so it only ran inside the repo. Embed backend/ via go:embed (synced into files/ at build time by `go generate`, gitignored) and extract it to runtime/backend on setup. The CLI now resolves its backend from the managed dir and runs from any working directory, like a real native binary.
The Go launcher showed its own help on a bare invocation and dropped the PodStack workflow commands the old bash wrapper forwarded to Claude/Codex. Bare `podcli` now opens the backend's branded interactive menu, and auto/generate-titles/… launch the AI agent with the matching slash command (embedded and installed into the project on demand), honoring --claude/--codex/--ai and PODCLI_AI with a Codex fallback.
The studio is a Node/TypeScript app not shipped with the native binary, so launching it ran npm against the managed runtime dir and crashed with ENOENT. Detect the missing package.json (and missing npm) and print how to run the studio from a source checkout instead.
Open Web UI ran npm against the managed runtime and crashed on a native install. Provision a hermetic Node and a prebuilt studio bundle (esbuild'd server + SPA, built by scripts/build-studio.sh and shipped as a release asset), then launch them from the menu with rendering delegated to the hermetic Python + ffmpeg. The studio resolves its backend via a new PODCLI_BACKEND override in paths.ts; doctor and setup report node/studio; CI builds and uploads studio-bundle.tar.gz. Falls back to system Node, then to the npm dev flow from a source checkout.
With the backend relocated to the global runtime dir, the Python derived .podcli/, data/, and .env from the backend's location, so a native install stranded each project's episodes, presets, and secrets in a global runtime folder. The launcher now resolves the project root from the working dir (nearest ancestor with .podcli/.podcli-home, else cwd) and passes PODCLI_HOME/PODCLI_DATA/PODCLI_ENV_FILE; explicit env still wins. Existing projects work unchanged — just run podcli in the project dir.
Four parity fixes vs the source install: --engine whisper-py now fails with a clear message instead of a raw ModuleNotFoundError; info/menu report speakers as unavailable when pyannote/torch aren't present rather than 'configured' on HF_TOKEN alone; the launcher filters the objc/FP16/warnings stderr noise the old bash wrapper stripped; and the unused 10MB res10 caffemodel is dropped from the embedded backend (YuNet onnx is the live one).
The mcp__podcli__* tools (the entire Claude/Codex integration the PodStack slash commands depend on) lived only in the TypeScript dist/index.js and were absent from the native install. Bundle the MCP stdio server alongside the studio (esbuild -> mcp-server.mjs), add a `podcli mcp` command that execs it under hermetic Node with the backend/python/ffmpeg + project env wired, and a `podcli mcp install` (run automatically by setup) that registers it with Claude Code via `claude mcp add podcli -- <self> mcp`. doctor reports it.
Captions and thumbnails render through Remotion (a headless-Chromium React pipeline) that the native install didn't ship, so both failed by default off a source checkout. Provision a per-platform Remotion bundle (remotion/ + a production node_modules with the native @rspack/compositor binaries, built by scripts/build-remotion.sh and uploaded per-platform in CI) into the runtime dir, point clip_generator + thumbnail_html at the hermetic Node, cache the project-independent composition bundle once in the managed dir, and pre-warm the bundle + Chrome at setup. Verified end-to-end: a real captioned clip and a thumbnail both render under the wired path. Also fixes re-audit regressions: whisper-py engine fails with a clean message (spinner stopped, no traceback); ProjectDir only pins project-local data when a .podcli/.podcli-home marker exists, so read-only commands don't scatter .podcli into arbitrary dirs; stderr noise filter matches warnings.warn lines precisely instead of swallowing tracebacks; setup skips MCP re-registration when already pointed at this binary.
- Remotion bundle build failed on a clean install: build-remotion.sh omitted @fontsource/dm-sans, which the compositions import — so the prebundle could not resolve the font and captions/bookends broke. Added the dep. - podcli studio was unreachable: clip_studio.py lived at repo-root scripts/ and was never shipped. Moved it into backend/ (so it embeds) and resolve it from the backend dir; it also now honors PODCLI_DATA for output. - render-bookend.mjs hardcoded its bundle cache into the runtime dir; it now honors PODCLI_CACHE_DIR and shares the prewarmed bundle with render.mjs. - New-project data no longer lands in the global runtime dir (which setup/ self-update overwrite): ProjectDir falls back to the working dir, keeping data project-local. doctor now prints the resolved project dir.
📝 WalkthroughWalkthroughIntroduces a native Go launcher, hermetic runtime provisioning, whisper.cpp transcription, moment search wiring, npm distribution, release automation, and parity validation for ChangesNative CLI Initiative
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 4
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (18)
.github/workflows/release.yml-14-16 (1)
14-16:⚠️ Potential issue | 🟠 Major | ⚡ Quick winScope
contents: writeto only the job that needs it.Line 15 grants write on repository contents workflow-wide, giving unnecessary token power to all jobs. Move write permissions to the release/publish job(s) and keep other jobs read-only.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/release.yml around lines 14 - 16, The permissions block at the workflow level grants write access to repository contents for all jobs in the workflow, which violates the principle of least privilege. Remove the `contents: write` permission from the workflow-level permissions block (lines 14-16) and move it to only the specific job(s) that actually need it (the release or publish job). Keep other jobs with read-only permissions or remove the permissions block from those jobs entirely to ensure minimal token scope.Source: Linters/SAST tools
.github/workflows/release.yml-58-58 (1)
58-58:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReplace unsupported runner label
macos-13.Line 58 and Line 116 use
macos-13, which is not recognized by current runner labels. Those matrix jobs will fail to schedule.Also applies to: 116-116
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/release.yml at line 58, The `macos-13` runner label used in the workflow matrix is no longer supported and will cause job scheduling failures. In `.github/workflows/release.yml` at line 58 and line 116, replace the `macos-13` runner label with a currently supported macOS runner label such as `macos-latest` or another actively maintained version. This will ensure the matrix jobs can schedule successfully on the GitHub Actions runners.Source: Linters/SAST tools
npm/scripts/install.js-43-79 (1)
43-79:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftAdd integrity verification for downloaded executables.
Lines 77-79 download a release binary and persist it without checksum/signature validation. A compromised or substituted asset would be executed on first run.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@npm/scripts/install.js` around lines 43 - 79, The download function downloads executable files without any integrity verification, making the system vulnerable to compromised binaries. Add checksum or signature validation to verify the downloaded executable before it is renamed and persisted. In the download function, after the file write completes but before the fs.renameSync call that persists the temporary file to the final destination, calculate or verify a checksum against a known good value (obtained from a separate checksum file or manifest). Similarly, in the ensure function, ensure that the downloaded binary from the GitHub release URL is validated. Only proceed with renaming the temporary file to the final destination after successful integrity verification, and reject with an error if the checksum does not match.npm/scripts/install.js-27-28 (1)
27-28:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix Windows home path composition in
defaultHome().Line 27 returns
LOCALAPPDATAdirectly, so the managed home becomes%LOCALAPPDATA%instead of%LOCALAPPDATA%/podcli. This diverges from the macOS/Linux layout and places binaries under a non-namespaced location.Suggested patch
- if (process.platform === 'win32') return process.env.LOCALAPPDATA || path.join(h, 'AppData', 'Local', 'podcli'); + if (process.platform === 'win32') { + const base = process.env.LOCALAPPDATA || path.join(h, 'AppData', 'Local'); + return path.join(base, 'podcli'); + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@npm/scripts/install.js` around lines 27 - 28, The Windows platform check in the `defaultHome()` function returns `process.env.LOCALAPPDATA` directly without appending the 'podcli' subdirectory, whereas the fallback case joins the path with 'AppData', 'Local', 'podcli', and the non-Windows paths always append 'podcli'. Modify the Windows platform check to use `path.join()` to append 'podcli' to `process.env.LOCALAPPDATA`, making it consistent with the fallback behavior and other platform implementations..github/workflows/release.yml-29-30 (1)
29-30:⚠️ Potential issue | 🟠 MajorPin all actions by commit SHA and add
persist-credentials: falseto checkout steps.All actions in this workflow use tag-pinning (
@v*) instead of SHA-pinning, which allows tags to be reassigned and introduces supply-chain risk. Additionally, allactions/checkoutsteps lackpersist-credentials: false, leaving the GITHUB_TOKEN in the environment for subsequent steps and increasing credential leakage risk.Update to use commit SHAs (e.g.,
actions/checkout@<full-sha>) and addpersist-credentials: falseto each checkout action:Example fix for checkout steps
- uses: actions/checkout@<commit-sha> with: persist-credentials: falseApplies to lines: 29, 30, 47, 67, 86, 94, 95, 103, 122, 123, 133, 142, 146, 155, 156
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/release.yml around lines 29 - 30, Replace all action version pins in the workflow from tag-based (`@v`*) to full commit SHA pins throughout the release.yml file to eliminate supply-chain risks from tag reassignment. Additionally, add a with clause containing persist-credentials: false to every actions/checkout step to prevent credential leakage. This applies to all uses statements referencing actions/checkout and actions/setup-go, as well as any other GitHub actions used in the workflow. For each affected action, convert the `@v`* reference to the corresponding full commit SHA and ensure checkout steps include the persist-credentials configuration.Source: Linters/SAST tools
backend/services/transcription_whispercpp.py-84-86 (1)
84-86:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftAvoid materializing every RMS frame index.
For long podcast WAVs,
idx = np.arange(nf)[:, None] * hop + np.arange(frame)[None, :]creates annf × framematrix before indexing samples; a one-hour 16 kHz file can allocate gigabytes and crash during timing snap. Compute RMS in chunks or with a streaming/cumulative-sum approach instead.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/services/transcription_whispercpp.py` around lines 84 - 86, The idx matrix creation in the RMS computation block materializes the full nf × frame index matrix, which causes excessive memory allocation for long audio files. Instead of creating the entire idx matrix upfront with np.arange(nf)[:, None] * hop + np.arange(frame)[None, :], refactor the RMS computation to process frames in chunks or use a streaming cumulative-sum approach that computes rms values iteratively without allocating the full index matrix. This will prevent gigabyte-scale memory spikes when processing long podcast WAVs.backend/services/transcription.py-29-56 (1)
29-56:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftKeep diarization on the whisper.cpp path.
Routing to whisper.cpp returns before the existing speaker-detection and face-analysis block, so native transcriptions always return
speaker: Noneand nospeaker_segmentseven whenenable_diarization=True. Factor the diarization/face enrichment into a shared post-processing helper and call it for both engines.As per coding guidelines,
backend/**/*.py: “Implement speaker detection as part of the transcribe_podcast tool using Whisper transcription”.Also applies to: 84-86
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/services/transcription.py` around lines 29 - 56, The _transcribe_with_whispercpp function returns transcription results without applying speaker detection and face analysis enrichment, causing speaker segments to be missing even when diarization is enabled. Extract the existing speaker-detection and face-analysis post-processing logic (currently applied to other transcription paths) into a separate shared helper function, then call this helper at the end of _transcribe_with_whispercpp before returning the result, and also ensure it is called for the other transcription path at lines 84-86 to maintain consistent behavior across all transcription engines.Source: Coding guidelines
backend/services/transcription_whispercpp.py-20-25 (1)
20-25:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRun external tools through the project process wrapper.
Both
ffmpegandwhisper-cliare invoked with unboundedsubprocess.run(), and the whisper call discards stderr entirely. A stalled media probe/transcode or failed whisper invocation can hang the backend or surface a uselessCalledProcessError; useutils.proc.run()here so the adapter inherits mandatory timeouts and captured failure details.Proposed direction
+from utils.proc import run as proc_run + def _extract_wav(media_path: str, wav_path: str, ffmpeg: str = "ffmpeg") -> None: - subprocess.run( + proc_run( [ffmpeg, "-y", "-loglevel", "error", "-i", media_path, "-ar", "16000", "-ac", "1", wav_path], - check=True, + timeout=600, ) @@ - subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL) + proc_run(cmd, timeout=3600, cwd=tmpdir)Also applies to: 158-169
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/services/transcription_whispercpp.py` around lines 20 - 25, The `_extract_wav` function uses unbounded `subprocess.run()` which lacks mandatory timeouts and proper error handling, making the backend vulnerable to stalled processes and losing useful error details. Replace the `subprocess.run()` call in `_extract_wav` with `utils.proc.run()` to inherit the project's timeout and failure capture mechanisms. The same fix must also be applied to the whisper-cli invocation that discards stderr (mentioned in the comment as also applying to that location), ensuring both external tool invocations inherit the proper process wrapper behavior.backend/services/transcription_whispercpp.py-53-58 (1)
53-58:⚠️ Potential issue | 🟠 Major | ⚡ Quick winNormalize
▁markers before appending token text.The code treats
▁as a word boundary but appends it unchanged, so SentencePiece-style output can produce caption words like▁hello. Convert the marker to whitespace before accumulation sostrip()removes it.Proposed fix
- starts_word = raw.startswith(" ") or raw.startswith("▁") + starts_word = raw.startswith(" ") or raw.startswith("▁") + piece = raw.replace("▁", " ") if starts_word and cur_text: flush() if cur_start is None and t0 is not None: cur_start = t0 - cur_text += raw + cur_text += piece🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/services/transcription_whispercpp.py` around lines 53 - 58, The code in the token accumulation block treats the SentencePiece marker `▁` as a word boundary indicator but appends the raw token unchanged to cur_text, causing caption words to retain the marker (e.g., `▁hello`). Before appending the raw token to cur_text, normalize the `▁` character to a space so that subsequent strip() operations can properly remove it. This should be done at the `cur_text += raw` line by replacing any `▁` characters with spaces in the raw token before concatenation.backend/services/transcription_whispercpp.py-124-130 (1)
124-130:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve positive word durations after upper-bound clamping.
When a stranded trailing word clamps to
hi, bothsandecan becomehi; the fallbacke = min(s + 0.05, hi)still leavesend == start. That can produce zero-duration caption words. Shiftsback when there is no room to extende.Proposed guard
if s < prev_end: s = prev_end if e <= s: e = min(s + 0.05, hi) + if e <= s: + s = max(lo, min(prev_end, hi - 0.05)) + e = hi prev_end = e🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/services/transcription_whispercpp.py` around lines 124 - 130, The code currently ensures positive word duration with the fallback `e = min(s + 0.05, hi)` when `e <= s`, but this still produces zero-duration words when `s` is already clamped to the upper bound `hi`. After the existing `if e <= s:` block (which sets `e = min(s + 0.05, hi)`), add an additional guard that checks if `e <= s` is still true after the fallback. If so, shift `s` backward by 0.05 (e.g., `s = max(e - 0.05, lo)`) to ensure a positive duration exists while respecting the lower bound `lo`.cli/internal/engine/engine.go-65-80 (1)
65-80:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrefer the provisioned backend before walking the working tree.
BackendRoot()currently executes an ancestorbackend/cli.pybefore the managed runtime backend. For an installed binary, runningpodcliinside an untrusted folder containingbackend/cli.pysilently executes project-local Python as podcli. KeepPODCLI_BACKENDas the explicit dev override, then prefer the provisioned runtime backend, and only fall back to a source checkout when no runtime backend exists.Proposed lookup order
func BackendRoot() (string, bool) { if b := os.Getenv("PODCLI_BACKEND"); b != "" && exists(filepath.Join(b, "cli.py")) { return b, true } + cand := filepath.Join(paths.RuntimeDir(), "backend") + if exists(filepath.Join(cand, "cli.py")) { + return cand, true + } if dir, err := os.Getwd(); err == nil { for { cand := filepath.Join(dir, "backend") if exists(filepath.Join(cand, "cli.py")) { return cand, true @@ dir = parent } } - cand := filepath.Join(paths.RuntimeDir(), "backend") - if exists(filepath.Join(cand, "cli.py")) { - return cand, true - } return "", false }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/internal/engine/engine.go` around lines 65 - 80, The BackendRoot() function currently searches for the backend in the wrong order, creating a security vulnerability where a project-local backend/cli.py in an ancestor directory takes precedence over the managed runtime backend. Reorder the function to check the provisioned runtime backend at paths.RuntimeDir()/backend first (before the directory tree walk), so that the managed runtime backend is preferred over any ancestor backend/cli.py. This way, only if the runtime backend does not exist should the function fall back to walking the directory tree for a source checkout.backend/utils/proc.py-21-27 (1)
21-27:⚠️ Potential issue | 🟠 Major | ⚡ Quick winResolve relative override paths before existence check.
If the environment variable contains a relative path (e.g.,
PODCLI_FFMPEG=./bin/ffmpeg),os.path.exists(override)will check relative to the current working directory, which may differ from the directory where the variable was set. This could cause the hermetic binary to be silently skipped in favor of the system fallback.🔧 Proposed fix to resolve relative paths
def _resolve_tool(cmd: Sequence[str]) -> list[str]: if not cmd: return list(cmd) override = os.environ.get(_TOOL_ENV.get(cmd[0], "")) - if override and os.path.exists(override): + if override: + resolved = os.path.abspath(override) + if os.path.exists(resolved): + return [resolved, *cmd[1:]] - return [override, *cmd[1:]] return list(cmd)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/utils/proc.py` around lines 21 - 27, The `_resolve_tool` function checks if an override path exists without first resolving relative paths to absolute paths. When an environment variable contains a relative path like `./bin/ffmpeg`, the `os.path.exists(override)` check evaluates relative to the current working directory, which may differ from where the variable was set. To fix this, convert the override path to an absolute path using `os.path.abspath()` before calling `os.path.exists()`. This ensures that relative paths are consistently resolved and the hermetic binary is not silently skipped.cli/internal/provision/provision.go-542-555 (1)
542-555:⚠️ Potential issue | 🟠 MajorHarden tar link extraction and fail on hard-link creation errors.
The hardlink source path from
filepath.Join(dest, h.Linkname)is not validated against the extraction root boundary, allowing a malicious archive to create hardlinks pointing outside the intended extraction directory. Additionally, errors fromos.Link()are silently ignored, potentially resulting in incomplete extraction without detection.Suggested fix
case tar.TypeLink: if err := os.MkdirAll(filepath.Dir(target), 0o755); err != nil { return err } os.Remove(target) - os.Link(filepath.Join(dest, h.Linkname), target) + linkSrc := filepath.Clean(filepath.Join(dest, h.Linkname)) + if !strings.HasPrefix(linkSrc+string(os.PathSeparator), root) { + return fmt.Errorf("unsafe hardlink target in archive: %s", h.Linkname) + } + if err := os.Link(linkSrc, target); err != nil { + return err + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/internal/provision/provision.go` around lines 542 - 555, In the tar.TypeLink case, the hardlink source path constructed with filepath.Join(dest, h.Linkname) must be validated to ensure it doesn't escape the extraction root boundary (similar to how the target path should be validated). Additionally, the os.Link() call currently ignores its error return value, allowing failures to go undetected. Fix this by: (1) validating the resolved source path stays within the extraction root before calling os.Link, and (2) capturing the error returned from os.Link() and returning it if non-nil, rather than silently ignoring failures.cli/internal/provision/provision.go-145-147 (1)
145-147:⚠️ Potential issue | 🟠 MajorAdd end-to-end HTTP timeout to prevent indefinite hangs during provisioning requests.
The HTTP client currently only sets
ResponseHeaderTimeout(60 seconds), which times out if response headers stall but not if body transmission slows or halts. This can freeze provisioning on slow/unreliable networks. Add a boundedTimeoutfield to limit total request time.Suggested fix
- client := &http.Client{Transport: &http.Transport{ResponseHeaderTimeout: 60 * time.Second}} + client := &http.Client{ + Timeout: 15 * time.Minute, + Transport: &http.Transport{ + ResponseHeaderTimeout: 60 * time.Second, + }, + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/internal/provision/provision.go` around lines 145 - 147, The HTTP client in the provision.go file currently only sets ResponseHeaderTimeout (60 seconds) which protects against stalled response headers but not against slow body transmission or other network hangs. Add a Timeout field to the http.Client struct initialization alongside the existing Transport configuration to enforce an end-to-end timeout on the entire request lifecycle, preventing indefinite hangs during provisioning requests on unreliable networks.cli/internal/provision/provision.go-117-119 (1)
117-119:⚠️ Potential issue | 🟠 MajorPin checksums for all remotely downloaded artifacts.
download()treats empty checksums as success, allowing several runtime artifacts to remain mutable-by-upstream with no integrity enforcement. The Whisper modelstiny.enandsmalllack checksums (lines 34–39), and the struct comment documents this as intentional ("empty: verification skipped"). Additionally, FFmpeg binaries and WhisperCLI are downloaded viafetch()without any checksum validation (lines 303, 272), despite the comment at lines 198–199 acknowledging they are "not yet pinned by checksum." This undermines hermetic trust guarantees across the provisioning system.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/internal/provision/provision.go` around lines 117 - 119, The provisioning system allows runtime artifacts to be downloaded without pinned checksums, undermining hermetic integrity guarantees. To fix this, add specific checksums for the Whisper models tiny.en and small that currently have empty checksums, add checksums for the FFmpeg binaries and WhisperCLI that are downloaded via fetch() without validation, update the struct comment that documents empty checksums as intentional to reflect that all artifacts are now checksummed, and ensure the download() function enforces checksum validation rather than treating empty checksums as valid. Verify that all remotely downloaded artifacts in the provisioning system now have pinned checksums to maintain integrity enforcement end-to-end.cli/internal/update/update.go-152-159 (1)
152-159:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd a request timeout to binary download.
Line 157 uses
http.Getwith no timeout, sopodcli updatecan hang indefinitely on slow/broken networks.Minimal fix
- resp, err := http.Get(url) + client := &http.Client{Timeout: 60 * time.Second} + resp, err := client.Get(url)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/internal/update/update.go` around lines 152 - 159, The downloadFile function uses http.Get without a timeout, which can cause podcli update to hang indefinitely on slow or broken networks. Create an http.Client with an appropriate timeout (e.g., 30 seconds) and use that client's Get method instead of the default http.Get call on line 157 to ensure the download request has a bounded execution time.cli/internal/provision/studio.go-220-239 (1)
220-239:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftBlock symlink-based write escape in tar extraction.
Line 233-Line 238 creates symlinks from archive headers without constraining
h.Linkname, and Line 224 writes regular files withos.OpenFile(which follows symlinks). A crafted tar can redirect writes outsidedest.Suggested hardening direction
case tar.TypeReg: + if fi, err := os.Lstat(target); err == nil && (fi.Mode()&os.ModeSymlink) != 0 { + return fmt.Errorf("refusing to write through symlink: %s", h.Name) + } if err := os.MkdirAll(filepath.Dir(target), 0o755); err != nil { return err } @@ case tar.TypeSymlink: - if err := os.MkdirAll(filepath.Dir(target), 0o755); err != nil { - return err - } - os.Remove(target) - os.Symlink(h.Linkname, target) + return fmt.Errorf("symlink entries are not allowed in archive: %s", h.Name)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/internal/provision/studio.go` around lines 220 - 239, The tar extraction code has a symlink-based write escape vulnerability where the tar.TypeSymlink case creates symlinks from h.Linkname without validating that the target path stays within the intended extraction directory, and the tar.TypeReg case writes files using os.OpenFile which follows symlinks. To fix this, validate that the h.Linkname target in the tar.TypeSymlink case is constrained to the destination directory by resolving it as a path and checking it doesn't escape. Additionally, before writing regular files in the tar.TypeReg case, verify the target path itself (after resolving any intermediate symlinks) is still within bounds to prevent writes through malicious symlinks created by a previous entry in the same archive.cli/main.go-65-71 (1)
65-71:⚠️ Potential issue | 🟠 MajorAdd
provision.EnsureWhisperCpp()to auto-selected whispercpp path inrunEngine().Lines 65–71 provision only the model but not the whisper-cli binary. When whispercpp is auto-selected here, the backend defaults to "whisper-cli" from PATH if
PODCLI_WHISPER_CLIis unset. This creates an asymmetry withsetup()(line 176), which explicitly callsEnsureWhisperCpp(). If whisper-cli is not in the managed runtime directory and not on PATH, transcription will fail at runtime.Add
provision.EnsureWhisperCpp()in this branch (similar to the setup flow) to ensure the binary is available before passing control to the backend.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/main.go` around lines 65 - 71, In the whispercpp auto-selection branch (when transcribeEngine(args) == "whispercpp"), the code currently calls provision.EnsureModel("base") to provision only the model file but does not provision the whisper-cli binary itself. Add a call to provision.EnsureWhisperCpp() before or after the EnsureModel call to ensure the whisper-cli binary is available in the managed runtime, creating consistency with the setup() function and preventing runtime failures when the binary is not found on the system PATH.
🟡 Minor comments (3)
backend/cli.py-590-601 (1)
590-601:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCleanly stop the spinner for native-engine failures too.
The new native path can raise
FileNotFoundErrorfor missing GGML models or subprocess-related exceptions from ffmpeg/whisper.cpp. Catching onlyRuntimeErrorleaves those failures to print through the active spinner as a traceback.Proposed fix
- except RuntimeError as e: + except (RuntimeError, FileNotFoundError, subprocess.CalledProcessError) as e: _spin_stop.set() spin_thread.join(timeout=1) print(f"\r{' ' * 70}\r ✗ {e}\n", flush=True) sys.exit(1)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/cli.py` around lines 590 - 601, The exception handler in the transcribe_file try-except block only catches RuntimeError, but the native-engine path can also raise FileNotFoundError for missing GGML models and subprocess-related exceptions from ffmpeg/whisper.cpp. Change the except clause to catch a broader set of exceptions (such as Exception or specifically include FileNotFoundError and OSError) so that the spinner is properly stopped via _spin_stop.set() and spin_thread.join() for all failure cases, not just RuntimeError, preventing tracebacks from printing while the spinner is still active.tests/test_whispercpp_snap.py-36-38 (1)
36-38:⚠️ Potential issue | 🟡 MinorReplace
mktemp()with opened temporary files.
tempfile.mktemp()only reserves a name without creating the file, leaving a TOCTOU race window where another process could create or modify the file before_make_wav()writes it. UseNamedTemporaryFile(delete=False)to atomically reserve and create the file, close it, and then pass the path to_make_wav().Proposed fix
def setUp(self): - self.wav = tempfile.mktemp(suffix=".wav") + tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False) + self.wav = tmp.name + tmp.close() _make_wav(self.wav) @@ def test_all_silence_leaves_words_unchanged(self): - silent = tempfile.mktemp(suffix=".wav") + tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False) + silent = tmp.name + tmp.close() _make_wav(silent, voiced_s=0.0, silence_s=1.0)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_whispercpp_snap.py` around lines 36 - 38, In the setUp method, replace the tempfile.mktemp() call with NamedTemporaryFile(delete=False) to eliminate the TOCTOU race condition. Create a NamedTemporaryFile with the appropriate suffix, extract its name property, close the file object, and then assign that name to self.wav before passing it to the _make_wav() function. This ensures the temporary file is atomically created and reserved before _make_wav() attempts to write to it.Source: Linters/SAST tools
plans/native-cli.md-7-7 (1)
7-7:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd language identifiers to fenced code blocks.
These fences trigger markdownlint MD040; add explicit languages (for example
bash/text) to keep docs lint-clean.Suggested fix
-``` +```bash npm i -g podcli # or: bun add -g podcli podcli process pod.mp4 --top 5 → first run: silently provisions a hermetic runtime (one time) → 9:16 clips with burned captions podcli # auto-updates itself on launch@@
-+text
┌─ podcli (Go launcher, ~8MB, per-platform) ──────────────────────────┐
│ • on-launch self-update (GitHub Releases, throttle-free fast check) │
...
└──────────────────────────────────────────────────────────────────────┘Also applies to: 52-52
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plans/native-cli.md` at line 7, Add language identifiers to the fenced code blocks in the native-cli.md file to comply with markdownlint MD040. At line 7, add the bash language identifier to the opening fence containing npm and podcli commands. At line 52, add the text language identifier to the opening fence containing the ASCII art diagram. Both code blocks currently have empty language identifiers (just triple backticks) and need explicit language tags to keep the documentation lint-clean.Source: Linters/SAST tools
🧹 Nitpick comments (2)
tests/parity/compare.py (1)
106-129: 💤 Low valueRemove unnecessary f-string prefix.
Line 118 has an f-string without any placeholders.
Minor style fix
- print(f"Need baseline/<stem> and candidate/<stem>, or two explicit paths.", file=sys.stderr) + print("Need baseline/<stem> and candidate/<stem>, or two explicit paths.", file=sys.stderr)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/parity/compare.py` around lines 106 - 129, The error message string in the main function contains an f-string prefix but has no placeholder variables, making the f-string unnecessary. Remove the f prefix from the print statement in the error handling block (the line that prints "Need baseline/<stem> and candidate/<stem>, or two explicit paths.") and convert it to a regular string literal.tests/parity/README.md (1)
32-34: ⚡ Quick winTighten the examples for lint and portability.
The fenced blocks are unlabeled, and the
venv/bin/python3examples are POSIX-only. Tag the fences (sh/text) and prefer a platform-neutral invocation here.Suggested cleanup
-``` -venv/bin/python3 -m pytest tests/parity/test_caption_parity.py -q -``` +```sh +python -m pytest tests/parity/test_caption_parity.py -q +```Apply the same fence tagging to the other command examples.
Also applies to: 38-40, 53-55, 61-63
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/parity/README.md` around lines 32 - 34, The fenced code blocks in the README lack language labels and use POSIX-specific paths that reduce portability. In tests/parity/README.md at lines 32-34 (anchor), add the `sh` language tag to the opening backticks and replace `venv/bin/python3` with `python` in the pytest invocation. Apply the same changes at the sibling locations: lines 38-40, lines 53-55, and lines 61-63. For any non-shell fenced blocks, use the `text` tag instead. This improves linting compliance and makes the examples work across different platforms and environments.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/requirements-runtime.txt`:
- Around line 5-11: Update three package version constraints in the requirements
file to address critical security vulnerabilities. Change opencv-python-headless
from >=4.8.0 to >=4.8.1.78 to fix CVE-2023-4863 heap buffer overflow. Change
python-dotenv from >=1.0.0 to >=1.2.2 to fix CVE-2026-28684 symlink following
vulnerability. Change google-api-python-client from >=2.0.0 to >=2.0.1 since
version 2.0.0 was yanked. Leave numpy, Pillow, questionary, and
google-auth-oauthlib at their current specified versions as they have no known
security issues.
In `@backend/services/transcription.py`:
- Around line 35-38: The launcher provisions whisper.cpp models via
`provision.EnsureModel()` but fails to pass the resulting model path to the
backend. Instead of only setting `PODCLI_HOME`, capture the model path returned
by `provision.EnsureModel()` and explicitly set the `PODCLI_WHISPERCPP_MODEL`
environment variable to that path when invoking the backend process. This
ensures the backend's fallback logic in the model assignment (which checks for
`PODCLI_WHISPERCPP_MODEL` before defaulting to the `_managed_home()` path) will
find the correctly provisioned model.
In `@cli/internal/update/update.go`:
- Around line 140-149: The Windows update process backs up the current binary by
moving dest to dest.old, then attempts to install the staged binary. If the
final os.Rename(staged, dest) call on line 149 fails, the original binary
remains in dest.old while dest is missing, leaving the CLI unusable. Add
rollback logic after the final os.Rename(staged, dest) call: if this rename
fails, restore the backup by moving dest.old back to dest before returning the
error, ensuring the system can fall back to the previous working binary if the
installation fails.
In `@tests/parity/test_caption_parity.py`:
- Around line 37-52: In the _render() function, replace the deprecated and
insecure tempfile.mktemp() call with NamedTemporaryFile. Create a
NamedTemporaryFile context manager with delete=False and suffix=".ass", extract
its name attribute to pass to render_captions(), then read the file contents
within the same try-finally block to ensure proper cleanup. This eliminates the
race condition vulnerability where an attacker could create a file at the path
before your code writes to it.
---
Major comments:
In @.github/workflows/release.yml:
- Around line 14-16: The permissions block at the workflow level grants write
access to repository contents for all jobs in the workflow, which violates the
principle of least privilege. Remove the `contents: write` permission from the
workflow-level permissions block (lines 14-16) and move it to only the specific
job(s) that actually need it (the release or publish job). Keep other jobs with
read-only permissions or remove the permissions block from those jobs entirely
to ensure minimal token scope.
- Line 58: The `macos-13` runner label used in the workflow matrix is no longer
supported and will cause job scheduling failures. In
`.github/workflows/release.yml` at line 58 and line 116, replace the `macos-13`
runner label with a currently supported macOS runner label such as
`macos-latest` or another actively maintained version. This will ensure the
matrix jobs can schedule successfully on the GitHub Actions runners.
- Around line 29-30: Replace all action version pins in the workflow from
tag-based (`@v`*) to full commit SHA pins throughout the release.yml file to
eliminate supply-chain risks from tag reassignment. Additionally, add a with
clause containing persist-credentials: false to every actions/checkout step to
prevent credential leakage. This applies to all uses statements referencing
actions/checkout and actions/setup-go, as well as any other GitHub actions used
in the workflow. For each affected action, convert the `@v`* reference to the
corresponding full commit SHA and ensure checkout steps include the
persist-credentials configuration.
In `@backend/services/transcription_whispercpp.py`:
- Around line 84-86: The idx matrix creation in the RMS computation block
materializes the full nf × frame index matrix, which causes excessive memory
allocation for long audio files. Instead of creating the entire idx matrix
upfront with np.arange(nf)[:, None] * hop + np.arange(frame)[None, :], refactor
the RMS computation to process frames in chunks or use a streaming
cumulative-sum approach that computes rms values iteratively without allocating
the full index matrix. This will prevent gigabyte-scale memory spikes when
processing long podcast WAVs.
- Around line 20-25: The `_extract_wav` function uses unbounded
`subprocess.run()` which lacks mandatory timeouts and proper error handling,
making the backend vulnerable to stalled processes and losing useful error
details. Replace the `subprocess.run()` call in `_extract_wav` with
`utils.proc.run()` to inherit the project's timeout and failure capture
mechanisms. The same fix must also be applied to the whisper-cli invocation that
discards stderr (mentioned in the comment as also applying to that location),
ensuring both external tool invocations inherit the proper process wrapper
behavior.
- Around line 53-58: The code in the token accumulation block treats the
SentencePiece marker `▁` as a word boundary indicator but appends the raw token
unchanged to cur_text, causing caption words to retain the marker (e.g.,
`▁hello`). Before appending the raw token to cur_text, normalize the `▁`
character to a space so that subsequent strip() operations can properly remove
it. This should be done at the `cur_text += raw` line by replacing any `▁`
characters with spaces in the raw token before concatenation.
- Around line 124-130: The code currently ensures positive word duration with
the fallback `e = min(s + 0.05, hi)` when `e <= s`, but this still produces
zero-duration words when `s` is already clamped to the upper bound `hi`. After
the existing `if e <= s:` block (which sets `e = min(s + 0.05, hi)`), add an
additional guard that checks if `e <= s` is still true after the fallback. If
so, shift `s` backward by 0.05 (e.g., `s = max(e - 0.05, lo)`) to ensure a
positive duration exists while respecting the lower bound `lo`.
In `@backend/services/transcription.py`:
- Around line 29-56: The _transcribe_with_whispercpp function returns
transcription results without applying speaker detection and face analysis
enrichment, causing speaker segments to be missing even when diarization is
enabled. Extract the existing speaker-detection and face-analysis
post-processing logic (currently applied to other transcription paths) into a
separate shared helper function, then call this helper at the end of
_transcribe_with_whispercpp before returning the result, and also ensure it is
called for the other transcription path at lines 84-86 to maintain consistent
behavior across all transcription engines.
In `@backend/utils/proc.py`:
- Around line 21-27: The `_resolve_tool` function checks if an override path
exists without first resolving relative paths to absolute paths. When an
environment variable contains a relative path like `./bin/ffmpeg`, the
`os.path.exists(override)` check evaluates relative to the current working
directory, which may differ from where the variable was set. To fix this,
convert the override path to an absolute path using `os.path.abspath()` before
calling `os.path.exists()`. This ensures that relative paths are consistently
resolved and the hermetic binary is not silently skipped.
In `@cli/internal/engine/engine.go`:
- Around line 65-80: The BackendRoot() function currently searches for the
backend in the wrong order, creating a security vulnerability where a
project-local backend/cli.py in an ancestor directory takes precedence over the
managed runtime backend. Reorder the function to check the provisioned runtime
backend at paths.RuntimeDir()/backend first (before the directory tree walk), so
that the managed runtime backend is preferred over any ancestor backend/cli.py.
This way, only if the runtime backend does not exist should the function fall
back to walking the directory tree for a source checkout.
In `@cli/internal/provision/provision.go`:
- Around line 542-555: In the tar.TypeLink case, the hardlink source path
constructed with filepath.Join(dest, h.Linkname) must be validated to ensure it
doesn't escape the extraction root boundary (similar to how the target path
should be validated). Additionally, the os.Link() call currently ignores its
error return value, allowing failures to go undetected. Fix this by: (1)
validating the resolved source path stays within the extraction root before
calling os.Link, and (2) capturing the error returned from os.Link() and
returning it if non-nil, rather than silently ignoring failures.
- Around line 145-147: The HTTP client in the provision.go file currently only
sets ResponseHeaderTimeout (60 seconds) which protects against stalled response
headers but not against slow body transmission or other network hangs. Add a
Timeout field to the http.Client struct initialization alongside the existing
Transport configuration to enforce an end-to-end timeout on the entire request
lifecycle, preventing indefinite hangs during provisioning requests on
unreliable networks.
- Around line 117-119: The provisioning system allows runtime artifacts to be
downloaded without pinned checksums, undermining hermetic integrity guarantees.
To fix this, add specific checksums for the Whisper models tiny.en and small
that currently have empty checksums, add checksums for the FFmpeg binaries and
WhisperCLI that are downloaded via fetch() without validation, update the struct
comment that documents empty checksums as intentional to reflect that all
artifacts are now checksummed, and ensure the download() function enforces
checksum validation rather than treating empty checksums as valid. Verify that
all remotely downloaded artifacts in the provisioning system now have pinned
checksums to maintain integrity enforcement end-to-end.
In `@cli/internal/provision/studio.go`:
- Around line 220-239: The tar extraction code has a symlink-based write escape
vulnerability where the tar.TypeSymlink case creates symlinks from h.Linkname
without validating that the target path stays within the intended extraction
directory, and the tar.TypeReg case writes files using os.OpenFile which follows
symlinks. To fix this, validate that the h.Linkname target in the
tar.TypeSymlink case is constrained to the destination directory by resolving it
as a path and checking it doesn't escape. Additionally, before writing regular
files in the tar.TypeReg case, verify the target path itself (after resolving
any intermediate symlinks) is still within bounds to prevent writes through
malicious symlinks created by a previous entry in the same archive.
In `@cli/internal/update/update.go`:
- Around line 152-159: The downloadFile function uses http.Get without a
timeout, which can cause podcli update to hang indefinitely on slow or broken
networks. Create an http.Client with an appropriate timeout (e.g., 30 seconds)
and use that client's Get method instead of the default http.Get call on line
157 to ensure the download request has a bounded execution time.
In `@cli/main.go`:
- Around line 65-71: In the whispercpp auto-selection branch (when
transcribeEngine(args) == "whispercpp"), the code currently calls
provision.EnsureModel("base") to provision only the model file but does not
provision the whisper-cli binary itself. Add a call to
provision.EnsureWhisperCpp() before or after the EnsureModel call to ensure the
whisper-cli binary is available in the managed runtime, creating consistency
with the setup() function and preventing runtime failures when the binary is not
found on the system PATH.
In `@npm/scripts/install.js`:
- Around line 43-79: The download function downloads executable files without
any integrity verification, making the system vulnerable to compromised
binaries. Add checksum or signature validation to verify the downloaded
executable before it is renamed and persisted. In the download function, after
the file write completes but before the fs.renameSync call that persists the
temporary file to the final destination, calculate or verify a checksum against
a known good value (obtained from a separate checksum file or manifest).
Similarly, in the ensure function, ensure that the downloaded binary from the
GitHub release URL is validated. Only proceed with renaming the temporary file
to the final destination after successful integrity verification, and reject
with an error if the checksum does not match.
- Around line 27-28: The Windows platform check in the `defaultHome()` function
returns `process.env.LOCALAPPDATA` directly without appending the 'podcli'
subdirectory, whereas the fallback case joins the path with 'AppData', 'Local',
'podcli', and the non-Windows paths always append 'podcli'. Modify the Windows
platform check to use `path.join()` to append 'podcli' to
`process.env.LOCALAPPDATA`, making it consistent with the fallback behavior and
other platform implementations.
---
Minor comments:
In `@backend/cli.py`:
- Around line 590-601: The exception handler in the transcribe_file try-except
block only catches RuntimeError, but the native-engine path can also raise
FileNotFoundError for missing GGML models and subprocess-related exceptions from
ffmpeg/whisper.cpp. Change the except clause to catch a broader set of
exceptions (such as Exception or specifically include FileNotFoundError and
OSError) so that the spinner is properly stopped via _spin_stop.set() and
spin_thread.join() for all failure cases, not just RuntimeError, preventing
tracebacks from printing while the spinner is still active.
In `@plans/native-cli.md`:
- Line 7: Add language identifiers to the fenced code blocks in the
native-cli.md file to comply with markdownlint MD040. At line 7, add the bash
language identifier to the opening fence containing npm and podcli commands. At
line 52, add the text language identifier to the opening fence containing the
ASCII art diagram. Both code blocks currently have empty language identifiers
(just triple backticks) and need explicit language tags to keep the
documentation lint-clean.
In `@tests/test_whispercpp_snap.py`:
- Around line 36-38: In the setUp method, replace the tempfile.mktemp() call
with NamedTemporaryFile(delete=False) to eliminate the TOCTOU race condition.
Create a NamedTemporaryFile with the appropriate suffix, extract its name
property, close the file object, and then assign that name to self.wav before
passing it to the _make_wav() function. This ensures the temporary file is
atomically created and reserved before _make_wav() attempts to write to it.
---
Nitpick comments:
In `@tests/parity/compare.py`:
- Around line 106-129: The error message string in the main function contains an
f-string prefix but has no placeholder variables, making the f-string
unnecessary. Remove the f prefix from the print statement in the error handling
block (the line that prints "Need baseline/<stem> and candidate/<stem>, or two
explicit paths.") and convert it to a regular string literal.
In `@tests/parity/README.md`:
- Around line 32-34: The fenced code blocks in the README lack language labels
and use POSIX-specific paths that reduce portability. In tests/parity/README.md
at lines 32-34 (anchor), add the `sh` language tag to the opening backticks and
replace `venv/bin/python3` with `python` in the pytest invocation. Apply the
same changes at the sibling locations: lines 38-40, lines 53-55, and lines
61-63. For any non-shell fenced blocks, use the `text` tag instead. This
improves linting compliance and makes the examples work across different
platforms and environments.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5e29c951-3895-4dfe-9d6b-5789ace174c5
📒 Files selected for processing (46)
.github/workflows/release.yml.gitignorebackend/cli.pybackend/clip_studio.pybackend/main.pybackend/requirements-runtime.txtbackend/services/clip_generator.pybackend/services/encoder.pybackend/services/thumbnail_html.pybackend/services/transcription.pybackend/services/transcription_whispercpp.pybackend/utils/proc.pycli/go.modcli/internal/backend/embed.gocli/internal/backend/sync.shcli/internal/config/config.gocli/internal/engine/engine.gocli/internal/paths/paths.gocli/internal/podstack/podstack.gocli/internal/podstack/sync.shcli/internal/provision/provision.gocli/internal/provision/provision_manual_test.gocli/internal/provision/studio.gocli/internal/update/update.gocli/internal/update/update_test.gocli/main.gonpm/README.mdnpm/bin/podcli.jsnpm/package.jsonnpm/scripts/install.jsplans/native-cli.mdremotion/render-bookend.mjsscripts/build-remotion.shscripts/build-studio.shsrc/config/paths.tstests/parity/.gitignoretests/parity/README.mdtests/parity/capture_baseline.pytests/parity/compare.pytests/parity/golden/branded.ass.expectedtests/parity/golden/hormozi.ass.expectedtests/parity/golden/karaoke.ass.expectedtests/parity/golden/subtle.ass.expectedtests/parity/test_caption_parity.pytests/parity/transcript_synthetic.jsontests/test_whispercpp_snap.py
…block symlink traversal - Publish checksums.txt in the release workflow; verify it in self-update, whisper-cli/studio/remotion provisioning, and the npm installer - Pin binary downloads/redirects to GitHub hosts (replaces manual redirect recursion in the self-updater) - Reject archive symlinks/hardlinks whose targets escape the extract dir - Namespace the transcript cache by engine so whisper.cpp doesn't reuse a whisper-py transcript; honor PODCLI_FFPROBE in clip_studio - Fail fast when NPM_TOKEN is unset before publishing
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
backend/services/transcription_whispercpp.py (3)
84-86:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid building the full frame index matrix for podcast-length audio.
idxmaterializesnf * framesamples; for hour-long files this can exceed memory and crash during the post-transcription snap pass.Proposed low-memory RMS calculation
- idx = np.arange(nf)[:, None] * hop + np.arange(frame)[None, :] - rms = np.sqrt((samples[idx] ** 2).mean(axis=1)) + rms = np.empty(nf, dtype=np.float32) + for i in range(nf): + start_idx = i * hop + window = samples[start_idx : start_idx + frame] + rms[i] = np.sqrt(np.mean(window * window))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/services/transcription_whispercpp.py` around lines 84 - 86, The index matrix `idx` in the RMS calculation materializes all nf * frame indices in memory at once, which causes memory exhaustion for long audio files. Replace this full materialization with a loop-based or rolling window approach that processes one or a small batch of frames at a time, computing the RMS values incrementally without constructing the large `idx` matrix. The key is to iterate through frame positions (using the `hop`, `frame`, and `nf` variables) and calculate the squared mean for each individual frame or small batch, then take the square root, rather than creating the full indexing array upfront.
39-44:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftWire speaker attribution into the whisper.cpp path.
This adapter currently returns every word and segment with
speaker: None, so--engine whispercppdisables speaker-aware clipping/cropping even when speaker detection is requested upstream.As per coding guidelines,
backend/**/*.py: “Implement speaker detection as part of the transcribe_podcast tool using Whisper transcription”.Also applies to: 178-185
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/services/transcription_whispercpp.py` around lines 39 - 44, The whisper.cpp transcription adapter is hardcoding speaker attribution as None, which disables speaker-aware processing downstream. In backend/services/transcription_whispercpp.py at lines 39-44 (the words dictionary) and lines 178-185 (the segments dictionary), replace the hardcoded "speaker": None with actual speaker attribution data extracted from the whisper.cpp transcription results. Extract speaker information from the whisper.cpp output for each word and segment, and use that value instead of None to enable speaker-aware clipping and cropping when speaker detection is enabled upstream.Source: Coding guidelines
123-131:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrevent snapped words from collapsing to zero duration.
When a word starts after
hi, bothsandeclamp tohi, so Line 129 keepsend == start; the trailing caption can disappear or become invalid.Proposed guard for clamped tail words
out, prev_end = [], lo for w in words: + min_dur = 0.05 s = min(max(float(w.get("start", 0.0)), lo), hi) e = min(max(float(w.get("end", 0.0)), lo), hi) if s < prev_end: s = prev_end if e <= s: - e = min(s + 0.05, hi) + if s >= hi: + s = max(prev_end, hi - min_dur) + e = s + min_dur + else: + e = min(s + min_dur, hi) prev_end = e🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/services/transcription_whispercpp.py` around lines 123 - 131, The condition on line 129 that expands zero-duration words uses `min(s + 0.05, hi)` which prevents proper expansion when both `s` and `e` are clamped to `hi` (i.e., when a word starts after the `hi` boundary). This results in `e` remaining equal to `s` and zero-duration words. Modify the expansion logic to ensure the minimum duration of 0.05 is applied without being constrained by `hi`, so that even tail words clamped to the boundary receive valid non-zero duration. Consider removing the `hi` constraint in the `min()` call or adding a separate guard specifically for words that start at or beyond `hi`.backend/cli.py (1)
568-597:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winImport
subprocessbefore catchingCalledProcessError.Line 597 references
subprocess.CalledProcessError, butsubprocessis not in scope here. Any transcription failure will turn into aNameError, bypassing the spinner cleanup and friendly exit path.Proposed fix
warnings.filterwarnings("ignore", message="FP16 is not supported on CPU") from services.transcription import transcribe_file + import subprocess import threading🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/cli.py` around lines 568 - 597, The except clause on line 597 references subprocess.CalledProcessError but the subprocess module is not imported, which will cause a NameError when any transcription failure occurs and prevent proper spinner cleanup. Add import subprocess with the other imports at the beginning of the function (alongside the imports for threading and transcribe_file from services.transcription) to ensure the exception handler can properly catch subprocess.CalledProcessError.Source: Linters/SAST tools
backend/services/transcript_packer.py (1)
79-90:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSkip legacy cache fallback for engine-namespaced transcripts.
When
PODCLI_ENGINE=whispercpp, this misses<hash>-whispercpp.json, loads the unsuffixed legacy MD5 cache, then saves that whisper-py-shaped data under the whisper.cpp filename on Line 89.Proposed fix
canonical = transcript_json_path(cache_hash) if os.path.exists(canonical): with open(canonical, encoding="utf-8") as f: return json.load(f) + if _engine_cache_suffix(): + return None legacy = legacy_md5_cache_path(video_path) if os.path.exists(legacy): with open(legacy, encoding="utf-8") as f:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/services/transcript_packer.py` around lines 79 - 90, The load_cached_transcript_for_video function currently falls back to loading from the legacy MD5 cache regardless of whether an engine namespace is being used. When PODCLI_ENGINE is set to a specific engine like whispercpp, this causes the function to load whisper-py-shaped data from the legacy cache and save it under the engine-specific filename via save_cached_transcript_for_video on line 89, creating a data mismatch. Skip the legacy cache fallback (lines 87-90) when an engine namespace is active by adding a condition to check if the transcript is engine-specific, ensuring that only engine-appropriate cached data is loaded or nothing is loaded if the engine-specific cache doesn't exist.
🧹 Nitpick comments (3)
cli/internal/provision/security_test.go (1)
51-75: ⚡ Quick winConsider adding a test for hardlink escape rejection.
extractTarGzvalidates both symlinks (tested here) and hardlinks (tar.TypeLinkat lines 668-672 inprovision.go). Adding a similar test for hardlinks would ensure the security invariant is tested end-to-end for both link types.func TestExtractTarGzRejectsEscapingHardlink(t *testing.T) { if runtime.GOOS == "windows" { t.Skip("hardlink semantics differ on Windows") } var buf bytes.Buffer gz := gzip.NewWriter(&buf) tw := tar.NewWriter(gz) // First create a regular file that exists tw.WriteHeader(&tar.Header{Name: "legit", Typeflag: tar.TypeReg, Size: 0, Mode: 0o644}) // Then a hardlink pointing outside tw.WriteHeader(&tar.Header{Name: "evil", Typeflag: tar.TypeLink, Linkname: "../../../etc/passwd", Mode: 0o644}) tw.Close() gz.Close() dir := t.TempDir() archive := filepath.Join(dir, "evil.tar.gz") if err := os.WriteFile(archive, buf.Bytes(), 0o644); err != nil { t.Fatal(err) } dest := filepath.Join(dir, "out") if err := extractTarGz(archive, dest); err == nil { t.Fatal("expected extractTarGz to reject escaping hardlink, got nil error") } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/internal/provision/security_test.go` around lines 51 - 75, Add a new test function `TestExtractTarGzRejectsEscapingHardlink` to the test file that validates `extractTarGz` rejects hardlinks escaping the destination directory. The test should follow the same pattern as the existing `TestExtractTarGzRejectsEscapingSymlink` test but use `tar.TypeLink` instead of `tar.TypeSymlink`, include the Windows skip condition since hardlink semantics differ on Windows, create a regular file entry first (using `tar.TypeReg`), then create a hardlink entry with a Linkname pointing outside the destination (e.g., "../../../etc/passwd"), and finally verify that `extractTarGz` returns an error when attempting to extract the archive. This ensures the security validation for hardlinks is tested end-to-end alongside symlinks.cli/internal/provision/studio.go (1)
43-82: ⚖️ Poor tradeoffNode.js downloads lack checksum verification.
Unlike whisper-cli, Remotion, and Studio (which verify against
checksums.txt), Node.js downloads from nodejs.org aren't verified. Node does publishSHASUMS256.txtfiles athttps://nodejs.org/dist/v{version}/SHASUMS256.txt.asc(GPG-signed) orSHASUMS256.txt.This is lower risk since nodejs.org is a trusted source over HTTPS, but for parity with other provisioned assets, consider fetching and verifying against the upstream checksum.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/internal/provision/studio.go` around lines 43 - 82, The EnsureNode() function downloads Node.js but does not verify the downloaded archive against a checksum, unlike other provisioned assets. Add checksum verification by fetching the appropriate SHASUMS256.txt file from nodejs.org (either the GPG-signed or plain text version at https://nodejs.org/dist/v{nodeVersion}/SHASUMS256.txt) after the fetch() call completes successfully, then verify the downloaded archive file against the checksum before proceeding to the extraction step using extractZipStrip1() or extractTarGzStrip1(). This ensures parity with checksum verification for other provisioned dependencies and adds a safety check for the Node.js download.cli/internal/update/update.go (1)
192-196: 💤 Low valueConsider failing closed when checksum entry is missing for the current asset.
A compromised release could omit the checksum entry for a specific platform's binary while providing entries for others, allowing that platform's binary to bypass verification. Since new releases will include
checksums.txt, consider returning an error when the file exists but lacks an entry for the current asset.This would only affect edge cases where
checksums.txtis present but incomplete.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/internal/update/update.go` around lines 192 - 196, The checksum verification logic in the ParseChecksums block currently returns nil (success) when the checksum entry for assetName() is not found in the parsed checksums. To prevent a compromised release from bypassing verification by selectively omitting entries, change this behavior to return an error instead of nil when ok is false. This ensures that if checksums.txt exists but lacks an entry for the current asset, the verification fails rather than silently succeeding, implementing a fail-closed security posture.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@backend/cli.py`:
- Around line 568-597: The except clause on line 597 references
subprocess.CalledProcessError but the subprocess module is not imported, which
will cause a NameError when any transcription failure occurs and prevent proper
spinner cleanup. Add import subprocess with the other imports at the beginning
of the function (alongside the imports for threading and transcribe_file from
services.transcription) to ensure the exception handler can properly catch
subprocess.CalledProcessError.
In `@backend/services/transcript_packer.py`:
- Around line 79-90: The load_cached_transcript_for_video function currently
falls back to loading from the legacy MD5 cache regardless of whether an engine
namespace is being used. When PODCLI_ENGINE is set to a specific engine like
whispercpp, this causes the function to load whisper-py-shaped data from the
legacy cache and save it under the engine-specific filename via
save_cached_transcript_for_video on line 89, creating a data mismatch. Skip the
legacy cache fallback (lines 87-90) when an engine namespace is active by adding
a condition to check if the transcript is engine-specific, ensuring that only
engine-appropriate cached data is loaded or nothing is loaded if the
engine-specific cache doesn't exist.
In `@backend/services/transcription_whispercpp.py`:
- Around line 84-86: The index matrix `idx` in the RMS calculation materializes
all nf * frame indices in memory at once, which causes memory exhaustion for
long audio files. Replace this full materialization with a loop-based or rolling
window approach that processes one or a small batch of frames at a time,
computing the RMS values incrementally without constructing the large `idx`
matrix. The key is to iterate through frame positions (using the `hop`, `frame`,
and `nf` variables) and calculate the squared mean for each individual frame or
small batch, then take the square root, rather than creating the full indexing
array upfront.
- Around line 39-44: The whisper.cpp transcription adapter is hardcoding speaker
attribution as None, which disables speaker-aware processing downstream. In
backend/services/transcription_whispercpp.py at lines 39-44 (the words
dictionary) and lines 178-185 (the segments dictionary), replace the hardcoded
"speaker": None with actual speaker attribution data extracted from the
whisper.cpp transcription results. Extract speaker information from the
whisper.cpp output for each word and segment, and use that value instead of None
to enable speaker-aware clipping and cropping when speaker detection is enabled
upstream.
- Around line 123-131: The condition on line 129 that expands zero-duration
words uses `min(s + 0.05, hi)` which prevents proper expansion when both `s` and
`e` are clamped to `hi` (i.e., when a word starts after the `hi` boundary). This
results in `e` remaining equal to `s` and zero-duration words. Modify the
expansion logic to ensure the minimum duration of 0.05 is applied without being
constrained by `hi`, so that even tail words clamped to the boundary receive
valid non-zero duration. Consider removing the `hi` constraint in the `min()`
call or adding a separate guard specifically for words that start at or beyond
`hi`.
---
Nitpick comments:
In `@cli/internal/provision/security_test.go`:
- Around line 51-75: Add a new test function
`TestExtractTarGzRejectsEscapingHardlink` to the test file that validates
`extractTarGz` rejects hardlinks escaping the destination directory. The test
should follow the same pattern as the existing
`TestExtractTarGzRejectsEscapingSymlink` test but use `tar.TypeLink` instead of
`tar.TypeSymlink`, include the Windows skip condition since hardlink semantics
differ on Windows, create a regular file entry first (using `tar.TypeReg`), then
create a hardlink entry with a Linkname pointing outside the destination (e.g.,
"../../../etc/passwd"), and finally verify that `extractTarGz` returns an error
when attempting to extract the archive. This ensures the security validation for
hardlinks is tested end-to-end alongside symlinks.
In `@cli/internal/provision/studio.go`:
- Around line 43-82: The EnsureNode() function downloads Node.js but does not
verify the downloaded archive against a checksum, unlike other provisioned
assets. Add checksum verification by fetching the appropriate SHASUMS256.txt
file from nodejs.org (either the GPG-signed or plain text version at
https://nodejs.org/dist/v{nodeVersion}/SHASUMS256.txt) after the fetch() call
completes successfully, then verify the downloaded archive file against the
checksum before proceeding to the extraction step using extractZipStrip1() or
extractTarGzStrip1(). This ensures parity with checksum verification for other
provisioned dependencies and adds a safety check for the Node.js download.
In `@cli/internal/update/update.go`:
- Around line 192-196: The checksum verification logic in the ParseChecksums
block currently returns nil (success) when the checksum entry for assetName() is
not found in the parsed checksums. To prevent a compromised release from
bypassing verification by selectively omitting entries, change this behavior to
return an error instead of nil when ok is false. This ensures that if
checksums.txt exists but lacks an entry for the current asset, the verification
fails rather than silently succeeding, implementing a fail-closed security
posture.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7cca89ec-090b-4aef-9769-c629374e671d
📒 Files selected for processing (14)
.github/workflows/release.ymlbackend/cli.pybackend/clip_studio.pybackend/services/transcript_packer.pybackend/services/transcription_whispercpp.pycli/internal/provision/provision.gocli/internal/provision/security_test.gocli/internal/provision/studio.gocli/internal/update/update.gocli/main.gocli/main_test.gonpm/scripts/install.jstests/test_transcript_cache_engine.pytests/test_whispercpp_adapter.py
✅ Files skipped from review due to trivial changes (1)
- tests/test_whispercpp_adapter.py
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/clip_studio.py
- .github/workflows/release.yml
- cli/main.go
The goldens encoded two host-specific inputs, so they passed on macOS but failed on CI's Ubuntu: - the font name (DETECTED_FONT resolves Arial on macOS, Liberation Sans on Ubuntu via fc-list) written into the ASS Style line — affected all 4 styles - per-word text widths (fc-match + freetype metrics) — affected the branded style's positioning and pill geometry Pin both in the parity test fixture (font -> Arial, deterministic width model) and regenerate the goldens. Production rendering still uses real host metrics; only the harness is pinned, which is its whole point: lock the engine-independent ASS pipeline, not the host's typography.
- requirements: bump opencv-python-headless (CVE-2023-4863), python-dotenv (CVE-2026-28684), google-api-python-client (2.0.0 yanked) floors in both runtime and dev requirements - cli.py: import subprocess at module scope — the native-engine except clause referenced subprocess.CalledProcessError without it in scope (NameError) - install.js: Windows defaultHome now nests under LOCALAPPDATA\podcli to match the Go launcher's managed dir (binary was installed where the shim couldn't find it) - update/provision: add ResponseHeaderTimeout to the release HTTP clients so a stalled server can't hang update/provisioning indefinitely - update.go: roll back the Windows binary swap if the staged rename fails, so a failed self-update can't brick the CLI - whisper.cpp snap: guarantee positive word duration when clamping to the voiced upper bound (was collapsing to zero) - transcript cache: don't adopt the legacy whisper-py cache under a whisper.cpp namespace - release.yml: default permissions to read; grant contents:write only to the release job - tests: parity goldens use mkstemp; add hardlink-escape and zero-duration-snap regression tests
…r.cpp Path model: the brand brain (presets, knowledge, assets, history, config) and the transcript cache now live in the global managed dir so they follow the user across directories; only rendered clips are written to <cwd>/podcli-clips. The Go launcher injects PODCLI_HOME/PODCLI_DATA (global), PODCLI_OUTPUT and PODCLI_CWD; Python and the studio honor PODCLI_OUTPUT. Migration: running podcli in an old ./podcli folder imports that folder's brand brain into the global home (via the bundle export/import so asset paths are rewritten), moves the transcript cache, picks up ancient top-level presets/, and copies .env. Non-destructive: it never overwrites a populated global home. Studio: the library already reads the global clips.json; clips are now streamed by history id from wherever they were rendered, with the recorded output_path as the allowlist (symlink-resolved, regular-file + extension checked). Engine: native installs auto-use whisper.cpp instead of erroring. The launcher selects it on a hermetic Python for the interactive menu, process, and studio; the backend also falls back automatically when openai-whisper is absent and the user did not explicitly request the whisper-py engine. The old in-repo launcher prints an upgrade notice pointing at the native CLI.
…cript Studio gets a text area to paste a quote or describe moments; the backend uses the local AI CLI to locate them in the transcript and appends the matches to the clip suggestions (deduped against existing ones). This mirrors the CLI's interactive "find a specific moment" flow. The moment-finding logic moves into claude_suggest.find_moments_from_text (the CLI now delegates to it), exposed as a find_moment backend task and a POST /api/find-moment route. Status goes to the progress channel; warnings to stderr so the task runner's stdout JSON-RPC stays clean.
- Interactive thumbnails passed output=None (the bare namespace has no argparse default), crashing in os.makedirs. Default to ./thumbnails at the point of use and in the menu namespace. - Studio launch derived PODCLI_DATA from PODCLI_OUTPUT, which now points at the cwd clip dir — scattering cache/working/logs into the working directory. Derive data dir from the cache path and propagate PODCLI_OUTPUT to the child instead. - Migration: resolve the .env target from PODCLI_ENV_FILE (what the loader reads) so migrated secrets land where they're loaded and "pending" clears; never treat the global managed dir itself as a legacy project; guard presets move against source==target. - whisper.cpp auto-fallback now also engages when a broken openai-whisper install fails to load/run, not only when it's missing. - Studio binds to 127.0.0.1 by default (it serves local files with no auth); set PODCLI_HOST=0.0.0.0 to expose on the LAN.
… reset - clip_generator: Remotion-fallback and boundary-revert messages went to stdout, which corrupts the JSON-RPC task channel (create_clip/batch_clips via MCP/studio) on the fallback path. Route them to stderr. - config bundle: asset paths in presets/ui-state were keyed in the rewrite map by resolved path but matched against the literal stored string, so any symlinked path (macOS /var, /tmp) was left pointing at the source machine after import. Register raw/expanded/resolved aliases so the literal value is rewritten. Adds a regression test (tmpdirs are symlinked on macOS, reproducing the failure). - find_moments_from_text: write the prompt via the managed .podcli/tmp helper instead of the repo root, matching suggest_with_claude. - studio: reset deselectedIndices when claude-suggest replaces the suggestion list so the wrong clips aren't silently excluded from export. - whisper.cpp: add timeouts to the ffmpeg and whisper-cli subprocess calls so a stalled child can't hang the transcribe task forever.
downloadOnce followed redirects to any host while only the checksums.txt fetch was host-pinned — and Node, CPython, and ffmpeg are downloaded without checksum verification, then executed. Route all binary/model downloads through a client that refuses redirects off the known source hosts (GitHub + *.githubusercontent, HuggingFace + its LFS CDN, nodejs.org, evermeet.cx, johnvansickle.com), and use the pinned client for the GitHub API calls too. Adds tests for the host allowlist (incl. suffix-spoofing) and redirect refusal.
Run setup automatically the first time a runtime command (the interactive menu, process, transcribe, studio) is invoked with no backend present, so a fresh install works without a separate `podcli setup`. Skipped for lightweight and mcp commands. Trim narration comments added in recent commits.
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
.github/workflows/release.yml (1)
79-86:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winValidate that whisper-cli binary was found before copying.
The
findcommand on line 83 could produce an empty$SRCif the build structure differs from expectations or the build failed silently. The subsequentcpwould then fail with a cryptic "cannot stat" error rather than a clear message.🛡️ Proposed validation
EXT=""; [ "${{ matrix.goos }}" = "windows" ] && EXT=".exe" SRC=$(find build -name "whisper-cli${EXT}" -type f | head -1) + if [ -z "$SRC" ]; then + echo "::error::whisper-cli binary not found in build/" + exit 1 + fi OUT="whisper-cli-${{ matrix.goos }}-${{ matrix.goarch }}${EXT}" cp "$SRC" "$OUT"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/release.yml around lines 79 - 86, Add validation after the find command that locates the whisper-cli binary to ensure the SRC variable is not empty before attempting to copy the file. Insert a conditional check that explicitly verifies SRC contains a valid file path, and if it is empty, output a clear error message and exit with a non-zero status code. This prevents the cp command from failing with a cryptic "cannot stat" error when the binary is not found, instead providing immediate and explicit feedback about the build artifact missing.backend/main.py (1)
502-519:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftExpose the required MCP tool names in
TASK_HANDLERS.
TASK_HANDLERScurrently doesn’t register several required MCP suite names (transcribe_podcast,batch_create_clips,knowledge_base,manage_assets,clip_history). This can break tool dispatch contracts when callers use the specified MCP names.As per coding guidelines, "Implement the MCP tools suite for podcli engine: transcribe_podcast, suggest_clips, create_clip, batch_create_clips, knowledge_base, manage_assets, and clip_history."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/main.py` around lines 502 - 519, The TASK_HANDLERS dictionary in the backend is missing several required MCP tool names that are part of the specified MCP suite. Add the following missing tool names to the TASK_HANDLERS dictionary: transcribe_podcast, batch_create_clips, knowledge_base, manage_assets, and clip_history. Each key should map to its corresponding handler function (for example, transcribe_podcast should map to handle_transcribe, batch_create_clips should map to handle_batch_clips, and the other three tools need their respective handler functions). Ensure all handler functions are properly defined before being referenced in the TASK_HANDLERS mapping to avoid runtime errors during tool dispatch.Source: Coding guidelines
cli/internal/engine/engine.go (1)
65-79:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrefer the provisioned backend before walking the working tree.
BackendRootcurrently executes the firstbackend/cli.pyfound abovecwdeven when the runtime backend exists. In the native binary path, runningpodcliinside any repo with abackend/cli.pywould execute local Python instead of the embedded backend, breaking the self-contained runtime trust boundary. Preferpaths.RuntimeDir()/backendafter explicitPODCLI_BACKEND, leaving cwd probing as a dev fallback.🛡️ Proposed resolution-order fix
func BackendRoot() (string, bool) { if b := os.Getenv("PODCLI_BACKEND"); b != "" && exists(filepath.Join(b, "cli.py")) { return b, true } + cand := filepath.Join(paths.RuntimeDir(), "backend") + if exists(filepath.Join(cand, "cli.py")) { + return cand, true + } if dir, err := os.Getwd(); err == nil { for { cand := filepath.Join(dir, "backend") if exists(filepath.Join(cand, "cli.py")) { return cand, true @@ dir = parent } } - cand := filepath.Join(paths.RuntimeDir(), "backend") - if exists(filepath.Join(cand, "cli.py")) { - return cand, true - } return "", false }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/internal/engine/engine.go` around lines 65 - 79, The BackendRoot function currently prioritizes the working directory tree walk over the provisioned runtime backend, which creates a security issue where local backends can override the embedded one. Reorder the backend discovery logic to check paths.RuntimeDir()/backend first by testing for the existence of cli.py there, and only fall back to the os.Getwd() directory tree walking if the runtime backend is not found. This ensures the provisioned backend takes precedence while preserving the development fallback capability.
🧹 Nitpick comments (6)
.github/workflows/release.yml (5)
23-28: 💤 Low valueConsider adding windows/arm64 to the build matrix.
The matrix includes arm64 variants for darwin and linux but omits windows/arm64. Windows 11 supports ARM64 natively, and GitHub Actions now offers windows-11-arm runners. If ARM64 Windows support is intentionally excluded, consider documenting the rationale in a comment.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/release.yml around lines 23 - 28, The build matrix in the include section is missing a windows/arm64 configuration entry, while darwin and linux both include arm64 variants. Add a new entry following the same pattern for windows/arm64 with goos: windows and goarch: arm64 to the include list, or if this exclusion is intentional, add a comment explaining the rationale for omitting Windows ARM64 support.
127-133: ⚡ Quick winAdd error checking for the remotion build script.
Similar to the studio job, if
scripts/build-remotion.shfails, the workflow will package an incomplete bundle without clear failure.🛡️ Proposed hardening
run: | + set -e sh scripts/build-remotion.sh dist/remotion-bundle🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/release.yml around lines 127 - 133, The Build remotion bundle step lacks error checking for the scripts/build-remotion.sh script execution. If the script fails, the workflow will continue and package an incomplete bundle instead of failing. Add error checking to the bash script block (either by using set -e at the start of the run command or by appending || exit 1 after the sh scripts/build-remotion.sh command) to ensure the workflow fails immediately if the build script encounters any errors.
40-46: ⚡ Quick winAdd explicit error checking after
go generate.The
go generate ./...command could fail silently if backend or PodStack sync scripts error out. Consider addingset -eat the top of the run block or checking$?after the generate step to fail fast.🛡️ Proposed hardening
run: | + set -e VERSION="${GITHUB_REF_NAME#v}" go generate ./... # sync backend + PodStack commands for go:embed🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/release.yml around lines 40 - 46, The `go generate ./...` command in the release workflow's run block lacks error checking and could fail silently if backend or PodStack sync scripts encounter errors. Add `set -e` at the very beginning of the run block (as the first line after the `run: |` statement) to ensure the workflow exits immediately if any command fails, preventing subsequent commands like `go build` from executing with potentially incomplete or corrupt generated files.
99-103: ⚡ Quick winAdd error checking for the studio build script.
If
scripts/build-studio.shfails, the workflow will continue to tar an empty or incomplete directory. The error might not surface until runtime when the provisioner tries to use the broken bundle.🛡️ Proposed hardening
run: | + set -e npm ci sh scripts/build-studio.sh dist/studio🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/release.yml around lines 99 - 103, The studio build workflow step does not include error checking, which means if scripts/build-studio.sh fails, the workflow will continue executing and create a tar archive of an incomplete or empty directory. Add error handling by including set -e at the beginning of the run block to ensure the workflow exits immediately if the build-studio.sh script fails, preventing incomplete bundles from being created and used downstream.
149-153: ⚡ Quick winAdd validation that checksums.txt was created successfully.
If
sha256sumfails or produces no output (e.g., due to missing files in dist/), themvcommand might move an empty file, and the release would be published with broken checksums.🛡️ Proposed validation
run: | sha256sum * > "$RUNNER_TEMP/checksums.txt" + if [ ! -s "$RUNNER_TEMP/checksums.txt" ]; then + echo "::error::checksums.txt is empty - no artifacts found in dist/" + exit 1 + fi mv "$RUNNER_TEMP/checksums.txt" checksums.txt🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/release.yml around lines 149 - 153, The "Generate checksums" step in the release workflow does not validate that the checksums.txt file was created successfully before moving it. If sha256sum fails or produces no output due to missing files in the dist directory, an empty checksums.txt would be moved to the release artifacts. Add validation after the sha256sum command to check that checksums.txt exists and is not empty, and exit with an error if the validation fails, ensuring the workflow stops before proceeding with the release if checksum generation is incomplete.tests/test_find_moments.py (1)
51-70: ⚡ Quick winAdd a malformed-output regression test for find-moment parsing.
Current tests don’t cover malformed
scores/timestamps from AI output. Add a case where one bad clip appears alongside one valid clip, and assert the valid one is still returned.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_find_moments.py` around lines 51 - 70, Add a new test method to the test file that verifies find_moments_from_text properly handles malformed AI output. The test should call find_moments_from_text with SEGMENTS data that includes both a malformed clip (with invalid scores or timestamps) and a valid clip, then assert that only the valid clip is returned in the results. This regression test will ensure that the find_moments_from_text function can gracefully filter out bad clips while still returning valid ones from mixed output.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/cli.py`:
- Around line 296-300: The help text for the output option is out of sync with
the resolver logic implemented in the base_output_dir assignment. The resolver
now checks configured_output_dir, then the PODCLI_OUTPUT environment variable,
then falls back to a clips folder relative to the video path, but the help text
still claims the default is "./clips". Update all help text descriptions for the
output parameter to accurately document this three-level fallback behavior:
first configured option, then PODCLI_OUTPUT environment variable, then the clips
subfolder of the video's directory. This ensures users understand where the
output will actually be written after running with the native launcher which
sets PODCLI_OUTPUT.
In `@backend/config_bundle.py`:
- Around line 408-412: The `_has_managed_content` function is incorrectly
identifying empty scaffold directories (presets/, knowledge/, assets/) as
existing profile data, causing `migrate_legacy_home` to skip legacy import when
it shouldn't. Modify `_has_managed_content` to return true only when the managed
directories contain actual files or data, not just when the directories
themselves exist. This check is used at the condition starting around line
408-412 where it returns early with skipped_existing set to true, and also
applies to similar logic around lines 496-503.
In `@backend/services/claude_suggest.py`:
- Around line 485-501: Add defensive error handling around the numeric
conversions in the clip processing logic to prevent one malformed clip from
failing the entire batch. Wrap the sum(scores.values()) call and the float()
conversions for start, end, start_second, and end_second fields in try-except
blocks, skipping individual clips that have invalid data rather than letting the
exception propagate to the broad exception handler at line 525-527. This ensures
that one clip with non-numeric output does not cause the entire find-moment
result to return an empty list, preserving all valid matches.
In `@backend/services/transcript_packer.py`:
- Around line 85-94: The engine cache isolation guard that prevents whisper.cpp
from adopting legacy md5 caches exists in load_cached_transcript_for_video
(checking _engine_cache_suffix() == ""), but the same protection is missing in
the load_cache function. Apply the same engine cache suffix guard in load_cache
(around lines 357-360) to ensure whisper.cpp runs cannot fall back to legacy
transcripts unconditionally. Add a check for _engine_cache_suffix() == "" before
allowing the legacy cache fallback in load_cache, matching the logic already
present in load_cached_transcript_for_video.
In `@backend/services/transcription_whispercpp.py`:
- Around line 130-134: The line where e is set to s + 0.05 applies a fixed
overhang unconditionally, and when this occurs for multiple consecutive clamped
words, the cumulative effect causes prev_end to drift progressively further past
the intended voiced bounds. Fix this by bounding the overhang value so that e
does not extend beyond a reasonable maximum (such as the next token boundary or
a configured upper bound). Use a min() function or similar constraint to ensure
the 0.05 overhang serves its purpose without accumulating excessive drift across
repeated clamped tokens.
In `@backend/services/transcription.py`:
- Around line 107-110: The whisper.cpp transcription path at the
_transcribe_with_whispercpp call returns immediately without flowing to the
shared diarization block at Line 212+, which causes the enable_diarization and
num_speakers parameters to be ignored for that path. Instead of returning
directly after _transcribe_with_whispercpp execution at the locations marked
(Line 109 and Line 127), refactor the code so that whisper.cpp transcription
results flow through the same diarization and speaker detection logic that other
transcription paths use, ensuring that the shared diarization block processes
all transcription outputs consistently.
In `@src/ui/client/ClipDetail.tsx`:
- Around line 86-87: The player render guard on line 179 references an undefined
variable `file` which breaks compilation. Replace the `file` reference with an
appropriate variable that is actually defined in the ClipDetail component. Based
on the context of this component which works with clip data, thumbnail images,
and preview URLs (as seen in variables like `clip`, `thumbImage`, and
`thumbTimestamp`), determine the correct variable name that should be used in
the conditional check and substitute it for the undefined `file` reference.
In `@src/ui/client/EpisodeWorkspace.jsx`:
- Around line 1513-1534: The condition on line 1513 in the EpisodeWorkspace.jsx
component allows the "Find a specific moment" feature to display when either
transcript or transcriptText exists, but the /api/find-moment endpoint requires
proper parsed transcript segments. Change the conditional check from (transcript
|| transcriptText) to only check for transcript availability, ensuring the
feature only displays when the full parsed transcript segments are available and
not when only raw transcript text exists.
In `@src/ui/web-server.ts`:
- Around line 1036-1042: The route handlers for "/api/clips/:id/preview" and
"/api/clips/:id/download" suppress promise rejections from serveClipById by
using the void operator, which causes unhandled errors instead of proper HTTP
responses. Replace the void keyword with proper error handling by adding a
.catch() handler to the serveClipById promise at both route definitions, or
alternatively make the route callbacks async and wrap the serveClipById calls in
try-catch blocks to ensure errors are caught and handled before they become
unhandled rejections.
In `@tests/test_config_bundle.py`:
- Around line 139-146: The `_enter_migration` method saves and restores
environment variables for PODCLI_CWD and paths, but does not handle
PODCLI_ENV_FILE, which can cause test isolation issues if the variable is set
externally. Save the current PODCLI_ENV_FILE environment variable at the start
of _enter_migration (around line 139-146) similar to how PODCLI_CWD is saved,
then unset or control PODCLI_ENV_FILE to prevent it from interfering with the
test. In the cleanup code (around lines 148-157), restore the saved
PODCLI_ENV_FILE value. Also verify the code at lines 281-298 for any additional
changes needed to maintain test isolation.
---
Outside diff comments:
In @.github/workflows/release.yml:
- Around line 79-86: Add validation after the find command that locates the
whisper-cli binary to ensure the SRC variable is not empty before attempting to
copy the file. Insert a conditional check that explicitly verifies SRC contains
a valid file path, and if it is empty, output a clear error message and exit
with a non-zero status code. This prevents the cp command from failing with a
cryptic "cannot stat" error when the binary is not found, instead providing
immediate and explicit feedback about the build artifact missing.
In `@backend/main.py`:
- Around line 502-519: The TASK_HANDLERS dictionary in the backend is missing
several required MCP tool names that are part of the specified MCP suite. Add
the following missing tool names to the TASK_HANDLERS dictionary:
transcribe_podcast, batch_create_clips, knowledge_base, manage_assets, and
clip_history. Each key should map to its corresponding handler function (for
example, transcribe_podcast should map to handle_transcribe, batch_create_clips
should map to handle_batch_clips, and the other three tools need their
respective handler functions). Ensure all handler functions are properly defined
before being referenced in the TASK_HANDLERS mapping to avoid runtime errors
during tool dispatch.
In `@cli/internal/engine/engine.go`:
- Around line 65-79: The BackendRoot function currently prioritizes the working
directory tree walk over the provisioned runtime backend, which creates a
security issue where local backends can override the embedded one. Reorder the
backend discovery logic to check paths.RuntimeDir()/backend first by testing for
the existence of cli.py there, and only fall back to the os.Getwd() directory
tree walking if the runtime backend is not found. This ensures the provisioned
backend takes precedence while preserving the development fallback capability.
---
Nitpick comments:
In @.github/workflows/release.yml:
- Around line 23-28: The build matrix in the include section is missing a
windows/arm64 configuration entry, while darwin and linux both include arm64
variants. Add a new entry following the same pattern for windows/arm64 with
goos: windows and goarch: arm64 to the include list, or if this exclusion is
intentional, add a comment explaining the rationale for omitting Windows ARM64
support.
- Around line 127-133: The Build remotion bundle step lacks error checking for
the scripts/build-remotion.sh script execution. If the script fails, the
workflow will continue and package an incomplete bundle instead of failing. Add
error checking to the bash script block (either by using set -e at the start of
the run command or by appending || exit 1 after the sh scripts/build-remotion.sh
command) to ensure the workflow fails immediately if the build script encounters
any errors.
- Around line 40-46: The `go generate ./...` command in the release workflow's
run block lacks error checking and could fail silently if backend or PodStack
sync scripts encounter errors. Add `set -e` at the very beginning of the run
block (as the first line after the `run: |` statement) to ensure the workflow
exits immediately if any command fails, preventing subsequent commands like `go
build` from executing with potentially incomplete or corrupt generated files.
- Around line 99-103: The studio build workflow step does not include error
checking, which means if scripts/build-studio.sh fails, the workflow will
continue executing and create a tar archive of an incomplete or empty directory.
Add error handling by including set -e at the beginning of the run block to
ensure the workflow exits immediately if the build-studio.sh script fails,
preventing incomplete bundles from being created and used downstream.
- Around line 149-153: The "Generate checksums" step in the release workflow
does not validate that the checksums.txt file was created successfully before
moving it. If sha256sum fails or produces no output due to missing files in the
dist directory, an empty checksums.txt would be moved to the release artifacts.
Add validation after the sha256sum command to check that checksums.txt exists
and is not empty, and exit with an error if the validation fails, ensuring the
workflow stops before proceeding with the release if checksum generation is
incomplete.
In `@tests/test_find_moments.py`:
- Around line 51-70: Add a new test method to the test file that verifies
find_moments_from_text properly handles malformed AI output. The test should
call find_moments_from_text with SEGMENTS data that includes both a malformed
clip (with invalid scores or timestamps) and a valid clip, then assert that only
the valid clip is returned in the results. This regression test will ensure that
the find_moments_from_text function can gracefully filter out bad clips while
still returning valid ones from mixed output.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 90bd9d52-971a-45ce-bbe1-f7ee46e0672c
📒 Files selected for processing (33)
.github/workflows/release.yml.gitignorebackend/cli.pybackend/config/paths.pybackend/config_bundle.pybackend/main.pybackend/requirements-runtime.txtbackend/requirements.txtbackend/services/claude_suggest.pybackend/services/clip_generator.pybackend/services/transcript_packer.pybackend/services/transcription.pybackend/services/transcription_whispercpp.pycli/internal/engine/engine.gocli/internal/provision/provision.gocli/internal/provision/redirect_test.gocli/internal/provision/security_test.gocli/internal/update/update.gocli/main.gonpm/scripts/install.jspodclisrc/config/paths.tssrc/models/index.tssrc/ui/client/ClipDetail.tsxsrc/ui/client/EpisodeWorkspace.jsxsrc/ui/client/StudioHome.tsxsrc/ui/web-server.tstests/parity/golden/branded.ass.expectedtests/parity/test_caption_parity.pytests/test_config_bundle.pytests/test_find_moments.pytests/test_transcription_engine.pytests/test_whispercpp_snap.py
✅ Files skipped from review due to trivial changes (3)
- .gitignore
- backend/requirements-runtime.txt
- tests/parity/golden/branded.ass.expected
🚧 Files skipped from review as they are similar to previous changes (7)
- src/config/paths.ts
- tests/parity/test_caption_parity.py
- npm/scripts/install.js
- tests/test_whispercpp_snap.py
- cli/internal/update/update.go
- cli/main.go
- cli/internal/provision/provision.go
The thumbnail screenshot opened a Remotion browser via openBrowser without first ensuring one exists (renderMedia, used for captions, ensures it implicitly), so the first render just timed out connecting to a browser that was never provisioned. Call ensureBrowser() before openBrowser(), and run the screenshot from the runtime root — as caption rendering already does — so the Chrome Headless Shell caches under a stable .remotion/ and is reused across working directories instead of re-downloading into (and polluting) each one.
Default allow_ass_fallback to true so a remotion caption-render failure (browser unavailable, offline, arch mismatch) falls back to ffmpeg-burned ASS captions instead of exporting zero clips. Healthy installs still use remotion first, so this only rescues the cases that would otherwise hard-fail.
A new machine can install with only curl or PowerShell — no Go, Node, Python, or FFmpeg required. install.sh / install.ps1 download the prebuilt, statically linked binary (checksum-verified) and put it on PATH; the binary provisions its own hermetic runtimes on first launch. README install section rewritten for the native flow.
- Knowledge upload wrote the raw multipart originalname into the knowledge dir, so a name like ../../x.md escaped it (multer doesn't sanitize separators). Use basename for both the stored name and the extension filter. - extractTarXz shelled out to system tar with no traversal guard (the unverified johnvansickle ffmpeg path). List entries first and reject absolute or .. paths before extracting.
- Verify the Node tarball against nodejs.org SHASUMS256.txt and the CPython build against python-build-standalone's SHA256SUMS after download. Fail closed on mismatch; fail open if the manifest is unreachable so a transient network issue can't block provisioning. - README: rewrite usage for the native binary (podcli, not ./podcli), Web UI via the menu, and MCP via `podcli mcp install`. - Add RELEASE.md (tag-driven release + smoke test); align root package.json to 2.0.0.
- README: Web UI via the menu (not setup.sh), project tree shows cli/ launcher + installers, setup.sh labeled as dev setup. - CONTRIBUTING: document the Go launcher build + go test/gofmt in the PR checklist alongside the Python/TS dev setup. - AGENTS.podstack: drop the nonexistent `setup.sh --host` flag.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
RELEASE.md (2)
42-42: 💤 Low valueComplete the install.sh URL placeholder.
Line 42 uses
.../install.shas a generic placeholder. For clarity in the smoke test procedure, consider replacing it with the full canonical URL or note that it matches the URL in line 9.📝 Proposed clarity improvement
- curl -fsSL .../install.sh | sh # or npm i -g podcli + curl -fsSL https://raw.githubusercontent.com/nmbrthirteen/podcli/main/install.sh | sh # or npm i -g podcli🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@RELEASE.md` at line 42, The install.sh URL in line 42 is incomplete with a generic `.../install.sh` placeholder that should be replaced with the full canonical URL for clarity in the smoke test procedure. Replace the vague placeholder with the actual complete URL, ensuring it matches the URL provided in line 9 of the same file so users can execute the command without ambiguity.
3-3: 💤 Low valueCapitalize "GitHub" per official spelling.
LanguageTool flagged GitHub capitalization on lines 3 and 49. "GitHub" is the official name of the platform.
🔤 Proposed capitalization fixes
- Distribution is fully automated by `.github/workflows/release.yml`: pushing a + Distribution is fully automated by `.github/workflows/release.yml`: pushing aLine 3:
.github/workflows/release.yml(no change needed for this line—the flag refers to the bare "github" in context)Line 49: Change "github.com" context reference:
- Needs outbound HTTPS to github.com, huggingface.co, nodejs.org, and the ffmpeg + Needs outbound HTTPS to GitHub.com, huggingface.co, nodejs.org, and the ffmpegActually, for hostnames like
github.com, the lowercase is technically correct (it's a domain name). The LanguageTool flag may be a false positive for domain names. Consider this a low-priority cosmetic note.Also applies to: 49-49
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@RELEASE.md` at line 3, The review comment identifies GitHub capitalization flags at two locations in RELEASE.md (lines 3 and 49) but clarifies that these are false positives: at line 3, the file path `.github/workflows/release.yml` is correctly lowercase as it is a directory/file path, not a proper noun reference; at line 49, the domain name `github.com` is correctly lowercase as hostnames should be lowercase regardless of the platform's official branding. No changes are needed at either flagged location. However, if there are any bare, non-technical references to "github" elsewhere in the RELEASE.md file (outside of domain names and file paths), those should be capitalized to "GitHub" per official spelling.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cli/main.go`:
- Around line 65-73: The wantsRuntime function only checks for a limited set of
commands, causing other backend-routed commands like clips, thumbnails,
knowledge, presets, assets, youtube, cache, info, and config to skip
ensureRuntime() and fail during setup. Expand the switch statement in
wantsRuntime to include all backend-routed commands that require the managed
runtime. Additionally, remove the "auto" case from wantsRuntime since it is
already handled by podstack.Run before reaching this function; if auto needs the
managed runtime, add the provisioning gate before the podstack.Run call in the
dispatcher instead.
- Around line 78-90: The ensureRuntime() function calls setup(nil) without
capturing or checking its return value, so if the setup fails, runEngine()
continues executing with an incomplete runtime. Modify ensureRuntime() to
capture the exit code or error from setup(nil) and propagate it to the caller
(either by returning an error/exit code or by exiting directly), then update
runEngine() to check the result from ensureRuntime() and return early if setup
fails instead of continuing with model provisioning and backend startup.
In `@install.ps1`:
- Around line 25-39: The catch block is catching all exceptions including the
intentional checksum mismatch throw on line 32, which silently swallows the
error and allows the script to continue. Restructure the error handling so the
try block only wraps the Invoke-WebRequest call to fetch checksums.txt (the
network operation that should be caught), and move the checksum verification
logic and the mismatch throw statement outside the try/catch block so that
verification failures cause the script to fail instead of being silently caught.
In `@RELEASE.md`:
- Around line 32-36: The RELEASE.md documentation lists the remotion bundle
artifact name incorrectly as `remotion-<os>-<arch>.tar.gz` when the actual
artifact generated by the release workflow is
`remotion-bundle-<os>-<arch>.tar.gz`. Update the artifact name in the list to
include the "-bundle" suffix so it matches what the release workflow actually
produces, making the documentation consistent with the workflow outputs.
---
Nitpick comments:
In `@RELEASE.md`:
- Line 42: The install.sh URL in line 42 is incomplete with a generic
`.../install.sh` placeholder that should be replaced with the full canonical URL
for clarity in the smoke test procedure. Replace the vague placeholder with the
actual complete URL, ensuring it matches the URL provided in line 9 of the same
file so users can execute the command without ambiguity.
- Line 3: The review comment identifies GitHub capitalization flags at two
locations in RELEASE.md (lines 3 and 49) but clarifies that these are false
positives: at line 3, the file path `.github/workflows/release.yml` is correctly
lowercase as it is a directory/file path, not a proper noun reference; at line
49, the domain name `github.com` is correctly lowercase as hostnames should be
lowercase regardless of the platform's official branding. No changes are needed
at either flagged location. However, if there are any bare, non-technical
references to "github" elsewhere in the RELEASE.md file (outside of domain names
and file paths), those should be capitalized to "GitHub" per official spelling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 96fbb05d-3c7c-4fed-9dcb-0484619d0e9a
📒 Files selected for processing (15)
AGENTS.podstack.mdCONTRIBUTING.mdREADME.mdRELEASE.mdbackend/config_bundle.pybackend/presets.pybackend/scripts/remotion_screenshot.cjsbackend/services/thumbnail_html.pycli/internal/provision/provision.gocli/internal/provision/studio.gocli/main.goinstall.ps1install.shpackage.jsonsrc/ui/web-server.ts
✅ Files skipped from review due to trivial changes (3)
- package.json
- AGENTS.podstack.md
- README.md
🚧 Files skipped from review as they are similar to previous changes (4)
- src/ui/web-server.ts
- backend/config_bundle.py
- cli/internal/provision/studio.go
- cli/internal/provision/provision.go
Turns podcli from an npm/Python hybrid into a self-contained native binary CLI (modeled on openai/codex). Big one — installs as one binary, runs from anywhere, provisions everything it needs.
What's in it
setup,doctor,update(self-update + atomic swap),config,mcpauto,generate-titles, …) forward to Claude/CodexNotes
Before merge / first release
NPM_TOKENrepo secretv2.0.0to exercise the full CI → release → provision → self-update chain (per-platform assets are unverified until the first tagged run)Summary by CodeRabbit
Release Notes