Understand & demonstrate the real product: comprehension + submit + frame-the-result#4
Conversation
…e director run
Provider config (src/director/config.ts):
- .env loading + resolveProvider: DeepSeek (api.deepseek.com, deepseek-v4-pro
default), OpenRouter, or any OpenAI-compatible endpoint via env vars.
- Vision capability is provider-aware: DeepSeek V4 is text-only, so the
director reads the DOM/inventory and the vision-QC pass is skipped (vision
threaded through crawl screenshots + analyze images + QC).
- .env is gitignored; .env.example documents setup.
Three real bugs the first live run against a real marketing site surfaced
(the fixture app never exercised them):
1. Crawler crashed navigating to a PDF link (download starts → page.goto
throws). Now skips non-HTML hrefs, aborts file navigations, and survives a
bad page instead of killing the whole crawl.
2. LLM client only retried HTTP 429/5xx, not network throws ('fetch failed'
behind a proxy). Now retries network errors with backoff, surfaces the
underlying cause, has an explicit 4-min timeout, and reads reasoning_content
as a fallback for reasoning models.
3. Camera aimed off-frame → pure-background shots. Root cause: the executor
read boundingBox WITHOUT scrolling below/above-fold elements into view, so
it recorded document coords (y=6549, y=-3883) and the camera flew to empty
space. Now scrolls targets into the viewport first (also how a single
viewport reveals a long page); settle is conditional so in-view actions keep
the deterministic timeline. Plus a defensive viewport clamp on the camera
focus so a stray bbox can never fly the camera off-frame again.
Verified end-to-end: deepseek-v4-pro generated a real Northline launch video
(understood the product, picked sample-audit + subscribe money moments, read
form placeholders and typed matching values). 47 tests green.
… --yes Codex's roast-fixes shipped two usability regressions: - url-policy blocked private/localhost by default, breaking the #1 use case (filming your own local dev app). Flipped: private network is ALLOWED by default at the product layer (CLI + generate/record); --block-private-network opts into the SSRF guard for untrusted/public targets. The pure assertSafeNavigationUrl keeps its safe default for standalone use. - generate hard-failed without --yes. Now a one-line privacy notice that does not block; --yes silences it. Also gitignore .worktrees/ (worktree convention). 68 tests green.
Read the app's own source (Next.js app/ and pages/ routers) to derive its real routes + a human summary of each page. The crawler only sees a page's initial DOM, so functional panels reachable by SPA nav never enter the inventory and the director tours the landing page. Reading the code is the ground truth of what each screen shows — cheaper and deeper than vision. extractAppRoutes() walks the repo (skips node_modules/.next/dist/tests), derives route paths (strips (groups), flags [param] dynamics), and pulls a text summary from each page component. routesToSeedAndNotes() turns concrete routes into crawl seed URLs + a product-source summary for the director. 5 unit tests cover route derivation, group stripping, dynamic flagging, node_modules exclusion, monorepo appName scoping, and seed/notes output.
… 4b) The director understood the product but the video stopped at the surface: it typed into a search box but never ran the query, and the camera framed the input instead of the result it produced. This makes the video actually demonstrate the product. - 4a wiring: generate.ts reads app source via extractAppRoutes and seeds the crawl with real routes; --app scopes a monorepo; crawl maxPages scales with seeds. inventory.crawlApp takes seedUrls (same-origin, queued first). - submit: recipe 'type' actions gain submit:true → executor presses Enter, so query/search inputs reveal their payoff. Without it the robot typed into a box and the product never ran. - frame-the-result (4b): inventory now surfaces framable result REGIONS (large stable containers like a graph/results panel). A recipe action can name a focus_selector; the executor resolves it after the action (+settle) and emits focus_bbox; the renderer frames it — fit-zoom so the result fills the frame, longer dwell so the payoff holds. Cursor stays on the control. - robust navigation: goto waits on domcontentloaded + best-effort load instead of load-only, which hung 10s on SPAs pulling a CDN bundle (the Pandora demo's d3) and aborted capture. - script prompt steers the director to submit query fields, frame results, and prefer app-suggested example values so the query returns real data. Verified end-to-end against the Pandora supply-chain demo: types an industry, submits, and the camera holds on the rendered chokepoint graph.
plan: camera frames focus_bbox center (not the interaction bbox), fit-zooms a large result region gentler than the fixed punch-in, and holds the longer FOCUS_DWELL. schema: type actions accept submit + focus_selector (and stay backward compatible when omitted); event log round-trips focus_bbox and rejects a degenerate (zero-size) one.
Bring Brayden's README hero banner (assets/readme-hero-10k.png) and the 'Why people should star it' pitch onto the current README, keeping the accurate pre-release status/pipeline copy rather than the older wip-branding body.
A hand-authored two-scene recipe that drives the Pandora supply-chain demo: type a ticker (NVDA) then an industry (SEMICONDUCTOR), submit each, and frame the rendered chokepoint graph. Doubles as a deterministic showcase of the submit + focus_selector path without needing an LLM.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f4c0b90e26
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const h = hostname.toLowerCase().replace(/^\[(.*)\]$/, "$1").replace(/\.$/, ""); | ||
| if (h === "localhost" || h.endsWith(".localhost")) return true; | ||
| if (h === "0.0.0.0") return true; | ||
| if (isIP(h) === 6) return h === "::1" || h.startsWith("fc") || h.startsWith("fd") || h.startsWith("fe80:"); |
There was a problem hiding this comment.
Block IPv4-mapped IPv6 private targets
When the private-network guard is enabled (allowPrivateNetwork: false), IPv4-mapped IPv6 literals such as http://[::ffff:127.0.0.1]/ normalize to ::ffff:7f00:1 and reach this IPv6 branch, which only treats ::1, fc*, fd*, and fe80:* as private. That lets loopback/RFC1918 targets bypass the guard and still be navigated to; handle ::ffff: mapped addresses by converting/checking the embedded IPv4 range before allowing the URL.
Useful? React with 👍 / 👎.
| const fitW = (FOCUS_FILL * layout.viewport.width) / Math.max(bw, 1); | ||
| const fitH = (FOCUS_FILL * layout.viewport.height) / Math.max(bh, 1); | ||
| z = Math.max(1, Math.min(ZOOM_TARGET, fitW, fitH)); | ||
| dwell = FOCUS_DWELL_MS; |
There was a problem hiding this comment.
Include focus dwell in the rendered duration
For a focused final interaction where the capture ends before e.t + 4200ms (easy with the prompted 600–1400ms holds), this branch extends the camera segment to FOCUS_DWELL_MS, but the duration calculation above still only reserves ZOOM_DWELL_MS for all interactions. The render can therefore stop while the new focus segment is still active, cutting short the promised payoff hold; the frame-count calculation should use the focus dwell when focus_bbox is present.
Useful? React with 👍 / 👎.
| `generate` blocks localhost, RFC1918, link-local, and cloud metadata addresses | ||
| by default. If you intentionally want to film a local development app, opt in: | ||
|
|
||
| ```bash | ||
| node dist/cli/index.js generate --url http://127.0.0.1:3000 --allow-private-network --yes |
There was a problem hiding this comment.
Align private-network docs with the CLI flags
This section says generate blocks private addresses by default and that --allow-private-network opts in, but the CLI now passes allowPrivateNetwork: !values["block-private-network"] and treats --allow-private-network as a deprecated no-op. Users following the README will both misunderstand the default behavior and miss the actual --block-private-network switch needed for untrusted targets.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
What this PR does: Adds source‑code comprehension (sourceRoutes.ts) to crawl real app routes, implements submit and focus_selector so the camera frames the result payload, makes navigation robust with domcontentloaded, and folds in the Codex PR #3 fixes (provider config, streaming, redaction, vitest v4).
Risk areas:
- Unredacted source‑route excerpts are sent to the LLM in repo notes, potentially leaking embedded secrets from source files.
FOCUS_DWELL_MS(4.2s) and fit‑zoom logic can push the rendered video beyond the 60s target, with no enforcement at the recipe level.- Scroll‑into‑view (350 ms sleep) and the 400 ms settle for focus_bbox may affect timing determinism on slower environments.
Verdict:⚠️ Minor concerns
| } else { | ||
| log(` source: no routes detected at ${opts.repoPath} (crawling links only)`); | ||
| } | ||
| } |
There was a problem hiding this comment.
🟡 P2 (Medium): The notes string built from sourceNotes (which contains raw text extracted from source‑code files via extractSummary) is never passed through redactForPrompt. That raw text is sent directly to the LLM inside analyzeApp. If the source code contains hard‑coded keys, tokens, or other sensitive strings, they will leak to the model provider. Apply redactForPrompt to the source‑routes notes before concatenating them with the readme/package notes.
| const ZOOM_LEAD_MS = 600; // camera starts moving before the click lands | ||
| const ZOOM_DWELL_MS = 1500; // stays on target after the event | ||
| /** a framed RESULT (focus_bbox) is the payoff — hold on it longer than a plain | ||
| * interaction so the viewer reads the graph/results before the camera moves */ |
There was a problem hiding this comment.
🟢 P3 (Low): FOCUS_DWELL_MS = 4200 and the fit‑zoom calculation for focus_bbox can extend the effective camera dwell for an action far beyond the recipe’s scheduled hold_ms. With multiple such actions, the total video length may silently exceed the 60‑second target. Consider integrating the extra dwell into the recipe’s budget or capping the total camera‑movement contribution per scene.
A supercut was mute UI footage; viewers couldn't tell what it was selling. This makes the video a product-launch STORY without burying the product. - director writes launch copy: a problem-framed headline (the hook), a product name, a tagline, and one benefit caption per beat (outcome voice, not feature labels). analyze schema + prompt updated; page_url coerced from relative paths. - captions.ts: minimal-by-design timeline — a hook title card opens, the product performs in the middle (no captions over it — it's the star), a close card brands the end. Per-beat lower-thirds intentionally omitted. - host-page renders cards as SVG → ImageBitmap → drawImage (the same raster path the app frames use). Copy comes from the director; colors adapt to light/dark apps; timing comes from the recorded scene events. - FIX: host-page is a template literal, so the wrap helper's /\s+/ had collapsed to /s+/ — splitting captions on the letter 's' and dropping it. Restored the escape (load-bearing; do not 'simplify'). - motion: between scenes the camera glides to a gentle floor instead of pumping fully wide and hard-cutting (GLIDE_Z), for fluency. - bitrate 10→40 Mbps for crisp 1080p60 text. Verified end-to-end on the Meridian demo: hook reads 'You run 12 sites. Three bleed cash — which?', product performs, close brands. 80 tests green.
- No text overlays, ever. supercut renders ONLY the product performing; the cinematic camera carries it. generate no longer passes narrative to the renderer (the caption layer stays in the code but is never fed copy). - Flash fix: the screencast ran continuously across inter-scene reloads, so the blank/white page mid-navigation landed in the footage and flashed at every scene change. Skip frames while navigating (isNavigating) — the renderer holds the last good frame across the gap, so transitions read as a clean hold. Verified on both demos: full-video luma scan shows no brightness spikes (Meridian light + Pandora dark), and no text anywhere.
From an adversarial code review (.roast). Highlights: - H1 destructive-action guard: the director scripts clicks/typing on the LIVE app, so a prompt-injected page could fire a real irreversible action. The inventory now excludes destructive controls (a NARROW lexicon — delete/wipe/ erase/deactivate/cancel-account/pay/checkout/withdraw/…, NOT reversible hero actions like send/remove/reset/archive) so the director can't even see them. Opt back in with --allow-destructive. Exclusions are logged LOUD, never silent. - M1 redaction parity: source notes (README/package.json/page string literals) are now run through redactForPrompt before LLM egress, like DOM text already is. - M2 SSRF: normalize numeric/hex/IPv4-mapped-IPv6 hosts before the private-network check (closed the ::ffff:127.0.0.1 bypass); documented that DNS-rebind is out of scope. Tests added. - M3: prepublishOnly build hook so npm publish can't ship a stale/absent dist/. - M4: deleted the dead caption/text layer (captions.ts + all caption code in host-page.ts + the narrative option) — unreachable since the video is intentionally text-free; retires ~150 lines of untested, hand-escaped code from the highest-risk file. - M6: sourceRoutes.walk skips symlinks (no reading ~/.ssh via a --repo symlink). - Cost telemetry: LLM token usage tracked and reported at end of a run. - Disclosure: privacy notice now states screenshots are uploaded in vision mode. - CI runs the whole (non-e2e) suite so new test files aren't silently skipped. - L2 surface .env parse errors; L3 constant-time render-server token compare; M5 non-fatal warning on event/frame clock skew. Deferred (noted in report): host-page compositor extraction, resolve-and-pin DNS-rebind defense, hard cost budget cap. tsc clean; 84 tests green; render smoke-tested end-to-end (no text, no flash).
There was a problem hiding this comment.
What this PR does: Adds source-code comprehension to seed the crawl with real app routes, submit=true on type actions so queries actually run and reveal results, focus_selector to frame result regions instead of input boxes, and navigation robustness (domcontentloaded + best-effort load). Also folds in Codex PR #3 with typed provider config, redaction, and schema bounds, correcting several regressions (localhost allowed by default, --yes no longer hard-blocks).
Risk areas:
isNavigatingflag stuck true if inter-scene navigation fails, corrupting all subsequent footage- Bitrate increase to 40 Mbps constant could produce larger files than expected for long takes
- Source-route parsing currently Next.js-focused; other frameworks degrade gracefully but produce no seed routes
- Private network navigation now allowed by default (by design, but worth noting for untrusted targets)
| try { | ||
| if (i > 0) { | ||
| await page.goto(scene.entry.url, { timeout: ACTION_TIMEOUT_MS, waitUntil: "load" }); | ||
| await assertSafeNavigationUrl(scene.entry.url, { allowPrivateNetwork }); |
There was a problem hiding this comment.
🟠 P1 (High): If gotoReady or assertSafeNavigationUrl throws during inter-scene navigation, the isNavigating flag remains true because it is not reset in a finally block. Subsequent scenes will then drop all captured frames (since the screencast handler checks isNavigating), effectively rendering the remainder of the video blank. Wrap the navigation + settle code in a try/finally that always resets isNavigating to false.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7f670344d0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ...(values["no-vision"] ? { noVision: true } : {}), | ||
| ...(seed !== undefined ? { seed } : {}), | ||
| // default ALLOW; only --block-private-network engages the SSRF guard | ||
| allowPrivateNetwork: !values["block-private-network"], |
There was a problem hiding this comment.
Make private-network blocking match the documented default
The new README tells users that generate blocks localhost/RFC1918/metadata by default and that --allow-private-network opts in, but the CLI actually passes allowPrivateNetwork: true unless the opposite --block-private-network flag is supplied. In the untrusted/public-target scenario described by the guard, users following the docs get no SSRF protection at all; either flip this default and wire --allow-private-network, or update the docs/help to make --block-private-network explicit.
Useful? React with 👍 / 👎.
| const timer = setTimeout(() => ctrl.abort(), 5000); | ||
| try { | ||
| const res = await fetch(url, { signal: ctrl.signal }); | ||
| await assertSafeNavigationUrl(url, { allowPrivateNetwork, finalUrl: res.url }); |
There was a problem hiding this comment.
Validate redirect targets before following them
When allowPrivateNetwork is false, this post-fetch check is too late: fetch follows HTTP redirects by default, so a public URL that 302s to http://169.254.169.254/... has already made the private-network request before assertSafeNavigationUrl(..., finalUrl: res.url) throws. This undermines --block-private-network for redirecting targets; use manual redirects and validate Location before following, and apply the same pattern to Playwright navigations.
Useful? React with 👍 / 👎.
| isNavigating = true; | ||
| const response = await gotoReady(page, scene.entry.url); | ||
| await assertSafeNavigationUrl(scene.entry.url, { allowPrivateNetwork, finalUrl: response?.url() ?? page.url() }); | ||
| await sleep(SETTLE_MS); | ||
| // timestamp canon (PR #1 review): when nav finishes EARLY, dwell out | ||
| isNavigating = false; |
There was a problem hiding this comment.
Reset navigation suppression on failures
If a later scene times out or is rejected during gotoReady/the final URL check, control jumps to the catch before isNavigating is reset. Because the screencast handler drops every frame while this flag is true, any subsequent scenes that continue after a single non-aborting scene failure are recorded with no new frames, freezing the take on the previous scene; reset the flag in a finally around the navigation attempt.
Useful? React with 👍 / 👎.
| const fitW = (FOCUS_FILL * layout.viewport.width) / Math.max(bw, 1); | ||
| const fitH = (FOCUS_FILL * layout.viewport.height) / Math.max(bh, 1); | ||
| z = Math.max(1, Math.min(ZOOM_TARGET, fitW, fitH)); | ||
| dwell = FOCUS_DWELL_MS; |
There was a problem hiding this comment.
Extend render duration for focused-result dwells
For events with focus_bbox, this branch extends the camera segment to e.t + FOCUS_DWELL_MS, but the duration calculation above still only extends lastT by the shorter ZOOM_DWELL_MS. When a focused action is near the end of the take and the captured scene hold is shorter than 4.2s, the render plan ends before the promised payoff hold completes, so the result framing is truncated; include FOCUS_DWELL_MS in lastT for focused events.
Useful? React with 👍 / 👎.
| export function loadDotEnv(path = ".env"): DotEnvLoadResult { | ||
| if (!existsSync(path)) return { path, loaded: false, reason: "not found" }; | ||
| try { | ||
| (process as unknown as { loadEnvFile: (p: string) => void }).loadEnvFile(path); |
There was a problem hiding this comment.
Support all declared Node 20 versions
The package advertises node >=20, but process.loadEnvFile is only available starting in Node 20.12.0 / 21.7.0 per the Node docs. On Node 20.0–20.11 with a .env present, this call is undefined, the API key is never loaded, and generate fails even though the runtime satisfies engines; either raise the engine floor or fall back to a compatible parser.
Useful? React with 👍 / 👎.
- [P1] isNavigating now reset in a finally: if an inter-scene goto/SSRF-check throws, the flag no longer stays true and freeze the rest of the take blank. - [P1] preflight follows redirects MANUALLY, validating every hop before the request — a public URL can no longer 302 into cloud-metadata/RFC1918 before the --block-private-network guard runs. SSRF errors surface as such (not 'cannot reach'). - [P1/P2] README private-network section rewritten to match the actual CLI: private allowed by default, --block-private-network opts into the guard for untrusted targets, --allow-private-network is a deprecated no-op. Added the 'supercut mutates the target' + --allow-destructive warning and the DNS-rebind limitation note. - [P2] render duration now reserves FOCUS_DWELL_MS for focus_bbox events, so a final focused payoff isn't cut short before its hold completes. - [P2] loadDotEnv falls back to a minimal parser when process.loadEnvFile is absent (Node 20.0–20.11), so the advertised 'node >=20' range actually works. Not changed: the IPv4-mapped-IPv6 SSRF comment is already handled — normalizeHostToIPv4 blocks [::ffff:127.0.0.1], 0x7f000001, and 2130706433 (verified). tsc clean; 84 tests green.
- Use the final supercut wordmark as the README hero (replaces the generic gradient banner). - Restructure: 'Why supercut' (now leads with comprehension + frame-the-result), a dedicated 'How it works' pipeline section, tightened intro. - Private-network + destructive-action docs already match the CLI.
…ch copy Rebuilt the README to match high-signal OSS repos (centered wordmark, tagline, badge row, a DEMO up top, pain-hook opener, scannable sections): - Inline demo GIF (assets/demo-meridian.gif) generated from a real supercut run — the make-or-break 'show, don't tell' that lets a visitor get it instantly. - Leads with the value prop + thesis, then 'What makes it different' (real footage, understands the product, frames the payoff, any app, no key needed). - Quick start that gets to a video in 4 commands; clear no-LLM path. - Removed the unused generic gradient hero.
What this PR does
Makes a supercut actually understand and demonstrate the product, not just tour its landing page. This is the fix for the core flaw: the robot would type into a search box but never run it, and the camera framed the input box instead of the result. Now it drives the real interaction, submits it, and holds on the payoff.
It also folds in the previously-open Codex provider PR (#3) with its regressions corrected, so this one branch is the whole story.
The three problems it solves
1. Comprehension — read the app's source, seed the crawl (4a).
The crawler only ever saw a page's initial DOM, so functional panels behind SPA nav never entered the inventory and the director toured the surface. New
src/director/sourceRoutes.tsreads the app's own source (Next.jsapp/+pages/routers), derives the real routes, and seeds the crawl with them — the code is the ground truth of what each screen shows.--repo <app>/--app <name>(monorepo) opt in.2. Show the payoff — submit.
A
typeaction gainssubmit: true→ the executor presses Enter, so query/search/command fields actually run and reveal their result (a graph, a results list, a detail view). Without it the video filmed a filled-in box and a product that never ran.3. Frame the result, not the cursor (4b).
The crawl now surfaces framable result regions (large stable containers like a chart area). A recipe action can name a
focus_selector; the executor resolves it after the action and emitsfocus_bbox; the renderer frames it — fit-zoomed so the result fills the frame and held longer so the viewer reads it. The cursor stays on the control.Plus a navigation robustness fix:
gotonow waits ondomcontentloaded+ best-effortloadinstead ofload-only, which hung for 10s and aborted capture on SPAs that pull a CDN bundle (the d3-based Pandora demo).Folded-in Codex PR #3 (supersedes it)
This branch contains Codex's roast-fixes (typed provider config, streaming render = OOM fix,
src/security/{redaction,url-policy}, schema bounds, CI, vitest v4) with its three regressions corrected: localhost/private targets allowed by default again,generateno longer hard-blocked on--yes, storyboard constraint relaxed. Closing #3 in favour of this.Verification
npm test→ 80 passing (added source-route + submit/focus_selector/focus_bbox/plan-camera coverage).examples/pandora-demo.recipe.json(rendered demo atout/demo/supercut-pandora-demo.mp4, gitignored).Explicitly deferred (not in this PR)
Do not merge — for review.