feat(producer): add --mode=lambda-local to the regression harness#913
Conversation
69ee7ba to
f5512ff
Compare
c8a3a10 to
c0895ef
Compare
vanceingalls
left a comment
There was a problem hiding this comment.
One-line: clean Phase 6b idea but the PR is currently red across required checks — the new module is statically loaded by every harness run, and the tsconfig paths change breaks the CLI bundle.
Calibrated strengths
regression-harness-lambda-local.ts:32-37— the choice to not go through Docker/RIE and instead drive the in-process handler with a filesystem-backed S3 is the right tradeoff for in-CI coverage. The doc-comment explaining why (Docker-in-Docker availability vs. RIE-only smoke) is exactly the kind of design rationale future maintainers need.regression-harness-lambda-local.ts:90-102— pre-staging the project as atar.gzunder the fake-S3 root mirrors prod'sdeploySitepath, so the handler's tar extraction logic is actually being exercised. This is what makes the new mode catch event-shape / key-convention regressions thatdistributed-simulatedcan't.regression-harness-distributed.test.ts:35-37— the new "error message lists all three accepted modes" assertion is the right pin: it tests the user-visible contract, not just the parsing branch.
Blockers
-
regression-harness.ts:34statically importsregression-harness-lambda-local.ts, which then statically imports@hyperframes/aws-lambdaat module top-level. This breaks every regression-shard run in CI today, not just--mode=lambda-localones. Evidence — shard-5 (--mode=in-process, fixturesstyle-4-prod style-11-prod style-2-prod animejs-adapter typegpu-adapter) fails on this PR with:Error [ERR_MODULE_NOT_FOUND]: Cannot find module '/app/packages/producer/node_modules/@hyperframes/aws-lambda/src/index.ts' imported from /app/packages/producer/src/regression-harness-lambda-local.tsRoot cause:
Dockerfile.testonlyCOPYspackages/{core,engine,player,producer}/into the image (lines afterCOPY package.json bun.lock). It does NOT copypackages/aws-lambda/.bun installcreates the workspace symlink, but the target dir is absent inside the container, so the static import fails at module load and the harness can't even start.Fix shapes: (a) lazy-import
runLambdaLocalRenderand its sibling imports behind themode === "lambda-local"branch (dynamicawait import(...)), or (b) addCOPY packages/aws-lambda/ packages/aws-lambda/toDockerfile.test. (a) is cleaner — keeps the producer test image lean and matches the PR's framing that the dep is harness-only. The unit-test path also needs the lazy import or a vitest mock, because the test file atregression-harness-distributed.test.tsdoes NOT import the new module but the harness module itself does — anyone importing harness types in a test pulls aws-lambda into the graph. -
The new tsconfig
pathsaliases break the CLI bundle.packages/producer/tsconfig.json:18-21adds"@hyperframes/producer": ["./src/index.ts"]and"@hyperframes/producer/distributed": ["./src/distributed.ts"]. This is tsconfig-only — butpackages/cli/tsup.config.ts:60-62separately sets:options.alias = { "@hyperframes/producer": resolve(__dirname, "../producer/src/index.ts") };esbuild then resolves
@hyperframes/producer/distributed(imported frompackages/aws-lambda/src/sdk/deploySite.ts:20, now reachable from the CLI graph via the new producer→aws-lambda dep) as<...>/src/index.ts/distributed— a path inside a file, which fails. The Perf:scrub job ran this on the head SHA and produced:✘ [ERROR] Cannot read directory "../producer/src/index.ts": not a directory ✘ [ERROR] Could not resolve "/home/runner/.../packages/producer/src/index.ts/distributed" (originally "@hyperframes/producer/distributed")Fix: extend the esbuild
options.aliasblock inpackages/cli/tsup.config.tsto mirror the tsconfig pair, e.g. add"@hyperframes/producer/distributed": resolve(__dirname, "../producer/src/distributed.ts")before the bare@hyperframes/producerentry (esbuild matches longest-prefix; ordering matters for some bundler versions, worth being explicit). Same fix likely needed wherever else this alias is re-declared (preview server, studio bundler, etc.) — please grep for"@hyperframes/producer":across*tsup*/*vite*/esbuildOptionsand audit each site. Rule 2: contract is "the alias map needs both keys"; the trigger is the CLI build, but every bundler that aliased@hyperframes/producerhas the same precondition now. -
PR test-plan checkbox 4 is unchecked ("End-to-end fixture pass via docker:test:lambda-local"), and the head SHA's CI never exercised it either —
regression-shardsdoesn't pass--mode=lambda-local. The author flags this as "maintainer-run before merging the stack," but that means no in-CI evidence that the new code path renders a single fixture end-to-end when this PR (or the stack it sits on) merges. For a PR whose entire purpose is "catch regressions the existing in-CI suite doesn't," shipping it without one CI shard that actually runs--mode=lambda-localdefeats the goal. Add at least one shard (or a smoke fixture) on the new mode to the regression matrix before merge, even if it's a fast subset. Otherwise the cost of carrying the new mode is paid (extra dep, bundle complexity, fragile path-alias setup) without the benefit accruing.
Important
-
regression-harness-lambda-local.ts:118-119castsfakeS3 as unknown as HandlerDeps["s3"]. That's a sharp edge — the test S3 only implements three commands (Get,Put,Head). When the handler evolves to callDeleteObject,CopyObject,ListObjects, etc., the cast suppresses the type error and the harness fails at runtime withFakeS3: unexpected command <CmdName>from the catchall at line 219. Two cheap improvements: (a) makeFilesystemBackedFakeS3extend an actual interface (e.g.Pick<S3Client, "send">typed against the supported commands), and (b) add a positiveparseHarnessModeFlag-style unit test that round-trips each command against the fake. Otherwise the silent-runtime-only failure mode is exactly the regression class this PR is meant to surface. -
No timeout / no cleanup of
tempRootif a fixture throws.runLambdaLocalRenderwrites totempRoot/s3andtempRoot/lambda-tmpand never cleans them up locally. The caller inregression-harness.tspresumably does, but please verify on the unhappy path (chunk render throws after step B/4 succeeds) — partial-state leftover dirs across fixtures in the same suite can produce false-pass on the next run if the fake S3's<key>collides. Same shape applies todistributed-simulated, so this isn't new; just worth confirming the cleanup wraps both paths the same way. -
Config.width/heighthardcoded to 1920×1080 with a comment "any positive integer works." That comment claims the composition'sdata-width/data-heightoverrides — but this is a Lambda-event-shape test specifically meant to catch SFN-contract drift. If the handler ever starts honoringConfig.width/height(e.g. for canvas sizing in a regression we haven't shipped yet), the harness will silently render 1920×1080 instead of the fixture's true resolution and pass against the golden — defeating the purpose. Either (a) pull the actual width/height fromsuite.meta.renderConfigand pass them through, or (b) add a static check that the fixture's resolution matches the hard-coded values. Mirroringdistributed-simulatedexactly here is what the comment says, butdistributed-simulateddoesn't go through the event-serialization boundary; this mode does, and that's the whole point.
Nits
regression-harness-lambda-local.ts:223—export type { Fps };re-exports a type the file imports but never uses externally. Drop it or use it in theRunLambdaLocalInput.fpsfield type.regression-harness-lambda-local.ts:74-77— comment says "required-by-type but overridden by composition attrs"; if (a) above lands, this comment goes away.- The PR title is
feat(producer)but it also modifiesbun.lockand adds a dep on@hyperframes/aws-lambda; considerfeat(producer,deps)or noting the dep-graph change in the body's "How" more prominently. Future bisects will appreciate it.
CI status (pulled from check_runs on c0895ef):
regression(required): FAIL — shard-5 and shard-6 both broken by the static-import issue above.player-perf(required): FAIL —Perf: scrubfailed atbun run buildwith the tsconfig-paths/esbuild-alias mismatch above.Preflight (lint + format): pass.preview-regression: pass.
Both required-check failures are caused by the changes in this PR, not pre-existing red. They're the blocking findings above.
Verdict: REQUEST CHANGES
Reasoning: Two required CI checks are red, root-caused to this PR (static-import of a workspace package that Dockerfile.test doesn't copy, and a tsconfig-only paths mapping that the CLI's esbuild alias doesn't mirror). Both have small, mechanical fixes; the design itself is sound. Also flagging that the new mode has no in-CI fixture coverage on this PR, which materially weakens the value proposition.
— Vai
c0895ef to
6faac80
Compare
fff432d to
8a006ca
Compare
6faac80 to
15289f3
Compare
8a006ca to
5c99590
Compare
vanceingalls
left a comment
There was a problem hiding this comment.
Thanks for the turnaround. Re-reviewing against the three blockers from my prior review (4304557853) on head 15289f37.
Blocker 1 — static import of regression-harness-lambda-local → @hyperframes/aws-lambda: RESOLVED.
regression-harness.ts:35-46 now wraps the load in a loadLambdaLocalRender() helper that does await import("./regression-harness-lambda-local.js") on demand, and the only call site at line 943 sits inside if (options.mode === "lambda-local"). The static-import edge from the harness module is gone, so a producer-only Dockerfile.test image no longer pulls aws-lambda into the load graph when running other modes. The doc-comment at lines 35-40 captures the why (Dockerfile.test doesn't copy packages/aws-lambda/), which is exactly the kind of rationale that prevents this regressing in a future cleanup.
Blocker 2 — CLI bundle broken by tsconfig-only paths alias: RESOLVED.
packages/cli/tsup.config.ts:81-89 now adds the explicit subpath alias:
"@hyperframes/producer/distributed": resolve(__dirname, "../producer/src/distributed.ts"),
ahead of the bare-package alias, with a comment explaining the esbuild prefix-substitution misfire — the exact fix I called out. The pattern is also extended to @hyperframes/aws-lambda/sdk (lines 90-92), which is the same shape and would have bitten on the next subpath import.
Verified against CI on 15289f37: Perf: scrub, Perf: parity, Perf: drift, Perf: load, Perf: fps all PASS (where the prior head was red on Perf: scrub at bun run build). That's the head-on evidence that the alias chain bundles.
Audit of other alias sites I asked about: packages/cli/tsconfig.json:8-9 and packages/aws-lambda/tsconfig.json:13-14 both already declare both keys; no other tsup/vite bundler in the repo aliases @hyperframes/producer, so the surface is fully covered.
Blocker 3 — no in-CI fixture coverage of --mode=lambda-local: STILL OPEN.
grep -rn "lambda-local\|mode=lambda" .github/ returns nothing on this head. The regression workflow's eight shards still pass --mode=in-process (default) or --mode=distributed-simulated; none run --mode=lambda-local. The only coverage of the new mode in CI is the parser unit test at regression-harness-distributed.test.ts:21-23 ("parses --mode=lambda-local") — which proves the flag is recognized, not that a fixture renders end-to-end through the handler.
This is the same gap I flagged before: the PR's entire value proposition is "catch event-shape / SFN-contract regressions the existing suite can't," but no fixture exercises the new path in CI, so a regression in handler.ts, events.ts, deploySite.ts, or the new harness module itself ships green. The test-plan checkbox "End-to-end fixture pass via docker:test:lambda-local" remains unchecked, and there's no docker:test:lambda-local invocation in the workflow files. Severity: important, not blocking — reasonable people can disagree on whether this needs to land in the same PR or a fast follow-up.
If a fast-follow PR adding one lambda-local shard (even a single smoke fixture) is already planned and tracked, this is fine to merge as-is. If not, I'd push to add it before merge — even a one-fixture shard catches the regression class this whole effort exists to catch. Author's call; flagging so it doesn't get lost.
Importants from prior review — status check:
fakeS3 as unknown as HandlerDeps["s3"]cast atregression-harness-lambda-local.ts:97: unchanged. Catchall at line 224 (throw new Error("FakeS3: unexpected command ${cmdName}")) still surfaces it as a runtime failure when the handler grows a new command. Not blocking, still worth tightening later — the cheap version is typingFilesystemBackedFakeS3against the actualS3Client["send"]signature.- 1920×1080 hardcoding at line 951: unchanged. The doc-comment at
RunLambdaLocalInput.width/height(lines 65-72) is honest about the limitation. Still flags as the right kind of debt to log but not block on.
CI status on 15289f37:
Perf: scrub,Perf: drift,Perf: parity,Perf: load,Perf: fps: PASS (all previously broken-or-affected by blocker 2).Preflight (lint + format): PASS.Preview parity: PASS.regression-shards (1–8): IN_PROGRESS at review time; the prior failure mode (shard-5 / shard-6 on the static-import) is structurally fixed by blocker 1, so I expect green. If they go red on something else, that's a new finding.- The
Perf: ${{ matrix.shard }}"fail" entry is a 0s cancelled stub from the matrix-name-only job and isn't a real failure.
Verdict: APPROVE.
The two CI-blocking fixes are exactly what I asked for and verifiably resolved on this head. Blocker 3 (in-CI fixture coverage of the new mode) is the only outstanding item from the original review, and I'm downgrading it to important rather than blocking — but please don't let it drop, since shipping a "catch regressions" feature with no fixture coverage materially weakens the value. A follow-up PR adding one shard is fine.
— Vai
miguel-heygen
left a comment
There was a problem hiding this comment.
Re-approved. Vai verified both blockers fixed: dynamic import gated on mode, tsup alias for producer/distributed. Regression shards passing. Lambda-local CI coverage deferred as follow-up — acceptable.
The base branch was changed.
15289f3 to
7c1e316
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
Re-approve after rebase. Diff verified unchanged.
vanceingalls
left a comment
There was a problem hiding this comment.
Re-approve after rebase onto main. Force-push dismissed my prior --approve (require_last_push_approval: true) — content unchanged, same commits replayed on the new base. All findings from the prior review's resolution still apply.
Re-review by Vai (post-rebase re-stamp)
7c1e316 to
e36eab9
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
Re-approve on e36eab9. Same content.
vanceingalls
left a comment
There was a problem hiding this comment.
Re-approve after rebase + #910 smoke fix.
#910 adds the CLI smoke fix on top: @hyperframes/aws-lambda moved to devDependencies, dispatcher dynamic-imports @hyperframes/aws-lambda/sdk (lambda.ts:150) with a friendly ERR_MODULE_NOT_FOUND → npm install handler at :152-158. npm pack / npm install now works because there's no workspace:* protocol in published dependencies. Clean fix.
#912/#913/#914/#915 are pure rebases on top — same commits replayed on the new base, content unchanged vs. the last approved round. Findings from the prior review's resolution still apply.
Re-review by Vai (post-smoke-fix re-stamp)
e36eab9 to
c7c8fde
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
Re-approve on c7c8fde. Content-identical.
Third harness mode that drives the OSS @hyperframes/aws-lambda handler
through the exact event sequence Step Functions produces in
production:
handler({Action: "plan"}) → planDir tarball on fake S3
handler({Action: "renderChunk"}) × N → chunk artifacts on fake S3
handler({Action: "assemble"}) → final mp4/mov/png-sequence
The S3 client is a filesystem-backed fake (every s3://<bucket>/<key>
URI maps to <tempRoot>/s3/<key>), so the harness exercises the
handler's event-parsing + tar/S3 conventions + dispatch logic on top
of the underlying producer primitives. Regressions in event JSON
shape, S3 key layout, or plan-hash boundary checks now surface in
the same CI run as the in-process and distributed-simulated modes
without paying for a real AWS round-trip.
Deliberately NOT a Docker/RIE invocation — that would gate the
producer test suite on Docker-in-Docker support which most CI
runners lack. Real-ZIP-via-RIE tests live in
packages/aws-lambda/scripts/ (probe:beginframe) and the
maintainer-run smoke.sh.
Wired up via:
- HarnessMode union extended to include "lambda-local"
- parseHarnessModeFlag accepts --mode=lambda-local
- regression-harness.ts dispatches to runLambdaLocalRender for
the new mode, sharing the distributed-support gate +
pathology-floor threshold with distributed-simulated mode
- package.json scripts: test:lambda-local + docker:test:lambda-local
- producer.devDependencies += @hyperframes/aws-lambda (workspace)
- producer/tsconfig.json gains path mappings to self so the type
cycle through aws-lambda's source resolves at typecheck time
without needing producer to be pre-built
Tests: 3 new unit tests on parseHarnessModeFlag + resolveMinPsnrForMode
cover the new mode. End-to-end PSNR contract still runs through
Dockerfile.test (manual + CI).
Three small cleanups on top of the lambda-local harness:
- Drop the unused createReadStream import + its `void` workaround
comment. The aws-lambda handler's tar / S3 transport pulls
createReadStream from its own imports; this file never references
it directly.
- Hoist the dynamic `await import("node:fs")` calls for
writeFileSync out of FilesystemBackedFakeS3.send into the static
import block. Repeated PutObject calls don't need to repay the
dynamic-import cost.
- Hoist the dynamic `await import("@hyperframes/aws-lambda")` call
for untarDirectory similarly. Drops the now-redundant duplicate
aws-lambda import statement.
The PutObject body branch also collapses: `body instanceof Buffer`
and `typeof body === "string"` both call writeFileSync identically,
so they share one branch.
No behavior changes.
The static import of regression-harness-lambda-local.ts pulled @hyperframes/aws-lambda (and its @aws-sdk/* + @sparticuz/chromium transitive deps) at module-load time. Dockerfile.test only copies the producer's own files into the container, so aws-lambda's src isn't present at runtime — and even `--mode=in-process` failed: Error [ERR_MODULE_NOT_FOUND]: Cannot find module '/app/packages/producer/node_modules/@hyperframes/aws-lambda/src/index.ts' imported from /app/packages/producer/src/regression-harness-lambda-local.ts Load the module on demand instead. `--mode=lambda-local` callers pay the import cost; the existing in-process and distributed- simulated modes don't.
Three review items from Vai:
- `Config.width`/`Config.height` are now plumbed through
RunLambdaLocalInput rather than hardcoded inside
runLambdaLocalRender. Lambda-local's whole point is to catch
event-shape drift; if the handler ever starts honouring
Config.width/height (e.g. for canvas sizing), having those
values flow from the caller means the harness sees what the
fixture authored. The interface change makes the eventual
upgrade-to-real-fixture-resolution a one-line dispatch swap.
- Drop the dead `export type { Fps }` and its unused import
from @hyperframes/core. The module never re-exports it.
- The dispatch site in regression-harness.ts now passes 1920×1080
explicitly with a comment marking it as a placeholder until
the harness compiles the composition HTML up-front to surface
the authored data-width/data-height. distributed-simulated
mode uses the same placeholder internally, kept for parity.
No behavior change in the existing modes; lambda-local now has a
clear extension point for honouring fixture dimensions.
c7c8fde to
7aa2e41
Compare

What
Adds
--mode=lambda-localto the producer regression harness. Drives the OSS@hyperframes/aws-lambdahandler through the exact event sequence Step Functions produces in production, with a filesystem-backed fake S3 standing in for the real bucket.The harness now has three modes:
in-process(default)executeRenderJob— same path thehyperframes renderCLI takesdistributed-simulatedplan→renderChunk× N →assembleprimitives directlylambda-local(new)@hyperframes/aws-lambdahandler dispatch through plan / renderChunk / assemble events, with filesystem-backed fake S3Why
Per
DISTRIBUTED-RENDERING-PLAN.md§ 11 Phase 6b, this PR delivers PR 6.6: in-CI coverage that catches regressions in the event-JSON shape, S3 key conventions, and plan-hash boundary checks thatdistributed-simulateddoesn't touch. Today those bugs only surface when the maintainer-run real-AWS smoke (smoke.sh) runs — andsmoke.shcosts real money + AWS quota.How
HarnessModeunion extended to include"lambda-local".parseHarnessModeFlagaccepts--mode=lambda-local; error message lists all three modes.regression-harness.tsdispatches the new mode throughrunLambdaLocalRender. Shares the existingcheckDistributedSupportgate (webm / NTSC / HDR rejections skip cleanly) and theDISTRIBUTED_SIMULATED_MIN_PSNR_DBpathology floor withdistributed-simulated.packages/producer/src/regression-harness-lambda-local.ts— wires thehandlerimport to aFilesystemBackedFakeS3and walks plan/renderChunk/assemble in sequence. Pre-stages the project as atar.gzunder the fake-S3 root (mirroring whatdeploySitedoes in prod) and copies the final output back out to the path the harness expects.@hyperframes/aws-lambdaworkspace package. The runtime producer code does not import aws-lambda — the dep is harness-only.tsconfig.jsongains path mappings to itself (@hyperframes/producer→./src/index.ts,@hyperframes/producer/distributed→./src/distributed.ts) so the type cycle through aws-lambda's source resolves at typecheck time without producer being pre-built.Deliberate non-choice:
lambda-localis NOT a Docker / RIE invocation. That would gate the producer test suite on Docker-in-Docker support which most CI runners lack. Real-ZIP-via-RIE tests live inpackages/aws-lambda/scripts/probe-beginframe*and the maintainer-runsmoke.sh.Test plan
regression-harness-distributed.test.tscovering the new mode inparseHarnessModeFlag+resolveMinPsnrForModebun run --cwd packages/producer docker:test:lambda-local— runs alongside the existing modes in Dockerfile.test (maintainer-run before merging the stack; the implementation matchesdistributed-simulated's already-passing path)Stacks on #909, #910, and #912.
🤖 Generated with Claude Code