docs(lambda): add docs/deploy/aws-lambda.mdx deployment guide#914
docs(lambda): add docs/deploy/aws-lambda.mdx deployment guide#914jrusso1020 wants to merge 2 commits into
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
51a1c41 to
c8a3a10
Compare
63a62c1 to
1265426
Compare
c8a3a10 to
c0895ef
Compare
1265426 to
9c7e205
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
CI green (regression, player-perf, preview-regression all pass; Graphite pending is expected for a stacked PR).
Content looks accurate. A few specific things I verified:
Architecture diagram — the dispatch model (Plan/RenderChunk/Assemble → one Lambda handler → S3) matches the handler.mjs structure described elsewhere in the stack.
Three deployment paths — CLI → SAM → CDK is a clean progression; the CDK construct exposing .bucket, .renderFunction, .stateMachine is consistent with the described CloudFormation outputs (RenderBucketName, RenderStateMachineArn, RenderFunctionArn).
IAM section — policies user / policies role / policies validate subcommands are well-documented with the Resource: "*" narrowing note; the CI gate pattern for validate is a good call.
Cost accounting — the $0.0214 example and the pointer to costAccounting.ts for auditability is correct in principle. One minor nit: the cost line says "S3 transfer is not included" but doesn't mention S3 GET/PUT request costs either — worth a one-liner so adopters don't expect the number to be complete.
Troubleshooting section — PLAN_HASH_MISMATCH, BROWSER_GPU_NOT_SOFTWARE, stuck-at-RUNNING, and S3 Retain bucket are all realistic failure modes with actionable guidance. FONT_FETCH_FAILED / FFMPEG_VERSION_MISMATCH are mentioned in the stuck-render entry but not given their own entries — fine for v1 docs.
"What's NOT in v1" section — useful, explicitly limits scope. The reference to "PR 6.10 on the plan" for compositions discovery is slightly internal; readers won't know what that means. Consider replacing with "in a future release" or linking to a tracking issue.
No broken links spotted. [CLI reference](/packages/cli#hyperframes-lambda) assumes that anchor exists on the CLI page — make sure the CLI PR in the stack adds it.
Approved.
vanceingalls
left a comment
There was a problem hiding this comment.
One-line summary: docs page is well-structured and mostly accurate, but the BROWSER_GPU_NOT_SOFTWARE troubleshooting entry points users at a non-existent data-gpu-mode composition attribute — that's a blocker on a docs PR.
Additive review — @miguel-heygen already covered the S3 request-cost line, the internal "PR 6.10" reference, and the [CLI reference] anchor concern. I won't repeat those. The findings below are gaps I didn't see in Miguel's review.
Strengths
docs/deploy/aws-lambda.mdx:138-152— the IAM bootstrap section is genuinely strong: it walks throughpolicies user|role|validate, notes theResource: "*"narrowing path, and explicitly recommendspolicies validateas a CI pre-deploy step. Matches the source intent inpackages/cli/src/commands/lambda/policies.ts:1-22.docs/deploy/aws-lambda.mdx:155-167— cost example output ($0.0214 (Lambda $0.0210 + SFN $0.0004)) matches the actualprogressoutput formatter inpackages/cli/src/commands/lambda/progress.ts:46-48. Concrete and verifiable.- The "What's NOT in v1 surface" section at the bottom is the right shape — adopters waste hours looking for missing webhooks/HDR without a callout like this.
Findings
blocker — docs/deploy/aws-lambda.mdx:173 (Troubleshooting: BROWSER_GPU_NOT_SOFTWARE). The doc tells users:
The compiled composition reads
data-gpu-mode="hardware"(or similar). [...] Change the composition'sdata-gpu-modeor omit it (the default is software).
I grepped the entire repo at the PR head: there is no data-gpu-mode attribute handling anywhere in packages/engine, packages/producer, or packages/aws-lambda. The only hits are this doc line and an unrelated gpuModes array in packages/cli/src/commands/render.ts:422 (local dev-render output, not distributed). The actual error source is packages/engine/src/utils/assertSwiftShader.ts:107-122: it reads chrome://gpu after launch and throws if the GL backend isn't SwiftShader. Its own thrown message says:
"Ensure Chrome was launched with
--use-gl=swiftshader --use-angle=swiftshaderand that the SwiftShader libraries are present in the runtime image."
i.e. the failure is a Lambda runtime-image / launch-flags problem, NOT a composition attribute. An adopter who hits this error and follows the doc's advice will edit a non-existent attribute on their composition and the error will persist. Worse than no advice. Replace this entry with the actual root cause (Chrome launch flags / SwiftShader libs in the handler ZIP) and the actual remediation (rebuild the ZIP with bun run --cwd packages/aws-lambda build:zip and re-deploy, since lambda deploy rebuilds the ZIP that bundles @sparticuz/chromium).
important — coverage gap: hyperframes lambda sites create is not mentioned anywhere in the doc. The CLI's own HELP at packages/cli/src/commands/lambda.ts:18-21 and its examples array call it out as a first-class workflow ("Pre-upload a project so multiple renders share the upload"), and the render subcommand explicitly supports a --site-id flag that consumes its output (packages/cli/src/commands/lambda/render.ts:51-60). For a page titled "Three deployment paths" that's supposed to take adopters from credentials to rendered MP4, omitting the sites workflow leaves users on Path 1 re-tarring + re-uploading the same project on every render — exactly the cost shape the page elsewhere tries to avoid. Add a sites create subsection (Path 1.5 or a "Re-using uploads" callout under Path 1).
important — SAM-path concurrency default mismatch. The doc's framing under Path 1 (docs/deploy/aws-lambda.mdx:62-67) explains why --concurrency=8 is a conservative default that bounds runaway spend, and the Path 2 SAM example happens to pass ReservedConcurrency=8. But the SAM template's own default is -1 (unreserved) — see examples/aws-lambda/template.yaml:36-42. A reader who simplifies the Path 2 example by dropping --parameter-overrides is silently switched from "conservative 8-cap" to "account-default unreserved." Worth one extra line in the Path 2 section: "Drop ReservedConcurrency from --parameter-overrides at your own risk — the template's own default is -1 (unreserved)." Same warning shape as the Path 1 paragraph.
nit — docs/deploy/aws-lambda.mdx:30 ("HyperFrames repo checkout"). Says lambda deploy builds the ZIP from source, and adopters who deploy outside a checkout can set HYPERFRAMES_REPO_ROOT. Verified accurate (packages/cli/src/commands/lambda/repoRoot.ts:15-30). But the env var is undocumented anywhere outside this single table row — worth a one-liner in the env-var reference (if one gets added later), or at least a fuller example here showing the directory structure it expects ($HYPERFRAMES_REPO_ROOT/packages/aws-lambda/package.json must exist).
nit — docs/deploy/aws-lambda.mdx:177 (stuck-at-RUNNING entry) lists FONT_FETCH_FAILED and FFMPEG_VERSION_MISMATCH as examples of typed errors the SFN console surfaces. Verified those names exist in packages/aws-lambda/src/cdk/HyperframesRenderStack.ts:193-207 (alarm dimensions). Miguel suggested giving each its own troubleshooting entry; I'll second that as a low-priority follow-up since these are the most common production failure modes after PLAN_HASH_MISMATCH.
nit — docs/deploy/aws-lambda.mdx:55-58 deploy example doesn't pass --profile, but the CLI documents it (packages/cli/src/commands/lambda.ts:74). For users on multi-account setups, a one-liner mentioning the flag (or the AWS_PROFILE env var fallback that deploy.ts:42 reads) would head off a class of "wrong-account deploy" pitfalls.
Verdict
Verdict: REQUEST CHANGES
Reasoning: the BROWSER_GPU_NOT_SOFTWARE entry actively misleads — it tells adopters to edit a composition attribute that doesn't exist, instead of the real runtime-image fix. That's a blocker on a docs page where the troubleshooting section is the load-bearing reason users land there. Everything else is fixable or punt-able. Fix the GPU entry, optionally add a sites create subsection, and this is good to ship.
Review by Vai
c0895ef to
6faac80
Compare
9c7e205 to
8d6ffe5
Compare
End-to-end deploy guide for the AWS Lambda surface. Covers:
- Architecture diagram (Step Functions Plan → Map(N) → Assemble +
the single Lambda function dispatching by Action; pulled from
the distributed rendering plan §15.2).
- Prerequisites table (AWS creds, SAM CLI, bun, repo checkout).
- Three deployment paths: hyperframes lambda CLI (recommended),
direct sam deploy against examples/aws-lambda/template.yaml,
and HyperframesRenderStack CDK construct.
- IAM bootstrap via hyperframes lambda policies user/role/validate.
- Cost shape — how Lambda GB-seconds + SFN transitions roll up
into the displayCost the progress verb prints.
- Troubleshooting block with the typed error names operators
actually hit (PLAN_HASH_MISMATCH, BROWSER_GPU_NOT_SOFTWARE,
iam:CreateRole denial, stuck RUNNING, S3 Retain semantics).
- "What's NOT in v1" callout so adopters don't burn time looking
for webhooks / compositions verb / HDR support.
Registered under a new "Deploy" group in docs.json's Documentation
tab, sitting after Packages so the conceptual flow is "what you
can build" → "how to ship it."
No code changes.
One blocker + two important items from Vai's review:
- The BROWSER_GPU_NOT_SOFTWARE troubleshooting entry pointed
adopters at a non-existent `data-gpu-mode` composition attribute.
Replaced with the actual root cause (Chrome launch flags +
@sparticuz/chromium libs in the handler ZIP) and the actual
remediation: rebuild + redeploy via `lambda deploy` (which
always rebuilds the ZIP). The composition-attribute story
would have sent users editing the wrong file entirely.
- Added a `sites create` subsection under Path 1 so adopters
running tight inner loops know how to reuse a project upload
across many renders instead of re-tarring + re-uploading on
each call. The CLI surface was first-class but the doc had
been silent.
- Added a Warning callout under Path 2 explaining that the SAM
template's own ReservedConcurrency default is `-1` (unreserved)
— a reader simplifying the Path 2 example by dropping the
--parameter-overrides flag would silently switch to unreserved
concurrency and pay the runaway-Map cost. The warning mirrors
the cost-shape callout earlier in the page.
6faac80 to
15289f3
Compare
8d6ffe5 to
0ded0c0
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
Blocker from the previous review is addressed:
Troubleshooting entry referenced non-existent data-gpu-mode attribute — Fixed. The bogus data-gpu-mode attribute reference is gone. The troubleshooting section now correctly documents the BROWSER_GPU_NOT_SOFTWARE error with the real fix: rebuild the handler ZIP and redeploy. The explanation correctly identifies that the issue is at the runtime-image / launch-flags layer (SwiftShader via --use-gl=swiftshader --use-angle=swiftshader), not at the composition layer, and that lambda deploy always rebuilds the ZIP so a redeploy resolves it.
vanceingalls
left a comment
There was a problem hiding this comment.
Re-review of 0ded0c07 against my prior REQUEST CHANGES at 4304554554.
Resolution status
- Blocker —
BROWSER_GPU_NOT_SOFTWAREpointed at non-existentdata-gpu-mode: resolved. Grepped HEAD (docs/,packages/,examples/) — zero hits fordata-gpu-mode. New entry atdocs/deploy/aws-lambda.mdx:189-198now correctly attributes the failure to the runtime-image / launch-flags layer and tells adopters to rebuild viabun run --cwd packages/aws-lambda build:zip(verified script exists inpackages/aws-lambda/package.json) and redeploy. The Chrome flag pair cited (--use-gl=swiftshader --use-angle=swiftshader) matches whatassertSwiftShader.ts:121says is required. Advice now leads to the actual fix. - Important — missing
sites createworkflow: resolved. New "Pre-staging a project withsites create" subsection atdocs/deploy/aws-lambda.mdx:76-88documents the workflow, the--site-idconsumer, and the content-addressing semantics. The SHA-256 +HeadObjectshort-circuit claim is grounded inpackages/aws-lambda/src/sdk/deploySite.ts:114-126. - Important — SAM
ReservedConcurrencydefault-1mismatch: resolved. Warning callout atdocs/deploy/aws-lambda.mdx:113-115correctly states the SAM template's own default is-1(unreserved) and warns about silently dropping the override. Matchesexamples/aws-lambda/template.yaml:40-42. - Nits (
HYPERFRAMES_REPO_ROOTdepth,--profile/AWS_PROFILE): not addressed. These were optional and remain optional — author's call.
Scope check
Diff between 149555f...0ded0c0 touches one file (docs/deploy/aws-lambda.mdx). No scope creep.
CI
mergeStateStatus=UNSTABLE is failing optional checks only — check_runs shows no failure conclusions on the head SHA. Per Rule 5, this is mention-not-block.
Verdict
Verdict: APPROVE
Reasoning: the blocker is fixed at the root (advice now points to the real runtime-image / Chrome-flags fix instead of a phantom composition attribute), both important items are addressed with technically accurate framing, and nothing else regressed. Nits are author's call.
Review by Vai

What
Adds
docs/deploy/aws-lambda.mdx— the end-to-end deployment guide for the new AWS Lambda surface. Registered under a new "Deploy" group in the Mintlify nav (docs/docs.json).Why
Per
DISTRIBUTED-RENDERING-PLAN.md§ 11 Phase 6b PR 6.7: adopters landing on the docs site need a single page that takes them from "I have AWS credentials" to "I have a rendered video in S3" without having to read the SAM template or the SDK source. The page collects everything the implementation PRs in this stack added.How
Covers:
DISTRIBUTED-RENDERING-PLAN.md§ 15.2).hyperframes lambdaCLI (recommended), directsam deployagainstexamples/aws-lambda/template.yaml, and theHyperframesRenderStackCDK construct.hyperframes lambda policies user|role|validate.displayCostthe progress verb prints. Notes that it's best-effort and S3 transfer is excluded.PLAN_HASH_MISMATCH,BROWSER_GPU_NOT_SOFTWARE, theiam:CreateRoledenial, stuckRUNNING, theRetainbucket semantics).No code changes.
Stacks on #909, #910, #912, and #913.
🤖 Generated with Claude Code