fix: eliminate cross-request body I/O on multipart & batch-delete writes#100
Merged
Conversation
Multipart and batch-delete writes could fail with 500/503 under a cold-isolate burst on Cloudflare Workers: "Cannot perform I/O on behalf of a different request". The inbound body was read via Response.array_buffer() in the NeedsBody arm, *after* resolve_request_with_metadata awaits the bucket lookup and the STS AssumeRoleWithWebIdentity exchange. wasm-bindgen-futures shares one microtask queue across concurrent requests in an isolate, so that deferred body read can resume under another request's I/O context, where the runtime rejects it. A cold OIDC credential cache synchronizes the first burst of writes onto the STS await, which is why only the opening requests fail and the rest succeed. - Plain (non-aws-chunked) UploadPart now streams via build_streaming_forward with UNSIGNED-PAYLOAD header signing instead of buffering. The part body is never materialized; content-md5 and x-amz-checksum-* are forwarded and signed so S3 still validates part integrity. Mirrors PutObject's streaming write. - The remaining buffered ops (CreateMultipartUpload, CompleteMultipartUpload, AbortMultipartUpload, DeleteObjects) carry only a small body that must be parsed, so they still buffer -- but the body is now collected at the top of handle_request, in the request's own I/O context, before any cross-request await. A synchronous classifier (op_needs_buffered_body) decides this from the parsed operation with no I/O. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
🚀 Latest commit deployed to https://multistore-proxy-pr-100.development-seed.workers.dev
|
…ntent-Length A bare ReadableStream attached as a subrequest body makes the Workers runtime send Transfer-Encoding: chunked and drop Content-Length. S3 rejects a non-aws-chunked PUT/UploadPart with no Content-Length (it can't size the payload), so the subrequest hangs until the whole body streams through (or 501s) and the client SDK retries with backoff -- observed as many stalled PUTs under a parallel multipart upload. Wrap non-aws-chunked PUT bodies that carry a Content-Length in a FixedLengthStream so the runtime emits a real Content-Length and a non-chunked request. The outbound fetch drives the pipe via backpressure, so it still streams (no buffering). aws-chunked bodies are sized by S3 from x-amz-decoded-content-length and keep their raw chunk framing. Also fixes the latent same-shaped issue for plain (non-aws-chunked) PutObject. Adds the TransformStream + WritableStream web-sys features. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Multipart and batch-delete writes intermittently fail with 500/503 under a cold-isolate burst on Cloudflare Workers:
Only the first few requests in a large batch fail; the rest succeed.
Root cause
For
NeedsBodyoperations the inbound body is read viaResponse.array_buffer()inhandle_request'sNeedsBodyarm — afterresolve_request_with_metadataawaits the bucket lookup and the backend-auth middleware's STSAssumeRoleWithWebIdentityexchange.wasm-bindgen-futuresdrains a single microtask queue shared across all concurrent requests in an isolate, so a body read deferred past those awaits can resume while a different request is the active I/O context — which the runtime rejects. A cold OIDC credential cache parks the opening burst of writes on the same STS await, which is why only the opening requests fail.The
no_handle_cross_request_promise_resolutionflag does not help — it only toggles cancel-with-warning (503) vs run-into-the-hard-error (500).Fix
Get the bytes into owned memory in the request's own I/O context, or don't read the stream in WASM at all. Three parts:
1. Pre-read the small buffered ops in-context (
crates/core/src/proxy.rs). The multipart-control + batch-delete ops (CreateMultipartUpload,CompleteMultipartUpload,AbortMultipartUpload,DeleteObjects) must parse their body, so they still buffer — but the read now happens at the top ofhandle_request, before any cross-request await, gated by a synchronous classifier (op_needs_buffered_body) that decides from the parsed operation with no I/O.2. Stream plain
UploadPartinstead of buffering (crates/core/src/proxy.rs). A plain (non-aws-chunked) part previously buffered the whole part into WASM memory viaarray_buffer()(the largest, highest-concurrency body — and the one hitting the cross-request read). It now forwards zero-copy throughbuild_streaming_forwardwithUNSIGNED-PAYLOADheader signing, mirroringPutObject's streaming write.content-md5andx-amz-checksum-*are forwarded and signed so S3 still validates part integrity.build_streaming_forwardis generalized with aforward_header_namesparam (aws-chunked path unchanged — same headers, byte-for-byte).3. Wrap streamed PUT bodies in
FixedLengthStream(crates/cf-workers/src/backend.rs). Required to make part 2 actually work: a bareReadableStreamattached as a subrequest body makes the Workers runtime sendTransfer-Encoding: chunkedand dropContent-Length. S3 rejects a non-aws-chunked PUT/UploadPart with noContent-Length(it can't size the payload), so the subrequest hangs until the whole body streams through, or 501s, and the client retries — observed as stalled PUTs. Wrapping non-aws-chunked PUT bodies that carry aContent-Lengthin aFixedLengthStreammakes the runtime emit a realContent-Length; the outbound fetch drives the pipe via backpressure, so it still streams (no buffering). aws-chunked bodies are sized by S3 fromx-amz-decoded-content-lengthand keep their raw chunk framing. Also fixes the latent same-shaped issue for plainPutObject. (Adds theTransformStream+WritableStreamweb-sys features.)Net: no large body is ever materialized, every buffered read happens in its originating request's context, and streamed parts reach S3 well-formed.
Tests
upload_part_plain_streams_unsigned_preserving_checksum— plain part →ForwardcarryingpartNumber/uploadId,x-amz-content-sha256: UNSIGNED-PAYLOAD, checksum preserved.op_needs_buffered_body_matches_needsbody_ops— classifier matches exactly theNeedsBodyops, excludes streaming/read ops.UploadPart, oversized-part rejection, andCreateMultipartUpload → NeedsBodyunchanged.Verification note
The cross-request I/O behavior and the
FixedLengthStreamframing are Workers-runtime behaviors that unit tests can't exercise — they need a real deployment. Confirm on a preview that a parallel multipart upload and a large batch-delete no longer 500/503 or stall.🤖 Generated with Claude Code