Review fixes for experimental streaming transcription#16560
Merged
gr2m merged 1 commit intoJul 2, 2026
Conversation
Correctness: - Rework experimental_streamTranscribe to pipe the model stream through a TransformStream instead of eagerly draining it in start(): preserves consumer backpressure and propagates fullStream cancellation to the model stream. - Add cancel() handlers to the OpenAI and xAI doStream ReadableStreams: cancelling now closes the WebSocket and stops reading the audio stream instead of leaking the connection and audio pump. - Treat xAI STT error events as terminal and surface the server message; previously the error was enqueued as a part only, so the caller saw an unrelated NoTranscriptGeneratedError or a hang. - Drop the error-part enqueue before controller.error() in the OpenAI model: erroring a stream resets its queue, so the part was discarded race-dependently. - Only reject still-pending result promises on failure; already-resolved promises (e.g. warnings) no longer flip to rejected depending on when they are first awaited. - Improve the xAI WebSocket connection error message: native WebSocket implementations cannot send the Authorization header, so tell users to pass a header-capable webSocket implementation. - Accept dated gpt-realtime-whisper snapshot model IDs via prefix match. - Warn about non-streaming provider options that OpenAI streaming silently ignored (prompt, temperature, timestampGranularities, include), and about unrecognized inputAudioFormat types that xAI silently mapped to raw PCM. - Remove the duplicate transcript-final part xAI emitted for the same utterance on transcript.done. API surface: - Export the new provider stream spec types with Experimental_ prefixes at the package boundary, matching the video/realtime model pattern for experimental spec surfaces. - Declare streamTranscribe unprefixed and alias it to experimental_streamTranscribe at the export seam per contributing/naming-conventions.md. - Expose language and durationInSeconds on StreamTranscriptionResult; both were delivered by providers in the finish part but unreachable. - Include a gateway/string-model hint in the UnsupportedFunctionality error since string model IDs cannot stream yet. - Move readWebSocketText and toWebSocketUrl into @ai-sdk/provider-utils/websocket (previously duplicated per provider). - Use .optional() instead of .nullish() for the new user-facing xAI streaming options per repo provider-options guidance. Tests, docs, examples: - Add tests for cancellation propagation, server error events, warning emission, snapshot model IDs, and promise settlement semantics. - Add API reference page for experimental_streamTranscribe and index it. - Correct capability tables (no language detection for gpt-realtime-whisper; clarify xAI streaming does not surface word timestamps/diarization/segments). - Add stream-transcribe examples for OpenAI and xAI. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
bf20cd8
into
kevindawkins/kda-146-realtime-transcription-via-provider-option-openai-gpt
3 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
Follow-up to #16338 with fixes for the issues found during review. Targets the PR branch so the changes can be reviewed/merged into the feature PR before it lands on
main.Summary
Correctness
experimental_streamTranscribenow pipes the model stream through aTransformStreaminstead of eagerly draining it insideReadableStream.start(). This preserves consumer backpressure (previously every part of a long live session was buffered in memory regardless of consumer pace) and propagatesfullStreamcancellation to the model stream (previously afor await … breakleft the pump running, rejected all result promises with an internalTypeError, and never cancelled the provider stream).doStreamstreams definecancel()handlers: cancelling closes the WebSocket and stops reading the audio stream. Previously the connection and the audio-sending loop kept running (connection leak, continued audio upload/billing).errorevents are now terminal and surface the actual server message. Previously the error was only enqueued as a stream part; when the server then closed the socket the caller got an unrelatedAI_NoTranscriptGeneratedError, and when it didn't,result.texthung forever.errorpart immediately beforecontroller.error()— erroring a stream resets its queue, so the part was discarded unless a read was pending at that exact moment.warningsatstream-start) no longer flip to rejected when the stream fails later; only pending promises are rejected.Authorizationheader xAI requires, and points atcreateXai({ webSocket }). Without this, the default path fails with an opaque generic error on every standard runtime.gpt-realtime-whisper-*snapshot IDs are accepted via prefix match instead of exact-match rejection.prompt,temperature,timestampGranularities,include) and for unrecognizedinputAudioFormat.typevalues that xAI silently mapped to raw PCM (garbled transcripts with no diagnostic).transcript-finalpart xAI emitted for the same utterance ontranscript.done(consumers rendered every utterance twice with no way to dedupe).API surface
Experimental_TranscriptionModelV4Stream*at the package boundary, matching theExperimental_VideoModelV4/Experimental_RealtimeModelV4pattern fromcontributing/project-philosophies.mdfor experimental spec surfaces. (Open question for maintainers: whetherdoStream?on the stableTranscriptionModelV4needs further isolation — see review discussion.)streamTranscribeis declared unprefixed and aliased toexperimental_streamTranscribeat the export seam percontributing/naming-conventions.md.StreamTranscriptionResultexposeslanguageanddurationInSeconds— both were delivered by the providers in thefinishpart but unreachable through the API (parity withtranscribe()).UnsupportedFunctionalityErrorfor missingdoStreamexplains the gateway/string-model limitation.readWebSocketMessageTextandtoWebSocketUrlmoved to@ai-sdk/provider-utils(previously duplicated verbatim in both provider packages).streamingoptions use.optional()instead of.nullish()per the provider-options guidance in the repo docs.Tests, docs, examples
errorevents, unsupported-option warnings, dated snapshot IDs, promise settlement semantics, unrecognized audio format warning.experimental_streamTranscribe(+ index entry) — previously the function had no reference docs.gpt-realtime-whisperdoes not detect language (the code echoes the user's ownlanguageoption); note that xAI streaming does not surface word timestamps/diarization/segments.examples/ai-functions/src/stream-transcribe/{openai,xai}/basic.ts(self-contained: synthesize PCM viagenerateSpeech, then stream-transcribe it). Addswsto the example package for the xAI header requirement.Not addressed (needs maintainer decision)
gpt-realtime-whisperis in the shared model ID union but Azure has nowebSocketsetting and itsapi-keyauth can never reach the WebSocket path.transcript-partial/transcript-finalvs the noun-verb convention) — worth settling before the strings freeze, but a rename touches the whole PR.bufferedAmountflow control; large pre-recorded sources buffer in the socket.Verification
pnpm --filter ai test:node(3174 passed)pnpm --filter @ai-sdk/openai test:node(741 passed)pnpm --filter @ai-sdk/xai test:node(368 passed)pnpm --filter @ai-sdk/provider-utils test:node(655 passed)pnpm --filter @ai-sdk/provider --filter @ai-sdk/provider-utils --filter @ai-sdk/openai --filter @ai-sdk/xai --filter ai type-checkpnpm check