Review fixes for experimental streaming transcription by gr2m · Pull Request #16560 · vercel/ai

gr2m · 2026-07-01T22:09:45Z

Background

Follow-up to #16338 with fixes for the issues found during review. Targets the PR branch so the changes can be reviewed/merged into the feature PR before it lands on main.

Summary

Correctness

experimental_streamTranscribe now pipes the model stream through a TransformStream instead of eagerly draining it inside ReadableStream.start(). This preserves consumer backpressure (previously every part of a long live session was buffered in memory regardless of consumer pace) and propagates fullStream cancellation to the model stream (previously a for await … break left the pump running, rejected all result promises with an internal TypeError, and never cancelled the provider stream).
The OpenAI and xAI doStream streams define cancel() handlers: cancelling closes the WebSocket and stops reading the audio stream. Previously the connection and the audio-sending loop kept running (connection leak, continued audio upload/billing).
xAI STT error events are now terminal and surface the actual server message. Previously the error was only enqueued as a stream part; when the server then closed the socket the caller got an unrelated AI_NoTranscriptGeneratedError, and when it didn't, result.text hung forever.
OpenAI no longer enqueues an error part immediately before controller.error() — erroring a stream resets its queue, so the part was discarded unless a read was pending at that exact moment.
Result promises that already resolved (e.g. warnings at stream-start) no longer flip to rejected when the stream fails later; only pending promises are rejected.
The xAI WebSocket failure message now explains that native WebSocket implementations (browsers, Node.js, Deno, Bun) cannot send the Authorization header xAI requires, and points at createXai({ webSocket }). Without this, the default path fails with an opaque generic error on every standard runtime.
Dated gpt-realtime-whisper-* snapshot IDs are accepted via prefix match instead of exact-match rejection.
Warnings are emitted for non-streaming OpenAI options that were silently dropped (prompt, temperature, timestampGranularities, include) and for unrecognized inputAudioFormat.type values that xAI silently mapped to raw PCM (garbled transcripts with no diagnostic).
Removed the duplicate transcript-final part xAI emitted for the same utterance on transcript.done (consumers rendered every utterance twice with no way to dedupe).

API surface

The new provider spec stream types are exported as Experimental_TranscriptionModelV4Stream* at the package boundary, matching the Experimental_VideoModelV4/Experimental_RealtimeModelV4 pattern from contributing/project-philosophies.md for experimental spec surfaces. (Open question for maintainers: whether doStream? on the stable TranscriptionModelV4 needs further isolation — see review discussion.)
streamTranscribe is declared unprefixed and aliased to experimental_streamTranscribe at the export seam per contributing/naming-conventions.md.
StreamTranscriptionResult exposes language and durationInSeconds — both were delivered by the providers in the finish part but unreachable through the API (parity with transcribe()).
The UnsupportedFunctionalityError for missing doStream explains the gateway/string-model limitation.
readWebSocketMessageText and toWebSocketUrl moved to @ai-sdk/provider-utils (previously duplicated verbatim in both provider packages).
The new user-facing xAI streaming options use .optional() instead of .nullish() per the provider-options guidance in the repo docs.

Tests, docs, examples

New tests: cancellation propagation (core + both providers), server error events, unsupported-option warnings, dated snapshot IDs, promise settlement semantics, unrecognized audio format warning.
New API reference page for experimental_streamTranscribe (+ index entry) — previously the function had no reference docs.
Capability table corrections: gpt-realtime-whisper does not detect language (the code echoes the user's own language option); note that xAI streaming does not surface word timestamps/diarization/segments.
Examples: examples/ai-functions/src/stream-transcribe/{openai,xai}/basic.ts (self-contained: synthesize PCM via generateSpeech, then stream-transcribe it). Adds ws to the example package for the xAI header requirement.

Not addressed (needs maintainer decision)

Gateway support for streaming transcription (tracked as future work in Add experimental streaming transcription support #16338).
Azure: gpt-realtime-whisper is in the shared model ID union but Azure has no webSocket setting and its api-key auth can never reach the WebSocket path.
Stream part naming (transcript-partial/transcript-final vs the noun-verb convention) — worth settling before the strings freeze, but a rename touches the whole PR.
Audio send loops have no bufferedAmount flow control; large pre-recorded sources buffer in the socket.

Verification

pnpm --filter ai test:node (3174 passed)
pnpm --filter @ai-sdk/openai test:node (741 passed)
pnpm --filter @ai-sdk/xai test:node (368 passed)
pnpm --filter @ai-sdk/provider-utils test:node (655 passed)
pnpm --filter @ai-sdk/provider --filter @ai-sdk/provider-utils --filter @ai-sdk/openai --filter @ai-sdk/xai --filter ai type-check
pnpm check

Correctness: - Rework experimental_streamTranscribe to pipe the model stream through a TransformStream instead of eagerly draining it in start(): preserves consumer backpressure and propagates fullStream cancellation to the model stream. - Add cancel() handlers to the OpenAI and xAI doStream ReadableStreams: cancelling now closes the WebSocket and stops reading the audio stream instead of leaking the connection and audio pump. - Treat xAI STT error events as terminal and surface the server message; previously the error was enqueued as a part only, so the caller saw an unrelated NoTranscriptGeneratedError or a hang. - Drop the error-part enqueue before controller.error() in the OpenAI model: erroring a stream resets its queue, so the part was discarded race-dependently. - Only reject still-pending result promises on failure; already-resolved promises (e.g. warnings) no longer flip to rejected depending on when they are first awaited. - Improve the xAI WebSocket connection error message: native WebSocket implementations cannot send the Authorization header, so tell users to pass a header-capable webSocket implementation. - Accept dated gpt-realtime-whisper snapshot model IDs via prefix match. - Warn about non-streaming provider options that OpenAI streaming silently ignored (prompt, temperature, timestampGranularities, include), and about unrecognized inputAudioFormat types that xAI silently mapped to raw PCM. - Remove the duplicate transcript-final part xAI emitted for the same utterance on transcript.done. API surface: - Export the new provider stream spec types with Experimental_ prefixes at the package boundary, matching the video/realtime model pattern for experimental spec surfaces. - Declare streamTranscribe unprefixed and alias it to experimental_streamTranscribe at the export seam per contributing/naming-conventions.md. - Expose language and durationInSeconds on StreamTranscriptionResult; both were delivered by providers in the finish part but unreachable. - Include a gateway/string-model hint in the UnsupportedFunctionality error since string model IDs cannot stream yet. - Move readWebSocketText and toWebSocketUrl into @ai-sdk/provider-utils/websocket (previously duplicated per provider). - Use .optional() instead of .nullish() for the new user-facing xAI streaming options per repo provider-options guidance. Tests, docs, examples: - Add tests for cancellation propagation, server error events, warning emission, snapshot model IDs, and promise settlement semantics. - Add API reference page for experimental_streamTranscribe and index it. - Correct capability tables (no language detection for gpt-realtime-whisper; clarify xAI streaming does not surface word timestamps/diarization/segments). - Add stream-transcribe examples for OpenAI and xAI. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions Bot assigned gr2m Jul 1, 2026

vercel Bot deployed to Preview July 1, 2026 22:10 View deployment

gr2m marked this pull request as ready for review July 2, 2026 17:57

gr2m merged commit bf20cd8 into kevindawkins/kda-146-realtime-transcription-via-provider-option-openai-gpt Jul 2, 2026
3 checks passed

gr2m deleted the kda-146-streaming-transcription-review-fixes branch July 2, 2026 17:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Review fixes for experimental streaming transcription#16560

Review fixes for experimental streaming transcription#16560
gr2m merged 1 commit into
kevindawkins/kda-146-realtime-transcription-via-provider-option-openai-gptfrom
kda-146-streaming-transcription-review-fixes

gr2m commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gr2m commented Jul 1, 2026

Background

Summary

Not addressed (needs maintainer decision)

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant