Skip to content

Review fixes for experimental streaming transcription#16560

Merged
gr2m merged 1 commit into
kevindawkins/kda-146-realtime-transcription-via-provider-option-openai-gptfrom
kda-146-streaming-transcription-review-fixes
Jul 2, 2026
Merged

Review fixes for experimental streaming transcription#16560
gr2m merged 1 commit into
kevindawkins/kda-146-realtime-transcription-via-provider-option-openai-gptfrom
kda-146-streaming-transcription-review-fixes

Conversation

@gr2m

@gr2m gr2m commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Background

Follow-up to #16338 with fixes for the issues found during review. Targets the PR branch so the changes can be reviewed/merged into the feature PR before it lands on main.

Summary

Correctness

  • experimental_streamTranscribe now pipes the model stream through a TransformStream instead of eagerly draining it inside ReadableStream.start(). This preserves consumer backpressure (previously every part of a long live session was buffered in memory regardless of consumer pace) and propagates fullStream cancellation to the model stream (previously a for await … break left the pump running, rejected all result promises with an internal TypeError, and never cancelled the provider stream).
  • The OpenAI and xAI doStream streams define cancel() handlers: cancelling closes the WebSocket and stops reading the audio stream. Previously the connection and the audio-sending loop kept running (connection leak, continued audio upload/billing).
  • xAI STT error events are now terminal and surface the actual server message. Previously the error was only enqueued as a stream part; when the server then closed the socket the caller got an unrelated AI_NoTranscriptGeneratedError, and when it didn't, result.text hung forever.
  • OpenAI no longer enqueues an error part immediately before controller.error() — erroring a stream resets its queue, so the part was discarded unless a read was pending at that exact moment.
  • Result promises that already resolved (e.g. warnings at stream-start) no longer flip to rejected when the stream fails later; only pending promises are rejected.
  • The xAI WebSocket failure message now explains that native WebSocket implementations (browsers, Node.js, Deno, Bun) cannot send the Authorization header xAI requires, and points at createXai({ webSocket }). Without this, the default path fails with an opaque generic error on every standard runtime.
  • Dated gpt-realtime-whisper-* snapshot IDs are accepted via prefix match instead of exact-match rejection.
  • Warnings are emitted for non-streaming OpenAI options that were silently dropped (prompt, temperature, timestampGranularities, include) and for unrecognized inputAudioFormat.type values that xAI silently mapped to raw PCM (garbled transcripts with no diagnostic).
  • Removed the duplicate transcript-final part xAI emitted for the same utterance on transcript.done (consumers rendered every utterance twice with no way to dedupe).

API surface

  • The new provider spec stream types are exported as Experimental_TranscriptionModelV4Stream* at the package boundary, matching the Experimental_VideoModelV4/Experimental_RealtimeModelV4 pattern from contributing/project-philosophies.md for experimental spec surfaces. (Open question for maintainers: whether doStream? on the stable TranscriptionModelV4 needs further isolation — see review discussion.)
  • streamTranscribe is declared unprefixed and aliased to experimental_streamTranscribe at the export seam per contributing/naming-conventions.md.
  • StreamTranscriptionResult exposes language and durationInSeconds — both were delivered by the providers in the finish part but unreachable through the API (parity with transcribe()).
  • The UnsupportedFunctionalityError for missing doStream explains the gateway/string-model limitation.
  • readWebSocketMessageText and toWebSocketUrl moved to @ai-sdk/provider-utils (previously duplicated verbatim in both provider packages).
  • The new user-facing xAI streaming options use .optional() instead of .nullish() per the provider-options guidance in the repo docs.

Tests, docs, examples

  • New tests: cancellation propagation (core + both providers), server error events, unsupported-option warnings, dated snapshot IDs, promise settlement semantics, unrecognized audio format warning.
  • New API reference page for experimental_streamTranscribe (+ index entry) — previously the function had no reference docs.
  • Capability table corrections: gpt-realtime-whisper does not detect language (the code echoes the user's own language option); note that xAI streaming does not surface word timestamps/diarization/segments.
  • Examples: examples/ai-functions/src/stream-transcribe/{openai,xai}/basic.ts (self-contained: synthesize PCM via generateSpeech, then stream-transcribe it). Adds ws to the example package for the xAI header requirement.

Not addressed (needs maintainer decision)

  • Gateway support for streaming transcription (tracked as future work in Add experimental streaming transcription support #16338).
  • Azure: gpt-realtime-whisper is in the shared model ID union but Azure has no webSocket setting and its api-key auth can never reach the WebSocket path.
  • Stream part naming (transcript-partial/transcript-final vs the noun-verb convention) — worth settling before the strings freeze, but a rename touches the whole PR.
  • Audio send loops have no bufferedAmount flow control; large pre-recorded sources buffer in the socket.

Verification

  • pnpm --filter ai test:node (3174 passed)
  • pnpm --filter @ai-sdk/openai test:node (741 passed)
  • pnpm --filter @ai-sdk/xai test:node (368 passed)
  • pnpm --filter @ai-sdk/provider-utils test:node (655 passed)
  • pnpm --filter @ai-sdk/provider --filter @ai-sdk/provider-utils --filter @ai-sdk/openai --filter @ai-sdk/xai --filter ai type-check
  • pnpm check

Correctness:
- Rework experimental_streamTranscribe to pipe the model stream through
  a TransformStream instead of eagerly draining it in start(): preserves
  consumer backpressure and propagates fullStream cancellation to the
  model stream.
- Add cancel() handlers to the OpenAI and xAI doStream ReadableStreams:
  cancelling now closes the WebSocket and stops reading the audio stream
  instead of leaking the connection and audio pump.
- Treat xAI STT error events as terminal and surface the server message;
  previously the error was enqueued as a part only, so the caller saw an
  unrelated NoTranscriptGeneratedError or a hang.
- Drop the error-part enqueue before controller.error() in the OpenAI
  model: erroring a stream resets its queue, so the part was discarded
  race-dependently.
- Only reject still-pending result promises on failure; already-resolved
  promises (e.g. warnings) no longer flip to rejected depending on when
  they are first awaited.
- Improve the xAI WebSocket connection error message: native WebSocket
  implementations cannot send the Authorization header, so tell users to
  pass a header-capable webSocket implementation.
- Accept dated gpt-realtime-whisper snapshot model IDs via prefix match.
- Warn about non-streaming provider options that OpenAI streaming
  silently ignored (prompt, temperature, timestampGranularities,
  include), and about unrecognized inputAudioFormat types that xAI
  silently mapped to raw PCM.
- Remove the duplicate transcript-final part xAI emitted for the same
  utterance on transcript.done.

API surface:
- Export the new provider stream spec types with Experimental_ prefixes
  at the package boundary, matching the video/realtime model pattern for
  experimental spec surfaces.
- Declare streamTranscribe unprefixed and alias it to
  experimental_streamTranscribe at the export seam per
  contributing/naming-conventions.md.
- Expose language and durationInSeconds on StreamTranscriptionResult;
  both were delivered by providers in the finish part but unreachable.
- Include a gateway/string-model hint in the UnsupportedFunctionality
  error since string model IDs cannot stream yet.
- Move readWebSocketText and toWebSocketUrl into
  @ai-sdk/provider-utils/websocket (previously duplicated per provider).
- Use .optional() instead of .nullish() for the new user-facing xAI
  streaming options per repo provider-options guidance.

Tests, docs, examples:
- Add tests for cancellation propagation, server error events, warning
  emission, snapshot model IDs, and promise settlement semantics.
- Add API reference page for experimental_streamTranscribe and index it.
- Correct capability tables (no language detection for
  gpt-realtime-whisper; clarify xAI streaming does not surface word
  timestamps/diarization/segments).
- Add stream-transcribe examples for OpenAI and xAI.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@gr2m gr2m marked this pull request as ready for review July 2, 2026 17:57
@gr2m gr2m merged commit bf20cd8 into kevindawkins/kda-146-realtime-transcription-via-provider-option-openai-gpt Jul 2, 2026
3 checks passed
@gr2m gr2m deleted the kda-146-streaming-transcription-review-fixes branch July 2, 2026 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant