Skip to content

feat(assemblyai): support universal-3-5-pro and expand the transcription provider#16548

Merged
gr2m merged 13 commits into
vercel:mainfrom
dlange-aai:feat/assemblyai-universal-3-5-pro
Jul 2, 2026
Merged

feat(assemblyai): support universal-3-5-pro and expand the transcription provider#16548
gr2m merged 13 commits into
vercel:mainfrom
dlange-aai:feat/assemblyai-universal-3-5-pro

Conversation

@dlange-aai

Copy link
Copy Markdown
Contributor

Background

AssemblyAI's transcription API has moved to the speech_models request parameter and shipped new models (including the universal-3-5-pro flagship) plus new request options. The @ai-sdk/assemblyai provider only supported the legacy best/nano ids via the deprecated singular speech_model param and dropped diarization / audio-intelligence output. This PR brings the provider up to date.

Changes

Models

  • Add universal-3-5-pro, universal-3-pro, and universal-2, routed via the speech_models array (the deprecated singular speech_model is used only for the legacy best model).
  • Deprecate best (still works; emits a deprecation warning) and remove nano — AssemblyAI's API now rejects nano with a 400 ("no longer available").
  • Using universal-3-pro / universal-2 emits an informational warning suggesting universal-3-5-pro; not a deprecation.

Output: speaker diarization + audio intelligence

  • doGenerate now returns the full raw AssemblyAI response on response.body (previously a schema-parsed object that stripped most fields).
  • providerMetadata.assemblyai surfaces utterances (diarization), entities, sentimentAnalysisResults, contentSafetyLabels, iabCategoriesResult, and autoHighlightsResult.

Input parameters

  • New provider options: prompt, keytermsPrompt, temperature, removeAudioTags, domain, speakerOptions, languageDetectionOptions, redactPiiAudioOptions, redactPiiReturnUnredacted, redactStaticEntities.
  • Deprecate wordBoost / boostParam in favor of keytermsPrompt (AssemblyAI rejects word_boost on the newer models). Warnings are emitted for these and for options missing a required prerequisite (e.g. a redactPii* option set without redactPii).

Fixes

  • Transcription segment timings were reported in milliseconds; converted to seconds.
  • Honor a caller-provided fetch for the status-polling requests (previously only upload/submit used it).

Verification

  • Unit tests: 21 (node + edge).
  • Live API: verified end-to-end against AssemblyAI — model routing + speech_model_used, diarization/audio-intelligence output, all new input params, word_boost rejection on the new models, and the keytermsPromptkeyterms_prompt mapping.

Notes

  • The feature-flagged params speech_understanding and language_codes are intentionally not exposed (gated per-account by AssemblyAI).
  • wordBoost on an incompatible model warns and still forwards (the API returns its own clear 400) rather than being silently stripped.

Docs & changeset

  • Provider docs (content/providers/01-ai-sdk-providers/100-assemblyai.mdx) updated; examples repointed to universal-3-5-pro.
  • Includes a single @ai-sdk/assemblyai patch changeset.

🤖 Generated with Claude Code

dlange-aai and others added 13 commits July 1, 2026 21:20
Add universal-3-5-pro, universal-3-pro, and universal-2 to the transcription
model ids. These newer models are only accessible through AssemblyAI's
speech_models request parameter (the singular speech_model parameter is
deprecated and rejects them), so the provider now routes the model id to the
correct parameter automatically: legacy best/nano use speech_model, all other
models use speech_models.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The legacy `best` model is deprecated (still functional, routes via the
deprecated singular `speech_model` parameter): the model id type marks it
`@deprecated` and `doGenerate` emits a deprecation warning pointing to
`universal-3-5-pro`.

The `nano` model is removed entirely — AssemblyAI's API now rejects it with a
400 ("the 'nano' speech model has been deprecated and is no longer available"),
confirmed end-to-end against the live API.

Repoint examples, docs, and README to `universal-3-5-pro`, generalize the
callable provider overload to the full model id type, and expand tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…output

The provider previously returned a Zod-parsed (stripped) transcript as
response.body, dropping speaker labels, utterances, and all audio-intelligence
results even when enabled via providerOptions.

Now doGenerate returns the full raw AssemblyAI response on response.body, and
populates providerMetadata.assemblyai with structured results for the
currently-available features: utterances (diarization), entities,
sentimentAnalysisResults, contentSafetyLabels, iabCategoriesResult, and
autoHighlightsResult. The words schema gains speaker/channel/confidence and a
typed utterances array.

Verified availability against the DeepLearning backend and the public API
reference: deprecated features (Summarization, Auto Chapters, Custom Topics)
are intentionally left off providerMetadata but remain on the raw body.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
AssemblyAI returns word start/end in milliseconds; the provider put them
directly into segments' startSecond/endSecond (and the durationInSeconds
fallback), making timings 1000x too large. Confirmed against the live API
(a 3s clip reported a first word at startSecond: 183). Now divided by 1000.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add provider options for newer AssemblyAI request params: prompt,
keytermsPrompt, temperature, removeAudioTags, and domain (wired through
api-types + getArgs).

Deprecate wordBoost/boostParam: AssemblyAI rejects word_boost with a 400 on
universal-3-pro / universal-3-5-pro / slam-1 (works only on universal-2/best),
so using either now emits a deprecation warning pointing to keytermsPrompt.

Verified param availability and model-gating against the DeepLearning backend
and the public API reference.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…options

Add provider options for AssemblyAI's GA nested request params:
speakerOptions, languageDetectionOptions, redactPiiAudioOptions,
redactPiiReturnUnredacted, and redactStaticEntities (wired through api-types +
getArgs with camelCase->snake_case mapping). Shapes verified against the live
AssemblyAI docs OpenAPI (the assemblyai-api-spec repo was stale for
redact_static_entities).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a regression test asserting nano is no longer special-cased (routes via
speech_models, no deprecation warning) and that universal-3-pro routes via
speech_models.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l-3-5-pro

Emit an informational warning (type: 'other', not a deprecation) when
universal-3-pro or universal-2 is used, noting that universal-3-5-pro is the
latest flagship and is set to replace universal-3-pro. Both models remain fully
supported; universal-3-5-pro emits no warning.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- warn on options missing prerequisites (redactPii*, languageCode+languageDetection)
- fix universal-2 nudge message and wordBoost/boostParam warning attribution
- type removeAudioTags / overrideAudioRedactionMethod as enums (drop `as never`)
- honor config.fetch for polling GETs
- source providerMetadata from the raw response (no field stripping); document
  its timings are in ms while segments are in seconds
- fix redactPiiAudioOptions docs (requires redactPiiAudio); restore the Model
  Capabilities table rows

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e, models

The provider only exposes transcription models (it throws on languageModel), so
the intro line is corrected to match the README and actual behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Combine the per-change changeset files into a single @ai-sdk/assemblyai patch
entry describing the provider update, matching the repo's one-changeset-per-PR
convention.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gr2m gr2m force-pushed the feat/assemblyai-universal-3-5-pro branch from e13307b to 1abe0e3 Compare July 2, 2026 04:21
@gr2m gr2m added the backport Admins only: add this label to a pull request in order to backport it to the prior version label Jul 2, 2026
@gr2m gr2m merged commit ec598e2 into vercel:main Jul 2, 2026
48 of 49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Admins only: add this label to a pull request in order to backport it to the prior version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants