feat(livekit): mix loading-indicator into the single published audio track#19
Merged
Conversation
…dio track Replace the dedicated "loading-audio" track with one published "tts-audio" track that mixes the loading-indicator clip into the TTS stream server-side, so single-track subscribers (browser clients, SIP bridges) hear it. LiveKit is an SFU and never mixes tracks server-side, and many clients play only one audio track, so a second track never reached them. A single audio pump task is the sole writer to the source: TTS pass-through when idle, and continuous 10ms mixing (saturating i16 sum) of the looping clip under speech when active, with a short fade-out on stop. - Resample client-supplied loading clips to the track format once at load time (rubato). - Back-pressure the TTS producer instead of dropping buffered audio, so rapid multi-sentence speech is no longer truncated. - Compute the source queue depth as a valid non-zero multiple of 10ms for any sample rate (fixes a crash at 44.1 kHz). - Collapse the client to one source/track/pump; simplify reconnect/teardown. - Update tests and docs to the single-track model.
8aecd4a to
e8d6662
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Publishes the loading-indicator audio as part of the single agent audio track, mixed in server-side, instead of on a separate
loading-audiotrack.LiveKit is an SFU — it forwards each published track independently and never mixes audio server-side. Many subscribers only ever play one audio track: custom browser clients that attach a single
<audio>element, and SIP bridges that down-mix to one phone stream. A dedicated second track therefore never reached them. This change mixes the loading clip into the one publishedtts-audiotrack so every single-track subscriber hears it.How it works
A single audio "pump" task is the sole writer to the published
NativeAudioSource:i16add — the same approach as LiveKit's ownAudioMixer), with a short fade-out on stop.Both modes live in one task, so there is never a second writer racing the source (two writers on one
NativeAudioSourceinterleave frames rather than mixing).Fixes included
speakoutput plays in full (previously the middle could be cut off).rubato) to the published track format.Compatibility
loading-audiotrack; an audio session now publishes exactly one audio track (tts-audio) whether or not a loading clip is configured.loading_audioconfig andloading_start/loading_stopcommands behave the same; clients that don't use them are unaffected.Testing
cargo fmtandcargo clippy --all-targets --all-features -- -D warningsclean.docs/websocket.md,docs/api-reference.md,docs/livekit_integration.md) updated to the single-track model.