Skip to content

feat(ws): loading-indicator audio loop on a dedicated LiveKit track#18

Merged
tigranbs merged 5 commits into
masterfrom
loading-indicator
May 25, 2026
Merged

feat(ws): loading-indicator audio loop on a dedicated LiveKit track#18
tigranbs merged 5 commits into
masterfrom
loading-indicator

Conversation

@tigranbs

Copy link
Copy Markdown
Contributor

Summary

Adds a loading-indicator audio capability to the WebSocket API — a short, client-supplied clip that loops into the LiveKit room while the calling application is busy ("thinking"). It is the audio equivalent of a spinner: the human participant hears "still working" instead of ambiguous silence.

The loading audio plays on its own dedicated, second published LiveKit track (loading-audio), fully independent of the speech track (tts-audio), so it never interferes with the STT/TTS pipeline.

What changed

  • Config — the config message gains an optional loading_audio object: base64-encoded WAV or raw 16-bit PCM, with optional format, sample_rate, channels, and volume (0.0–1.0). The clip is decoded and validated once per session; a decode failure is non-fatal (the session continues without the feature).
  • Commands — two new fire-and-forget WebSocket commands:
    • loading_start — begins the seamless loop.
    • loading_stop — stops the loop with a short linear fade-out.
  • Dedicated track — a second NativeAudioSource + LocalAudioTrack is published at the clip's own sample rate, so no resampling is needed. The loop captures frames directly on its own source and never touches the TTS operation queue, audio queue, or generation state.
  • Playback — seamless cursor-based looping with wrap-around; ~30 ms fade-out on stop; volume applied once at decode time.
  • Lifecycle — track publish on connect, re-publish on reconnect, and full teardown (cancel → await → abort backstop) on disconnect.

The loop is controlled exclusively by loading_start / loading_stop. speak and clear are unaffected — applications send loading_stop before speak if they don't want overlap.

Validation

  • Input is validated: base64, size cap, 16-bit-PCM only, sample-rate range, channel count, duration bounds, and frame alignment — each with a clear client-facing error.
  • No new third-party dependencies (hound, base64, tokio-util were already declared).

Testing

  • Unit tests for decode/validation, loop mechanics (cursor wrap, fade-out), volume scaling, and message (de)serialization.
  • LiveKit client tests for track setup, start/stop idempotency, disconnect teardown, rapid start/stop stress, and drop-backstop cancellation.
  • All CI checks pass locally: cargo fmt --all -- --check, cargo clippy --all-targets --all-features -- -D warnings, cargo build --locked --all-features, cargo test --locked --all-features (105 tests passing).

Docs

docs/websocket.md, docs/livekit_integration.md, docs/api-reference.md, and the generated docs/openapi.yaml are updated; CLAUDE.md notes the new commands.

tigranbs added 2 commits May 21, 2026 12:44
Add a loading-indicator audio capability to the WebSocket API: a short
client-supplied clip looped into the LiveKit room while the calling
application is busy, so the human participant hears "still working"
instead of silence.

- config: new optional `loading_audio` object (base64 WAV or raw PCM
  with format/sample_rate/channels/volume); decoded and validated once
  per session, non-fatal on failure.
- new `loading_start` / `loading_stop` WebSocket commands, fire-and-forget.
- publish a second, dedicated `loading-audio` LiveKit track, independent
  of the `tts-audio` speech track; the loading loop never touches the
  TTS operation queue, audio queue, or generation state.
- seamless cursor-based looping with a short linear fade-out on stop.
- lifecycle handling for connect, reconnect, and disconnect teardown.

The loop is controlled exclusively by `loading_start` / `loading_stop`;
`speak` and `clear` are unaffected. No new dependencies.
Follow-up hardening for the loading-indicator audio feature:

- Report the original loading_audio decode failure again on a later
  loading_start, instead of a generic "not available" message; the
  reason is retained on the connection state.
- Give start_loading_audio distinct errors for a missing clip versus a
  loading track that failed to publish.
- Close the reconnect/loading-loop race: tear down the loop and clear
  the dead loading source together under the loading_loop lock, so a
  loading_start racing a reconnect can never bind to a stale source.
- Allow WAV container overhead in the decoded-payload size guard so a
  maximum-duration WAV clip is not rejected for its header bytes.
- Run the libwebrtc-native loading-audio tests in an isolated,
  single-threaded CI step; they intermittently segfault under the full
  unit-test binary's thread concurrency.
- Deduplicate the shared loading-clip test helper and add a
  handler-level test for the missing-clip error path.
@tigranbs tigranbs force-pushed the loading-indicator branch from 442e748 to 469d91d Compare May 22, 2026 06:33
Clear loading_loop when the playback task exits; fix stop cursor
continuity and stale decode errors; use BadRequest when LiveKit is
not connected; honor cancel during fade-out; add tests and docs;
ignore local loading-indicator.md planning file.
@tigranbs tigranbs force-pushed the loading-indicator branch from 7c717b8 to 6710d1e Compare May 25, 2026 05:39
tigranbs added 2 commits May 24, 2026 23:14
Five sites constructed the same `AudioSourceOptions { echo_cancellation:
false, noise_suppression: false, auto_gain_control: false }` literal — two
TTS sites (`setup_audio_publishing`, `process_reconnect`), the loading-audio
track publisher, and two test-only injection sites. `auto_gain_control: false`
is load-bearing for the loading-audio `volume` feature (AGC would re-normalise
loudness and silently undo the configured attenuation), and that rationale was
only spelled out at one of the five sites.

Introduce `sayna_audio_source_options()` in `client/mod.rs` whose docstring
owns the rationale for all three flags. Replace every literal with a call to
the helper. The helper is re-exported `pub(crate)` under `#[cfg(test)]` so
the cross-module test in `handlers/ws/loading_handler.rs` can reach it
without exposing `client/` internals at runtime.

Also simplify two `loading_stop` no-op tests in `loading_handler.rs` whose
`(message_tx, mut message_rx) + drop(message_tx)` idiom only obscured the
intent — the handler takes no sender, so a discarded `_tx` is clearer.

The previously-drafted lower-level race tests against `run_loading_loop`
were dropped because the existing `livekit_native_rapid_start_stop_is_clean`,
`livekit_native_start_idempotent_and_disconnect_teardown`,
`livekit_native_loading_loop_cleared_after_stop`, and
`livekit_native_drop_cancels_active_loop` already exercise the same races
through the real `LiveKitClient` public API.
…ource

test_handle_loading_start_message_success_is_silent constructed a real
libwebrtc NativeAudioSource and spawned the loading-audio loop, but ran
in the default multi-threaded cargo test pass. The project quarantines
such tests behind a livekit_native_ prefix and an #[ignore] attribute
because libwebrtc's lazily-initialised global runtime intermittently
segfaults under thread concurrency (see src/livekit/client/tests.rs:499-506
and .github/workflows/ci.yml). This test was an outlier.

Rename it with the livekit_native_ prefix and add the same #[ignore]
attribute as its peers, so the dedicated single-threaded CI step picks
it up alongside the existing six native tests.
@tigranbs tigranbs merged commit be537ea into master May 25, 2026
1 check passed
@tigranbs tigranbs deleted the loading-indicator branch May 25, 2026 22:50
tigranbs added a commit to SaynaAI/saysdk that referenced this pull request May 25, 2026
* adding loading-indicator audio support

Adds loading_audio config plus loading_start / loading_stop fire-and-forget
WebSocket commands to node-sdk and python-sdk, mirroring the server addition
in SaynaAI/sayna#18. Failures continue to surface through the existing
registerOnError / register_on_error callbacks; no new error types are added.

See ../sayna/docs/websocket.md#loading-indicator for the protocol contract.

* removing misleading no-cover pragma from loading_start / loading_stop

The wrapped-exception branches are exercised by
test_loading_start_wraps_send_failure and test_loading_stop_wraps_send_failure
in python-sdk/tests/test_client.py. The pragma was copy-pasted from
sip_transfer (which has no equivalent test) and incorrectly masked covered
code from coverage tracking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant