diff --git a/.cursor/rules/node-sdk.mdc b/.cursor/rules/node-sdk.mdc index 4f166ec..dc82c38 100644 --- a/.cursor/rules/node-sdk.mdc +++ b/.cursor/rules/node-sdk.mdc @@ -62,6 +62,11 @@ Unknown or malformed websocket control messages should be logged and ignored so - `onAudioInput(audioData)` - Send audio for STT - `sendMessage(message, role, topic?, debug?)` - Send LiveKit data message - `sipTransfer(transferTo)` - Initiate SIP call transfer +- `loadingStart()` - Begin server-side seamless playback loop of the configured loading audio clip on a dedicated LiveKit track; fire-and-forget; errors surface via `registerOnError` +- `loadingStop()` - Stop the loading-audio loop with a short server-side fade-out; never reports a server-side error + +### Loading Indicator +A `LoadingAudioConfig` passed as the constructor's 8th positional argument (or the `saynaConnect()` 7th argument) registers a base64-encoded WAV or raw 16-bit little-endian PCM clip that the server loops on a dedicated `"loading-audio"` LiveKit track while `loadingStart()` is active. The clip is decoded once at config time; the SDK does no audio decoding or file IO. The application must call `loadingStop()` before `speak()` to avoid overlap — neither the SDK nor the server auto-stop the loop on speech. See the Loading Indicator section of `node-sdk/README.md` and `../sayna/docs/websocket.md#loading-indicator`. ## Documentation Reference diff --git a/.cursor/rules/python-sdk.mdc b/.cursor/rules/python-sdk.mdc index d488e68..16ac9dd 100644 --- a/.cursor/rules/python-sdk.mdc +++ b/.cursor/rules/python-sdk.mdc @@ -90,6 +90,8 @@ Unknown or malformed websocket control messages should be logged and ignored so - `on_audio_input(audio_data)` - Send audio bytes for STT - `send_message(message, role, topic=None, debug=None)` - `tts_flush(allow_interruption=True)` +- `loading_start()` - Async fire-and-forget; tells the server to begin the seamless playback loop of the configured loading-audio clip on a dedicated `"loading-audio"` LiveKit track. Idempotent server-side; failures (no clip configured, audio disabled, no LiveKit, decode failure, track publish failure) surface via `register_on_error`. +- `loading_stop()` - Async fire-and-forget; tells the server to stop the loading-audio loop with a short fade-out. Always silent server-side (no `error` is emitted even if no loop is running). ### Client Properties - `ready` - Boolean, connection ready state @@ -105,6 +107,7 @@ Key models in `types.py`: - `STTConfig` - Speech-to-text configuration - `TTSConfig` - Text-to-speech configuration - `LiveKitConfig` - LiveKit room configuration +- `LoadingAudioConfig` - Loading-indicator audio clip uploaded once at config time via the `loading_audio=` constructor kwarg. Fields: `data` (base64, required), `format: Literal["wav", "pcm"]` (optional), `sample_rate` (optional, required for raw PCM), `channels` (optional), `volume` (optional, clamped to `[0.0, 1.0]`). `extra="forbid"` rejects unknown fields. The server decodes once at config time; decode failures arrive on `register_on_error`. - `STTResult` - Transcription result - `VoiceDescriptor` - TTS voice info - `SipHook` - SIP webhook entry diff --git a/.gitignore b/.gitignore index b67e63b..c9c3d7e 100644 --- a/.gitignore +++ b/.gitignore @@ -6,4 +6,5 @@ rust-ffi-migration-plan.md results/ target/ .mcgravity/ +*-prd.md ws-test.ts diff --git a/node-sdk/README.md b/node-sdk/README.md index 3c8babe..d569b9c 100644 --- a/node-sdk/README.md +++ b/node-sdk/README.md @@ -39,6 +39,8 @@ await client.connect(); await client.speak("Hello, world!"); ``` +The constructor also accepts an 8th positional argument `loadingAudio?: LoadingAudioConfig` for a server-side "thinking" audio loop on a dedicated LiveKit track; see [Loading Indicator](#loading-indicator) below. + ## API ### REST API Methods @@ -210,15 +212,18 @@ try { These methods require an active WebSocket connection: -### `new SaynaClient(url, sttConfig, ttsConfig, livekitConfig?, withoutAudio?)` +### `new SaynaClient(url, sttConfig, ttsConfig, livekitConfig?, withoutAudio?, apiKey?, streamId?, loadingAudio?)` -| parameter | type | purpose | -| --------------- | --------------- | ------------------------------------------------------- | -| `url` | `string` | Sayna server URL (http://, https://, ws://, or wss://). | -| `sttConfig` | `STTConfig` | Speech-to-text provider configuration. | -| `ttsConfig` | `TTSConfig` | Text-to-speech provider configuration. | -| `livekitConfig` | `LiveKitConfig` | Optional LiveKit room configuration. | -| `withoutAudio` | `boolean` | Disable audio streaming (defaults to `false`). | +| parameter | type | purpose | +| --------------- | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `url` | `string` | Sayna server URL (http://, https://, ws://, or wss://). | +| `sttConfig` | `STTConfig` | Speech-to-text provider configuration. | +| `ttsConfig` | `TTSConfig` | Text-to-speech provider configuration. | +| `livekitConfig` | `LiveKitConfig` | Optional LiveKit room configuration. | +| `withoutAudio` | `boolean` | Disable audio streaming (defaults to `false`). | +| `apiKey` | `string` | Optional API key for HTTP and WebSocket auth (defaults to `SAYNA_API_KEY` env). | +| `streamId` | `string` | Optional session identifier for recording paths; server generates a UUID when omitted. | +| `loadingAudio` | `LoadingAudioConfig` | Optional "thinking" audio clip sent in the initial `config` frame; loops on a dedicated LiveKit track when `loadingStart()` runs. See [Loading Indicator](#loading-indicator). | ### `await client.connect()` @@ -280,6 +285,77 @@ Sends a message to the Sayna session with role and optional metadata. Clears the text-to-speech queue. +### Loading Indicator + +The loading indicator loops a short audio clip into the LiveKit room while the application is "thinking" (e.g. while a large-language-model call is in flight). The clip is decoded once on the server when the WebSocket `config` frame is sent and replayed seamlessly on a dedicated LiveKit audio track named `"loading-audio"`, which is separate from the speech track `"tts-audio"`. STT and TTS streams are unaffected by the loop. See [`../sayna/docs/websocket.md#loading-indicator`](../sayna/docs/websocket.md#loading-indicator) for the authoritative protocol definition. + +The clip is configured through the `LoadingAudioConfig` object passed as the 8th positional argument to the `SaynaClient` constructor: + +```typescript +interface LoadingAudioConfig { + /** Base64-encoded WAV or raw 16-bit little-endian PCM. Required. */ + data: string; + /** Container hint; omit to let the server auto-detect from the RIFF/WAVE signature. */ + format?: "wav" | "pcm"; + /** Sample rate in Hz. Required for raw PCM; ignored for WAV. */ + sample_rate?: number; + /** Channel count for raw PCM. Defaults to 1 server-side; ignored for WAV. */ + channels?: 1 | 2; + /** Playback volume in [0.0, 1.0]. Defaults to 1.0; clamped server-side. */ + volume?: number; +} +``` + +The SDK does not read files or decode audio. Encode the clip to base64 in your own application code, e.g. with `fs/promises`: + +```typescript +import { readFile } from "node:fs/promises"; + +const data = (await readFile("./loading.wav")).toString("base64"); +``` + +Full call flow: + +```typescript +import { readFile } from "node:fs/promises"; +import { SaynaClient } from "@sayna/node-sdk"; + +const data = (await readFile("./loading.wav")).toString("base64"); + +const client = new SaynaClient( + "https://api.sayna.ai", + { provider: "deepgram", model: "nova-2" }, + { provider: "cartesia", voice_id: "example-voice" }, + { room_name: "my-room" }, + false, // withoutAudio + undefined, // apiKey (defaults to SAYNA_API_KEY env) + undefined, // streamId + { data, format: "wav" } // loadingAudio (8th positional argument) +); + +await client.connect(); + +// ...on user turn complete: +client.loadingStart(); +// ...application does its "thinking" (LLM call, tool invocation, etc.)... +client.loadingStop(); +await client.speak("Here is the answer."); +``` + +The application is responsible for calling `loadingStop()` before `speak()`. The SDK and server deliberately do **not** auto-stop the loop on `speak()` or `clear()` — overlapping the indicator with the answer would otherwise play both clips on top of each other. + +Failures — `LoadingAudioConfig` decode failures detected at config time, and `loading_start` failures (audio disabled, no LiveKit room, no `loadingAudio` configured, track failed to publish) — arrive on the existing `registerOnError(callback)` channel. There is no separate `loading_error` event. + +If the LiveKit room reconnects while the loop was running (publisher timeout, network blip), the loop stops. The SDK does **not** auto-restart it — the application must call `loadingStart()` again to resume. + +### `client.loadingStart()` + +Begins the server-side seamless playback loop of the configured loading clip on the dedicated `"loading-audio"` LiveKit track. Fire-and-forget: any server-side rejection (audio disabled, no LiveKit room, no `loadingAudio` configured, track failed to publish) arrives asynchronously through `registerOnError(callback)`. Throws `SaynaNotConnectedError` / `SaynaNotReadyError` if invoked before the connection is ready, and `SaynaConnectionError` if the transport fails to send the frame. + +### `client.loadingStop()` + +Stops the loading-audio loop with a short server-side fade-out. The server never returns an `error` for this command (stopping a non-running loop is a no-op). Throws the same connection-state errors as `loadingStart()`; `disconnect()` does not call it for you. + ### `await client.ttsFlush(allowInterruption?)` Flushes the TTS queue by sending an empty speak command. diff --git a/node-sdk/bun.lock b/node-sdk/bun.lock index 6281607..e515267 100644 --- a/node-sdk/bun.lock +++ b/node-sdk/bun.lock @@ -1,14 +1,15 @@ { "lockfileVersion": 1, + "configVersion": 0, "workspaces": { "": { "name": "@sayna-ai/node-sdk", "dependencies": { - "@types/ws": "^8.18.1", "ws": "^8.0.0", }, "devDependencies": { "@types/bun": "latest", + "@types/ws": "^8.18.1", "@typescript-eslint/eslint-plugin": "^8.46.0", "@typescript-eslint/parser": "^8.46.0", "eslint": "^9.37.0", diff --git a/node-sdk/src/index.ts b/node-sdk/src/index.ts index 83817da..554f85f 100644 --- a/node-sdk/src/index.ts +++ b/node-sdk/src/index.ts @@ -1,5 +1,10 @@ import { SaynaClient } from "./sayna-client"; -import type { STTConfig, TTSConfig, LiveKitConfig } from "./types"; +import type { + STTConfig, + TTSConfig, + LiveKitConfig, + LoadingAudioConfig, +} from "./types"; export * from "./sayna-client"; export * from "./types"; @@ -19,6 +24,10 @@ export * from "./webhook-receiver"; * @param livekitConfig - Optional LiveKit room configuration * @param withoutAudio - If true, disables audio streaming (default: false) * @param apiKey - Optional API key used to authorize HTTP and WebSocket calls (defaults to SAYNA_API_KEY env) + * @param loadingAudio - Optional loading-indicator clip sent inside the initial `config` frame. The + * server decodes it once at config time and loops it on a dedicated LiveKit audio track when + * `loadingStart()` is invoked. Only effective when `withoutAudio=false` and `livekitConfig` is + * supplied. See the Loading Indicator section of `../sayna/docs/websocket.md` for the protocol contract. * * @returns Promise that resolves to a connected SaynaClient * @@ -72,7 +81,8 @@ export async function saynaConnect( ttsConfig?: TTSConfig, livekitConfig?: LiveKitConfig, withoutAudio: boolean = false, - apiKey?: string + apiKey?: string, + loadingAudio?: LoadingAudioConfig ): Promise { const client = new SaynaClient( url, @@ -80,7 +90,9 @@ export async function saynaConnect( ttsConfig, livekitConfig, withoutAudio, - apiKey + apiKey, + undefined /* streamId */, + loadingAudio ); await client.connect(); return client; diff --git a/node-sdk/src/sayna-client.ts b/node-sdk/src/sayna-client.ts index ec5f3f8..056a34b 100644 --- a/node-sdk/src/sayna-client.ts +++ b/node-sdk/src/sayna-client.ts @@ -2,9 +2,12 @@ import type { STTConfig, TTSConfig, LiveKitConfig, + LoadingAudioConfig, ConfigMessage, SpeakMessage, ClearMessage, + LoadingStartMessage, + LoadingStopMessage, SendMessageMessage, STTResultMessage, ErrorMessage, @@ -143,6 +146,7 @@ export class SaynaClient { private sttConfig?: STTConfig; private ttsConfig?: TTSConfig; private livekitConfig?: LiveKitConfig; + private loadingAudio?: LoadingAudioConfig; private withoutAudio: boolean; private apiKey?: string; private websocket?: InstanceType; @@ -176,8 +180,16 @@ export class SaynaClient { * @param withoutAudio - If true, disables audio streaming (default: false) * @param apiKey - Optional API key used to authorize HTTP and WebSocket calls (defaults to SAYNA_API_KEY env) * @param streamId - Optional session identifier for recording paths; server generates a UUID when omitted + * @param loadingAudio - Optional loading-indicator clip sent inside the initial `config` frame on + * {@link SaynaClient.connect}. The server decodes it once at config time and loops it on a + * dedicated LiveKit audio track when {@link SaynaClient.loadingStart} is called. Decode failures + * arrive asynchronously through {@link SaynaClient.registerOnError} and do not abort the session. + * Only effective when `withoutAudio=false` and `livekitConfig` is supplied. See the Loading + * Indicator section of `../sayna/docs/websocket.md` for the protocol contract. * - * @throws {SaynaValidationError} If URL is invalid or if audio configs are missing when audio is enabled + * @throws {SaynaValidationError} If URL is invalid, if audio configs are missing when audio is + * enabled, or if `loadingAudio` is supplied but is not an object, has empty `data`, or has an + * unrecognised `format` value. */ constructor( url: string, @@ -186,7 +198,8 @@ export class SaynaClient { livekitConfig?: LiveKitConfig, withoutAudio: boolean = false, apiKey?: string, - streamId?: string + streamId?: string, + loadingAudio?: LoadingAudioConfig ) { // Validate URL if (!url || typeof url !== "string") { @@ -213,10 +226,16 @@ export class SaynaClient { } } + // Validate loadingAudio when supplied. Deeper rules (sample-rate range, channel count, + // duration, bit depth, byte cap, base64 decode) are server-authoritative; mirroring them + // would force SDK releases whenever the server widens or narrows a limit. + SaynaClient.validateLoadingAudio(loadingAudio); + this.url = url; this.sttConfig = sttConfig; this.ttsConfig = ttsConfig; this.livekitConfig = livekitConfig; + this.loadingAudio = loadingAudio; this.withoutAudio = withoutAudio; this.apiKey = apiKey ?? process.env.SAYNA_API_KEY; this.inputStreamId = streamId; @@ -257,6 +276,7 @@ export class SaynaClient { stt_config: sttConfig, tts_config: ttsConfig, livekit: this.livekitConfig, + loading_audio: this.loadingAudio, audio: !this.withoutAudio, }; @@ -723,6 +743,51 @@ export class SaynaClient { } } + /** + * Runtime guard for the constructor's `loadingAudio` argument. + * + * Accepts `unknown` because the constructor signature alone cannot stop JS callers from + * passing nulls, primitives, or arrays, and the rest of the constructor must trust the + * field once it is assigned. The checks here mirror only what cannot be expressed in + * TypeScript at the call site (shape + non-empty `data` + closed `format` set); content + * rules (sample-rate range, channels, duration, byte cap, base64 decode) are + * server-authoritative and are left to the server's `error` channel. + * @internal + */ + private static validateLoadingAudio(input: unknown): void { + if (typeof input === "undefined") { + return; + } + if ( + typeof input !== "object" || + input === null || + Array.isArray(input) + ) { + throw new SaynaValidationError("loadingAudio must be an object"); + } + const candidate = input as { + data?: unknown; + format?: unknown; + }; + if ( + typeof candidate.data !== "string" || + candidate.data.length === 0 + ) { + throw new SaynaValidationError( + "loadingAudio.data must be a non-empty base64 string" + ); + } + if ( + typeof candidate.format !== "undefined" && + candidate.format !== "wav" && + candidate.format !== "pcm" + ) { + throw new SaynaValidationError( + 'loadingAudio.format must be "wav" or "pcm"' + ); + } + } + /** * Creates a WebSocket instance using the appropriate constructor for the current runtime. * - Node.js (ws package): passes headers via third argument @@ -1070,6 +1135,90 @@ export class SaynaClient { } } + /** + * Starts the loading-indicator audio loop on the dedicated LiveKit track. + * + * Fire-and-forget: the method returns once the frame is queued on the WebSocket; success + * is silent on the wire and there is no acknowledgement to await. Any asynchronous server + * failure (no clip configured, audio disabled, no LiveKit room, decode failure at config + * time, track failed to publish, LiveKit not connected) is delivered later as a standard + * `error` message and surfaces through the callback registered via + * {@link SaynaClient.registerOnError}. + * + * Idempotent server-side: calling twice while the loop is running — including during the + * brief fade-out window of a prior {@link SaynaClient.loadingStop} — is a no-op. + * + * Requires `withoutAudio=false`, a configured LiveKit room, AND a `loadingAudio` argument + * supplied to the constructor. The SDK does not pre-check these prerequisites; the server + * enforces them and reports any violation through the `error` channel. + * + * {@link SaynaClient.speak} and {@link SaynaClient.clear} do NOT stop the loop. Callers that + * do not want overlap with synthesized speech must call {@link SaynaClient.loadingStop} + * before {@link SaynaClient.speak}. + * + * @throws {SaynaNotConnectedError} If not connected + * @throws {SaynaNotReadyError} If connection is not ready + * @throws {SaynaConnectionError} If sending the frame fails at the transport layer + */ + loadingStart(): void { + if (!this.isConnected || !this.websocket) { + throw new SaynaNotConnectedError(); + } + + if (!this.isReady) { + throw new SaynaNotReadyError(); + } + + try { + const message: LoadingStartMessage = { + type: "loading_start", + }; + this.websocket.send(JSON.stringify(message)); + } catch (error) { + throw new SaynaConnectionError( + "Failed to send loading_start command", + error + ); + } + } + + /** + * Stops the loading-indicator audio loop with a short server-side fade-out. + * + * Always silent server-side: the server never returns an `error` for `loading_stop`, + * even when no loop is running or no LiveKit room is configured. + * + * Calling {@link SaynaClient.loadingStop} while the client is not connected still throws + * {@link SaynaNotConnectedError}, mirroring {@link SaynaClient.clear}, so cleanup invoked + * on a disposed client surfaces to the application instead of being silently swallowed. + * It is never auto-called by {@link SaynaClient.disconnect}. + * + * @throws {SaynaNotConnectedError} If not connected + * @throws {SaynaNotReadyError} If connection is not ready + * @throws {SaynaConnectionError} If sending the frame fails at the transport layer + */ + loadingStop(): void { + if (!this.isConnected || !this.websocket) { + throw new SaynaNotConnectedError(); + } + + if (!this.isReady) { + throw new SaynaNotReadyError(); + } + + try { + const message: LoadingStopMessage = { + type: "loading_stop", + }; + this.websocket.send(JSON.stringify(message)); + } catch (error) { + throw new SaynaConnectionError( + "Failed to send loading_stop command", + error + ); + } + } + /** * Flushes the TTS queue by sending an empty speak command. * diff --git a/node-sdk/src/types.ts b/node-sdk/src/types.ts index 2fde678..9841d3b 100644 --- a/node-sdk/src/types.ts +++ b/node-sdk/src/types.ts @@ -152,6 +152,35 @@ export interface LiveKitConfig { listen_participants?: string[]; } +/** + * Loading-indicator audio clip uploaded once at config time. + * + * The clip plays on a dedicated LiveKit audio track when the application sends a + * `loading_start` command, and stops with a short server-side fade-out on `loading_stop`. + * The SDK does not decode, parse, or validate the audio content; the server is + * authoritative on format, sample-rate range, channel count, duration, bit depth, and + * byte-size limits. See the "Loading Indicator" section of `../sayna/docs/websocket.md` + * and `../sayna/docs/api-reference.md` for the protocol contract. + */ +export interface LoadingAudioConfig { + /** + * Base64-encoded audio bytes (standard alphabet, padded): either a complete WAV file + * or raw 16-bit little-endian PCM. Encode in user code, e.g. + * `fs.readFile(path).then(b => b.toString('base64'))`. The SDK does not read files or + * decode audio. See the Loading Indicator section of `../sayna/docs/websocket.md` for + * the authoritative format and size rules. + */ + data: string; + /** Audio container hint. Omit to let the server auto-detect from the RIFF/WAVE signature. */ + format?: "wav" | "pcm"; + /** Sample rate in Hz (8000-48000). Required for raw PCM; ignored for WAV. */ + sample_rate?: number; + /** Channel count for raw PCM. Defaults to 1 server-side. Ignored for WAV. */ + channels?: 1 | 2; + /** Playback volume in [0.0, 1.0]. Defaults to 1.0; clamped server-side; applied once at config time. */ + volume?: number; +} + /** * Configuration message sent to initialize the Sayna WebSocket connection. * @internal @@ -168,6 +197,8 @@ export interface ConfigMessage { tts_config?: TTSConfig; /** Optional LiveKit room configuration */ livekit?: LiveKitConfig; + /** Optional loading-indicator clip; see {@link LoadingAudioConfig}. */ + loading_audio?: LoadingAudioConfig; } /** @@ -192,6 +223,22 @@ export interface ClearMessage { type: "clear"; } +/** + * Message to start the loading-indicator audio loop on the dedicated LiveKit track. + * @internal + */ +export interface LoadingStartMessage { + type: "loading_start"; +} + +/** + * Message to stop the loading-indicator audio loop with a short server-side fade-out. + * @internal + */ +export interface LoadingStopMessage { + type: "loading_stop"; +} + /** * Message to send data to the Sayna session. * @internal diff --git a/node-sdk/tests/sayna-client.test.ts b/node-sdk/tests/sayna-client.test.ts index a86d361..52e2113 100644 --- a/node-sdk/tests/sayna-client.test.ts +++ b/node-sdk/tests/sayna-client.test.ts @@ -3,9 +3,16 @@ import { SaynaClient } from "../src/sayna-client"; import { SaynaValidationError, SaynaNotConnectedError, + SaynaNotReadyError, + SaynaConnectionError, SaynaServerError, } from "../src/errors"; -import type { STTConfig, TTSConfig } from "../src/types"; +import type { + STTConfig, + TTSConfig, + LiveKitConfig, + LoadingAudioConfig, +} from "../src/types"; function getTestSTTConfig(): STTConfig { return { @@ -736,6 +743,487 @@ describe("SaynaClient REST API Methods", () => { }); /* eslint-enable @typescript-eslint/await-thenable */ +describe("SaynaClient Loading Indicator constructor validation", () => { + test("should reject empty data string with a SaynaValidationError mentioning loadingAudio.data", () => { + expect( + () => + new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig(), + undefined, + false, + undefined, + undefined, + { data: "" } + ) + ).toThrow(SaynaValidationError); + + expect( + () => + new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig(), + undefined, + false, + undefined, + undefined, + { data: "" } + ) + ).toThrow("loadingAudio.data"); + }); + + test("should reject unknown format value with a SaynaValidationError mentioning format and the allowed values", () => { + const bogus = { data: "abc", format: "mp3" } as unknown as LoadingAudioConfig; + + expect( + () => + new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig(), + undefined, + false, + undefined, + undefined, + bogus + ) + ).toThrow(SaynaValidationError); + + expect( + () => + new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig(), + undefined, + false, + undefined, + undefined, + bogus + ) + ).toThrow("loadingAudio.format"); + + expect( + () => + new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig(), + undefined, + false, + undefined, + undefined, + bogus + ) + ).toThrow('"wav" or "pcm"'); + }); + + test("should accept a minimal valid loadingAudio with only data", () => { + expect(() => { + new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig(), + undefined, + false, + undefined, + undefined, + { data: "abc" } + ); + }).not.toThrow(); + }); + + test("should reject an empty object loadingAudio with a SaynaValidationError mentioning loadingAudio.data", () => { + const bogus = {} as unknown as LoadingAudioConfig; + + expect( + () => + new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig(), + undefined, + false, + undefined, + undefined, + bogus + ) + ).toThrow(SaynaValidationError); + + expect( + () => + new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig(), + undefined, + false, + undefined, + undefined, + bogus + ) + ).toThrow("loadingAudio.data"); + }); + + test("should reject non-object loadingAudio inputs with a SaynaValidationError", () => { + const nonObjects: unknown[] = ["AAA=", 42, true, null, ["data"]]; + + for (const bogus of nonObjects) { + expect( + () => + new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig(), + undefined, + false, + undefined, + undefined, + bogus as LoadingAudioConfig + ) + ).toThrow(SaynaValidationError); + } + }); +}); + +describe("SaynaClient Loading Indicator config frame emission", () => { + /** + * Drives `SaynaClient.connect()` against an in-memory fake WebSocket and returns the + * payloads the SDK actually sends. The fake intercepts `createWebSocket`, captures the + * `onopen` and `onmessage` hooks the SDK installs, fires `onopen` so `connect()` emits + * the `config` frame, then fires a `ready` message so the Promise resolves cleanly. + */ + async function captureConfigFrames( + loadingAudio?: LoadingAudioConfig + ): Promise<{ sent: string[] }> { + const sent: string[] = []; + const livekitConfig: LiveKitConfig = { room_name: "test-room" }; + + interface FakeWs { + binaryType: string; + readyState: number; + send: (payload: string) => void; + close: () => void; + onopen: ((event?: unknown) => void) | null; + onmessage: ((event: { data: string }) => void) | null; + onerror: ((event?: unknown) => void) | null; + onclose: ((event: { code: number; reason: string }) => void) | null; + } + + const fakeWs: FakeWs = { + binaryType: "arraybuffer", + readyState: 1, + send: (payload: string) => sent.push(payload), + close: () => {}, + onopen: null, + onmessage: null, + onerror: null, + onclose: null, + }; + + const client = new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig(), + livekitConfig, + false, + undefined, + undefined, + loadingAudio + ); + + // Replace the runtime WebSocket constructor with a factory that returns the fake. + // Stub installation must happen before connect() reaches `createWebSocket`, which it + // does after two `await resolveConfigAuth(...)` microtask hops. + (client as any).createWebSocket = () => fakeWs; + + const connected = client.connect(); + + // Wait for `connect()` to clear its two `await resolveConfigAuth(...)` points before + // the SDK assigns the `onopen`/`onmessage` handlers on the fake WebSocket. We drain + // up to a handful of microtasks; in practice both awaits resolve in the first. + for (let i = 0; i < 8 && !(fakeWs.onopen && fakeWs.onmessage); i += 1) { + await Promise.resolve(); + } + + // Fire the open handler so connect() emits the config frame, then deliver a ready + // message so the Promise resolves and the test completes deterministically. + if (fakeWs.onopen) { + fakeWs.onopen(); + } + if (fakeWs.onmessage) { + fakeWs.onmessage({ + data: JSON.stringify({ type: "ready", stream_id: "test-stream" }), + }); + } + + await connected; + client.disconnect(); + + return { sent }; + } + + test("config frame includes loading_audio when supplied at construction", async () => { + const audio: LoadingAudioConfig = { + data: "AAA=", + format: "wav", + sample_rate: 16000, + channels: 1, + volume: 0.75, + }; + + const { sent } = await captureConfigFrames(audio); + + expect(sent.length).toBe(1); + const payload = JSON.parse(sent[0] ?? "{}"); + expect(payload.type).toBe("config"); + expect(payload.loading_audio).toEqual(audio); + }); + + test("config frame omits loading_audio when not supplied", async () => { + const { sent } = await captureConfigFrames(); + + expect(sent.length).toBe(1); + const payload = JSON.parse(sent[0] ?? "{}"); + expect(payload.type).toBe("config"); + expect(Object.prototype.hasOwnProperty.call(payload, "loading_audio")).toBe( + false + ); + }); +}); + +describe("SaynaClient loadingStart", () => { + test("should throw SaynaNotConnectedError when called before connect", () => { + const client = new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig() + ); + + expect(() => client.loadingStart()).toThrow(SaynaNotConnectedError); + }); + + test("should throw SaynaNotReadyError when called after connect but before ready", () => { + const client = new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig() + ); + + (client as any).websocket = { send: () => {} } as unknown as WebSocket; + (client as any).isConnected = true; + + expect(() => client.loadingStart()).toThrow(SaynaNotReadyError); + }); + + test("should emit a single loading_start frame when ready", () => { + const client = new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig() + ); + + const sent: string[] = []; + (client as any).websocket = { + send: (payload: string) => sent.push(payload), + } as unknown as WebSocket; + (client as any).isConnected = true; + (client as any).isReady = true; + + client.loadingStart(); + + expect(sent.length).toBe(1); + const payload = JSON.parse(sent[0] ?? "{}"); + expect(payload).toEqual({ type: "loading_start" }); + }); + + test("should wrap synchronous send failures in SaynaConnectionError with cause", () => { + const client = new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig() + ); + + const underlying = new Error("socket gone"); + (client as any).websocket = { + send: () => { + throw underlying; + }, + } as unknown as WebSocket; + (client as any).isConnected = true; + (client as any).isReady = true; + + let captured: unknown; + try { + client.loadingStart(); + } catch (error) { + captured = error; + } + + expect(captured).toBeInstanceOf(SaynaConnectionError); + const connectionError = captured as SaynaConnectionError; + expect(connectionError.message).toContain( + "Failed to send loading_start command" + ); + expect(connectionError.cause).toBe(underlying); + }); +}); + +describe("SaynaClient loadingStop", () => { + test("should throw SaynaNotConnectedError when called before connect", () => { + const client = new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig() + ); + + expect(() => client.loadingStop()).toThrow(SaynaNotConnectedError); + }); + + test("should throw SaynaNotReadyError when called after connect but before ready", () => { + const client = new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig() + ); + + (client as any).websocket = { send: () => {} } as unknown as WebSocket; + (client as any).isConnected = true; + + expect(() => client.loadingStop()).toThrow(SaynaNotReadyError); + }); + + test("should emit a single loading_stop frame when ready", () => { + const client = new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig() + ); + + const sent: string[] = []; + (client as any).websocket = { + send: (payload: string) => sent.push(payload), + } as unknown as WebSocket; + (client as any).isConnected = true; + (client as any).isReady = true; + + client.loadingStop(); + + expect(sent.length).toBe(1); + const payload = JSON.parse(sent[0] ?? "{}"); + expect(payload).toEqual({ type: "loading_stop" }); + }); + + test("should wrap synchronous send failures in SaynaConnectionError with cause", () => { + const client = new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig() + ); + + const underlying = new Error("socket gone"); + (client as any).websocket = { + send: () => { + throw underlying; + }, + } as unknown as WebSocket; + (client as any).isConnected = true; + (client as any).isReady = true; + + let captured: unknown; + try { + client.loadingStop(); + } catch (error) { + captured = error; + } + + expect(captured).toBeInstanceOf(SaynaConnectionError); + const connectionError = captured as SaynaConnectionError; + expect(connectionError.message).toContain( + "Failed to send loading_stop command" + ); + expect(connectionError.cause).toBe(underlying); + }); +}); + +describe("SaynaClient Loading Indicator server error propagation", () => { + test("should deliver server loading_audio error message to the registerOnError callback", async () => { + const client = new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig() + ); + + let receivedMessage: string | undefined; + client.registerOnError((error) => { + receivedMessage = error.message; + }); + + await (client as any).handleJsonMessage({ + type: "error", + message: "loading_audio.data is not valid base64", + }); + + expect(receivedMessage).toBe("loading_audio.data is not valid base64"); + }); +}); + +describe("SaynaClient speak and clear do not send loading_stop", () => { + test("speak emits exactly one speak frame (no implicit loading_stop) even when loadingAudio is configured", () => { + const client = new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig(), + { room_name: "test-room" }, + false, + undefined, + undefined, + { data: "AAA=" } + ); + + const sent: string[] = []; + (client as any).websocket = { + send: (payload: string) => sent.push(payload), + } as unknown as WebSocket; + (client as any).isConnected = true; + (client as any).isReady = true; + + client.speak("hello"); + + expect(sent.length).toBe(1); + const payload = JSON.parse(sent[0] ?? "{}"); + expect(payload.type).toBe("speak"); + }); + + test("clear emits exactly one clear frame (no implicit loading_stop) even when loadingAudio is configured", () => { + const client = new SaynaClient( + "https://api.example.com", + getTestSTTConfig(), + getTestTTSConfig(), + { room_name: "test-room" }, + false, + undefined, + undefined, + { data: "AAA=" } + ); + + const sent: string[] = []; + (client as any).websocket = { + send: (payload: string) => sent.push(payload), + } as unknown as WebSocket; + (client as any).isConnected = true; + (client as any).isReady = true; + + client.clear(); + + expect(sent.length).toBe(1); + const payload = JSON.parse(sent[0] ?? "{}"); + expect(payload.type).toBe("clear"); + }); +}); + describe("SaynaClient SIP Transfer", () => { test("should send sip_transfer payload", () => { const client = new SaynaClient( diff --git a/node-sdk/tests/types.test.ts b/node-sdk/tests/types.test.ts index 8b2c428..d4e7fb0 100644 --- a/node-sdk/tests/types.test.ts +++ b/node-sdk/tests/types.test.ts @@ -3,9 +3,12 @@ import type { STTConfig, TTSConfig, LiveKitConfig, + LoadingAudioConfig, ConfigMessage, SpeakMessage, ClearMessage, + LoadingStartMessage, + LoadingStopMessage, SendMessageMessage, ReadyMessage, STTResultMessage, @@ -768,3 +771,104 @@ describe("Provider Auth Types", () => { expect(json.auth).toBeUndefined(); }); }); + +describe("LoadingAudioConfig", () => { + test("minimal configuration accepts only data field", () => { + const config: LoadingAudioConfig = { data: "AAA=" }; + + expect(config.data).toBe("AAA="); + expect(config.format).toBeUndefined(); + expect(config.sample_rate).toBeUndefined(); + expect(config.channels).toBeUndefined(); + expect(config.volume).toBeUndefined(); + }); + + test("full configuration accepts all fields", () => { + const config: LoadingAudioConfig = { + data: "AAA=", + format: "pcm", + sample_rate: 24000, + channels: 2, + volume: 0.5, + }; + + expect(config.data).toBe("AAA="); + expect(config.format).toBe("pcm"); + expect(config.sample_rate).toBe(24000); + expect(config.channels).toBe(2); + expect(config.volume).toBe(0.5); + }); + + test("format accepts both wav and pcm literals", () => { + const wav: LoadingAudioConfig = { data: "AAA=", format: "wav" }; + const pcm: LoadingAudioConfig = { data: "AAA=", format: "pcm" }; + + expect(wav.format).toBe("wav"); + expect(pcm.format).toBe("pcm"); + }); + + test("channels accepts both 1 and 2 literals", () => { + const mono: LoadingAudioConfig = { data: "AAA=", channels: 1 }; + const stereo: LoadingAudioConfig = { data: "AAA=", channels: 2 }; + + expect(mono.channels).toBe(1); + expect(stereo.channels).toBe(2); + }); + + test("undefined optionals are dropped when JSON.stringify serializes the config", () => { + const config: LoadingAudioConfig = { data: "AAA=" }; + + const json = JSON.parse(JSON.stringify(config)); + expect(json.data).toBe("AAA="); + expect(Object.prototype.hasOwnProperty.call(json, "format")).toBe(false); + expect(Object.prototype.hasOwnProperty.call(json, "sample_rate")).toBe( + false + ); + expect(Object.prototype.hasOwnProperty.call(json, "channels")).toBe(false); + expect(Object.prototype.hasOwnProperty.call(json, "volume")).toBe(false); + }); + + test("ConfigMessage with loading_audio carries the full nested object", () => { + const msg: ConfigMessage = { + type: "config", + audio: true, + loading_audio: { + data: "AAA=", + format: "wav", + sample_rate: 16000, + channels: 1, + volume: 1.0, + }, + }; + + expect(msg.loading_audio?.data).toBe("AAA="); + expect(msg.loading_audio?.format).toBe("wav"); + expect(msg.loading_audio?.sample_rate).toBe(16000); + expect(msg.loading_audio?.channels).toBe(1); + expect(msg.loading_audio?.volume).toBe(1.0); + }); + + test("ConfigMessage without loading_audio does NOT contain the key after JSON.stringify", () => { + const msg: ConfigMessage = { + type: "config", + audio: true, + }; + + const json = JSON.parse(JSON.stringify(msg)); + expect(json.type).toBe("config"); + expect(json.audio).toBe(true); + expect(Object.prototype.hasOwnProperty.call(json, "loading_audio")).toBe( + false + ); + }); + + test("LoadingStartMessage has fixed loading_start type", () => { + const msg: LoadingStartMessage = { type: "loading_start" }; + expect(msg.type).toBe("loading_start"); + }); + + test("LoadingStopMessage has fixed loading_stop type", () => { + const msg: LoadingStopMessage = { type: "loading_stop" }; + expect(msg.type).toBe("loading_stop"); + }); +}); diff --git a/python-sdk/README.md b/python-sdk/README.md index 5abc224..6e4365b 100644 --- a/python-sdk/README.md +++ b/python-sdk/README.md @@ -64,6 +64,8 @@ if __name__ == "__main__": asyncio.run(main()) ``` +The constructor also accepts an optional `loading_audio=LoadingAudioConfig(...)` keyword argument for a server-side "thinking" audio loop on a dedicated LiveKit track; see [Loading Indicator](#loading-indicator) below. + ## API ### REST API Methods @@ -208,7 +210,7 @@ print(f"Total hooks: {len(response.hooks)}") These methods require an active WebSocket connection: -#### `SaynaClient(url, stt_config, tts_config, livekit_config=None, without_audio=False, api_key=None)` +#### `SaynaClient(url, stt_config, tts_config, livekit_config=None, without_audio=False, api_key=None, stream_id=None, loading_audio=None)` Creates a new SaynaClient instance. @@ -220,6 +222,8 @@ Creates a new SaynaClient instance. | `livekit_config` | `LiveKitConfig` (optional) | `None` | Optional LiveKit room configuration. | | `without_audio` | `bool` | `False` | Disable audio streaming. | | `api_key` | `str` (optional) | `None` | API key for authentication. | +| `stream_id` | `str` (optional) | `None` | Session identifier for recording paths; server generates a UUID when omitted. | +| `loading_audio` | `LoadingAudioConfig` (optional) | `None` | Server-side "thinking" audio clip looped on a dedicated LiveKit track when `loading_start()` is called. See [Loading Indicator](#loading-indicator) below. | --- @@ -349,6 +353,28 @@ Flushes the TTS queue by sending an empty speak command. --- +#### `await client.loading_start()` + +Tells the server to begin the seamless playback loop of the configured loading-audio clip on the dedicated `"loading-audio"` LiveKit track. Fire-and-forget: the call returns as soon as the WebSocket frame is sent, without waiting for a server acknowledgement. + +Server-side this is idempotent: calling `loading_start()` while the loop is already running (including during the brief fade-out window of a prior `loading_stop()`) is a no-op. + +Requires audio to be enabled, a LiveKit room to be configured, and a `loading_audio` argument supplied at construction time. The SDK does not pre-check these prerequisites; the server enforces them and reports failures through the `error` channel — see [Loading Indicator](#loading-indicator). + +**Raises**: `SaynaNotConnectedError` if not connected; `SaynaNotReadyError` if the session is not ready; `SaynaConnectionError` if the transport-layer send fails. + +--- + +#### `await client.loading_stop()` + +Tells the server to stop the loading-indicator audio loop with a short server-side fade-out. Fire-and-forget and always silent on the server side: the server never emits an `error` for `loading_stop`, even when no loop is running or no LiveKit room is configured. + +Call this immediately before `speak()` if you do not want the loop to overlap with synthesized speech. See [Loading Indicator](#loading-indicator) for the full call flow. + +**Raises**: `SaynaNotConnectedError` if not connected; `SaynaNotReadyError` if the session is not ready; `SaynaConnectionError` if the transport-layer send fails. + +--- + #### `await client.disconnect()` Disconnects from the WebSocket server and cleans up resources. @@ -366,6 +392,112 @@ Disconnects from the WebSocket server and cleans up resources. --- +### Loading Indicator + +The Sayna server can play a short audio clip as a seamless loop while your application is busy ("thinking"), giving the caller an audible equivalent of a spinner. The loop is published on a dedicated LiveKit audio track named `"loading-audio"` that is independent of the speech track (`"tts-audio"`), so it never interferes with STT or TTS. The SDK does not decode, parse, or play the audio itself; it forwards the configuration to the server, which performs all decoding, validation, volume scaling, and looping. + +The authoritative protocol contract lives in [`../sayna/docs/websocket.md#loading-indicator`](../sayna/docs/websocket.md#loading-indicator); see also [`../sayna/docs/api-reference.md`](../sayna/docs/api-reference.md) and the server change in [`SaynaAI/sayna#18`](https://github.com/SaynaAI/sayna/pull/18). + +#### `LoadingAudioConfig` + +Pydantic model passed once to the constructor via `loading_audio=`. Unknown fields are rejected (`extra="forbid"`). + +```python +from typing import Literal, Optional +from pydantic import BaseModel + +class LoadingAudioConfig(BaseModel): + data: str # required, base64-encoded + format: Optional[Literal["wav", "pcm"]] = None + sample_rate: Optional[int] = None + channels: Optional[int] = None + volume: Optional[float] = None +``` + +| Field | Type | Purpose | +| --- | --- | --- | +| `data` | `str` (required) | Base64-encoded audio bytes (standard alphabet, padded): either a complete WAV file or raw 16-bit little-endian PCM. | +| `format` | `Literal["wav", "pcm"]` (optional) | Container hint. Omit to let the server auto-detect from the RIFF/WAVE signature. Any other value is rejected by the server. | +| `sample_rate` | `int` (optional) | Sample rate in Hz (8000-48000). Required for raw PCM (no header is present); ignored for WAV (the header is authoritative). | +| `channels` | `int` (optional) | Channel count for raw PCM (typically 1 or 2). Defaults to 1 server-side. Ignored for WAV. | +| `volume` | `float` (optional) | Playback volume in `[0.0, 1.0]`. Defaults to 1.0. Out-of-range values are clamped by the server (not rejected). Applied once at config time as amplitude scaling; there is no runtime volume control. | + +The server is authoritative on every audio-content rule (duration, bit depth, byte cap, channel count, sample-rate range). The SDK does not mirror those limits to avoid drift. + +#### Loading a WAV file and base64-encoding it + +The SDK ships no file-loading helper; encoding lives in your application code: + +```python +import base64 + +with open("loading.wav", "rb") as f: + data = base64.b64encode(f.read()).decode("ascii") +``` + +#### Full call flow + +```python +import asyncio +import base64 +from sayna_client import ( + SaynaClient, + STTConfig, + TTSConfig, + LiveKitConfig, + LoadingAudioConfig, +) + +async def main(): + with open("loading.wav", "rb") as f: + loading_data = base64.b64encode(f.read()).decode("ascii") + + client = SaynaClient( + url="https://api.sayna.ai", + stt_config=STTConfig(provider="deepgram", model="nova-2"), + tts_config=TTSConfig(provider="cartesia", voice_id="example-voice"), + livekit_config=LiveKitConfig(room_name="my-room"), + loading_audio=LoadingAudioConfig(data=loading_data, format="wav"), + api_key="your-api-key", + ) + + client.register_on_error(lambda err: print(f"Server error: {err.message}")) + + await client.connect() + + # User turn complete: start the loop while the application thinks. + await client.loading_start() + + # ...application does its background work (LLM call, database lookup, etc.)... + + # IMPORTANT: stop the loop BEFORE calling speak(). The server and SDK + # deliberately do not auto-stop the loop on speak()/clear() so callers + # control the overlap explicitly. Skipping loading_stop() will play the + # loop and the spoken answer simultaneously on separate LiveKit tracks. + await client.loading_stop() + await client.speak("Here is the answer.") + + await client.disconnect() + +asyncio.run(main()) +``` + +#### Error channel + +Loading-indicator failures arrive on the existing `register_on_error(callback)` channel as plain `ErrorMessage` instances; there is no separate `loading_error` event, no `register_on_loading_error` callback, and no dedicated exception class. The following conditions all surface this way: + +- A `LoadingAudioConfig` was supplied but the server failed to decode it (invalid base64, unsupported format, sample rate or channel count out of range, clip too long, byte cap exceeded). The decode happens once at config time, and the session stays alive afterwards. +- `loading_start()` was called but no `loading_audio` was supplied at construction time, audio was disabled (`without_audio=True`), or no `livekit_config` was supplied. +- `loading_start()` was called but the dedicated `"loading-audio"` LiveKit track failed to publish or the LiveKit room is not connected. + +Any application that already registers an error callback observes loading failures automatically. + +#### LiveKit publisher-timeout reconnect + +If the LiveKit room reconnects while the loop was running, the server tears down the loop and clears the audio source. The loop does **not** auto-resume; if you still want loading audio to play, call `loading_start()` again after the reconnect. + +--- + ### Webhook Receiver The `WebhookReceiver` class securely verifies and parses cryptographically signed webhooks from the Sayna SIP service. diff --git a/python-sdk/src/sayna_client/__init__.py b/python-sdk/src/sayna_client/__init__.py index 41b823e..7b9951d 100644 --- a/python-sdk/src/sayna_client/__init__.py +++ b/python-sdk/src/sayna_client/__init__.py @@ -27,6 +27,9 @@ LiveKitRoomSummary, LiveKitTokenRequest, LiveKitTokenResponse, + LoadingAudioConfig, + LoadingStartMessage, + LoadingStopMessage, MessageMessage, MuteLiveKitParticipantRequest, MuteLiveKitParticipantResponse, @@ -85,6 +88,9 @@ "LiveKitRoomsResponse", "LiveKitTokenRequest", "LiveKitTokenResponse", + "LoadingAudioConfig", + "LoadingStartMessage", + "LoadingStopMessage", "MessageMessage", "MuteLiveKitParticipantRequest", "MuteLiveKitParticipantResponse", diff --git a/python-sdk/src/sayna_client/client.py b/python-sdk/src/sayna_client/client.py index e71cd3a..3e3d43c 100644 --- a/python-sdk/src/sayna_client/client.py +++ b/python-sdk/src/sayna_client/client.py @@ -31,6 +31,9 @@ LiveKitRoomsResponse, LiveKitTokenRequest, LiveKitTokenResponse, + LoadingAudioConfig, + LoadingStartMessage, + LoadingStopMessage, MessageMessage, MuteLiveKitParticipantRequest, MuteLiveKitParticipantResponse, @@ -105,6 +108,7 @@ def __init__( without_audio: bool = False, api_key: Optional[str] = None, stream_id: Optional[str] = None, + loading_audio: Optional[LoadingAudioConfig] = None, ) -> None: """Initialize the Sayna client. @@ -116,9 +120,17 @@ def __init__( without_audio: If True, disables audio streaming (sends audio=False for control-only sessions) api_key: Optional API key for authentication (defaults to SAYNA_API_KEY env) stream_id: Optional session identifier for recording paths; server generates a UUID when omitted + loading_audio: Optional loading-indicator clip sent inside the initial ``config`` frame + on :meth:`connect`. The server decodes it once at config time; decode failures + arrive asynchronously on the registered ``on_error`` callback and the session + stays alive. Only effective when audio is enabled (``without_audio=False``) and a + ``livekit_config`` is supplied. See ``../sayna/docs/websocket.md`` (Loading + Indicator section) for the full protocol contract. Raises: - SaynaValidationError: If URL is invalid or if audio configs are missing when audio is enabled + SaynaValidationError: If URL is invalid, if audio configs are missing when audio is + enabled, or if ``loading_audio`` is not a :class:`LoadingAudioConfig` instance or + carries an empty ``data`` field. """ # Validate URL if not url or not isinstance(url, str): @@ -137,10 +149,17 @@ def __init__( ) raise SaynaValidationError(msg) + # Validate loading_audio: strict instance check + non-empty data guard. + # Deeper rules (format, sample-rate range, channel count, duration, bit depth, byte cap) + # are enforced server-side; mirroring them here would risk drift and force SDK releases + # whenever the server widens or narrows a range. + self._validate_loading_audio(loading_audio) + self.url = url self.stt_config = stt_config self.tts_config = tts_config self.livekit_config = livekit_config + self.loading_audio: Optional[LoadingAudioConfig] = loading_audio self.without_audio = without_audio self.audio_enabled = audio_enabled self.api_key = api_key or os.environ.get("SAYNA_API_KEY") @@ -840,6 +859,7 @@ async def connect(self) -> None: stt_config=resolve_config_auth(self.stt_config), tts_config=resolve_config_auth(self.tts_config), livekit=self.livekit_config, + loading_audio=self.loading_audio, ) await self._send_json(config.model_dump(exclude_none=True)) @@ -960,6 +980,69 @@ async def clear(self) -> None: message = ClearMessage() await self._send_json(message.model_dump(exclude_none=True)) + async def loading_start(self) -> None: + """Start the loading-indicator audio loop on a dedicated LiveKit track. + + Fire-and-forget: there is no return value, success is silent on the wire, and the + method does not await a server acknowledgement. Any asynchronous server failure + (no clip configured, audio disabled, no LiveKit room, decode failure at config time, + track failed to publish, LiveKit not connected) arrives later as a standard ``error`` + message and is dispatched to the callback registered via :meth:`register_on_error`. + + Idempotent server-side: calling twice while the loop is running -- including during the + brief fade-out window of a prior :meth:`loading_stop` -- is a no-op. + + Requires audio to be enabled, a LiveKit room to be configured, and a + ``loading_audio`` argument supplied at construction time. The SDK does not pre-check + these prerequisites; the server enforces them and reports failures through the + ``error`` channel. + + :meth:`speak` and :meth:`clear` do **not** stop the loop. Callers that do not want + overlap with synthesized speech must call :meth:`loading_stop` before :meth:`speak`. + + Raises: + SaynaNotConnectedError: If not connected. + SaynaNotReadyError: If not ready. + SaynaConnectionError: If sending the frame fails at the transport layer. + """ + self._check_ready() + message = LoadingStartMessage() + try: + await self._send_json(message.model_dump(exclude_none=True)) + except (SaynaNotConnectedError, SaynaNotReadyError): + raise + except Exception as e: + logger.exception("Failed to send loading_start message: %s", e) + msg = "Failed to send loading_start message" + raise SaynaConnectionError(msg, cause=e) from e + + async def loading_stop(self) -> None: + """Stop the loading-indicator audio loop with a short server-side fade-out. + + Fire-and-forget and always silent server-side: the server never returns an ``error`` + for ``loading_stop``, even when no loop is running or no LiveKit room is configured. + + Calling :meth:`loading_stop` while the client is not connected still raises + :class:`~sayna_client.errors.SaynaNotConnectedError`, consistent with :meth:`clear`, + so cleanup invoked on a disposed client is visible to the application instead of + being silently swallowed. + + Raises: + SaynaNotConnectedError: If not connected. + SaynaNotReadyError: If not ready. + SaynaConnectionError: If sending the frame fails at the transport layer. + """ + self._check_ready() + message = LoadingStopMessage() + try: + await self._send_json(message.model_dump(exclude_none=True)) + except (SaynaNotConnectedError, SaynaNotReadyError): + raise + except Exception as e: + logger.exception("Failed to send loading_stop message: %s", e) + msg = "Failed to send loading_stop message" + raise SaynaConnectionError(msg, cause=e) from e + async def tts_flush(self, allow_interruption: bool = True) -> None: """Flush the TTS queue by sending an empty speak command. @@ -1156,6 +1239,26 @@ def register_on_audio(self, callback: Callable[[bytes], Any]) -> None: # Internal Methods # ============================================================================ + @staticmethod + def _validate_loading_audio(loading_audio: Optional[LoadingAudioConfig]) -> None: + """Validate the constructor's ``loading_audio`` argument. + + Pydantic enforces shape (``extra="forbid"``, ``Literal["wav", "pcm"]``) when the user + builds the model, so this guard focuses on the two checks Pydantic does not perform: the + argument must be a :class:`LoadingAudioConfig` (failing the deep checks early instead of + inside :meth:`connect`), and ``data`` must be a non-empty string. The server is + authoritative on every audio-content rule (duration, bit depth, byte cap, channel count, + sample-rate range); mirroring those here would invite drift. + """ + if loading_audio is None: + return + if not isinstance(loading_audio, LoadingAudioConfig): + msg = "loading_audio must be a LoadingAudioConfig instance" + raise SaynaValidationError(msg) + if not loading_audio.data: + msg = "loading_audio.data must be a non-empty base64 string" + raise SaynaValidationError(msg) + def _check_connected(self) -> None: """Check if connected, raise error if not.""" if not self._connected: diff --git a/python-sdk/src/sayna_client/types.py b/python-sdk/src/sayna_client/types.py index 495156d..f469680 100644 --- a/python-sdk/src/sayna_client/types.py +++ b/python-sdk/src/sayna_client/types.py @@ -161,6 +161,63 @@ class LiveKitConfig(BaseModel): ) +class LoadingAudioConfig(BaseModel): + """Loading-indicator audio clip uploaded once at config time. + + The server decodes and validates the clip when the WebSocket session is configured + and loops it on a dedicated LiveKit audio track when ``loading_start`` is sent. + The SDK does not decode, parse, or validate the audio content; the server is + authoritative on format, duration, bit depth, and channel-count limits. + + See ``../sayna/docs/websocket.md`` ("Loading Indicator" section) and + ``../sayna/docs/api-reference.md`` for the protocol contract. + """ + + model_config = ConfigDict(extra="forbid") + + data: str = Field( + ..., + description=( + "Base64-encoded audio bytes (standard alphabet, padded): either a complete WAV file " + "or raw 16-bit little-endian PCM. Encode in user code, e.g. " + 'base64.b64encode(open(path, "rb").read()).decode("ascii"). The SDK does not read ' + "files or decode audio. See the Loading Indicator section of " + "../sayna/docs/websocket.md for the authoritative format rules." + ), + ) + format: Optional[Literal["wav", "pcm"]] = Field( + default=None, + description=( + "Audio container hint. Omit to let the server auto-detect from the RIFF/WAVE " + "signature. Only 'wav' or 'pcm' are accepted; any other value is rejected by the " + "server with an error message." + ), + ) + sample_rate: Optional[int] = Field( + default=None, + description=( + "Sample rate in Hz (8000-48000). Required for raw PCM (no header is present); " + "ignored for WAV (the header is authoritative). The server is authoritative on the " + "accepted range." + ), + ) + channels: Optional[int] = Field( + default=None, + description=( + "Channel count for raw PCM. Current values are 1 (mono) or 2 (stereo); defaults to 1 " + "server-side. Ignored for WAV. The server is authoritative on accepted values." + ), + ) + volume: Optional[float] = Field( + default=None, + description=( + "Playback volume in [0.0, 1.0]. Out-of-range values are clamped by the server " + "(not rejected). Applied once at config time as amplitude scaling; there is no " + "runtime volume control." + ), + ) + + # ============================================================================ # Outgoing Messages (Client -> Server) # ============================================================================ @@ -184,6 +241,14 @@ class ConfigMessage(BaseModel): livekit: Optional[LiveKitConfig] = Field( default=None, description="Optional LiveKit room configuration" ) + loading_audio: Optional[LoadingAudioConfig] = Field( + default=None, + description=( + "Optional loading-indicator clip; the server decodes once at config time. " + "Processed only when audio=True and a livekit config is present. Decode failure is " + "non-fatal -- the server emits a single error message and continues." + ), + ) class SpeakMessage(BaseModel): @@ -228,6 +293,18 @@ class SipTransferMessage(BaseModel): ) +class LoadingStartMessage(BaseModel): + """Message to start the loading-indicator audio loop on the dedicated LiveKit track.""" + + type: Literal["loading_start"] = "loading_start" + + +class LoadingStopMessage(BaseModel): + """Message to stop the loading-indicator audio loop with a short server-side fade-out.""" + + type: Literal["loading_stop"] = "loading_stop" + + # ============================================================================ # Incoming Messages (Server -> Client) # ============================================================================ diff --git a/python-sdk/tests/test_client.py b/python-sdk/tests/test_client.py index 38b4e94..37236fb 100644 --- a/python-sdk/tests/test_client.py +++ b/python-sdk/tests/test_client.py @@ -1,13 +1,22 @@ """Tests for SaynaClient class.""" import logging -from typing import Any +from typing import Any, Optional +from unittest.mock import AsyncMock, MagicMock, patch +import aiohttp import pytest +from pydantic import ValidationError from sayna_client import ( + ErrorMessage, + LiveKitConfig, + LoadingAudioConfig, ParticipantConnectedMessage, SaynaClient, + SaynaConnectionError, + SaynaNotConnectedError, + SaynaNotReadyError, SaynaValidationError, SipTransferErrorMessage, STTConfig, @@ -645,11 +654,338 @@ async def test_sip_transfer_rest_validates_whitespace_transfer_to(self) -> None: await client.sip_transfer_rest("call-room-123", "sip_participant_456", " ") +class _EmptyAsyncIterator: + """Async iterator that yields nothing, used to keep the WebSocket receive loop quiet in tests.""" + + def __aiter__(self) -> "_EmptyAsyncIterator": + return self + + async def __anext__(self) -> Any: + raise StopAsyncIteration + + +async def _capture_connect_config_frame( + *, + loading_audio: Optional[LoadingAudioConfig], +) -> dict[str, Any]: + """Drive ``SaynaClient.connect()`` against a mocked aiohttp stack and capture the config frame. + + Returns the first JSON payload sent to the WebSocket (the ``config`` message). The mocked + WebSocket exposes an immediately-exhausted async iterator so the receive loop completes + without performing any real I/O. + """ + sent_frames: list[dict[str, Any]] = [] + + async def capture(data: dict[str, Any]) -> None: + sent_frames.append(data) + + mock_ws = MagicMock(spec=aiohttp.ClientWebSocketResponse) + mock_ws.closed = False + mock_ws.send_json = AsyncMock(side_effect=capture) + mock_ws.close = AsyncMock() + mock_ws.__aiter__ = lambda _self: _EmptyAsyncIterator() + + mock_session = MagicMock(spec=aiohttp.ClientSession) + mock_session.closed = False + mock_session.ws_connect = AsyncMock(return_value=mock_ws) + mock_session.close = AsyncMock() + + with patch("sayna_client.client.aiohttp.ClientSession", return_value=mock_session): + client = SaynaClient( + url="https://api.example.com", + stt_config=_get_test_stt_config(), + tts_config=_get_test_tts_config(), + livekit_config=LiveKitConfig(room_name="test-room"), + loading_audio=loading_audio, + ) + await client.connect() + try: + assert sent_frames, "connect() did not send a config frame" + return sent_frames[0] + finally: + await client.disconnect() + + +class TestLoadingAudioConstructor: + """Tests for the constructor's ``loading_audio`` argument and config-payload wiring.""" + + def test_valid_loading_audio_config_accepted(self) -> None: + """A LoadingAudioConfig with non-empty data must construct without raising.""" + client = SaynaClient( + url="https://api.example.com", + stt_config=_get_test_stt_config(), + tts_config=_get_test_tts_config(), + loading_audio=LoadingAudioConfig(data="abc"), + ) + assert isinstance(client.loading_audio, LoadingAudioConfig) + assert client.loading_audio.data == "abc" + + def test_empty_data_rejected_by_constructor(self) -> None: + """An empty data string must fail in the constructor with a clear message.""" + with pytest.raises(SaynaValidationError, match=r"loading_audio\.data"): + SaynaClient( + url="https://api.example.com", + stt_config=_get_test_stt_config(), + tts_config=_get_test_tts_config(), + loading_audio=LoadingAudioConfig(data=""), + ) + + def test_pydantic_rejects_non_literal_format_before_client(self) -> None: + """Pydantic rejects an invalid format literal at model-build, before SaynaClient runs.""" + with pytest.raises(ValidationError): + LoadingAudioConfig(data="abc", format="mp3") # type: ignore[arg-type] + + def test_raw_dict_rejected_by_instance_guard(self) -> None: + """A raw dict must trip the strict isinstance guard with the documented message.""" + with pytest.raises( + SaynaValidationError, + match="loading_audio must be a LoadingAudioConfig instance", + ): + SaynaClient( + url="https://api.example.com", + stt_config=_get_test_stt_config(), + tts_config=_get_test_tts_config(), + loading_audio={"data": "abc"}, # type: ignore[arg-type] + ) + + @pytest.mark.asyncio + async def test_no_loading_audio_omitted_from_connect_frame(self) -> None: + """connect() must not include loading_audio in its config frame when unset.""" + frame = await _capture_connect_config_frame(loading_audio=None) + assert frame["type"] == "config" + assert "loading_audio" not in frame + + @pytest.mark.asyncio + async def test_loading_audio_included_in_connect_frame(self) -> None: + """connect() must include the full loading_audio block when supplied.""" + loading = LoadingAudioConfig( + data="QkFTRTY0", + format="wav", + sample_rate=24000, + channels=2, + volume=0.6, + ) + frame = await _capture_connect_config_frame(loading_audio=loading) + assert frame["type"] == "config" + assert frame["loading_audio"] == { + "data": "QkFTRTY0", + "format": "wav", + "sample_rate": 24000, + "channels": 2, + "volume": 0.6, + } + + +def _ready_client_with_capture() -> tuple[SaynaClient, list[dict[str, Any]]]: + """Build a connected+ready client whose _send_json appends payloads to a list.""" + client = SaynaClient( + url="https://api.example.com", + stt_config=_get_test_stt_config(), + tts_config=_get_test_tts_config(), + ) + client._connected = True + client._ready = True + + sent: list[dict[str, Any]] = [] + + async def fake_send_json(data: dict[str, Any]) -> None: + sent.append(data) + + client._send_json = fake_send_json # type: ignore[assignment] + return client, sent + + +class TestLoadingStart: + """Tests for the loading_start WebSocket command.""" + + @pytest.mark.asyncio + async def test_loading_start_requires_connection(self) -> None: + """Calling loading_start before connect raises SaynaNotConnectedError.""" + client = SaynaClient( + url="https://api.example.com", + stt_config=_get_test_stt_config(), + tts_config=_get_test_tts_config(), + ) + with pytest.raises(SaynaNotConnectedError): + await client.loading_start() + + @pytest.mark.asyncio + async def test_loading_start_requires_ready(self) -> None: + """Calling loading_start after connect but before ready raises SaynaNotReadyError.""" + client = SaynaClient( + url="https://api.example.com", + stt_config=_get_test_stt_config(), + tts_config=_get_test_tts_config(), + ) + client._connected = True + with pytest.raises(SaynaNotReadyError): + await client.loading_start() + + @pytest.mark.asyncio + async def test_loading_start_sends_single_payload(self) -> None: + """After ready, loading_start writes exactly one frame with the wire shape.""" + client, sent = _ready_client_with_capture() + await client.loading_start() + assert sent == [{"type": "loading_start"}] + + @pytest.mark.asyncio + async def test_loading_start_wraps_send_failure(self) -> None: + """A transport-level aiohttp.ClientError is wrapped as SaynaConnectionError.""" + client = SaynaClient( + url="https://api.example.com", + stt_config=_get_test_stt_config(), + tts_config=_get_test_tts_config(), + ) + client._connected = True + client._ready = True + + underlying = aiohttp.ClientError("socket broke") + + async def failing_send_json(_data: dict[str, Any]) -> None: + raise underlying + + client._send_json = failing_send_json # type: ignore[assignment] + + with pytest.raises(SaynaConnectionError) as exc_info: + await client.loading_start() + + assert "Failed to send loading_start message" in str(exc_info.value) + assert exc_info.value.cause is underlying + + +class TestLoadingStop: + """Tests for the loading_stop WebSocket command.""" + + @pytest.mark.asyncio + async def test_loading_stop_requires_connection(self) -> None: + """Calling loading_stop before connect raises SaynaNotConnectedError.""" + client = SaynaClient( + url="https://api.example.com", + stt_config=_get_test_stt_config(), + tts_config=_get_test_tts_config(), + ) + with pytest.raises(SaynaNotConnectedError): + await client.loading_stop() + + @pytest.mark.asyncio + async def test_loading_stop_requires_ready(self) -> None: + """Calling loading_stop after connect but before ready raises SaynaNotReadyError.""" + client = SaynaClient( + url="https://api.example.com", + stt_config=_get_test_stt_config(), + tts_config=_get_test_tts_config(), + ) + client._connected = True + with pytest.raises(SaynaNotReadyError): + await client.loading_stop() + + @pytest.mark.asyncio + async def test_loading_stop_sends_single_payload(self) -> None: + """After ready, loading_stop writes exactly one frame with the wire shape.""" + client, sent = _ready_client_with_capture() + await client.loading_stop() + assert sent == [{"type": "loading_stop"}] + + @pytest.mark.asyncio + async def test_loading_stop_wraps_send_failure(self) -> None: + """A transport-level aiohttp.ClientError is wrapped as SaynaConnectionError.""" + client = SaynaClient( + url="https://api.example.com", + stt_config=_get_test_stt_config(), + tts_config=_get_test_tts_config(), + ) + client._connected = True + client._ready = True + + underlying = aiohttp.ClientError("socket broke") + + async def failing_send_json(_data: dict[str, Any]) -> None: + raise underlying + + client._send_json = failing_send_json # type: ignore[assignment] + + with pytest.raises(SaynaConnectionError) as exc_info: + await client.loading_stop() + + assert "Failed to send loading_stop message" in str(exc_info.value) + assert exc_info.value.cause is underlying + + +class TestLoadingErrorPropagation: + """Tests that loading-indicator errors reuse the existing error channel.""" + + @pytest.mark.asyncio + async def test_loading_decode_error_invokes_on_error(self) -> None: + """A server error frame for a loading-decode failure reaches the on_error callback.""" + client = SaynaClient( + url="https://api.example.com", + stt_config=_get_test_stt_config(), + tts_config=_get_test_tts_config(), + ) + received: list[ErrorMessage] = [] + + def on_error(message: ErrorMessage) -> None: + received.append(message) + + client.register_on_error(on_error) + + await client._handle_text_message( + '{"type": "error", "message": "loading_audio.data is not valid base64"}' + ) + + assert len(received) == 1 + assert received[0].type == "error" + assert received[0].message == "loading_audio.data is not valid base64" + + @pytest.mark.asyncio + async def test_async_on_error_callback_is_awaited(self) -> None: + """An ``async def`` on_error must be awaited (not just called) for loading error frames.""" + client = SaynaClient( + url="https://api.example.com", + stt_config=_get_test_stt_config(), + tts_config=_get_test_tts_config(), + ) + awaited_with: list[ErrorMessage] = [] + + async def on_error(message: ErrorMessage) -> None: + # If the callback is only *called*, the coroutine never reaches this line. + awaited_with.append(message) + + client.register_on_error(on_error) + + await client._handle_text_message( + '{"type": "error", "message": "loading_audio.data is not valid base64"}' + ) + + assert len(awaited_with) == 1 + assert awaited_with[0].message == "loading_audio.data is not valid base64" + + +class TestSpeakAndClearDoNotStopLoadingLoop: + """speak() and clear() must remain single-frame; they never emit loading_stop.""" + + @pytest.mark.asyncio + async def test_speak_emits_only_speak_frame(self) -> None: + """Calling speak after ready writes a single speak frame and nothing else.""" + client, sent = _ready_client_with_capture() + await client.speak("hi") + assert len(sent) == 1 + assert sent[0]["type"] == "speak" + assert sent[0]["text"] == "hi" + assert all(frame["type"] != "loading_stop" for frame in sent) + + @pytest.mark.asyncio + async def test_clear_emits_only_clear_frame(self) -> None: + """Calling clear after ready writes a single clear frame and nothing else.""" + client, sent = _ready_client_with_capture() + await client.clear() + assert sent == [{"type": "clear"}] + + # TODO: Add integration tests with mock WebSocket server: -# - Test WebSocket connection with valid config -# - Test WebSocket message sending (speak, clear, tts_flush, send_message, on_audio_input) -# - Test message receiving (ready, stt_result, error, etc.) +# - Test WebSocket message sending (tts_flush, send_message, on_audio_input) +# - Test message receiving (ready, stt_result, etc.) # - Test event callbacks (register_on_tts_audio, register_on_stt_result, etc.) -# - Test error handling and reconnection +# - Test reconnection # - Test proper cleanup on disconnect # - Test REST API methods (health, get_voices, speak_rest, get_livekit_token) diff --git a/python-sdk/tests/test_types.py b/python-sdk/tests/test_types.py index 484a4e5..dd09549 100644 --- a/python-sdk/tests/test_types.py +++ b/python-sdk/tests/test_types.py @@ -15,6 +15,9 @@ LiveKitRoomDetails, LiveKitRoomsResponse, LiveKitRoomSummary, + LoadingAudioConfig, + LoadingStartMessage, + LoadingStopMessage, MuteLiveKitParticipantRequest, MuteLiveKitParticipantResponse, ParticipantConnectedMessage, @@ -1069,3 +1072,146 @@ def test_tts_config_from_dict_with_google_auth(self) -> None: config = TTSConfig(**data) assert isinstance(config.auth, GoogleAuth) assert config.auth.credentials == "/path/to/creds.json" + + +class TestLoadingAudioConfig: + """Tests for LoadingAudioConfig and its integration with ConfigMessage.""" + + def test_loading_audio_full_round_trip(self) -> None: + """All fields populate as supplied and round-trip through model_dump.""" + config = LoadingAudioConfig( + data="QkFTRTY0", + format="wav", + sample_rate=24000, + channels=2, + volume=0.75, + ) + assert config.data == "QkFTRTY0" + assert config.format == "wav" + assert config.sample_rate == 24000 + assert config.channels == 2 + assert config.volume == 0.75 + + dump = config.model_dump(exclude_none=True) + assert dump == { + "data": "QkFTRTY0", + "format": "wav", + "sample_rate": 24000, + "channels": 2, + "volume": 0.75, + } + + # Round-trip through model_validate to confirm the wire shape is parseable. + restored = LoadingAudioConfig.model_validate(dump) + assert restored == config + + def test_loading_audio_minimal_round_trip(self) -> None: + """Only data is required; exclude_none drops every other field.""" + config = LoadingAudioConfig(data="QkFTRTY0") + assert config.data == "QkFTRTY0" + assert config.format is None + assert config.sample_rate is None + assert config.channels is None + assert config.volume is None + + dump = config.model_dump(exclude_none=True) + assert dump == {"data": "QkFTRTY0"} + + def test_loading_audio_rejects_non_literal_format(self) -> None: + """format only accepts the literal 'wav' or 'pcm'.""" + with pytest.raises(ValidationError): + LoadingAudioConfig(data="QkFTRTY0", format="mp3") # type: ignore[arg-type] + + def test_loading_audio_rejects_extra_field(self) -> None: + """extra='forbid' guards against typos like 'volums' silently passing through.""" + with pytest.raises(ValidationError): + LoadingAudioConfig(data="QkFTRTY0", volums=0.5) # type: ignore[call-arg] + + def test_loading_audio_embedded_in_config_message(self) -> None: + """ConfigMessage.model_dump(exclude_none=True) carries loading_audio when supplied.""" + stt = STTConfig( + provider="deepgram", + language="en-US", + sample_rate=16000, + channels=1, + punctuation=True, + encoding="linear16", + model="nova-2", + ) + tts = TTSConfig(provider="deepgram", model="aura-asteria-en") + msg = ConfigMessage( + audio=True, + stt_config=stt, + tts_config=tts, + livekit=LiveKitConfig(room_name="test-room"), + loading_audio=LoadingAudioConfig( + data="QkFTRTY0", + format="pcm", + sample_rate=16000, + channels=1, + volume=0.5, + ), + ) + dump = msg.model_dump(exclude_none=True) + assert "loading_audio" in dump + assert dump["loading_audio"] == { + "data": "QkFTRTY0", + "format": "pcm", + "sample_rate": 16000, + "channels": 1, + "volume": 0.5, + } + + def test_loading_audio_omitted_from_config_message(self) -> None: + """Existing clients omit loading_audio and the wire payload is unchanged.""" + stt = STTConfig( + provider="deepgram", + language="en-US", + sample_rate=16000, + channels=1, + punctuation=True, + encoding="linear16", + model="nova-2", + ) + tts = TTSConfig(provider="deepgram", model="aura-asteria-en") + msg = ConfigMessage( + audio=True, + stt_config=stt, + tts_config=tts, + ) + dump = msg.model_dump(exclude_none=True) + assert "loading_audio" not in dump + + +class TestLoadingStartMessage: + """Tests for the LoadingStartMessage wire-format model.""" + + def test_loading_start_model_dump(self) -> None: + """model_dump produces exactly the wire payload.""" + assert LoadingStartMessage().model_dump() == {"type": "loading_start"} + + def test_loading_start_model_dump_json(self) -> None: + """model_dump_json matches the wire payload.""" + assert LoadingStartMessage().model_dump_json() == '{"type":"loading_start"}' + + def test_loading_start_type_defaults_to_literal(self) -> None: + """No-arg construction populates the literal type field.""" + msg = LoadingStartMessage() + assert msg.type == "loading_start" + + +class TestLoadingStopMessage: + """Tests for the LoadingStopMessage wire-format model.""" + + def test_loading_stop_model_dump(self) -> None: + """model_dump produces exactly the wire payload.""" + assert LoadingStopMessage().model_dump() == {"type": "loading_stop"} + + def test_loading_stop_model_dump_json(self) -> None: + """model_dump_json matches the wire payload.""" + assert LoadingStopMessage().model_dump_json() == '{"type":"loading_stop"}' + + def test_loading_stop_type_defaults_to_literal(self) -> None: + """No-arg construction populates the literal type field.""" + msg = LoadingStopMessage() + assert msg.type == "loading_stop"