Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .cursor/rules/node-sdk.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,11 @@ Unknown or malformed websocket control messages should be logged and ignored so
- `onAudioInput(audioData)` - Send audio for STT
- `sendMessage(message, role, topic?, debug?)` - Send LiveKit data message
- `sipTransfer(transferTo)` - Initiate SIP call transfer
- `loadingStart()` - Begin server-side seamless playback loop of the configured loading audio clip on a dedicated LiveKit track; fire-and-forget; errors surface via `registerOnError`
- `loadingStop()` - Stop the loading-audio loop with a short server-side fade-out; never reports a server-side error

### Loading Indicator
A `LoadingAudioConfig` passed as the constructor's 8th positional argument (or the `saynaConnect()` 7th argument) registers a base64-encoded WAV or raw 16-bit little-endian PCM clip that the server loops on a dedicated `"loading-audio"` LiveKit track while `loadingStart()` is active. The clip is decoded once at config time; the SDK does no audio decoding or file IO. The application must call `loadingStop()` before `speak()` to avoid overlap — neither the SDK nor the server auto-stop the loop on speech. See the Loading Indicator section of `node-sdk/README.md` and `../sayna/docs/websocket.md#loading-indicator`.

## Documentation Reference

Expand Down
3 changes: 3 additions & 0 deletions .cursor/rules/python-sdk.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ Unknown or malformed websocket control messages should be logged and ignored so
- `on_audio_input(audio_data)` - Send audio bytes for STT
- `send_message(message, role, topic=None, debug=None)`
- `tts_flush(allow_interruption=True)`
- `loading_start()` - Async fire-and-forget; tells the server to begin the seamless playback loop of the configured loading-audio clip on a dedicated `"loading-audio"` LiveKit track. Idempotent server-side; failures (no clip configured, audio disabled, no LiveKit, decode failure, track publish failure) surface via `register_on_error`.
- `loading_stop()` - Async fire-and-forget; tells the server to stop the loading-audio loop with a short fade-out. Always silent server-side (no `error` is emitted even if no loop is running).

### Client Properties
- `ready` - Boolean, connection ready state
Expand All @@ -105,6 +107,7 @@ Key models in `types.py`:
- `STTConfig` - Speech-to-text configuration
- `TTSConfig` - Text-to-speech configuration
- `LiveKitConfig` - LiveKit room configuration
- `LoadingAudioConfig` - Loading-indicator audio clip uploaded once at config time via the `loading_audio=` constructor kwarg. Fields: `data` (base64, required), `format: Literal["wav", "pcm"]` (optional), `sample_rate` (optional, required for raw PCM), `channels` (optional), `volume` (optional, clamped to `[0.0, 1.0]`). `extra="forbid"` rejects unknown fields. The server decodes once at config time; decode failures arrive on `register_on_error`.
- `STTResult` - Transcription result
- `VoiceDescriptor` - TTS voice info
- `SipHook` - SIP webhook entry
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ rust-ffi-migration-plan.md
results/
target/
.mcgravity/
*-prd.md
ws-test.ts
92 changes: 84 additions & 8 deletions node-sdk/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ await client.connect();
await client.speak("Hello, world!");
```

The constructor also accepts an 8th positional argument `loadingAudio?: LoadingAudioConfig` for a server-side "thinking" audio loop on a dedicated LiveKit track; see [Loading Indicator](#loading-indicator) below.

## API

### REST API Methods
Expand Down Expand Up @@ -210,15 +212,18 @@ try {

These methods require an active WebSocket connection:

### `new SaynaClient(url, sttConfig, ttsConfig, livekitConfig?, withoutAudio?)`
### `new SaynaClient(url, sttConfig, ttsConfig, livekitConfig?, withoutAudio?, apiKey?, streamId?, loadingAudio?)`

| parameter | type | purpose |
| --------------- | --------------- | ------------------------------------------------------- |
| `url` | `string` | Sayna server URL (http://, https://, ws://, or wss://). |
| `sttConfig` | `STTConfig` | Speech-to-text provider configuration. |
| `ttsConfig` | `TTSConfig` | Text-to-speech provider configuration. |
| `livekitConfig` | `LiveKitConfig` | Optional LiveKit room configuration. |
| `withoutAudio` | `boolean` | Disable audio streaming (defaults to `false`). |
| parameter | type | purpose |
| --------------- | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `url` | `string` | Sayna server URL (http://, https://, ws://, or wss://). |
| `sttConfig` | `STTConfig` | Speech-to-text provider configuration. |
| `ttsConfig` | `TTSConfig` | Text-to-speech provider configuration. |
| `livekitConfig` | `LiveKitConfig` | Optional LiveKit room configuration. |
| `withoutAudio` | `boolean` | Disable audio streaming (defaults to `false`). |
| `apiKey` | `string` | Optional API key for HTTP and WebSocket auth (defaults to `SAYNA_API_KEY` env). |
| `streamId` | `string` | Optional session identifier for recording paths; server generates a UUID when omitted. |
| `loadingAudio` | `LoadingAudioConfig` | Optional "thinking" audio clip sent in the initial `config` frame; loops on a dedicated LiveKit track when `loadingStart()` runs. See [Loading Indicator](#loading-indicator). |

### `await client.connect()`

Expand Down Expand Up @@ -280,6 +285,77 @@ Sends a message to the Sayna session with role and optional metadata.

Clears the text-to-speech queue.

### Loading Indicator

The loading indicator loops a short audio clip into the LiveKit room while the application is "thinking" (e.g. while a large-language-model call is in flight). The clip is decoded once on the server when the WebSocket `config` frame is sent and replayed seamlessly on a dedicated LiveKit audio track named `"loading-audio"`, which is separate from the speech track `"tts-audio"`. STT and TTS streams are unaffected by the loop. See [`../sayna/docs/websocket.md#loading-indicator`](../sayna/docs/websocket.md#loading-indicator) for the authoritative protocol definition.

The clip is configured through the `LoadingAudioConfig` object passed as the 8th positional argument to the `SaynaClient` constructor:

```typescript
interface LoadingAudioConfig {
/** Base64-encoded WAV or raw 16-bit little-endian PCM. Required. */
data: string;
/** Container hint; omit to let the server auto-detect from the RIFF/WAVE signature. */
format?: "wav" | "pcm";
/** Sample rate in Hz. Required for raw PCM; ignored for WAV. */
sample_rate?: number;
/** Channel count for raw PCM. Defaults to 1 server-side; ignored for WAV. */
channels?: 1 | 2;
/** Playback volume in [0.0, 1.0]. Defaults to 1.0; clamped server-side. */
volume?: number;
}
```

The SDK does not read files or decode audio. Encode the clip to base64 in your own application code, e.g. with `fs/promises`:

```typescript
import { readFile } from "node:fs/promises";

const data = (await readFile("./loading.wav")).toString("base64");
```

Full call flow:

```typescript
import { readFile } from "node:fs/promises";
import { SaynaClient } from "@sayna/node-sdk";

const data = (await readFile("./loading.wav")).toString("base64");

const client = new SaynaClient(
"https://api.sayna.ai",
{ provider: "deepgram", model: "nova-2" },
{ provider: "cartesia", voice_id: "example-voice" },
{ room_name: "my-room" },
false, // withoutAudio
undefined, // apiKey (defaults to SAYNA_API_KEY env)
undefined, // streamId
{ data, format: "wav" } // loadingAudio (8th positional argument)
);

await client.connect();

// ...on user turn complete:
client.loadingStart();
// ...application does its "thinking" (LLM call, tool invocation, etc.)...
client.loadingStop();
await client.speak("Here is the answer.");
```

The application is responsible for calling `loadingStop()` before `speak()`. The SDK and server deliberately do **not** auto-stop the loop on `speak()` or `clear()` — overlapping the indicator with the answer would otherwise play both clips on top of each other.

Failures — `LoadingAudioConfig` decode failures detected at config time, and `loading_start` failures (audio disabled, no LiveKit room, no `loadingAudio` configured, track failed to publish) — arrive on the existing `registerOnError(callback)` channel. There is no separate `loading_error` event.

If the LiveKit room reconnects while the loop was running (publisher timeout, network blip), the loop stops. The SDK does **not** auto-restart it — the application must call `loadingStart()` again to resume.

### `client.loadingStart()`

Begins the server-side seamless playback loop of the configured loading clip on the dedicated `"loading-audio"` LiveKit track. Fire-and-forget: any server-side rejection (audio disabled, no LiveKit room, no `loadingAudio` configured, track failed to publish) arrives asynchronously through `registerOnError(callback)`. Throws `SaynaNotConnectedError` / `SaynaNotReadyError` if invoked before the connection is ready, and `SaynaConnectionError` if the transport fails to send the frame.

### `client.loadingStop()`

Stops the loading-audio loop with a short server-side fade-out. The server never returns an `error` for this command (stopping a non-running loop is a no-op). Throws the same connection-state errors as `loadingStart()`; `disconnect()` does not call it for you.

### `await client.ttsFlush(allowInterruption?)`

Flushes the TTS queue by sending an empty speak command.
Expand Down
3 changes: 2 additions & 1 deletion node-sdk/bun.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 15 additions & 3 deletions node-sdk/src/index.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
import { SaynaClient } from "./sayna-client";
import type { STTConfig, TTSConfig, LiveKitConfig } from "./types";
import type {
STTConfig,
TTSConfig,
LiveKitConfig,
LoadingAudioConfig,
} from "./types";

export * from "./sayna-client";
export * from "./types";
Expand All @@ -19,6 +24,10 @@ export * from "./webhook-receiver";
* @param livekitConfig - Optional LiveKit room configuration
* @param withoutAudio - If true, disables audio streaming (default: false)
* @param apiKey - Optional API key used to authorize HTTP and WebSocket calls (defaults to SAYNA_API_KEY env)
* @param loadingAudio - Optional loading-indicator clip sent inside the initial `config` frame. The
* server decodes it once at config time and loops it on a dedicated LiveKit audio track when
* `loadingStart()` is invoked. Only effective when `withoutAudio=false` and `livekitConfig` is
* supplied. See the Loading Indicator section of `../sayna/docs/websocket.md` for the protocol contract.
*
* @returns Promise that resolves to a connected SaynaClient
*
Expand Down Expand Up @@ -72,15 +81,18 @@ export async function saynaConnect(
ttsConfig?: TTSConfig,
livekitConfig?: LiveKitConfig,
withoutAudio: boolean = false,
apiKey?: string
apiKey?: string,
loadingAudio?: LoadingAudioConfig
): Promise<SaynaClient> {
const client = new SaynaClient(
url,
sttConfig,
ttsConfig,
livekitConfig,
withoutAudio,
apiKey
apiKey,
undefined /* streamId */,
loadingAudio
);
await client.connect();
return client;
Expand Down
Loading