This Viam module provides audio input and output capabilities using the PortAudio library. It can capture and play audio from microphones and speakers on your machine.
- Darwin ARM64
- Linux x64
- Linux ARM64
The following attribute template can be used to configure this model:
{
"device_id": <DEVICE_ID>,
"device_name" : <DEVICE_NAME>,
"sample_rate": <SAMPLE_RATE>,
"num_channels": <NUM_CHANNELS>,
"latency": <LATENCY>
}The following attributes are available for the viam:audio:microphone model:
| Name | Type | Inclusion | Description |
|---|---|---|---|
device_id |
string | Optional | Stable device id from discovery (survives reboots). Takes precedence over device_name when both are set. If the id cannot be resolved on the current system, logs a warning and falls through to device_name (or the system default). |
device_name |
string | Optional | The PortAudio device name to stream audio from. Used when device_id is not set. If neither is specified, the system default will be used. |
sample_rate |
int | Optional | The sample rate in Hz of the stream. If not specified, the device's default sample rate will be used. |
num_channels |
int | Optional | The number of audio channels to capture. Must not exceed the device's maximum input channels. Default: 1 |
latency |
int | Optional | Suggested input latency in milliseconds. This controls how much audio PortAudio buffers before making it available. Lower values (5-20ms) provide more responsive audio capture but use more CPU time. Higher values (50-100ms) are more stable but less responsive. If not specified, uses the device's default low latency setting (typically 10-20ms). |
historical_throttle_ms |
int | Optional | Delay in milliseconds between chunks when streaming historical audio data using the previous_timestamp parameter (default: 50ms). Gives clients adequate time to process buffered audio data. |
The following attribute template can be used to configure this model:
{
"device_id": <DEVICE_ID>,
"device_name" : <DEVICE_NAME>,
"sample_rate": <SAMPLE_RATE>,
"num_channels": <NUM_CHANNELS>,
"latency": <LATENCY>
}The following attributes are available for the viam:audio:speaker model:
| Name | Type | Inclusion | Description |
|---|---|---|---|
device_id |
string | Optional | Stable device id from discovery (survives reboots). Takes precedence over device_name when both are set. If the id cannot be resolved on the current system, logs a warning and falls through to device_name (or the system default). |
device_name |
string | Optional | The PortAudio device name to play audio from. Used when device_id is not set. If neither is specified, the system default will be used. |
sample_rate |
int | Optional | The sample rate in Hz of the output stream. If not specified, the device's default sample rate will be used. |
num_channels |
int | Optional | The number of audio channels of the output stream. Must not exceed the device's maximum output channels. Default: 1 |
latency |
int | Optional | Suggested output latency in milliseconds. This controls how much audio PortAudio buffers before making it available. Lower values (5-20ms) provide faster audio output but use more CPU time. Higher values (50-100ms) are more stable but less responsive. If not specified, uses the device's default low latency setting (typically 10-20ms). |
volume |
int | Optional | Output volume as percentage (0-100). Supported on Linux devices only. On macOS, use the system volume controls (keyboard keys). |
The speaker supports the following DoCommands:
set_volume — Set the speaker output volume.
{"set_volume": 75}- Value must be between 0 and 100.
- Linux only. On macOS, use the system volume controls (keyboard keys).
- Returns:
{"volume": 75}
stop — Immediately stop audio playback.
{"stop": true}- Interrupts any in-progress
Playcall and silences the output. - Returns:
{"stopped": true}
This model is used to discover audio devices on your machine. No configuration is needed, expand the test card or look at the discovery control card to obtain configurations for all connected audio devices.
Each discovered device is returned as a component config with the standard
microphone/speaker attributes (device_name, sample_rate, num_channels)
plus a device_id attribute.
device_id is a best-effort OS-provided identifier for the underlying
hardware intended to be stable across reboots. It is informational — the
microphone and speaker components still open the device via device_name.
Format and stability depend on the platform:
| Platform | Source | Example |
|---|---|---|
| macOS | Core Audio kAudioDevicePropertyDeviceUID |
BuiltInMicrophoneDevice |
| Linux, udev by-id | /dev/snd/by-id/ symlink |
by-id:usb-Logitech_USB_Headset_A00000000000-00 |
| Linux, udev by-path | /dev/snd/by-path/ symlink |
by-path:pci-0000:00:14.0-usb-0:1.3:1.0 |
| Linux, fallback | /sys/class/sound/cardN/id |
alsa-card:PCH |
Resolution order on Linux is by-id (descriptor-based, survives USB port
moves) → by-path (topology-based, stable across reboots but breaks on port
moves) → card id fallback (used when udev doesn't populate the above).
Only ALSA hw:X,Y devices are resolved; virtual endpoints (default,
pulse, etc.) get an empty id. The attribute is always present so callers
can rely on it.
All audio data uses little-endian byte order. The specific format depends on the codec requested:
Supported codecs:
PCM_16: 16-bit signed integer PCM (range: -32768 to 32767)PCM_32: 32-bit signed integer PCM (range: -2147483648 to 2147483647)PCM_32_FLOAT: 32-bit floating point PCM (range: -1.0 to 1.0)MP3: MP3 compressed audio
All audio data is in interleaved format - multi-channel samples are stored sequentially:
-
Mono (1 channel):
[S0, S1, S2, ...] -
Stereo (2 channels):
[L0, R0, L1, R1, L2, R2, ...](left and right samples alternate) -
Microphone (
get_audio): Returns audio data in interleaved format -
Speaker (
play): Expects audio data in interleaved format
Any config change terminates in-flight streams. Callers must handle the error and resubmit the request to resume.
canon make setupcanon makecanon make build