Audio Data Support (Similar to the ImageArray)

## Problem Statement
Currently, audio data is not supported. Therefore, it would be great if semantic operators supported audio input (.wav, .mp4, .mp3). 



## Proposed Solution
A possible solution would be to create the AudioArray class (similar to the existing AudioArray). A possible model that supports audio is GPT-4o.



## Use Cases
A specific use case is, for instance, semantic filtering based on sound, generation of a label from the sound, e.g., for the multi-modal emotion [recognition](https://www.kaggle.com/datasets/alenken/multimodal-emotion-recognition-ravdess?select=ravdess), [animal sound detection](https://www.kaggle.com/datasets/rushibalajiputthewad/sound-classification-of-animal-voice), or multi-modal electronic health records that include relational tables, images, and audio.



## Alternative Solutions


I tried to use ImageArray, but it does not accept .wav files.


## Additional Context



## Checklist

- [X] I have searched existing issues to avoid duplicates
- [X] I have provided a clear problem statement
- [X] I have considered alternative solutions
- [ ] I have assessed the impact and priority
- [X] I am willing to contribute to implementation (if applicable)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Audio Data Support (Similar to the ImageArray) #196

Problem Statement

Proposed Solution

Use Cases

Alternative Solutions

Additional Context

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Audio Data Support (Similar to the ImageArray) #196

Description

Problem Statement

Proposed Solution

Use Cases

Alternative Solutions

Additional Context

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions