Skip to content

Audio Data Support (Similar to the ImageArray) #196

Description

@OlgaOvcharenko

Problem Statement

Currently, audio data is not supported. Therefore, it would be great if semantic operators supported audio input (.wav, .mp4, .mp3).

Proposed Solution

A possible solution would be to create the AudioArray class (similar to the existing AudioArray). A possible model that supports audio is GPT-4o.

Use Cases

A specific use case is, for instance, semantic filtering based on sound, generation of a label from the sound, e.g., for the multi-modal emotion recognition, animal sound detection, or multi-modal electronic health records that include relational tables, images, and audio.

Alternative Solutions

I tried to use ImageArray, but it does not accept .wav files.

Additional Context

Checklist

  • I have searched existing issues to avoid duplicates
  • I have provided a clear problem statement
  • I have considered alternative solutions
  • I have assessed the impact and priority
  • I am willing to contribute to implementation (if applicable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions