The BSD10k dataset (Broad Sound Dataset 10k) is an open collection of human-labeled sounds containing over 10k Freesound audio clips, annotated according to the 23 second-level classes defined in the Broad Sound Taxonomy. The dataset has been created at the Music Technology Group of Universitat Pompeu Fabra.
The dataset comprises heterogeneous sounds from Freesound.
The sounds are cropped to a maximum length of 30 seconds, resulting in variable durations ranging from 0.01 to 30s.
Audio lengths vary due to the heterogeneity of the sound classes and the range of contributions from Freesound users.
The original files downloaded from Freesound are converted to a standardized format of uncompressed WAV files with 44.1 kHz sampling rate, 16-bit depth, and mono channel.
All sounds have been manually labeled by human annotators.
BSD10k categorizes the sounds into 23 classes, which are the second-level categories of the Broad Sound Taxonomy (see details below). The annotated data has a non-uniform distribution across these categories.
The dataset and its respective versions are available in the Zenodo archive.
For each audio file, the latest version of the dataset includes the following: the category label assigned during annotation and its confidence score, descriptive metadata (title, tags, description), and provenance information (ID, uploader, license).
We also provide precomputed embeddings to facilitate further analysis and reproducibility.
For more details on the dataset creation and its contents, please refer to our paper "Heterogeneous Sound Classification with the Broad Sound Taxonomy and Dataset" (Section 3.1).
The second version of the dataset is first presented in "Hierarchical and Multimodal Learning for Heterogeneous Sound Classification".
- Sound count: 10,956
- Total hours: 35.25
- Metadata fields: ID, class (code, index, top level), confidence, title, tags, description, uploader, license
- Features (precomputed): CLAP audio embeddings, CLAP text embeddings
- Notes: Added description, confidence score, precomputed embeddings
- Sound count: 10,309
- Total hours: 32.5
- Metadata fields: ID, class (code, index, top level), title, tags, uploader, license
- Notes: Initial version
The Broad Sound Taxonomy (BST) organizes sounds into a two-level hierarchical structure with 5 top-level and 23 second-level classes. The top-level categories cover distinct types of sounds: Music, Instrument samples, Speech, Sound effects, and Soundscapes. The taxonomy is designed to classify any type of sound while remaining broad, comprehensive, and easy to use. It can be used for organizing and filtering sounds in heterogeneous sound collections, such as Freesound, as well as in personal sound libraries. Additional information about the taxonomy, including its design principles, detailed taxonomy creation methodology and evaluation, is provided in the journal article "A General-Purpose Sound Taxonomy for the Classification of Heterogeneous Sound Collections".
The audio files and metadata of the dataset can be downloaded from Zenodo: Access BSD10k data
[TODO]
If you use this dataset, please cite our papers:
Anastasopoulou, P., Torrey, J., Serra, X., & Font, F. (2024). Heterogeneous sound classification with the Broad Sound Taxonomy and Dataset. In Proc. Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE).
Anastasopoulou, P., Dal Ri, F., Serra, X., & Font, F. (2025). Hierarchical and multimodal learning for heterogeneous sound classification. In Proc. Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE).
BSD10k is released in its entirety under the CC BY 4.0 license.
We note, though, that each audio file is released under its own Creative Commons (CC) license, as defined by the uploader in Freesound.
Some sounds require attribution to their original authors, while others forbid commercial reuse.
If the dataset is used in a commercial setting, the sounds with CC-BY-NC licenses should be excluded.
Links to the license deeds for each sound can be further accessed through the metadata files of each version.
