[ICML 2026] AVTrack: Audio-Visual Tracking in Human-centric Complex Scenes
-
Updated
May 14, 2026 - Python
[ICML 2026] AVTrack: Audio-Visual Tracking in Human-centric Complex Scenes
Track human speakers in complex scenes using this audio-visual instance segmentation dataset.
Add a description, image, and links to the multomodal topic page so that developers can more easily learn about it.
To associate your repository with the multomodal topic, visit your repo's landing page and select "manage topics."