MSD is a benchmark for evaluating disentangled representation learning on sequential data (e.g., video, audio, time series). It supports datasets with both static and dynamic factors, and includes tools for automatic annotation, model evaluation, and visualization.
We recommend using a conda environment:
conda create -n msd python=3.9.21
conda activate msd
pip install -r requirements.txtAfter installation, edit the configurations/meta.yaml file and set the msd_root field to the absolute path where all outputs (e.g., logs, models, results) should be saved.
MSD is modular and consists of the following major components:
- Datasets: Loaders for synthetic and real-world data. See docs/datasets.md
- Methods: Plug-and-play training pipelines for disentanglement models. See docs/methods.md
- Latent Exploration: Analyzes latent codes to discover factor-aligned dimensions. See docs/latent_exploration.md
- Evaluation: Metrics and visualizations for judging model performance. See docs/evaluation_metrics.md
- VLM Tools: Use vision-language models (e.g., GPT-4) for automatic annotation and factor classification. See docs/vlm_module.md
- Configuration: YAML-driven setup for all runs. See docs/configuration.md
All training and evaluation is handled by the msd/run.py script.
PYTHONPATH="." python msd/run.py --run_config configurations/methods/ssm_skd/ssm_skd_sprites.yaml --trainPYTHONPATH="." python msd/run.py --run_config configurations/methods/ssm_skd/ssm_skd_sprites.yaml --evalThe model will be automatically loaded from the checkpoint_dir path specified in the configuration file.
See Hugging Face: TalBarami/msd_models
Use msd/auto_annotate.py to automatically discover and label factor spaces in new datasets using a vision-language model.
PYTHONPATH="." python msd/auto_annotate.py \
--ds_path /path/to/dataset.h5 \
--subset train \
--n_exploration 500 \
--n_annotation 500 \
--out_dir /path/to/output \
--ds_name my_datasetFor details, see docs/vlm_module.md
| Component | Authors' |
|---|---|
| CPU | AMD EPYC 7002 Series Processor |
| CPU architecture | x86_64 |
| RAM | 256 GB |
| Linux kernel version | 5.14.0 |
| glibc version | 2.34 |
| GPU | NVIDIA GeForce RTX 4090 (Gigabyte) |
| NVIDIA VBIOS version | 95.02.3C.C0.93 |
| NVIDIA driver version | 565.57.01 |
| CUDA version | 12.6 |
| cuDNN version | 9.5.1 |
| Python version | 3.9.21 |
| pip version | 25.1 |
| NumPy version | 1.26.4 |
| PyTorch version | 2.7.0 |
| Python package versions | requirements.txt |
T. Barami, N. Berman, I. Naiman, A. H. Hason, R. Ezra, O. Azencot, "Disentanglement Beyond Static vs. Dynamic: A Benchmark and Evaluation Framework for Multi-Factor Sequential Representations" in Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 2025.
For questions or contributions, feel free to open an issue or contact the authors.
