Constrained Wavefront Optimization for Synchronized Co-Speech Gestures in Humanoid Robots
Full.scenes.mov
WaveSync is a framework for generating temporally synchronized co-speech gestures in humanoid robots using constrained wavefront optimization. It enables natural, expressive gesture synthesis that aligns with the rhythm and semantics of speech in real time.
Clone the repository and install the required Python dependencies:
git clone https://github.com/your-org/wavesync.git
cd wavesync
pip install -r requirements.txtBefore running any simulations, download the required data and pretrained model weights.
Step 1 — Download assets from Google Drive:
Step 2 — Place the folders in the root directory:
wavesync/
├── data/ ← downloaded
├── out/
│ └── models/ ← downloaded
├── scene/
├── execute.py
└── requirements.txt
See docs/structure.png for a visual reference.
Run the core simulation via execute.py by passing a scene configuration file from the scene/ directory.
python execute.py -s scene1.json
python execute.py -s scene2.json
python execute.py -s scene3.json
python execute.py -s scene4.json
python execute.py -s scene5.jsonEach scene file defines a unique speech-gesture scenario. You can customize or create new scene configurations under scene/.
wavesync/
├── data/ # Input speech and motion data
├── out/
│ └── models/ # Pretrained model weights
├── scene/ # Predefined scene configurations (JSON)
├── docs/ # Documentation and figures
├── execute.py # Main entry point
└── requirements.txt # Python dependencies
If you find WaveSync useful in your research, please cite our paper:
@article{viet2026wavesync,
title = {WaveSync: Constrained Wavefront Optimization for Synchronized Co-Speech Gestures in Humanoid Robots},
author = {Thang Tran Viet and Thanh Nguyen Canh and Gia Huy Uong and Phuc Van Dinh and Tan Viet Tuyen Nguyen and Xiem HoangVan and Nak Young Chong},
journal = {arXiv preprint arXiv:2606.16600},
year = {2026}
}