MDM from Scratch

Minimal PyTorch re-implementation of "Human Motion Diffusion Model" (Tevet et al., ICLR 2023), built from scratch for learning purposes.

The full official implementation is at GuyTevet/motion-diffusion-model.

Demo

Action 3 (jump), trained on HumanAct12Poses for 30,000 epochs on a single RTX 2080 Ti.

What Is This?

MDM generates human motion sequences (e.g., a person walks forward) using a diffusion model — a generative approach that learns to reverse a process which gradually corrupts data with noise.

This repository is a minimal, from-scratch implementation that isolates the core mechanics: the noise scheduler, the Transformer-based denoising model, and the training loop. It intentionally omits CLIP/BERT text encoding, real datasets, and evaluation pipelines so the essential ideas stay readable.

Architecture

The three components and how they connect:

graph LR
    PE["PositionalEncoding\n(model.py)"]
    MDM["MDM\n(model.py)"]
    NS["NoiseScheduler\n(scheduler.py)"]
    TS["train_step()\n(train_step.py)"]

    PE --> MDM
    TS -->|"① add_noise(x₀, ε, t) → x_t"| NS
    NS -->|"x_t"| TS
    TS -->|"② forward(x_t, t, action) → pred_x₀"| MDM
    MDM -->|"pred_x₀"| TS

Data flow — one training step

graph LR
    x0["x₀ · Clean motion\n[B, F, J×3]"]
    eps["ε ~ N(0, I)\n[B, F, J×3]"]
    xt["x_t · Noisy motion\n[B, F, J×3]"]
    pred["pred_x₀\n[B, F, J×3]"]
    loss["MSE Loss"]
    optim["Adam update"]

    x0 --> xt
    eps --> xt
    xt -->|"MDM.forward"| pred
    pred --> loss
    x0 -->|"target"| loss
    loss --> optim
    optim -->|"∇θ"| pred

Inside `MDM.forward()`

action_class ──► Embedding          ──► [B, 1, 512] ─┐
t            ──► Linear → SiLU → Linear ──► [B, 1, 512] ─┤ torch.cat ──► [B, F+2, 512]
x_t          ──► Linear             ──► [B, F, 512] ─┘
                                                         │
                                              PositionalEncoding
                                                         │
                                         TransformerEncoder (8 layers)
                                                         │
                              remove first 2 tokens  ──► [B, F, 512]
                                                         │
                                              Linear  ──► [B, F, J×3]

Theory in Brief

MDM is built on DDPM. The forward process adds Gaussian noise to clean motion $x_0$ step by step. In closed form, the noisy motion at any timestep $t$ can be sampled directly:

$$x_t = \sqrt{\bar{\alpha}_t}, x_0 + \sqrt{1 - \bar{\alpha}_t}, \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, \mathbf{I})$$

where $\bar{\alpha}t = \prod{s=1}^{t}(1 - \beta_s)$ and $\beta_s$ follows a linear schedule from $0.0001$ to $0.02$ over 1000 steps.

The model $f_\theta$ is trained to recover the clean motion from the noisy input. The training objective is:

$$\mathcal{L} = \mathbb{E}_{x_0,, t,, \varepsilon}!\left[\left| x_0 - f_\theta(x_t, t, a) \right|^2\right]$$

where $a$ is the action condition. See docs/decisions.md for why $x_0$-prediction was chosen over noise-prediction.

File Structure

mdm-scratch/
├── model.py          # MDM model: Transformer + PositionalEncoding
├── scheduler.py      # NoiseScheduler: linear beta schedule, add_noise(), step()
├── train.py          # Full training loop on HumanAct12Poses
├── sample.py         # Inference: load checkpoint and generate motion
├── visualise.py      # 3D skeleton visualization → animated GIF (matplotlib)
├── README.md         # This file (English)
├── README_ja.md      # Japanese version
├── examples/
│   ├── train_step.py    # Demo: single training step with dummy data
│   └── sample_step.py   # Demo: single sampling pass (reverse diffusion)
├── tests/
│   ├── test_model.py      # Unit tests for MDM
│   └── test_scheduler.py  # Unit tests for NoiseScheduler
├── .github/workflows/
│   └── test.yml      # GitHub Actions CI: runs pytest on push
├── docs/
│   ├── decisions.md     # Architecture Decision Records (English)
│   └── decisions_ja.md  # Architecture Decision Records (Japanese)
└── assets/
    └── demo_jump.gif    # Generated motion demo (action 3: jump)

Quick Start

# 1. Create and activate a virtual environment
python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate

# 2. Install PyTorch (CPU is fine for this step)
pip install torch

# 3. Run one training step (smoke test)
python examples/train_step.py

# 4. Run full training on HumanAct12Poses
python train.py

# 5. Generate motion from a trained checkpoint
python sample.py --checkpoint checkpoints/mdm_final.pth --action_id 3

# 6. Visualize generated motion as an animated GIF
python visualise.py --input output/generated_action3_samples4.npy --output assets/demo.gif --title "MDM - Jump"

# 7. Run unit tests
pytest tests/ -v

Expected output (train.py):

--- トレーニング開始 (device: cuda) ---
Epoch 1/30000, Loss: 0.5234  MSE: 0.0476  Vel: 0.0476
...
Epoch 5000/30000, Loss: 0.0312  MSE: 0.0028  Vel: 0.0028
  -> checkpoint: checkpoints/mdm_epoch5000.pth
...
トレーニング完了。モデルを checkpoints/mdm_final.pth に保存しました。

Scope

This implementation covers the core training loop only.

Feature	This repo	`reference/`
Transformer-based denoising model	✅	✅
Action-conditioned generation	✅	✅
Forward diffusion (`add_noise`)	✅	✅
Reverse diffusion (sampling loop)	✅	✅
Full training loop with real data	✅ (HumanAct12Poses)	✅
Unit tests + CI (GitHub Actions)	✅	❌
Text conditioning (CLIP / BERT)	❌	✅
Large-scale datasets (HumanML3D, KIT)	❌	✅
Evaluation metrics (FID, R-Precision)	❌	✅
3D skeleton visualization (matplotlib)	✅	❌
SMPL mesh rendering	❌	✅

Design decisions and trade-offs are documented in docs/decisions.md. Feature comparison is against the official implementation.

Reference

@inproceedings{tevet2023human,
  title     = {Human Motion Diffusion Model},
  author    = {Guy Tevet and Sigal Raab and Brian Gordon and Yoni Shafir
               and Daniel Cohen-or and Amit Haim Bermano},
  booktitle = {The Eleventh International Conference on Learning Representations},
  year      = {2023},
  url       = {https://openreview.net/forum?id=SJ1kSyO2jwu}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MDM from Scratch

Demo

What Is This?

Architecture

Data flow — one training step

Inside `MDM.forward()`

Theory in Brief

File Structure

Quick Start

Scope

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
examples		examples
tests		tests
.gitignore		.gitignore
README.md		README.md
README_ja.md		README_ja.md
conftest.py		conftest.py
model.py		model.py
requirements.txt		requirements.txt
sample.py		sample.py
scheduler.py		scheduler.py
train.py		train.py
visualise.py		visualise.py

Folders and files

Latest commit

History

Repository files navigation

MDM from Scratch

Demo

What Is This?

Architecture

Data flow — one training step

Inside MDM.forward()

Theory in Brief

File Structure

Quick Start

Scope

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Inside `MDM.forward()`

Packages