MoViD: View-Invariant 3D Human Pose Estimation via Motion-View Disentanglement

This is the repository for the SenSys '26 paper "MoViD: View-Invariant 3D Human Pose Estimation via Motion-View Disentanglement".

Paper DOI | Paper PDF | Demo Video | Camera-Ready Guide

Introduction

MoViD is a viewpoint-invariant 3D human pose estimation framework for robust human motion understanding under large camera changes, severe occlusions, and edge deployment constraints. Instead of treating viewpoint variation as a nuisance handled only through more data, MoViD explicitly estimates and disentangles view information from motion features to produce view-invariant pose representations.

The framework combines a dedicated view estimator, motion-view orthogonal projection, physics-enhanced contrastive alignment, and a frame-by-frame view-aware inference strategy for edge deployment. Across nine public datasets and two newly collected datasets, MoViD reduces pose estimation error by over 24.2% versus prior state-of-the-art methods, remains robust with 60% less training data, and reaches real-time inference at 15 FPS on NVIDIA edge devices.

This repository is organized as the camera-ready code release. It keeps the maintained training pipeline, offline inference pipeline, stream inference pipeline, Python API wrapper, and edge-side runtime in a single public tree.

System Overview

MoViD introduces four main components for view-invariant and edge-ready 3D pose estimation:

View Estimator: predicts viewpoint information from intermediate 3D pose features by modeling key joint relationships.
Motion-View Orthogonal Projection: explicitly disentangles motion and view features so the motion branch remains stable across camera changes.
Physics-Enhanced Contrastive Alignment: improves cross-view consistency with contrastive supervision and motion-aware physical constraints.
View-Aware Edge Inference: runs frame-by-frame inference and adaptively enables flip refinement only when the estimated viewpoint requires it.

Quick Start

Main Repository

1. Clone the repository

git clone <your-movid-repo-url> --recursive
cd MoViD

2. Install the environment

bash scripts/setup/install_environment.sh

If you also want action recognition support:

bash scripts/setup/install_pyskl.sh
python tools/action/download_stgcn_model.py

3. Download demo assets and default checkpoints

bash scripts/setup/fetch_demo_data.sh

This script downloads the demo video, default checkpoints, and required body-model assets. It will prompt for SMPL / SMPLify credentials when needed.

4. Run offline inference

python demo.py \
  --video examples/demo_video.mp4 \
  --output_pth output/demo \
  --visualize

To run the helper with action recognition:

bash scripts/demo/run_demo_with_har.sh examples/demo_video.mp4 output/demo_har

5. Run stream inference

python demo.py \
  --video examples/demo_video.mp4 \
  --mode stream \
  --stream_window_size 10 \
  --output_pth output/stream \
  --visualize

6. Run the API wrapper

python movid_api.py \
  --video examples/demo_video.mp4 \
  --output_dir output/api_demo \
  --visualize

7. Train and evaluate

Train:

python train.py --cfg configs/yamls/stage2.yaml

If multi-worker dataloaders are restricted:

python train.py --cfg configs/yamls/stage2.yaml NUM_WORKERS 0

Evaluate:

bash scripts/eval/run_eval.sh 3dpw checkpoints/movid_vit_w_3dpw.pth.tar

Edge Runtime

Offline edge inference:

python MoViD_edge/demo.py \
  --video examples/demo_video.mp4 \
  --output_pth output/edge_demo \
  --visualize

Real-time / streaming edge inference:

python MoViD_edge/real_time.py \
  --video realsense \
  --output_pth output/edge_rt \
  --visualize \
  --max_frames 1000

Flip-eval streaming:

python MoViD_edge/real_time.py \
  --video realsense \
  --output_pth output/edge_rt_flip \
  --visualize \
  --flip_eval \
  --flip_select all \
  --max_frames 1000

Repository Layout

.
|-- configs/
|-- docs/
|   |-- assets/
|   |-- guides/
|   |-- API.md
|   |-- CAMERA_READY.md
|   |-- DATASET.md
|   `-- INSTALL.md
|-- lib/
|-- models/
|   `-- action_recognition/
|-- scripts/
|   |-- demo/
|   |-- eval/
|   |-- setup/
|   `-- train/
|-- tools/
|   |-- action/
|   |-- data/
|   `-- eval/
|-- MoViD_edge/
|-- third-party/
|-- batch_eval.py
|-- demo.py
|-- movid_api.py
`-- train.py

Documentation

Release Notes

Large runtime artifacts such as datasets, checkpoints, logs, outputs, videos, and TensorRT engines are intentionally excluded from Git.
The maintained edge deployment workflow lives under MoViD_edge/, while the full training and evaluation pipeline remains at the repository root.
Upstream dependencies such as DPVO and ViTPose stay under third-party/ with their original licenses and attribution.

Citation

If you use MoViD in your research, please cite:

@inproceedings{liu2026movid,
  title={MoViD: View-Invariant 3D Human Pose Estimation via Motion-View Disentanglement},
  author={Liu, Yejia and Jiang, Hengle and Liu, Haoxian and Huang, Runxi and Ouyang, Xiaomin},
  booktitle={Proceedings of the ACM/IEEE International Conference on Embedded Artificial Intelligence and Sensing Systems},
  series={SenSys '26},
  year={2026},
  doi={10.1145/3774906.3802786}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoViD: View-Invariant 3D Human Pose Estimation via Motion-View Disentanglement

Introduction

System Overview

Quick Start

Main Repository

1. Clone the repository

2. Install the environment

3. Download demo assets and default checkpoints

4. Run offline inference

5. Run stream inference

6. Run the API wrapper

7. Train and evaluate

Edge Runtime

Repository Layout

Documentation

Release Notes

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
MoViD_edge		MoViD_edge
configs		configs
docs		docs
lib		lib
models/action_recognition		models/action_recognition
scripts		scripts
third-party		third-party
tools		tools
.codex		.codex
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
batch_eval.py		batch_eval.py
demo.py		demo.py
movid_api.py		movid_api.py
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

MoViD: View-Invariant 3D Human Pose Estimation via Motion-View Disentanglement

Introduction

System Overview

Quick Start

Main Repository

1. Clone the repository

2. Install the environment

3. Download demo assets and default checkpoints

4. Run offline inference

5. Run stream inference

6. Run the API wrapper

7. Train and evaluate

Edge Runtime

Repository Layout

Documentation

Release Notes

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages