This is the repository for the SenSys '26 paper "MoViD: View-Invariant 3D Human Pose Estimation via Motion-View Disentanglement".
Paper DOI | Paper PDF | Demo Video | Camera-Ready Guide
MoViD is a viewpoint-invariant 3D human pose estimation framework for robust human motion understanding under large camera changes, severe occlusions, and edge deployment constraints. Instead of treating viewpoint variation as a nuisance handled only through more data, MoViD explicitly estimates and disentangles view information from motion features to produce view-invariant pose representations.
The framework combines a dedicated view estimator, motion-view orthogonal projection, physics-enhanced contrastive alignment, and a frame-by-frame view-aware inference strategy for edge deployment. Across nine public datasets and two newly collected datasets, MoViD reduces pose estimation error by over 24.2% versus prior state-of-the-art methods, remains robust with 60% less training data, and reaches real-time inference at 15 FPS on NVIDIA edge devices.
This repository is organized as the camera-ready code release. It keeps the maintained training pipeline, offline inference pipeline, stream inference pipeline, Python API wrapper, and edge-side runtime in a single public tree.
MoViD introduces four main components for view-invariant and edge-ready 3D pose estimation:
View Estimator: predicts viewpoint information from intermediate 3D pose features by modeling key joint relationships.Motion-View Orthogonal Projection: explicitly disentangles motion and view features so the motion branch remains stable across camera changes.Physics-Enhanced Contrastive Alignment: improves cross-view consistency with contrastive supervision and motion-aware physical constraints.View-Aware Edge Inference: runs frame-by-frame inference and adaptively enables flip refinement only when the estimated viewpoint requires it.
git clone <your-movid-repo-url> --recursive
cd MoViDbash scripts/setup/install_environment.shIf you also want action recognition support:
bash scripts/setup/install_pyskl.sh
python tools/action/download_stgcn_model.pybash scripts/setup/fetch_demo_data.shThis script downloads the demo video, default checkpoints, and required body-model assets. It will prompt for SMPL / SMPLify credentials when needed.
python demo.py \
--video examples/demo_video.mp4 \
--output_pth output/demo \
--visualizeTo run the helper with action recognition:
bash scripts/demo/run_demo_with_har.sh examples/demo_video.mp4 output/demo_harpython demo.py \
--video examples/demo_video.mp4 \
--mode stream \
--stream_window_size 10 \
--output_pth output/stream \
--visualizepython movid_api.py \
--video examples/demo_video.mp4 \
--output_dir output/api_demo \
--visualizeTrain:
python train.py --cfg configs/yamls/stage2.yamlIf multi-worker dataloaders are restricted:
python train.py --cfg configs/yamls/stage2.yaml NUM_WORKERS 0Evaluate:
bash scripts/eval/run_eval.sh 3dpw checkpoints/movid_vit_w_3dpw.pth.tarOffline edge inference:
python MoViD_edge/demo.py \
--video examples/demo_video.mp4 \
--output_pth output/edge_demo \
--visualizeReal-time / streaming edge inference:
python MoViD_edge/real_time.py \
--video realsense \
--output_pth output/edge_rt \
--visualize \
--max_frames 1000Flip-eval streaming:
python MoViD_edge/real_time.py \
--video realsense \
--output_pth output/edge_rt_flip \
--visualize \
--flip_eval \
--flip_select all \
--max_frames 1000.
|-- configs/
|-- docs/
| |-- assets/
| |-- guides/
| |-- API.md
| |-- CAMERA_READY.md
| |-- DATASET.md
| `-- INSTALL.md
|-- lib/
|-- models/
| `-- action_recognition/
|-- scripts/
| |-- demo/
| |-- eval/
| |-- setup/
| `-- train/
|-- tools/
| |-- action/
| |-- data/
| `-- eval/
|-- MoViD_edge/
|-- third-party/
|-- batch_eval.py
|-- demo.py
|-- movid_api.py
`-- train.py
- Camera-Ready Guide
- Installation
- Dataset Preparation
- Python API
- Action Recognition Guide
- HAR Quick Start
- MoViD Edge Guide
- Large runtime artifacts such as datasets, checkpoints, logs, outputs, videos, and TensorRT engines are intentionally excluded from Git.
- The maintained edge deployment workflow lives under
MoViD_edge/, while the full training and evaluation pipeline remains at the repository root. - Upstream dependencies such as DPVO and ViTPose stay under
third-party/with their original licenses and attribution.
If you use MoViD in your research, please cite:
@inproceedings{liu2026movid,
title={MoViD: View-Invariant 3D Human Pose Estimation via Motion-View Disentanglement},
author={Liu, Yejia and Jiang, Hengle and Liu, Haoxian and Huang, Runxi and Ouyang, Xiaomin},
booktitle={Proceedings of the ACM/IEEE International Conference on Embedded Artificial Intelligence and Sensing Systems},
series={SenSys '26},
year={2026},
doi={10.1145/3774906.3802786}
}