Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,8 +225,8 @@ Quick reference:
python train_mimic/scripts/data/build_dataset.py --spec train_mimic/configs/datasets/twist2.yaml
python scripts/run/record_pico_motion.py
python train_mimic/scripts/data/build_dataset.py --spec data/pico_motion/pico_recorded.yaml --force
python train_mimic/scripts/data/precompute_dataset.py data/datasets/seed --outdir data/datasets/seed_precomputed --jobs 8
python train_mimic/scripts/train.py --motion_file data/datasets/seed_precomputed
python train_mimic/scripts/data/precompute_dataset.py data/datasets --outdir data/datasets_precomputed --jobs 8
python train_mimic/scripts/train.py --motion_file data/datasets_precomputed
python train_mimic/scripts/data/precompute_dataset.py data/datasets/twist2 --outdir data/datasets/twist2_precomputed --jobs 8 --force
python train_mimic/scripts/save_onnx.py --checkpoint logs/rsl_rl/g1_general_tracking/<run>/model_30000.pt --output policy.onnx --history_length 10
```
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/getting-started/download-assets.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Downloaded file sizes change as checkpoints, datasets, and asset bundles are upd
|------------|---------|
| `track.onnx` | ONNX inference model |
| `track.pt` | PyTorch checkpoint for resume training |
| `data/datasets/seed/shard_*.h5` | Minimal motion dataset; run precompute before training |
| `data/datasets/<dataset>/shard_*.h5` | Minimal motion datasets; run precompute before training |
| `data/sample_bvh/*.bvh` | Sample motion files |
| `assets/robots/unitree_g1/` | Canonical G1 XML and meshes used by training, sim2sim, retargeting, and FK validation |
| `teleopit/retargeting/gmr/assets/` | GMR retargeting assets, IK configs, and non-canonical robot descriptions |
Expand All @@ -44,6 +44,6 @@ Downloaded file sizes change as checkpoints, datasets, and asset bundles are upd
| `robots` | `BingqianWu/Teleopit-models` | Canonical robot XML/meshes |
| `gmr` | `BingqianWu/Teleopit-models` | GMR retargeting assets |
| `bvh` | `BingqianWu/Teleopit-models` | Sample BVH motion files |
| `data` | `BingqianWu/Teleopit-datasets` | Training/validation shards |
| `data` | `BingqianWu/Teleopit-datasets` | Minimal shards for `lafan1`, `pico_record`, `seed`, and `twist2` |

For asset management details (uploading, versioning), see [Asset Management](../reference/assets).
4 changes: 2 additions & 2 deletions docs/docs/reference/assets.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Datasets, checkpoints, robot models, and demo media are not tracked in Git. They
| `robots` | Teleopit-models | `archives/robot_assets.tar.gz` |
| `gmr` | Teleopit-models | `archives/gmr_assets.tar.gz` |
| `bvh` | Teleopit-models | `archives/sample_bvh.tar.gz` |
| `data` | Teleopit-datasets | `data/` |
| `data` | Teleopit-datasets | `data/datasets/*/*.h5` (`lafan1`, `pico_record`, `seed`, `twist2`) |

## Download

Expand Down Expand Up @@ -66,7 +66,7 @@ Local paths after download:
| `archives/robot_assets.tar.gz` | `assets/robots/` (extracted) |
| `archives/gmr_assets.tar.gz` | `teleopit/retargeting/gmr/assets/` (extracted) |
| `archives/sample_bvh.tar.gz` | `data/sample_bvh/` (extracted) |
| `data/` | `data/datasets/seed/` |
| `data/datasets/*/*.h5` | `data/datasets/` |

## Upload to ModelScope

Expand Down
15 changes: 8 additions & 7 deletions docs/docs/reference/dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,13 @@ sidebar_position: 3
python scripts/setup/download_assets.py --only robots data
```

Then precompute the training shard and train with the precomputed dataset root:
Then precompute all downloaded datasets and train with the combined precomputed
dataset root:

```bash
python train_mimic/scripts/data/precompute_dataset.py \
data/datasets/seed --outdir data/datasets/seed_precomputed --jobs 8
python train_mimic/scripts/train.py --motion_file data/datasets/seed_precomputed
data/datasets --outdir data/datasets_precomputed --jobs 8
python train_mimic/scripts/train.py --motion_file data/datasets_precomputed
```

For custom dataset construction, read on.
Expand Down Expand Up @@ -63,15 +64,15 @@ python train_mimic/scripts/data/build_dataset.py \
data/datasets/<dataset>/
└── shard_*.h5

data/datasets/<dataset>_precomputed/
data/datasets_precomputed/<dataset>/
└── shard_*.h5
```

- If the spec contains `bvh` or `npz` sources, the full dataset builder uses a temporary `clips/` directory during conversion and deletes it after shards are written. Rebuilds do not reuse converted clips.
- If the spec is all `pkl` or `seed_csv` sources, the builder takes a batch path producing shards directly
- `build_dataset.py` only writes the minimal distributable dataset. It does not run FK precompute.
- `precompute_dataset.py` writes a separate training dataset containing the minimal motion plus precomputed joint velocities and body FK/velocities.
- Training accepts only the precomputed dataset directory. It recursively discovers precomputed `*.h5` shards below the specified root, so precomputed datasets can be merged by placing multiple shard directories under one parent.
- Training accepts only the precomputed dataset directory. It recursively discovers precomputed `*.h5` shards below the specified root, so use `data/datasets_precomputed` to train on all downloaded datasets together.
- Training loads all discovered precomputed motion windows into memory at startup. Joint velocities and body FK/velocities are not computed during training.

## YAML Spec Format
Expand Down Expand Up @@ -143,9 +144,9 @@ python train_mimic/scripts/data/build_dataset.py \
python train_mimic/scripts/data/build_dataset.py \
--spec train_mimic/configs/datasets/twist2.yaml --json

# Generate a precomputed training dataset from an existing minimal dataset
# Generate one combined precomputed training dataset from all downloaded minimal datasets
python train_mimic/scripts/data/precompute_dataset.py \
data/datasets/twist2 --outdir data/datasets/twist2_precomputed --jobs 8 --force
data/datasets --outdir data/datasets_precomputed --jobs 8 --force

# Inspect a dataset root
python train_mimic/scripts/data/inspect_dataset.py data/datasets/twist2
Expand Down
19 changes: 10 additions & 9 deletions docs/docs/tutorials/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,13 @@ Verify:
python -c "import train_mimic.tasks; print('training OK')"
```

Download the minimal seed dataset and generate the precomputed training shard:
Download the distributed minimal datasets and generate the combined precomputed
training dataset:

```bash
python scripts/setup/download_assets.py --only robots data
python train_mimic/scripts/data/precompute_dataset.py \
data/datasets/seed --outdir data/datasets/seed_precomputed --jobs 8
data/datasets --outdir data/datasets_precomputed --jobs 8
```

## Training
Expand All @@ -39,7 +40,7 @@ python train_mimic/scripts/data/precompute_dataset.py \
python train_mimic/scripts/train.py \
--num_envs 64 \
--max_iterations 100 \
--motion_file data/datasets/seed_precomputed
--motion_file data/datasets_precomputed
```

### Full Training
Expand All @@ -48,7 +49,7 @@ python train_mimic/scripts/train.py \
python train_mimic/scripts/train.py \
--num_envs 4096 \
--max_iterations 30000 \
--motion_file data/datasets/seed_precomputed
--motion_file data/datasets_precomputed
```

### Multi-GPU
Expand All @@ -58,7 +59,7 @@ python train_mimic/scripts/train.py \
--gpu_ids 0 1 2 3 \
--num_envs 1024 \
--max_iterations 30000 \
--motion_file data/datasets/seed_precomputed
--motion_file data/datasets_precomputed
```

### Multi-Node Multi-GPU
Expand All @@ -75,7 +76,7 @@ torchrun \
train_mimic/scripts/train.py \
--num_envs 1024 \
--max_iterations 1000 \
--motion_file data/datasets/seed_precomputed
--motion_file data/datasets_precomputed
```

**Notes:**
Expand Down Expand Up @@ -105,15 +106,15 @@ The exported model is a dual-input ONNX (`obs` + `obs_history`). The inference s
```bash
python train_mimic/scripts/play.py \
--checkpoint logs/rsl_rl/g1_general_tracking/<run>/model_30000.pt \
--motion_file data/datasets/seed_precomputed
--motion_file data/datasets_precomputed
```

### Benchmark

```bash
python train_mimic/scripts/benchmark.py \
--checkpoint logs/rsl_rl/g1_general_tracking/<run>/model_30000.pt \
--motion_file data/datasets/seed_precomputed \
--motion_file data/datasets_precomputed \
--num_envs 1
```

Expand All @@ -122,7 +123,7 @@ python train_mimic/scripts/benchmark.py \
```bash
python train_mimic/scripts/benchmark.py \
--checkpoint logs/rsl_rl/g1_general_tracking/<run>/model_30000.pt \
--motion_file data/datasets/seed_precomputed \
--motion_file data/datasets_precomputed \
--num_envs 1 \
--video \
--video_length 600
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ checkpoint、数据集和资源包更新后,下载文件大小会变化。下
|----------|------|
| `track.onnx` | ONNX 推理模型 |
| `track.pt` | 用于恢复训练的 PyTorch checkpoint |
| `data/datasets/seed/shard_*.h5` | 最小运动数据集;训练前需先预计算 |
| `data/datasets/<dataset>/shard_*.h5` | 最小运动数据集;训练前需先预计算 |
| `data/sample_bvh/*.bvh` | 示例动捕文件 |
| `assets/robots/unitree_g1/` | 训练、sim2sim、重定向和 FK 校验共用的 G1 canonical XML 与 mesh |
| `teleopit/retargeting/gmr/assets/` | GMR 重定向资源、IK 配置和非 canonical 机器人描述 |
Expand All @@ -44,6 +44,6 @@ checkpoint、数据集和资源包更新后,下载文件大小会变化。下
| `robots` | `BingqianWu/Teleopit-models` | Canonical 机器人 XML/mesh |
| `gmr` | `BingqianWu/Teleopit-models` | GMR 重定向资源 |
| `bvh` | `BingqianWu/Teleopit-models` | 示例 BVH 动捕文件 |
| `data` | `BingqianWu/Teleopit-datasets` | 训练 / 验证数据分片 |
| `data` | `BingqianWu/Teleopit-datasets` | `lafan1`、`pico_record`、`seed`、`twist2` 的最小 shard |

资源管理的更多细节(上传、版本控制等)请参阅 [资源管理](../reference/assets)。
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ sidebar_position: 2
| `robots` | Teleopit-models | `archives/robot_assets.tar.gz` |
| `gmr` | Teleopit-models | `archives/gmr_assets.tar.gz` |
| `bvh` | Teleopit-models | `archives/sample_bvh.tar.gz` |
| `data` | Teleopit-datasets | `data/` |
| `data` | Teleopit-datasets | `data/datasets/*/*.h5`(`lafan1`、`pico_record`、`seed`、`twist2`) |

## 下载

Expand Down Expand Up @@ -66,7 +66,7 @@ python scripts/setup/download_assets.py --source huggingface
| `archives/robot_assets.tar.gz` | `assets/robots/`(自动解压) |
| `archives/gmr_assets.tar.gz` | `teleopit/retargeting/gmr/assets/`(自动解压) |
| `archives/sample_bvh.tar.gz` | `data/sample_bvh/`(自动解压) |
| `data/` | `data/datasets/seed/` |
| `data/datasets/*/*.h5` | `data/datasets/` |

## 上传到 ModelScope

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ sidebar_position: 3
python scripts/setup/download_assets.py --only robots data
```

下载后先生成预计算训练 shard,再把预计算数据集根目录用于训练
下载后先预计算所有已下载数据集,再把合并后的预计算数据集根目录用于训练

```bash
python train_mimic/scripts/data/precompute_dataset.py \
data/datasets/seed --outdir data/datasets/seed_precomputed --jobs 8
python train_mimic/scripts/train.py --motion_file data/datasets/seed_precomputed
data/datasets --outdir data/datasets_precomputed --jobs 8
python train_mimic/scripts/train.py --motion_file data/datasets_precomputed
```

如需自定义构建,继续阅读下文。
Expand Down Expand Up @@ -61,15 +61,15 @@ python train_mimic/scripts/data/build_dataset.py \
data/datasets/<dataset>/
└── shard_*.h5

data/datasets/<dataset>_precomputed/
data/datasets_precomputed/<dataset>/
└── shard_*.h5
```

- 若 spec 包含 `bvh` 或 `npz` source,完整 dataset builder 会在转换期间使用临时 `clips/` 目录,并在 shard 写入完成后删除。重新 build 不会复用已转换 clips。
- 若 spec 全部是 `pkl` 或 `seed_csv` source,builder 会直接并行产出 shard,默认不写中间 clip 文件
- `build_dataset.py` 只写最小分发数据集,不执行 FK 预计算。
- `precompute_dataset.py` 会写出独立的训练数据集,里面包含最小运动数据以及预计算的 joint velocity 和 body FK/velocity。
- 训练只接受预计算后的数据集目录。它会递归发现指定根目录下的预计算 `*.h5` shard,因此可以把多个预计算数据集目录放到同一个父目录下完成合并
- 训练只接受预计算后的数据集目录。它会递归发现指定根目录下的预计算 `*.h5` shard,因此使用 `data/datasets_precomputed` 可以一起训练所有已下载数据集
- 训练会在启动时把所有发现的预计算 motion window 全量加载到内存中。joint velocity 和 body FK/velocity 不会在训练时计算。

## YAML spec
Expand Down Expand Up @@ -139,9 +139,9 @@ python train_mimic/scripts/data/build_dataset.py \
python train_mimic/scripts/data/build_dataset.py \
--spec train_mimic/configs/datasets/twist2.yaml --json

# 从已有最小数据集生成预计算训练数据集
# 从所有已下载最小数据集生成合并后的预计算训练数据集
python train_mimic/scripts/data/precompute_dataset.py \
data/datasets/twist2 --outdir data/datasets/twist2_precomputed --jobs 8 --force
data/datasets --outdir data/datasets_precomputed --jobs 8 --force

# 查看数据集统计
python train_mimic/scripts/data/inspect_dataset.py data/datasets/twist2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,12 @@ pip install -e '.[train]'
python -c "import train_mimic.tasks; print('training OK')"
```

下载最小 seed 数据集,并生成预计算训练 shard
下载分发的最小数据集,并生成合并后的预计算训练数据集

```bash
python scripts/setup/download_assets.py --only robots data
python train_mimic/scripts/data/precompute_dataset.py \
data/datasets/seed --outdir data/datasets/seed_precomputed --jobs 8
data/datasets --outdir data/datasets_precomputed --jobs 8
```

## 训练
Expand All @@ -39,7 +39,7 @@ python train_mimic/scripts/data/precompute_dataset.py \
python train_mimic/scripts/train.py \
--num_envs 64 \
--max_iterations 100 \
--motion_file data/datasets/seed_precomputed
--motion_file data/datasets_precomputed
```

### 完整训练
Expand All @@ -48,7 +48,7 @@ python train_mimic/scripts/train.py \
python train_mimic/scripts/train.py \
--num_envs 4096 \
--max_iterations 30000 \
--motion_file data/datasets/seed_precomputed
--motion_file data/datasets_precomputed
```

### 多卡训练
Expand All @@ -58,7 +58,7 @@ python train_mimic/scripts/train.py \
--gpu_ids 0 1 2 3 \
--num_envs 1024 \
--max_iterations 30000 \
--motion_file data/datasets/seed_precomputed
--motion_file data/datasets_precomputed
```

### 多机多卡训练
Expand All @@ -75,7 +75,7 @@ torchrun \
train_mimic/scripts/train.py \
--num_envs 1024 \
--max_iterations 1000 \
--motion_file data/datasets/seed_precomputed
--motion_file data/datasets_precomputed
```

**注意事项:**
Expand Down Expand Up @@ -105,15 +105,15 @@ python train_mimic/scripts/save_onnx.py \
```bash
python train_mimic/scripts/play.py \
--checkpoint logs/rsl_rl/g1_general_tracking/<run>/model_30000.pt \
--motion_file data/datasets/seed_precomputed
--motion_file data/datasets_precomputed
```

### 定量评估

```bash
python train_mimic/scripts/benchmark.py \
--checkpoint logs/rsl_rl/g1_general_tracking/<run>/model_30000.pt \
--motion_file data/datasets/seed_precomputed \
--motion_file data/datasets_precomputed \
--num_envs 1
```

Expand All @@ -122,7 +122,7 @@ python train_mimic/scripts/benchmark.py \
```bash
python train_mimic/scripts/benchmark.py \
--checkpoint logs/rsl_rl/g1_general_tracking/<run>/model_30000.pt \
--motion_file data/datasets/seed_precomputed \
--motion_file data/datasets_precomputed \
--num_envs 1 \
--video \
--video_length 600
Expand Down
17 changes: 15 additions & 2 deletions scripts/setup/download_assets.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,17 @@ def _resolve_entry_source(repo_cache: Path, entry: AssetEntry) -> Path:
return repo_cache / entry.remote_path


def _entry_allow_patterns(entry: AssetEntry) -> list[str]:
if entry.allow_patterns:
return list(entry.allow_patterns)
return [f"{entry.remote_path}*"]


def _clear_cached_entry_sources(repo_cache: Path, entries: list[AssetEntry]) -> None:
for entry in entries:
_remove_path(_resolve_entry_source(repo_cache, entry))


def _copy_path(src: Path, dst: Path) -> None:
dst.parent.mkdir(parents=True, exist_ok=True)
if src.is_dir():
Expand Down Expand Up @@ -113,8 +124,9 @@ def download_all(groups, cache_dir):
if not repo_entries:
continue
repo_type = repo_type_map[repo_id]
allow_patterns = [f"{e.remote_path}*" for e in repo_entries]
allow_patterns = [pattern for entry in repo_entries for pattern in _entry_allow_patterns(entry)]
repo_cache = cache_dir / repo_type / repo_id.split("/")[-1]
_clear_cached_entry_sources(repo_cache, repo_entries)

print(f"\nDownloading {repo_id} ({repo_type}) to {repo_cache} ...")
print(f"Fetching: {[e.remote_path for e in repo_entries]}")
Expand Down Expand Up @@ -155,8 +167,9 @@ def download_all_hf(groups, cache_dir):
if not repo_entries:
continue
repo_type = repo_type_map[repo_id]
allow_patterns = [f"{e.remote_path}*" for e in repo_entries]
allow_patterns = [pattern for entry in repo_entries for pattern in _entry_allow_patterns(entry)]
repo_cache = cache_dir / repo_type / repo_id.split("/")[-1]
_clear_cached_entry_sources(repo_cache, repo_entries)

print(f"\nDownloading {repo_id} ({repo_type}) from HuggingFace to {repo_cache} ...")
print(f"Fetching: {[e.remote_path for e in repo_entries]}")
Expand Down
8 changes: 7 additions & 1 deletion teleopit/runtime/external_assets.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ class AssetEntry:
local_path: str
repo: str = "model" # "model" or "dataset"
mode: str = "copy"
allow_patterns: tuple[str, ...] = field(default_factory=tuple)


ASSET_GROUPS: dict[str, list[AssetEntry]] = {
Expand Down Expand Up @@ -48,6 +49,11 @@ class AssetEntry:
),
],
"data": [
AssetEntry("data", "data/datasets/seed", repo="dataset"),
AssetEntry(
"data/datasets",
"data/datasets",
repo="dataset",
allow_patterns=("data/datasets/*/*.h5",),
),
],
}
Loading
Loading