diff --git a/cookbooks/cosmos3/generator/action/README.md b/cookbooks/cosmos3/generator/action/README.md index 73760e9d..49169f55 100644 --- a/cookbooks/cosmos3/generator/action/README.md +++ b/cookbooks/cosmos3/generator/action/README.md @@ -88,7 +88,6 @@ visualize the generated videos: inverse dynamics, predicting ego-motion trajectories from input AV videos using Cosmos3-Nano. - [`run_policy_with_cosmos_framework.md`](./run_policy_with_cosmos_framework.md) - policy, predicting future observations and action trajectories for DROID robot using Cosmos3-Nano-Policy-DROID. - ## Run with vLLM-Omni ### Quickstart @@ -135,7 +134,9 @@ To reproduce our post-training recipe for [Cosmos3-Nano-Policy-DROID](https://hu launch-script pattern as the other Cosmos3 finetune cookbooks while delegating the canonical training implementation to Cosmos Framework. - +The same [action-policy SFT cookbook](./finetune/README.md) also covers **LIBERO-10** +(`launch_sft_action_policy_libero.sh`) — fine-tuning Cosmos3-Nano on the `libero_10` +simulation benchmark with the same launch-script pattern. ## TODO diff --git a/cookbooks/cosmos3/generator/action/finetune/README.md b/cookbooks/cosmos3/generator/action/finetune/README.md index 52436be6..4f5f2bd0 100644 --- a/cookbooks/cosmos3/generator/action/finetune/README.md +++ b/cookbooks/cosmos3/generator/action/finetune/README.md @@ -1,12 +1,18 @@ -# Cosmos3-Nano-Policy-DROID Fine-Tuning (SFT) +# Cosmos3-Nano Action-Policy Fine-Tuning (SFT) -This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano) into an action policy for the DROID robot. It reproduces the post-training recipe used to create [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID), leveraging the public [Cosmos3-DROID](https://huggingface.co/datasets/nvidia/Cosmos3-DROID) dataset and the action-policy recipe from [cosmos-framework](https://github.com/NVIDIA/cosmos-framework). +This example demonstrates supervised fine-tuning (SFT) of [Cosmos3-Nano](https://huggingface.co/nvidia/Cosmos3-Nano) into a robot action policy, using the action-policy recipe from [cosmos-framework](https://github.com/NVIDIA/cosmos-framework). Two embodiments are covered, each reproducing a Cosmos3 paper result: + +- **DROID** — reproduces [Cosmos3-Nano-Policy-DROID](https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID): trained on real-robot DROID data, evaluated on the RoboLab simulation benchmark. +- **LIBERO-10** — reproduces the Cosmos3 paper's LIBERO-10 results: trained and evaluated on the LIBERO-10 simulation benchmark. | Recipe | Launch shell | Base model | Dataset | | --- | --- | --- | --- | | Policy-DROID SFT | `launch_sft_action_policy_droid.sh` | Cosmos3-Nano | [Cosmos3-DROID](https://huggingface.co/datasets/nvidia/Cosmos3-DROID) success split | +| Policy-LIBERO-10 SFT | `launch_sft_action_policy_libero.sh` | Cosmos3-Nano | [LIBERO_LeRobot_v3](https://huggingface.co/datasets/nvidia/LIBERO_LeRobot_v3) `libero_10` | + +The DROID recipe uses the registered `action_policy_droid_nano` experiment: `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter. -The recipe uses `[job].task = "vfm"` with the registered `action_policy_droid_nano` experiment. It trains a DROID policy model with `joint_pos` 8-D actions, proprioceptive state, `concat_view` 480p video, chunk length 32, episode-shuffle streaming, and the optional `keep_ranges_1_0_1.json` window filter. +The LIBERO-10 recipe uses the registered `action_policy_libero_nano` experiment: `frame_wise_relative` rot6d 10-D actions, `quantile_rot` normalization, `concat_view` (third-person + wrist) at 20 fps, lr 5e-5 / warmup 500 / cycle 16000, global batch 2048 (HSDP 2x8). Train on `libero_10` **alone**. ## Prerequisites @@ -38,7 +44,7 @@ The launcher is a complete local wrapper for the public cookbook: - downloads `Wan2.2_VAE.pth` if needed - converts `Cosmos3-Nano` to a local DCP checkpoint if needed - downloads `keep_ranges_1_0_1.json` if needed -- launches 8-GPU training with `action_policy_droid_repro.toml` +- launches training with `action_policy_droid_repro.toml` The script intentionally stays close to the `cosmos-framework` example launcher: `DATASET_PATH` is bridged to `DROID_ROOT`, `BASE_CHECKPOINT_PATH` and `WAN_VAE_PATH` are exported for the TOML, @@ -63,6 +69,31 @@ export EXTRA_TAIL_OVERRIDES="job.wandb_mode=disabled trainer.max_iter=10 checkpo bash launch_sft_action_policy_droid.sh ``` +## LIBERO-10 quick start + +The LIBERO launcher stages the `libero_10` suite (auto-downloaded if missing), +downloads the Wan VAE, converts the base checkpoint, and trains. + +```shell +bash launch_sft_action_policy_libero.sh +``` + +The launcher: + +- downloads `nvidia/LIBERO_LeRobot_v3` `libero_10` to `data/LIBERO_LeRobot_v3/libero_10` if missing +- downloads `Wan2.2_VAE.pth` and converts `Cosmos3-Nano` to a local DCP checkpoint if needed +- launches training with the LIBERO action-policy TOML (`action_policy_libero_repro.toml`) + +Relocate inputs via env vars, or run a short smoke test: + +```shell +export LIBERO_ROOT=/scratch/LIBERO_LeRobot_v3/libero_10 +export EXTRA_TAIL_OVERRIDES="job.wandb_mode=disabled trainer.max_iter=10 checkpoint.save_iter=10 dataloader_train.max_samples_per_batch=32" +bash launch_sft_action_policy_libero.sh +``` + +Checkpoints are saved every 500 iters. + ## Outputs Training writes to `outputs/train////`: diff --git a/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh b/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh new file mode 100755 index 00000000..48f886e8 --- /dev/null +++ b/cookbooks/cosmos3/generator/action/finetune/launch_sft_action_policy_libero.sh @@ -0,0 +1,76 @@ +#!/usr/bin/env bash +# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: OpenMDW-1.1 + +# Complete recipe: LIBERO-10 action-policy SFT on Cosmos3-Nano (HSDP 2x8). +# Run from this folder with the cosmos-framework venv active (see README): +# bash launch_sft_action_policy_libero.sh +# It prepares the small dependencies, checks for the staged libero_10 dataset, and trains. +# Paths are fixed under this (git-ignored) folder, matching the reasoner finetune +# wrappers, while the TOML and tail-overrides match the cosmos-framework example. + +set -euo pipefail +cd "$(dirname "${BASH_SOURCE[0]}")" + +TOML_FILE="toml/sft_config/action_policy_libero_repro.toml" +: "${LIBERO_ROOT:=$PWD/data/LIBERO_LeRobot_v3/libero_10}" +: "${BASE_CHECKPOINT_PATH:=$PWD/checkpoints/Cosmos3-Nano}" +: "${WAN_VAE_PATH:=$PWD/checkpoints/wan22_vae/Wan2.2_VAE.pth}" + +# 1. Stage the libero_10 suite (the Table-20 reproduction trains on libero_10 ALONE). +if [[ ! -f "$LIBERO_ROOT/meta/info.json" ]]; then + echo "Downloading nvidia/LIBERO_LeRobot_v3 (libero_10) ..." + uvx hf@latest download --repo-type dataset nvidia/LIBERO_LeRobot_v3 \ + --include 'libero_10/**' --local-dir "$(dirname "$LIBERO_ROOT")" +fi +if [[ ! -f "$LIBERO_ROOT/meta/info.json" ]]; then + cat >&2 <