Skip to content

feat: [cosmos3-action] Add example processing script and assets for the egocentric hand action data#246

Open
qianlim wants to merge 3 commits into
NVIDIA:mainfrom
qianlim:action-hand-pose-57d-example
Open

feat: [cosmos3-action] Add example processing script and assets for the egocentric hand action data#246
qianlim wants to merge 3 commits into
NVIDIA:mainfrom
qianlim:action-hand-pose-57d-example

Conversation

@qianlim

@qianlim qianlim commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

Addressing the issue #31 at cosmos-framework repo; making the PR in this repo as this is tutorial/example-like code, better suited here.

  • Add an egocentric hand-action data processing example for Cosmos3 Action.
  • Include a small ESCALE_000374 sample with video, camera, caption, and 3D hand-pose annotations.
  • Add a converter script that produces the raw 57D hand-action layout:
    [camera, right_wrist, right_fingertips, left_wrist, left_fingertips].
  • Document the expected input JSON schema, coordinate conventions, wrist-frame alignment hook, output files, and optional model-space normalization.

Tests

  • Ran the converter from the cosmos repo root using public cosmos-framework imports:
    PYTHONDONTWRITEBYTECODE=1 PYTHONPATH=/home/qianlim/projects/cosmos-framework \
      python cookbooks/cosmos3/generator/action/finetune/data_processing_for_egocentric_hand_action.py \
      --output-dir /tmp/cosmos_pr_readme_verify_final
  • Verified output shape: (121, 57).
  • Verified roundtrip fingertip reconstruction error: right max/mean: 4.217e-05 / 1.973e-05 m; left max/mean: 3.102e-05 / 1.537e-05 m.
  • Ran py_compile on the new script.
  • Generated a local visualization overlay from the decoded action for manual sanity check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant