Hello Dr. Lepert,
I read your Phantom paper and found the robot-agnostic data-editing idea very interesting. I had one technical question about the Kinova Gen3 visualization and action conversion.
In the robot-agnostic example, the human hand is replaced by different robot arms, including Kinova Gen3, and the rendered gripper appears to accurately contact or hold the rope. Could you please explain how this alignment is achieved in practice?
Specifically, I am curious about:
How do you map the human thumb–index fingertip midpoint and orientation to the Kinova Gen3 gripper frame?
Do you use any fixed hand-to-gripper offset or calibration between the human pinch pose and the robot gripper TCP?
How do you make sure the rendered gripper is actually aligned with the rope/object in 3D, not only visually aligned in the 2D image?
Is the contact point corrected using depth/object geometry, or is it fully determined from the estimated hand pose after RGB-D/ICP refinement?
For Kinova Gen3, did you use IK/reachability checks before rendering the robot pose, and how sensitive is the method to camera calibration or hand-pose estimation error?
Hello Dr. Lepert,
I read your Phantom paper and found the robot-agnostic data-editing idea very interesting. I had one technical question about the Kinova Gen3 visualization and action conversion.
In the robot-agnostic example, the human hand is replaced by different robot arms, including Kinova Gen3, and the rendered gripper appears to accurately contact or hold the rope. Could you please explain how this alignment is achieved in practice?
Specifically, I am curious about:
How do you map the human thumb–index fingertip midpoint and orientation to the Kinova Gen3 gripper frame?
Do you use any fixed hand-to-gripper offset or calibration between the human pinch pose and the robot gripper TCP?
How do you make sure the rendered gripper is actually aligned with the rope/object in 3D, not only visually aligned in the 2D image?
Is the contact point corrected using depth/object geometry, or is it fully determined from the estimated hand pose after RGB-D/ICP refinement?
For Kinova Gen3, did you use IK/reachability checks before rendering the robot pose, and how sensitive is the method to camera calibration or hand-pose estimation error?