We evaluated GraspMAS on the OCID-VLG dataset and observed performance issues regarding object grounding and grasp generation.
1. Grounding Failure (GroundingDINO)
The GroundingDINO model frequently fails to ground the target object when allowing up to 5 rounds.
- Red Rectangle: Model Prediction
- Green Rectangle: Ground Truth
2. Overly Large Grasp Rectangles
The generated grasp rectangles are often significantly larger than the target object and do not align with the object’s geometry.
- Red Rectangle: Model Prediction
- Green Rectangle: Ground Truth
Overall Performance
When running the full pipeline on OCID-VLG, we observed an overall success rate of ~17%, which is lower than expected.
Reproducibility / Verification Request
We have compiled five specific failure cases including their respective prompts. We would appreciate it if the maintainers (or other users) could run these cases and confirm whether the same grounding and grasping behavior is observed.
We would appreciate any insight into whether these results are expected under the current implementation or if additional configuration is required.
Thank you for your assistance.
We evaluated GraspMAS on the OCID-VLG dataset and observed performance issues regarding object grounding and grasp generation.
1. Grounding Failure (GroundingDINO)
The GroundingDINO model frequently fails to ground the target object when allowing up to 5 rounds.
2. Overly Large Grasp Rectangles
The generated grasp rectangles are often significantly larger than the target object and do not align with the object’s geometry.
Overall Performance
When running the full pipeline on OCID-VLG, we observed an overall success rate of ~17%, which is lower than expected.
Reproducibility / Verification Request
We have compiled five specific failure cases including their respective prompts. We would appreciate it if the maintainers (or other users) could run these cases and confirm whether the same grounding and grasping behavior is observed.
We would appreciate any insight into whether these results are expected under the current implementation or if additional configuration is required.
Thank you for your assistance.