AdaClearGrasp: Learning Adaptive Clearing for Zero-Shot Robust Dexterous Grasping in Densely Cluttered Environments
AdaClearGrasp is a hierarchical robot manipulation framework designed to solve robust dexterous grasping tasks in cluttered environments. It leverages the high-level semantic reasoning capabilities of Vision-Language Models (VLMs) combined with a library of robust low-level atomic skills to perform adaptive obstacle clearing and target grasping.
The core philosophy is "High-Level Semantic Planning → Atomic Skill Scheduling → Low-Level Motion Control". By introducing the Model Context Protocol (MCP), the robot's physical capabilities are encapsulated as standardized tools, allowing VLMs to control the robot directly through function calls.
- VLM-Guided Planning: Utilizes advanced VLMs (e.g., Qwen-VL, GPT-4o) to perceive scene geometry and generate multi-step manipulation plans (e.g., "Push the can to the left, then grasp the apple").
- MCP Architecture: Implements a Model Context Protocol server (
exec/mcp_server.py) that exposes robotic skills as standardized tool interfaces. - Hybrid Control Strategy: Combines deterministic Inverse Kinematics (IK) for precise reaching and PPO-based Reinforcement Learning policies for robust grasping.
- Atomic Skill Library: Provides a suite of robust motion primitives, including
move_to,push,pull,lift,lower, andgrasp. - ManiSkill Integration: Features a high-fidelity
PickClutterYCB-XArm7-v1environment built on ManiSkill and Sapien.
- Linux (Ubuntu 20.04/22.04 recommended)
- Anaconda or Miniconda
- NVIDIA GPU (for simulation rendering and VLM inference)
-
Clone the Repository
git clone https://github.com/NJU-R-L-Group-Embodied-Lab/AdaClearGrasp.git cd AdaClearGrasp -
Create Conda Environment Create the environment using the provided
environment.yml:conda env create -f environment.yml conda activate clear
-
Download Assets AdaClearGrasp relies on ManiSkill's YCB dataset:
python -m mani_skill.utils.download_asset "PickSingleYCB-v1"
Before running, you need to configure runtime parameters (e.g., VLM API Key).
-
Create Configuration File Copy the template file:
cp configs/runtime_config_template.yaml configs/runtime_config.yaml
-
Edit Configuration Open
configs/runtime_config.yamland fill in your API information. The default supports OpenAI-compatible interfaces (e.g., SiliconFlow):openai: api_key: "sk-xxxxxxxxxxxxxxxx" base_url: "https://api.siliconflow.cn/v1" # Or other providers model: "Qwen/Qwen3-VL-32B-Instruct" # Or gpt-4o
All commands should be run from the project root directory.
Before running complex VLM planning, it is recommended to test if the low-level atomic skills are working correctly. This script executes a sequence of predefined actions (reset, move, push/pull, grasp).
python scripts/test_skills.pyNote: If running on a headless server, set RUN_MODE = "video" in the script to save screen recordings.
Launch the autonomous agent to let the VLM observe the scene and make decisions.
python -m plan.vlm_plan \
--scene_name apple \
--clutter_count 4 \
--scene_id 1 \
--max_plan_steps 40Parameters:
--scene_name: Target object category (e.g.,apple,can,mug,ball,cube,lego,pear).--clutter_count: Number of clutter objects in the scene (e.g.,2,4,6).--scene_id: Specific scene ID (corresponding to config files indata/scenes/).
Workflow:
- Initialize the simulation environment.
- Render the RGB image of the current view.
- Send the image and task description to the VLM.
- VLM returns action instructions in JSON format (e.g.,
{"action": "push", "args": {"side": "left", "dist_m": 0.1}}). - MCP Runtime parses and executes the action.
- Repeat until the task is completed or maximum steps are reached.
You can generate new random scenes with different clutter configurations using the provided script. This is useful for creating diverse test cases.
python scripts/gen_safe_random_scenes.py- Modify
CLUTTER_COUNTSin the script to generate scenes with different numbers of objects. - Generated scenes will be saved in
data/scenes/<category>/<count>/.
To reproduce the full experimental results or run large-scale evaluations:
Use the sweep script to run the VLM planner across multiple object categories and clutter levels. This script sequentially triggers parallel runs for each configuration.
python scripts/run_vlm_plan_sweep.py- Customize: Edit the
RUNSlist inscripts/run_vlm_plan_sweep.pyto select specific objects or clutter counts. - Parallelism: The script uses
scripts/run_parallel_vlm_plan.pyinternally to execute multiple scene IDs (defaults to 10 scenes) in parallel. You can adjustMAX_CONCURRENCYin that file.
After the experiments complete, use the summary script to aggregate metrics (success rate, steps taken) from the logs.
python scripts/summarize_vlm_plan_steps.py- This will parse logs from
data/logs/vlm_plan/and generate a CSV report atdata/analysis/vlm_plan_tasks_summary.csv.
- IK Failures: If the robot fails to reach a target or grasp pose, it might be due to workspace limits or collisions. Try adjusting the target position or approach angle.
- Video Recording: To save videos of the execution, set
RUN_MODE = "video"in the relevant script (e.g.,scripts/test_skills.pyorplan/vlm_plan.py). Videos will be saved todata/videos/. - Model Loading: Ensure that the pre-trained PPO model (
data/models/ppo/PickClutterYCB-XArm7-v1/ppo_grasp.zip) is present. This model is critical for the grasping phase.
AdaClearGrasp/
├── configs/ # Configuration files
│ ├── runtime_config.yaml # [User Created] Runtime config (API Key)
│ └── runtime_config_template.yaml # Config template
├── core/ # Core algorithms
│ ├── kinematics/ # Robot Inverse Kinematics (IK)
│ └── perception/ # Point cloud processing & geometry
├── env/ # Environment layer
│ └── sim/ # ManiSkill simulation (PickClutterYCB)
├── exec/ # Execution layer (MCP Server & Skills)
│ ├── skills/ # Atomic skills implementation (Clear, Grasp, Move)
│ └── mcp_server.py # MCP Server: Exposing skills to VLM
├── plan/ # Planning layer (VLM Agent)
│ ├── vlm_plan.py # Main planning loop
│ ├── mcp_runtime.py # MCP client runtime
│ └── prompts.py # VLM system prompts
├── scripts/ # Tool scripts
│ └── test_skills.py # Skill testing script
├── data/ # Data & Models
│ ├── models/ppo/ # Pre-trained Grasp RL policy
│ └── scenes/ # Scene definition JSON files
└── README.md
This project is licensed under the MIT License.