AdaClearGrasp: Learning Adaptive Clearing for Zero-Shot Robust Dexterous Grasping in Densely Cluttered Environments

AdaClearGrasp is a hierarchical robot manipulation framework designed to solve robust dexterous grasping tasks in cluttered environments. It leverages the high-level semantic reasoning capabilities of Vision-Language Models (VLMs) combined with a library of robust low-level atomic skills to perform adaptive obstacle clearing and target grasping.

The core philosophy is "High-Level Semantic Planning → Atomic Skill Scheduling → Low-Level Motion Control". By introducing the Model Context Protocol (MCP), the robot's physical capabilities are encapsulated as standardized tools, allowing VLMs to control the robot directly through function calls.

✨ Key Features

VLM-Guided Planning: Utilizes advanced VLMs (e.g., Qwen-VL, GPT-4o) to perceive scene geometry and generate multi-step manipulation plans (e.g., "Push the can to the left, then grasp the apple").
MCP Architecture: Implements a Model Context Protocol server (exec/mcp_server.py) that exposes robotic skills as standardized tool interfaces.
Hybrid Control Strategy: Combines deterministic Inverse Kinematics (IK) for precise reaching and PPO-based Reinforcement Learning policies for robust grasping.
Atomic Skill Library: Provides a suite of robust motion primitives, including move_to, push, pull, lift, lower, and grasp.
ManiSkill Integration: Features a high-fidelity PickClutterYCB-XArm7-v1 environment built on ManiSkill and Sapien.

🛠️ Installation

1. Prerequisites

Linux (Ubuntu 20.04/22.04 recommended)
Anaconda or Miniconda
NVIDIA GPU (for simulation rendering and VLM inference)

2. Steps

Clone the Repository

git clone https://github.com/NJU-R-L-Group-Embodied-Lab/AdaClearGrasp.git
cd AdaClearGrasp

Create Conda Environment Create the environment using the provided environment.yml:
```
conda env create -f environment.yml
conda activate clear
```
Download Assets AdaClearGrasp relies on ManiSkill's YCB dataset:
```
python -m mani_skill.utils.download_asset "PickSingleYCB-v1"
```

⚙️ Configuration

Before running, you need to configure runtime parameters (e.g., VLM API Key).

Create Configuration File Copy the template file:

cp configs/runtime_config_template.yaml configs/runtime_config.yaml

Edit Configuration Open configs/runtime_config.yaml and fill in your API information. The default supports OpenAI-compatible interfaces (e.g., SiliconFlow):

openai:
  api_key: "sk-xxxxxxxxxxxxxxxx"
  base_url: "https://api.siliconflow.cn/v1"  # Or other providers
  model: "Qwen/Qwen3-VL-32B-Instruct"       # Or gpt-4o

🚀 Usage Guide

All commands should be run from the project root directory.

1. Test Atomic Skills

Before running complex VLM planning, it is recommended to test if the low-level atomic skills are working correctly. This script executes a sequence of predefined actions (reset, move, push/pull, grasp).

python scripts/test_skills.py

Note: If running on a headless server, set RUN_MODE = "video" in the script to save screen recordings.

2. Run VLM Planning (Autonomous Agent)

Launch the autonomous agent to let the VLM observe the scene and make decisions.

python -m plan.vlm_plan \
    --scene_name apple \
    --clutter_count 4 \
    --scene_id 1 \
    --max_plan_steps 40

Parameters:

--scene_name: Target object category (e.g., apple, can, mug, ball, cube, lego, pear).
--clutter_count: Number of clutter objects in the scene (e.g., 2, 4, 6).
--scene_id: Specific scene ID (corresponding to config files in data/scenes/).

Workflow:

Initialize the simulation environment.
Render the RGB image of the current view.
Send the image and task description to the VLM.
VLM returns action instructions in JSON format (e.g., {"action": "push", "args": {"side": "left", "dist_m": 0.1}}).
MCP Runtime parses and executes the action.
Repeat until the task is completed or maximum steps are reached.

3. Generate Random Scenes

You can generate new random scenes with different clutter configurations using the provided script. This is useful for creating diverse test cases.

python scripts/gen_safe_random_scenes.py

Modify CLUTTER_COUNTS in the script to generate scenes with different numbers of objects.
Generated scenes will be saved in data/scenes/<category>/<count>/.

🧪 Experiments

To reproduce the full experimental results or run large-scale evaluations:

1. Run Experiment Sweep

Use the sweep script to run the VLM planner across multiple object categories and clutter levels. This script sequentially triggers parallel runs for each configuration.

python scripts/run_vlm_plan_sweep.py

Customize: Edit the RUNS list in scripts/run_vlm_plan_sweep.py to select specific objects or clutter counts.
Parallelism: The script uses scripts/run_parallel_vlm_plan.py internally to execute multiple scene IDs (defaults to 10 scenes) in parallel. You can adjust MAX_CONCURRENCY in that file.

2. Analyze Results

After the experiments complete, use the summary script to aggregate metrics (success rate, steps taken) from the logs.

python scripts/summarize_vlm_plan_steps.py

This will parse logs from data/logs/vlm_plan/ and generate a CSV report at data/analysis/vlm_plan_tasks_summary.csv.

💡 Tips & Troubleshooting

IK Failures: If the robot fails to reach a target or grasp pose, it might be due to workspace limits or collisions. Try adjusting the target position or approach angle.
Video Recording: To save videos of the execution, set RUN_MODE = "video" in the relevant script (e.g., scripts/test_skills.py or plan/vlm_plan.py). Videos will be saved to data/videos/.
Model Loading: Ensure that the pre-trained PPO model (data/models/ppo/PickClutterYCB-XArm7-v1/ppo_grasp.zip) is present. This model is critical for the grasping phase.

📂 Project Structure

AdaClearGrasp/
├── configs/               # Configuration files
│   ├── runtime_config.yaml          # [User Created] Runtime config (API Key)
│   └── runtime_config_template.yaml # Config template
├── core/                  # Core algorithms
│   ├── kinematics/        # Robot Inverse Kinematics (IK)
│   └── perception/        # Point cloud processing & geometry
├── env/                   # Environment layer
│   └── sim/               # ManiSkill simulation (PickClutterYCB)
├── exec/                  # Execution layer (MCP Server & Skills)
│   ├── skills/            # Atomic skills implementation (Clear, Grasp, Move)
│   └── mcp_server.py      # MCP Server: Exposing skills to VLM
├── plan/                  # Planning layer (VLM Agent)
│   ├── vlm_plan.py        # Main planning loop
│   ├── mcp_runtime.py     # MCP client runtime
│   └── prompts.py         # VLM system prompts
├── scripts/               # Tool scripts
│   └── test_skills.py     # Skill testing script
├── data/                  # Data & Models
│   ├── models/ppo/        # Pre-trained Grasp RL policy
│   └── scenes/            # Scene definition JSON files
└── README.md

📝 License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AdaClearGrasp: Learning Adaptive Clearing for Zero-Shot Robust Dexterous Grasping in Densely Cluttered Environments

✨ Key Features

🛠️ Installation

1. Prerequisites

2. Steps

⚙️ Configuration

🚀 Usage Guide

1. Test Atomic Skills

2. Run VLM Planning (Autonomous Agent)

3. Generate Random Scenes

🧪 Experiments

1. Run Experiment Sweep

2. Analyze Results

💡 Tips & Troubleshooting

📂 Project Structure

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets/xhand_right		assets/xhand_right
configs		configs
core		core
data		data
env		env
exec		exec
plan		plan
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AdaClearGrasp: Learning Adaptive Clearing for Zero-Shot Robust Dexterous Grasping in Densely Cluttered Environments

✨ Key Features

🛠️ Installation

1. Prerequisites

2. Steps

⚙️ Configuration

🚀 Usage Guide

1. Test Atomic Skills

2. Run VLM Planning (Autonomous Agent)

3. Generate Random Scenes

🧪 Experiments

1. Run Experiment Sweep

2. Analyze Results

💡 Tips & Troubleshooting

📂 Project Structure

📝 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages