Skip to content

MARS-EAI/CoEnv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment

arXiv License: MIT

CoEnv is a framework that leverages simulation for safe strategy exploration while ensuring reliable real-world deployment in multi-agent robotic manipulation. It operates through three stages:

  1. Real-to-Sim Scene Reconstruction — Converts multi-view RGBD observations into a simulator-ready scene via 3D asset generation, object localization, and iterative camera calibration.
  2. Simulation-Conditioned Action Synthesis — Performs hierarchical task planning followed by grounded execution in either interactive mode (closed-loop VLM feedback) or iterative mode (code agent with iterative refinement).
  3. Sim-to-Real Transfer — Deploys validated trajectories via trajectory interpolation with collision volume verification for safe multi-agent execution.

Tasks

CoEnv is evaluated on five real-world multi-agent manipulation tasks:

Task Agents Description
Cube Stacking Franka x 2 Two arms stack a blue cube and a red cube separately
Ball Pickup Franka x 2 Bimanual coordination to lift a soccer ball with balanced contact
Transfer Cylinder Franka x 2 Pick up a cylinder, perform a bimanual handover and place it at the target
Place Cucumber Franka + Piper x 2 Pick up the pot lid and place the cucumbers into the pot
Brush Box Franka + Piper x 2 Grasp a brush and repeatedly sweep the box

Installation

1. Clone the Repository

First, install Vulkan.

git clone https://github.com/MARS-EAI/CoEnv.git --recurse
cd CoEnv

2. Create a Conda Environment

conda create -n coenv python=3.10
conda activate coenv

3. Install Dependencies

pip install -r requirements.txt

4. Download Assets

python coenv/script/download_assets.py

5. Verify Installation

# With graphical desktop
python coenv/script/run_task.py coenv/configs/table/dual_franka_stack_cubes.yaml

Headless Server Setup

If running on a headless server without a graphical desktop:

sudo apt update
sudo apt install libgl1 libglvnd0 libegl1-mesa libgles2-mesa libopengl0

Usage

Interactive Mode (VLM-driven)

The interactive mode uses a VLM (e.g., GPT-5) for closed-loop planning with real-time visual feedback.

export VLM_API_KEY="your-api-key"
export VLM_BASE_URL="your-api-base-url"

python coenv/vlm_workflow_visual_guided_new.py \
    --task dual_franka_stack_cubes \
    --config coenv/configs/table/dual_franka_stack_cubes.yaml

Iterative Mode (Code Agent)

The iterative mode uses a code agent (Claude Code) to generate complete trajectory programs with iterative refinement.

export ANTHROPIC_API_KEY="your-anthropic-key"

python coenv/code_agent_workflow.py \
    --config coenv/configs/table/dual_franka_stack_cubes.yaml

Sim-to-Real Transfer

Replay validated trajectories on real robots with collision volume verification:

python coenv/replay_actions.py --debug-dir debug_images --config coenv/configs/table/dual_franka_stack_cubes.yaml

Data Collection

CoEnv supports scalable multi-agent data collection with domain randomization:

# Cycling mode for continuous data collection
python coenv/vlm_workflow_cycling.py \
    --task dual_franka_stack_cubes \
    --config coenv/configs/table/dual_franka_stack_cubes_rand.yaml

Demo Scripts

Run pre-recorded action sequences for quick visualization:

python coenv/run_demo/dual_franka_stack_cubes_demo.py
python coenv/run_demo/dual_franka_grasp_ball_demo.py
python coenv/run_demo/three_arm_clean_box_demo.py
python coenv/run_demo/three_arm_put_food_demo.py

Project Structure

CoEnv/
├── README.md
├── LICENSE
├── requirements.txt
└── coenv/
    ├── agents/                    # Robot agent definitions (Franka, Piper, Panda)
    ├── tasks/                     # Task environment definitions
    ├── configs/table/             # Task configuration YAML files
    ├── tools/                     # Action primitive library for code agent
    ├── planner/                   # Motion planning (open-loop IK)
    ├── utils/                     # Utility functions
    ├── vlm_workflow_visual_guided_new.py   # Interactive mode main entry
    ├── vlm_workflow_cycling.py             # Data collection (cycling)
    ├── vlm_workflow_cycling_cubes.py       # Data collection (cube tasks)
    ├── code_agent_workflow.py              # Iterative mode main entry
    ├── code_agent_executor.py              # Code agent executor
    ├── code_agent_prompts.py               # Code agent prompt templates
    ├── code_agent_sdk_main.py              # Code agent SDK integration
    ├── code_agent_sdk_runner.py            # Code agent SDK runner
    ├── replay_actions.py                   # Sim-to-real trajectory replay
    ├── replay_actions_cycling_cubes.py     # Sim-to-real replay (cube tasks)
    ├── run_demo/                  # Demo scripts with pre-recorded actions
    ├── run_realtime/              # Real-time robot control scripts
    ├── assets/                    # 3D model assets
    ├── docs/                      # Documentation
    ├── script/                    # Utility scripts (data generation, etc.)
    └── custom_policy/             # Custom policy interface

Citation

If you find this work useful, please cite our paper:

@article{kang2026coenv,
  title={CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment},
  author={Kang, Li and Fan, Yutao and Li, Rui and Zhou, Heng and Qin, Yiran and Zhang, Zhemeng and Huang, Songtao and Song, Xiufeng and Zhang, Zaibin and Chen, Bruno N.Y. and Yin, Zhenfei and Zhou, Dongzhan and Zuo, Wangmeng and Bai, Lei},
  journal={arXiv preprint arXiv:2604.05484},
  year={2026}
}

License

This project is released under the MIT License.

Acknowledgements

CoEnv is developed on top of RoboFactory, a simulation benchmark for multi-agent embodied collaboration built upon ManiSkill and SAPIEN. We gratefully acknowledge the RoboFactory team for providing the foundational simulation environment, task definitions, and robot agent implementations that made this work possible.

About

CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors