CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment

CoEnv is a framework that leverages simulation for safe strategy exploration while ensuring reliable real-world deployment in multi-agent robotic manipulation. It operates through three stages:

Real-to-Sim Scene Reconstruction — Converts multi-view RGBD observations into a simulator-ready scene via 3D asset generation, object localization, and iterative camera calibration.
Simulation-Conditioned Action Synthesis — Performs hierarchical task planning followed by grounded execution in either interactive mode (closed-loop VLM feedback) or iterative mode (code agent with iterative refinement).
Sim-to-Real Transfer — Deploys validated trajectories via trajectory interpolation with collision volume verification for safe multi-agent execution.

Tasks

CoEnv is evaluated on five real-world multi-agent manipulation tasks:

Task	Agents	Description
Cube Stacking	Franka x 2	Two arms stack a blue cube and a red cube separately
Ball Pickup	Franka x 2	Bimanual coordination to lift a soccer ball with balanced contact
Transfer Cylinder	Franka x 2	Pick up a cylinder, perform a bimanual handover and place it at the target
Place Cucumber	Franka + Piper x 2	Pick up the pot lid and place the cucumbers into the pot
Brush Box	Franka + Piper x 2	Grasp a brush and repeatedly sweep the box

Installation

1. Clone the Repository

First, install Vulkan.

git clone https://github.com/MARS-EAI/CoEnv.git --recurse
cd CoEnv

2. Create a Conda Environment

conda create -n coenv python=3.10
conda activate coenv

3. Install Dependencies

pip install -r requirements.txt

4. Download Assets

python coenv/script/download_assets.py

5. Verify Installation

# With graphical desktop
python coenv/script/run_task.py coenv/configs/table/dual_franka_stack_cubes.yaml

Headless Server Setup

If running on a headless server without a graphical desktop:

sudo apt update
sudo apt install libgl1 libglvnd0 libegl1-mesa libgles2-mesa libopengl0

Usage

Interactive Mode (VLM-driven)

The interactive mode uses a VLM (e.g., GPT-5) for closed-loop planning with real-time visual feedback.

export VLM_API_KEY="your-api-key"
export VLM_BASE_URL="your-api-base-url"

python coenv/vlm_workflow_visual_guided_new.py \
    --task dual_franka_stack_cubes \
    --config coenv/configs/table/dual_franka_stack_cubes.yaml

Iterative Mode (Code Agent)

The iterative mode uses a code agent (Claude Code) to generate complete trajectory programs with iterative refinement.

export ANTHROPIC_API_KEY="your-anthropic-key"

python coenv/code_agent_workflow.py \
    --config coenv/configs/table/dual_franka_stack_cubes.yaml

Sim-to-Real Transfer

Replay validated trajectories on real robots with collision volume verification:

python coenv/replay_actions.py --debug-dir debug_images --config coenv/configs/table/dual_franka_stack_cubes.yaml

Data Collection

CoEnv supports scalable multi-agent data collection with domain randomization:

# Cycling mode for continuous data collection
python coenv/vlm_workflow_cycling.py \
    --task dual_franka_stack_cubes \
    --config coenv/configs/table/dual_franka_stack_cubes_rand.yaml

Demo Scripts

Run pre-recorded action sequences for quick visualization:

python coenv/run_demo/dual_franka_stack_cubes_demo.py
python coenv/run_demo/dual_franka_grasp_ball_demo.py
python coenv/run_demo/three_arm_clean_box_demo.py
python coenv/run_demo/three_arm_put_food_demo.py

Project Structure

CoEnv/
├── README.md
├── LICENSE
├── requirements.txt
└── coenv/
    ├── agents/                    # Robot agent definitions (Franka, Piper, Panda)
    ├── tasks/                     # Task environment definitions
    ├── configs/table/             # Task configuration YAML files
    ├── tools/                     # Action primitive library for code agent
    ├── planner/                   # Motion planning (open-loop IK)
    ├── utils/                     # Utility functions
    ├── vlm_workflow_visual_guided_new.py   # Interactive mode main entry
    ├── vlm_workflow_cycling.py             # Data collection (cycling)
    ├── vlm_workflow_cycling_cubes.py       # Data collection (cube tasks)
    ├── code_agent_workflow.py              # Iterative mode main entry
    ├── code_agent_executor.py              # Code agent executor
    ├── code_agent_prompts.py               # Code agent prompt templates
    ├── code_agent_sdk_main.py              # Code agent SDK integration
    ├── code_agent_sdk_runner.py            # Code agent SDK runner
    ├── replay_actions.py                   # Sim-to-real trajectory replay
    ├── replay_actions_cycling_cubes.py     # Sim-to-real replay (cube tasks)
    ├── run_demo/                  # Demo scripts with pre-recorded actions
    ├── run_realtime/              # Real-time robot control scripts
    ├── assets/                    # 3D model assets
    ├── docs/                      # Documentation
    ├── script/                    # Utility scripts (data generation, etc.)
    └── custom_policy/             # Custom policy interface

Citation

If you find this work useful, please cite our paper:

@article{kang2026coenv,
  title={CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment},
  author={Kang, Li and Fan, Yutao and Li, Rui and Zhou, Heng and Qin, Yiran and Zhang, Zhemeng and Huang, Songtao and Song, Xiufeng and Zhang, Zaibin and Chen, Bruno N.Y. and Yin, Zhenfei and Zhou, Dongzhan and Zuo, Wangmeng and Bai, Lei},
  journal={arXiv preprint arXiv:2604.05484},
  year={2026}
}

License

This project is released under the MIT License.

Acknowledgements

CoEnv is developed on top of RoboFactory, a simulation benchmark for multi-agent embodied collaboration built upon ManiSkill and SAPIEN. We gratefully acknowledge the RoboFactory team for providing the foundational simulation environment, task definitions, and robot agent implementations that made this work possible.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
coenv		coenv
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
paper.pdf		paper.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment

Tasks

Installation

1. Clone the Repository

2. Create a Conda Environment

3. Install Dependencies

4. Download Assets

5. Verify Installation

Headless Server Setup

Usage

Interactive Mode (VLM-driven)

Iterative Mode (Code Agent)

Sim-to-Real Transfer

Data Collection

Demo Scripts

Project Structure

Citation

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment

Tasks

Installation

1. Clone the Repository

2. Create a Conda Environment

3. Install Dependencies

4. Download Assets

5. Verify Installation

Headless Server Setup

Usage

Interactive Mode (VLM-driven)

Iterative Mode (Code Agent)

Sim-to-Real Transfer

Data Collection

Demo Scripts

Project Structure

Citation

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages