CoEnv is a framework that leverages simulation for safe strategy exploration while ensuring reliable real-world deployment in multi-agent robotic manipulation. It operates through three stages:
- Real-to-Sim Scene Reconstruction — Converts multi-view RGBD observations into a simulator-ready scene via 3D asset generation, object localization, and iterative camera calibration.
- Simulation-Conditioned Action Synthesis — Performs hierarchical task planning followed by grounded execution in either interactive mode (closed-loop VLM feedback) or iterative mode (code agent with iterative refinement).
- Sim-to-Real Transfer — Deploys validated trajectories via trajectory interpolation with collision volume verification for safe multi-agent execution.
CoEnv is evaluated on five real-world multi-agent manipulation tasks:
| Task | Agents | Description |
|---|---|---|
| Cube Stacking | Franka x 2 | Two arms stack a blue cube and a red cube separately |
| Ball Pickup | Franka x 2 | Bimanual coordination to lift a soccer ball with balanced contact |
| Transfer Cylinder | Franka x 2 | Pick up a cylinder, perform a bimanual handover and place it at the target |
| Place Cucumber | Franka + Piper x 2 | Pick up the pot lid and place the cucumbers into the pot |
| Brush Box | Franka + Piper x 2 | Grasp a brush and repeatedly sweep the box |
First, install Vulkan.
git clone https://github.com/MARS-EAI/CoEnv.git --recurse
cd CoEnvconda create -n coenv python=3.10
conda activate coenvpip install -r requirements.txtpython coenv/script/download_assets.py# With graphical desktop
python coenv/script/run_task.py coenv/configs/table/dual_franka_stack_cubes.yamlIf running on a headless server without a graphical desktop:
sudo apt update
sudo apt install libgl1 libglvnd0 libegl1-mesa libgles2-mesa libopengl0The interactive mode uses a VLM (e.g., GPT-5) for closed-loop planning with real-time visual feedback.
export VLM_API_KEY="your-api-key"
export VLM_BASE_URL="your-api-base-url"
python coenv/vlm_workflow_visual_guided_new.py \
--task dual_franka_stack_cubes \
--config coenv/configs/table/dual_franka_stack_cubes.yamlThe iterative mode uses a code agent (Claude Code) to generate complete trajectory programs with iterative refinement.
export ANTHROPIC_API_KEY="your-anthropic-key"
python coenv/code_agent_workflow.py \
--config coenv/configs/table/dual_franka_stack_cubes.yamlReplay validated trajectories on real robots with collision volume verification:
python coenv/replay_actions.py --debug-dir debug_images --config coenv/configs/table/dual_franka_stack_cubes.yamlCoEnv supports scalable multi-agent data collection with domain randomization:
# Cycling mode for continuous data collection
python coenv/vlm_workflow_cycling.py \
--task dual_franka_stack_cubes \
--config coenv/configs/table/dual_franka_stack_cubes_rand.yamlRun pre-recorded action sequences for quick visualization:
python coenv/run_demo/dual_franka_stack_cubes_demo.py
python coenv/run_demo/dual_franka_grasp_ball_demo.py
python coenv/run_demo/three_arm_clean_box_demo.py
python coenv/run_demo/three_arm_put_food_demo.pyCoEnv/
├── README.md
├── LICENSE
├── requirements.txt
└── coenv/
├── agents/ # Robot agent definitions (Franka, Piper, Panda)
├── tasks/ # Task environment definitions
├── configs/table/ # Task configuration YAML files
├── tools/ # Action primitive library for code agent
├── planner/ # Motion planning (open-loop IK)
├── utils/ # Utility functions
├── vlm_workflow_visual_guided_new.py # Interactive mode main entry
├── vlm_workflow_cycling.py # Data collection (cycling)
├── vlm_workflow_cycling_cubes.py # Data collection (cube tasks)
├── code_agent_workflow.py # Iterative mode main entry
├── code_agent_executor.py # Code agent executor
├── code_agent_prompts.py # Code agent prompt templates
├── code_agent_sdk_main.py # Code agent SDK integration
├── code_agent_sdk_runner.py # Code agent SDK runner
├── replay_actions.py # Sim-to-real trajectory replay
├── replay_actions_cycling_cubes.py # Sim-to-real replay (cube tasks)
├── run_demo/ # Demo scripts with pre-recorded actions
├── run_realtime/ # Real-time robot control scripts
├── assets/ # 3D model assets
├── docs/ # Documentation
├── script/ # Utility scripts (data generation, etc.)
└── custom_policy/ # Custom policy interface
If you find this work useful, please cite our paper:
@article{kang2026coenv,
title={CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment},
author={Kang, Li and Fan, Yutao and Li, Rui and Zhou, Heng and Qin, Yiran and Zhang, Zhemeng and Huang, Songtao and Song, Xiufeng and Zhang, Zaibin and Chen, Bruno N.Y. and Yin, Zhenfei and Zhou, Dongzhan and Zuo, Wangmeng and Bai, Lei},
journal={arXiv preprint arXiv:2604.05484},
year={2026}
}This project is released under the MIT License.
CoEnv is developed on top of RoboFactory, a simulation benchmark for multi-agent embodied collaboration built upon ManiSkill and SAPIEN. We gratefully acknowledge the RoboFactory team for providing the foundational simulation environment, task definitions, and robot agent implementations that made this work possible.