A robotics and machine learning project that teaches a simulated Franka Panda robot to autonomously detect objects, grasp a broom, and sweep them to a target location — built on MuJoCo physics, a fine-tuned YOLO vision model, and a custom RL environment.
The system combines computer vision, inverse kinematics, and reinforcement learning into a single end-to-end pipeline:
- Synthetic data generation — MuJoCo renders annotated training frames automatically, no manual labeling
- Vision — YOLOv8-OBB fine-tuned to detect the broom and target objects with oriented bounding boxes (preserving rotation angle, critical for grasping)
- 3D pose estimation — 2D detections + depth maps → 6-DOF object poses via quaternion math
- Motion control — Damped Least-Squares inverse kinematics + PD torque control to move the arm
- RL environment — Gymnasium-compatible environment with a multi-metric reward that balances object displacement, tool alignment, grasp stability, and drop penalties
| Component | Tool |
|---|---|
| Physics simulation | MuJoCo 3.4.0 |
| Robot model | Franka Panda (MuJoCo Menagerie) |
| Object detection | YOLOv8-OBB (Ultralytics) |
| Deep learning | PyTorch |
| RL environment | Gymnasium |
| Vision utilities | OpenCV |
| Rendering | MediaPy |
RLRoboticsFinalProject2026/
├── project.ipynb # Full end-to-end pipeline (stages 1–7)
├── project_Model.ipynb # RL policy training and evaluation
├── Dylan_reward_2dsim_pass.ipynb # Reward function prototyping in 2D
├── yolov8n-obb.pt # Pre-trained YOLO base weights
├── mujoco_obb/ # Synthetic training dataset (objects only)
├── mujoco_obb_with_broom/ # Synthetic training dataset (with broom)
└── *.obj # 3D assets: broom, mug, can opener, shoe, action figure
The main notebook (project.ipynb) runs through 7 self-contained stages:
| Stage | Description |
|---|---|
| 1 | Environment setup and imports |
| 2 | Synthetic training data generation via MuJoCo rendering |
| 3 | YOLOv8-OBB fine-tuning on generated data |
| 4 | Perception module validation (2D → 3D pose) |
| 5 | Full scene composition — robot, broom, and objects |
| 6 | Motion planning and torque control |
| 7 | Reward function definition and RL environment construction |
Prerequisites: Python 3.8+, GPU recommended
All dependencies are installed automatically when you run the first cell of project.ipynb.
git clone https://github.com/GreyViperTooth/RLRoboticsFinalProject2026.git
cd RLRoboticsFinalProject2026
jupyter notebook project.ipynbRun cells top to bottom. The notebook will install MuJoCo, Ultralytics, PyTorch, Gymnasium, OpenCV, and MediaPy as needed.
To train or evaluate the RL policy separately:
jupyter notebook project_Model.ipynb- Synthetic data over manual annotation — all training images are rendered directly from the MuJoCo scene, making the dataset free to regenerate and inherently aligned with the simulation domain
- Oriented bounding boxes — standard axis-aligned boxes lose the broom's angle; OBB detection preserves it, which is essential for computing a correct grasp pose
- Custom gripper — enlarged fingertip plates with friction ridges prevent the broom handle from slipping during sweeping motions
- Multi-metric reward — a single reward signal (e.g., distance to goal) is insufficient for a contact-rich task; the reward combines object displacement, tool-object alignment, grasp stability, and a drop penalty
- Jeff Helzner
- Maanav Anand Kumar
- Dylan
MIT
