- Reacher (Gymnasium)
- Franka Kitchen (Gymnasium-Robotics)
BCO
For BCO we have used an MLP deep network.
DAgger
To train the student we have used DAgger algorithm with a Shallow MLP network (for Policy Distillation).
In particular we have used firstly the Reacher environment in order to test DAgger in a more "simple" env, while after that we have tested its performance in a more complicated one, Franka Kitchen.
We suggest you to read our presentation in order to understand the environments, action/observation spaces ... CLICK HERE
In order to use the BCO method, that is a supervised learning method, we have used Minari datasets for both environments:
- Reacher Expert and Reacher Medium datasets
- Franka Kitchen Complete (we have cutted the complete trajectories in order to do only the "microwave" task!)
Follow these instructions if you want to run the notebooks on Colab:
In this notebook you can train with BCO both an expert for Reacher and Franka Kitchen.
In this notebook you can train with DAgger a student for Reacher env.
In this notebook you can train with DAgger a student for Franka Kitchen env.
For Reacher we have tested three agents using the Expert dataset (the left), a filtered dataset (homemade) taking only mean reward over -0.1 (the center), and finally the Medium dataset (the right).
This is a comparison between the Deep MLP Teacher (the left) and the Shallow MLP student (the right).
The following are the performance of the metrics and the mean reward on 1k episodes:
For Franka Kitchen we have tested a Deep Teacher (BCO) using the Complete Minari dataset (the left), a Shallow Teacher (BCO) using also the Complete Minari dataset (center), and a Shallow Student (DAgger) (the right).
We have tested the Deep Teacher in presence of noise (setting the "robot_noise_ratio" variable of Franka Kitchen Environment). You can see the robot_noise_ratio=0.29 (the left) and robot_noise_ratio=0.30.
We have tested also the Shallow Student obtained with DAgger training seeing that is more robust to the noise: in fact, you can see the robot_noise_ratio=0.29 (the left) and robot_noise_ratio=0.3378 (that it is able to recover the initial error).
Finally, we have tested a "Cold Start" situation, in which for the first 27 steps of the episode the joint/end_effector velocities are zeroed out. Both the Deep Teacher (the left) and the Shallow Student (the right) behave correctly to this situation.
The following are the tables with the metrics performance and the mean reward on 1k episodes:
Massimo Romano
GitHub: @cybernetic-m
LinkedIn: Massimo Romano
Website: Massimo Romano
Luca Del Signore
GitHub: @Puaison
This project is licensed under the MIT License.
See the LICENSE file for details.






















