I had discussed with @hongzimao issues with replicating results on ABRSimEnv
This post doesn’t need a response, just posting here so others can learn from it.
I had initially had issues replicating results on the ABRSimEnv.
The A2C agent in the Park paper contains scores of around ~420+-210
I was able to replicate the scores on ABR using code from @hongzimao here: abr_agents.zip
Entropy Ratio | Average Episode Score and Standard Deviation for 100,000 actions
10.0: | 517.3681106430971 +- 405.73426203813045
5.0: | 524.5324282999072 +- 400.950983685324
I was able to reach similar results using the same parameters in an A2C agent from stable-baselines modified with entropy decay, and a vf_coef of 0.25 a2c_stable_baselines.zip
Entropy Ratio | Average Episode Score and Standard Deviation for 100,000 actions
10.0 | 441.72765 +- 343.60534
5.0 | 420.04653 +- 178.98197
However, I initially ran the same experiments with RMSProp (default parameters) for optimization and was not able to beat the robustMPC and buffer based heuristics.
Thanks for the help!!
I had discussed with @hongzimao issues with replicating results on ABRSimEnv
This post doesn’t need a response, just posting here so others can learn from it.
I had initially had issues replicating results on the ABRSimEnv.
The A2C agent in the Park paper contains scores of around ~420+-210
I was able to replicate the scores on ABR using code from @hongzimao here: abr_agents.zip
Entropy Ratio | Average Episode Score and Standard Deviation for 100,000 actions
10.0: | 517.3681106430971 +- 405.73426203813045
5.0: | 524.5324282999072 +- 400.950983685324
I was able to reach similar results using the same parameters in an A2C agent from stable-baselines modified with entropy decay, and a vf_coef of 0.25 a2c_stable_baselines.zip
Entropy Ratio | Average Episode Score and Standard Deviation for 100,000 actions
10.0 | 441.72765 +- 343.60534
5.0 | 420.04653 +- 178.98197
However, I initially ran the same experiments with RMSProp (default parameters) for optimization and was not able to beat the robustMPC and buffer based heuristics.
Thanks for the help!!