ABR_Sim Results Replication

I had discussed with @hongzimao issues with replicating results on ABRSimEnv 
**This post doesn’t need a response, just posting here so others can learn from it.**

I had initially had issues replicating results on the ABRSimEnv.
The A2C agent in the Park paper contains scores of around ~420+-210

I was able to replicate the scores on ABR using code from @hongzimao here: [abr_agents.zip](https://github.com/park-project/park/files/4411308/abr_agents.zip)

Entropy Ratio  | Average Episode Score and Standard Deviation for 100,000 actions
10.0:                 | 517.3681106430971 +- 405.73426203813045
5.0:                   | 524.5324282999072 +- 400.950983685324

I was able to reach similar results using the same parameters in an A2C agent from stable-baselines modified with entropy decay, and a vf_coef of 0.25 [a2c_stable_baselines.zip](https://github.com/park-project/park/files/4411291/a2c_stable_baselines.zip)

Entropy Ratio  | Average Episode Score and Standard Deviation for 100,000 actions
10.0                  | 441.72765 +- 343.60534
5.0                    | 420.04653 +- 178.98197
However, I initially ran the same experiments with RMSProp (default parameters) for optimization and was not able to beat the robustMPC and buffer based heuristics. 

Thanks for the help!!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ABR_Sim Results Replication #8

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

ABR_Sim Results Replication #8

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions