Skip to content

ABR_Sim Results Replication #8

Description

@hashbrown512

I had discussed with @hongzimao issues with replicating results on ABRSimEnv
This post doesn’t need a response, just posting here so others can learn from it.

I had initially had issues replicating results on the ABRSimEnv.
The A2C agent in the Park paper contains scores of around ~420+-210

I was able to replicate the scores on ABR using code from @hongzimao here: abr_agents.zip

Entropy Ratio | Average Episode Score and Standard Deviation for 100,000 actions
10.0: | 517.3681106430971 +- 405.73426203813045
5.0: | 524.5324282999072 +- 400.950983685324

I was able to reach similar results using the same parameters in an A2C agent from stable-baselines modified with entropy decay, and a vf_coef of 0.25 a2c_stable_baselines.zip

Entropy Ratio | Average Episode Score and Standard Deviation for 100,000 actions
10.0 | 441.72765 +- 343.60534
5.0 | 420.04653 +- 178.98197
However, I initially ran the same experiments with RMSProp (default parameters) for optimization and was not able to beat the robustMPC and buffer based heuristics.

Thanks for the help!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions