Some inconsistencies between code and paper

I have been digging into your paper and code, and noticed some potential discrepancies between the paper and the code. I would appreciate it very much if you could clarify.
1) in **training.py** line 84
``` 
                for i in range(batch_mol.shape[0]):

                    # If molecule was modified
                    if not np.all(org_mols[i] == batch_mol[i]):

                        fr = evaluate_mol(batch_mol[i], e, decodings)
                        frs.append(fr)
                        rewards[i] += np.sum(fr * dist)
``` 
isn't the reward updated by `rewards[i] += np.sum(fr * 1/(1+dist))` according to the paper(Eq. 12)?

2) In **models.py** Actor and Critic have different learning rates (0.0005 and 0.0001), but the paper mentions that they both have the same learning rate (page 19). I would like to know the reason for different learning rates, and if there are any more parameter changes, would appreciate it if you could tell me.

3) in **training.py**  line 112
```
            for i in range(BATCH_SIZE):

                a = int(actions[i])
                loss = -np.log(probs[i,a]) * td_error[i]
                target_actor[i,a] = td_error[i]
```
The variable `loss` is not used anywhere. From https://www.freecodecamp.org/news/an-intro-to-advantage-actor-critic-methods-lets-play-sonic-the-hedgehog-86d6240171d/ I see that the policy update should include the logarithm. So should the `target_actor[i,a] = loss` be the correct update?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some inconsistencies between code and paper #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Some inconsistencies between code and paper #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions