Skip to content

Some inconsistencies between code and paper #3

@MherMatevosyan

Description

@MherMatevosyan

I have been digging into your paper and code, and noticed some potential discrepancies between the paper and the code. I would appreciate it very much if you could clarify.

  1. in training.py line 84
                for i in range(batch_mol.shape[0]):

                    # If molecule was modified
                    if not np.all(org_mols[i] == batch_mol[i]):

                        fr = evaluate_mol(batch_mol[i], e, decodings)
                        frs.append(fr)
                        rewards[i] += np.sum(fr * dist)

isn't the reward updated by rewards[i] += np.sum(fr * 1/(1+dist)) according to the paper(Eq. 12)?

  1. In models.py Actor and Critic have different learning rates (0.0005 and 0.0001), but the paper mentions that they both have the same learning rate (page 19). I would like to know the reason for different learning rates, and if there are any more parameter changes, would appreciate it if you could tell me.

  2. in training.py line 112

            for i in range(BATCH_SIZE):

                a = int(actions[i])
                loss = -np.log(probs[i,a]) * td_error[i]
                target_actor[i,a] = td_error[i]

The variable loss is not used anywhere. From https://www.freecodecamp.org/news/an-intro-to-advantage-actor-critic-methods-lets-play-sonic-the-hedgehog-86d6240171d/ I see that the policy update should include the logarithm. So should the target_actor[i,a] = loss be the correct update?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions