Skip to content

Remark on MCTS #33

@homerours

Description

@homerours

Hi! I just went over mcts.py. Here are few remarks:

  • Great work :) this looks nice!
  • The MCTSNode attributes prior and policy seems redundant: node.prior = node.parent._policy['child_index']
  • The root_player attribute may lead to complication: after playing a move, you would need to update all the root_player attributes of the corresponding subtree. I do not believe that we really need this attribute, if we adopt a convention like 'value(position) = value for the player who has to play' (this is the convention used by the NN). Then, we should modify backpropagation:
        current = leaf_node
        sign = 1
        while not current.is_root:
            current.n_visit += 1
            current.total_value += sign*v_leaf
            sign *= -1
            current = current.parent

Here, we however need to be carefull when chosing the child with highest PUCT: with the above convention, the Q-value of a child of the root correspond to the point of view of the oponent of the root. Hence we should probably use -Q to compute the PUCT.

To do so, we could store the prior, visit counts and Q values (total_value) in arrays attached to parent nodes:

class MCTSNode:
    childrens = [node1, node2, node3, node4]
    policy = [ p1, p2, p3, p4]
    n_visit = [ n1, n2, n3, n4]
    total_values = [q1, q2, q3, q4]
...

Then,

index_chosen_child = np.argmax( self.total_values 
                   + c * policy * np.sqrt(np.sum(self.n_visit)) / (1 + self.total_values))

I do not know if this would really speedup things, so that may not be urgent...
A drawback is that we would need to keep for each node an index indicating the position of the node in the arrays of its parent (because we need to update the total_values and n_visit when backpropagating).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions