Hi! I just went over mcts.py. Here are few remarks:
- Great work :) this looks nice!
- The
MCTSNode attributes prior and policy seems redundant: node.prior = node.parent._policy['child_index']
- The
root_player attribute may lead to complication: after playing a move, you would need to update all the root_player attributes of the corresponding subtree. I do not believe that we really need this attribute, if we adopt a convention like 'value(position) = value for the player who has to play' (this is the convention used by the NN). Then, we should modify backpropagation:
current = leaf_node
sign = 1
while not current.is_root:
current.n_visit += 1
current.total_value += sign*v_leaf
sign *= -1
current = current.parent
Here, we however need to be carefull when chosing the child with highest PUCT: with the above convention, the Q-value of a child of the root correspond to the point of view of the oponent of the root. Hence we should probably use -Q to compute the PUCT.
- We can potentially improve runtime by 'vectorizing' loops over the children of a node.
|
sum_n = np.sum([child.n_visit for child in current.children]) |
(above we could use something like current.parent.n_visit)
|
[ |
|
child.puct(sum_n=sum_n, c_puct=c_puct) |
|
for child in current.children |
|
] |
|
[child.n_visit for child in start_node.children] |
To do so, we could store the prior, visit counts and Q values (total_value) in arrays attached to parent nodes:
class MCTSNode:
childrens = [node1, node2, node3, node4]
policy = [ p1, p2, p3, p4]
n_visit = [ n1, n2, n3, n4]
total_values = [q1, q2, q3, q4]
...
Then,
index_chosen_child = np.argmax( self.total_values
+ c * policy * np.sqrt(np.sum(self.n_visit)) / (1 + self.total_values))
I do not know if this would really speedup things, so that may not be urgent...
A drawback is that we would need to keep for each node an index indicating the position of the node in the arrays of its parent (because we need to update the total_values and n_visit when backpropagating).
Hi! I just went over mcts.py. Here are few remarks:
MCTSNodeattributespriorandpolicyseems redundant:node.prior = node.parent._policy['child_index']root_playerattribute may lead to complication: after playing a move, you would need to update all theroot_playerattributes of the corresponding subtree. I do not believe that we really need this attribute, if we adopt a convention like 'value(position) = value for the player who has to play' (this is the convention used by the NN). Then, we should modify backpropagation:Here, we however need to be carefull when chosing the child with highest PUCT: with the above convention, the Q-value of a child of the root correspond to the point of view of the oponent of the root. Hence we should probably use -Q to compute the PUCT.
AlphaZero/alphazero/mcts/mcts.py
Line 284 in d06e212
(above we could use something like
current.parent.n_visit)AlphaZero/alphazero/mcts/mcts.py
Lines 287 to 290 in d06e212
AlphaZero/alphazero/mcts/mcts.py
Line 299 in d06e212
To do so, we could store the prior, visit counts and Q values (
total_value) in arrays attached to parent nodes:Then,
I do not know if this would really speedup things, so that may not be urgent...
A drawback is that we would need to keep for each node an index indicating the position of the node in the arrays of its parent (because we need to update the total_values and n_visit when backpropagating).