Remark on MCTS

Hi! I just went over mcts.py. Here are few remarks:

- Great work :) this looks nice!
- The `MCTSNode` attributes `prior` and `policy` seems redundant: `node.prior = node.parent._policy['child_index']`
- The `root_player` attribute may lead to complication: after playing a move, you would need to update all the `root_player` attributes of the corresponding subtree. I do not believe that we really need this attribute, if we adopt a convention like 'value(position) = value for the player who has to play' (this is the convention used by the NN). Then, we should modify backpropagation:
```python
        current = leaf_node
        sign = 1
        while not current.is_root:
            current.n_visit += 1
            current.total_value += sign*v_leaf
            sign *= -1
            current = current.parent
```
Here, we however need to be carefull when chosing the child with highest PUCT: with the above convention, the Q-value of a child of the root correspond to the point of view of the oponent of the root. Hence we should probably use -Q to compute the PUCT.
- We can potentially improve runtime by 'vectorizing' loops over the children of a node.
https://github.com/AlphaZeroIncubator/AlphaZero/blob/d06e212129b2a2ef1be6b35d8570ecb125dca2e3/alphazero/mcts/mcts.py#L284
(above we could use something like `current.parent.n_visit`)
https://github.com/AlphaZeroIncubator/AlphaZero/blob/d06e212129b2a2ef1be6b35d8570ecb125dca2e3/alphazero/mcts/mcts.py#L287-L290
https://github.com/AlphaZeroIncubator/AlphaZero/blob/d06e212129b2a2ef1be6b35d8570ecb125dca2e3/alphazero/mcts/mcts.py#L299

To do so, we could store the prior, visit counts and Q values (`total_value`) in arrays attached to parent nodes:
```python
class MCTSNode:
    childrens = [node1, node2, node3, node4]
    policy = [ p1, p2, p3, p4]
    n_visit = [ n1, n2, n3, n4]
    total_values = [q1, q2, q3, q4]
...
```
Then, 
```python
index_chosen_child = np.argmax( self.total_values 
                   + c * policy * np.sqrt(np.sum(self.n_visit)) / (1 + self.total_values))
```
I do not know if this would really speedup things, so that may not be urgent...
A drawback is that we would need to keep for each node an index indicating the position of the node in the arrays of its parent (because we need to update the total_values and n_visit when backpropagating).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remark on MCTS #33

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	[
	child.puct(sum_n=sum_n, c_puct=c_puct)
	for child in current.children
	]

Remark on MCTS #33

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions