I did some short test for self-play on the server. I needed to modify few things, so I created a branch 'self_play_speed_test'.
I used cProfiler for the profiling, here is the output of python -m cProfile -s cumulative speed_test.py (the result are sorted by cumulative time), for a TicTacToe game of 9 moves and 100 mcts run per move:
1689398 function calls (1602525 primitive calls) in 11.643 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
754/1 0.004 0.000 11.645 11.645 {built-in method builtins.exec}
1 0.000 0.000 11.645 11.645 speed_test.py:1(<module>)
1 0.000 0.000 8.992 8.992 mcts.py:363(self_play)
9 0.001 0.000 8.990 0.999 mcts.py:245(mcts)
900 0.010 0.000 8.987 0.010 mcts.py:319(forward)
900 0.023 0.000 8.866 0.010 mcts.py:287(backpropagate)
900 0.099 0.000 7.841 0.009 mcts.py:83(calc_policy_value)
81000/900 0.436 0.000 7.473 0.008 module.py:540(__call__)
900 0.012 0.000 7.466 0.008 model.py:218(forward)
900 0.017 0.000 6.375 0.007 model.py:84(forward)
The self-play took about 9s, and 7.8s are used by the network evaluations (calc_policy_value) so that the MCTS logic takes the remaining 1.2s.
It would be nice to check the results for Connect4, but I was unable to run it. Did you get similar results when running MCTS?
So, we need to improve the performance of the 'inference task' (and also MCTS if possible), using for instance multiple processes.
I did some short test for self-play on the server. I needed to modify few things, so I created a branch 'self_play_speed_test'.
I used cProfiler for the profiling, here is the output of
python -m cProfile -s cumulative speed_test.py(the result are sorted by cumulative time), for a TicTacToe game of 9 moves and 100 mcts run per move:The self-play took about 9s, and 7.8s are used by the network evaluations (
calc_policy_value) so that the MCTS logic takes the remaining 1.2s.It would be nice to check the results for Connect4, but I was unable to run it. Did you get similar results when running MCTS?
So, we need to improve the performance of the 'inference task' (and also MCTS if possible), using for instance multiple processes.