5

AlphaZero searches for moves using Monte Carlo Tree Search (MCTS). As I understand MCTS, it takes the root position, goes to a child node, and then plays random moves until one side wins. Since the moves played are random, does that mean AlphaZero is also random?

Related: this question shows that Stockfish and other conventional engines are indeed deterministic if one looks only at the number of nodes evaluated. I don't know however if the same applies to AlphaZero.

Allure
  • 26,534
  • 1
  • 68
  • 146

2 Answers2

2

No. MCTS is generally not deterministic. However, it's not that bad given how powerful Google's machines were.

SmallChess
  • 22,476
  • 2
  • 45
  • 82
  • 3
    I don't think AlphaZero uses random roll-outs. If it just explores branches according to the neural network score, it might well be deterministic. But it's too long since I read the paper, so I'm not sure. – BlindKungFuMaster Dec 10 '18 at 14:04
  • @BlindKungFuMaster I need some time to give you a technical answer. I'm not able to do it now. – SmallChess Dec 10 '18 at 14:05
  • 2
    @BlindKungFuMaster It does randomly explore the tree until a leaf is found, but it is not a uniform random dist., instead it performs an importance sampling as follows: it selects moves either proportionally to the node win probability (estimated over the set of times the node's been sampled) or w.r.t to the visit count of the node (emphasizing rare moves). These two types of selection entail the exploration and exploitation modes respectively. Which type gets chosen at each generation point is inherently random (though non-uniform) and therefore, the set of sampled leaves is randomized. – Ellie Dec 10 '18 at 23:02
  • Are you still confident this is the answer? Watching Leela play the TCEC bonus against Stockfish, from the opening position, its moves seemed to be deterministic. In all 50 games it had the white pieces, it played 1.e4 in all of them. – Allure May 06 '19 at 11:21
  • @Allure maybe it has book move? In any case the first move in chess is not important for this discussion. – SmallChess May 06 '19 at 11:25
  • @SmallChess the bonus was played bookless though. Stockfish's opening choices varied, but not Leela's. http://mytcecexperience.blogspot.com/2019/03/s14-bonus-match-leela-stockfish.html – Allure May 06 '19 at 11:39
  • @Allure I need time to read and give you a response. – SmallChess May 06 '19 at 11:40
  • @SmallChess did you find anything? I'm getting pretty convinced this answer is incorrect. – Allure May 16 '19 at 02:07
  • @Allure I don’t have time. Can you please write s new answer and eclair your logic? – SmallChess May 16 '19 at 02:15
  • 1
    @Allure: that it always 1.e4 doesn't by itself mean anything about the algorithm being deterministic or not. It could have played many many games with many different moves internally but if 1.e4 consistently comes out with the best average scores over all those games, it'll play 1.e4. – RemcoGerlich May 27 '19 at 08:28
  • @RemcoGerlich but that's what all engines do, no? They search each move internally, and make the one that gives the best eval (or win percentage). It's he search part that can be random, and Alphazero's search algorithm appears not to be random (up until hyperthreading). – Allure May 27 '19 at 10:07
  • @Allure: I don't know enough about it to say, just noting that the bare fact it always plays 1.e4 isn't an argument for or against. – RemcoGerlich May 27 '19 at 10:38
1

It seems Alphazero is deterministic, up to a point. Looking at the details of its implementation, there's nothing inherently random in it. If one looks at the TCEC bookless bonus between Leela (an Alphazero clone) against Stockfish, this particular Leela net always played the same opening move 1.e4 when it was white. Against this Stockfish sometimes played 1...e6 and sometimes 1...c5; Leela always responded with the Steinitz Boleslavsky variation and Najdorf English attack respectively. In fact, almost every time it's Stockfish deviating instead of Leela:

Leela is more stubborn than Stockfish in its opening moves. All the major opening classes in the first few moves - Sicilian, French, Ruy Lopez, Italian, QGD - were determined by Stockfish either in black or in white. Stockfish expanded the opening tree almost every move in at least one opening line, again in both white and black. Leela, on the other hand, almost always played the same move when presented with the same position. In the first 24 plys of 100 games I found only 3 cases where Leela expanded the opening tree, on move 7 as black in QGD (game 29), on move 10 as black in Ruy Lopez (games 45 vs 97), and on move 10 as white in French (game 84).

Although Leela was a lot less random than Stockfish, it wasn't completely deterministic either, presumably because of multithreading & random fluctuations causing a different number of nodes to be evaluated. Take out these factors, and Leela (and hence Alphazero) is deterministic.

Allure
  • 26,534
  • 1
  • 68
  • 146
  • This is exactly what I think now. I've made a few of my implementations and I fail to understand what else could be causing, for example, it not winning, drawing, or losing 100% of time against a previous version of itself given a certain color played. I, however, doubt that the number of nodes can change so much as to cause a noticeable influence. I'd like to investigate this, I've read the paper multiple times, but I must have missed the explanation (if it is present). – Captain Trojan Aug 25 '21 at 08:43