Alphazero’s real secret sauce is MCTS

6 min readDec 21, 2017

Alphazero defeated Stockfish in a series of remarkable games marking, according to the common interpretation, a turning point where computer Chess will inevitably switch to deep learning based board evaluation.

The deep learning centered interpretation has some truth to it, but I think it’s less than half the story. In the available games Alphazero does a notably good job of exploiting the opponent’s piece mobility and king safety evaluation problems, and even given the dubious fairness of the match it’s a remarkable achievement, but Alphazero changes two things: It uses a deep learning-based board evaluation function, and it uses Monte Carlo tree search (MCTS) instead of alpha-beta pruning to apply that function. The available information fits a narrative of MCTS being superior to alpha-beta just as strongly, if not more so, than one where the deep learning board evaluation has achieved dominance.

To understand why this might be you first need to understand what it’s like to work on one of these engines. Improving a traditional chess engine is a slow, painstaking process where you come up with ideas for possible improvements, tests them extensively, and usually find that it makes things worse, but occasionally find that it improves play ever so slightly, maybe by 1 or 2 elo if you’re really lucky. Over time these improvements accumulate bit by bit until the engine is able to annihilate an older version of itself. (Viewing elo as a linear commodity seems to hold up very well, which suggests some highly controversial things about the possible genetics of IQ, but that isn’t what this essay is about.)

An unfortunate effect of this iterative approach is that engine strength always feels stuck in a local maxima. One of the Komodo authors recently lamented that his engine seems to do everything better than Stockfish but it uses significantly less pruning than Stockfish and thus has to try more possibilities per move and as a result can’t search nearly as deeply. It does this because that’s the configuration in which its playing strength is demonstrably the strongest. For reasons no one can fathom — not the Komodo authors, not the Stockfish authors, and most definitely not me — replacing Komodo’s pruning with Stockfish’s makes its play worse. Somehow assumptions about how the pruning works are baked deeply into the fabric of Komodo’s board evaluation function, embedded far to subtly for anyone to understand.

There are ways of trying to fix this. It would probably be a good idea to take a chess engine, throw all of its magic numbers out the window and regenerate them using genetic algorithms guided by the results of self-play, an approach which is very similar in flavor to Alphazero’s. I’m a huge hypocrite here because I haven’t tried this approach even for my own projects where it’s totally applicable. I haven’t even used the much simpler methods of optimizing magic numbers which are de rigueur in chess engine development these days. But the genetic approach would likely result in an overall stronger engine with much less painstaking work for each tweak and be able to justify much smaller changes.

But even those currently-viewed-as-extreme measures are confined to the framework of the existing engine and only able to make small modifications. To make any truly deep changes would require putting in literal years of developer time re-optimizing the entire system to see if it winds up being stronger than the old approach, and since most such experiments will turn out to be failures they’re hardly ever done. At the top of the list of things which are nearly impossible to change is alpha-beta pruning. It’s long seemed plausible that MCTS could outperform alpha-beta pruning under some circumstances: It’s routine in engine vs. engine games for them to reach a position which they view as favorable for one side where a playout will result in a draw nearly 100% of the time, or a situation where one side has a slight advantage but it’s accelerating rather than remaining the same and full playouts would quickly show that it’s highly favorable. The problem is that all attempts to do MCTS for chess in the past have been clear failures, and with a million subtle variations on how, exactly, MCTS can be implemented. No one has even attempted to to the sorts of years-long experiments on trying to improve it because getting the secret sauce exactly right would likely take a few millennia of developer time.

This is where Alphazero’s deep learning comes in. Traditional chess engine board evaluation has been developed as previously described: slowly and painstakingly. The benefit has been that they’re ludicrously, insanely fast. This is because the fast, dumb approach has clearly yielded the best results in all previous experiences. The actual details of how the board evaluation is done are, to use a technical term, hacky bullshit. A few years ago I commented to a chess engine developer that the methods of compensating for which side makes the last move in board evaluation seem a bit like slaughtering a chicken under a full moon, and he responded ‘Chess engines slaughter millions of chickens per second’. Looking deeper ahead is clearly always better, and testing has always borne out that depth is much more important than evaluation quality.

When Alphago was first being worked on its implementation of MCTS was not very good. But Go is a game where MCTS is so clearly superior to alpha-beta pruning that even a poor implementation gave good results and provided good guidance as to what approaches were better than others. Alphazero’s lead programmer says that it’s now using much more principled algorithms. The approach to MCTS which Alphazero uses now is a good (or at least much better) one, intuitively justifiable with coherent arguments and empirically quite successful. (He stops short of referring to the MCTS used in the first Alphago as ‘hacky bullshit’ but that would probably be technically correct.) That said, there are probably a million other variations on MCTS which have just as plausible-sounding arguments for why they’re the right approach as what Alphazero uses. The Deepmind team didn’t have some vastly greater intuition for approaches to MCTS which they used to get it right, they had the not-so-secret weapon of deep learning board evaluation guided by self-play. With that tool, experimenting with variations on MCTS isn’t a matter of years of developer time, it’s just a matter of letting the computer run for a few hours. This is a truly remarkable achievement, but its real power may not be in the quality of the board evaluation function it generates, (the details of which go against everything we previously thought we knew about chess board evaluation,) but in the way it allows for fundamentally different kinds of experimentation to be done and have now resulted in the simple, actionable guidance that chess engines should be based on a very particular flavor of MCTS instead of alpha-beta pruning.

We won’t actually know whether Alphazero’s deep learning based board evaluation function is a red herring in its success at chess until someone does the experiment of building a chess engine using Alphazero’s MCTS algorithm and a relatively traditional fast but dumb board evaluation optimized to maximize performance under those circumstances. Now that an actually good MCTS is known the experiment is worth doing, and while it has a high likelihood of being an unsatisfying failure it also has a real chance of being a dramatic success, so hopefully somebody has a real go at it (although it might be a good idea to wait a bit until improvements on MCTS itself settle down a bit).

Unfortunately this rather prosaic interpretation of the Alphazero chess results, while entirely plausible, don’t make for nearly as good of a story as declaring it a huge victory for AI. The history of AI is filled with great achievements which stopped being called AI as soon as they made any real progress. Case in point being the news articles talking about how this match was an AI versus a chess engine, as if chess engines aren’t artificial intelligence. Deep learning is likely the first technique which will continue to bear the AI moniker despite having proven itself useful. Better MCTS most definitely will not.

Alphazero’s real secret sauce is MCTS

Written by Bram Cohen