Not so long ago, mastering the ancient Chinese game of Go was beyond the reach of artificial intelligence. But then AlphaGo, Google DeepMind’s AI player, started to leave even the best human opponents in the dust. Yet even this world-beating AI needed humans to learn from. Then, on Wednesday, DeepMind’s new version ditched people altogether.
AlphaGo Zero has surpassed its predecessor’s abilities, bypassing AI’s traditional method of learning games, which involves watching thousands of hours of human play. Instead, it simply starts playing at random, honing its skills by repeatedly playing against itself. Three days and 4.9 million such games later, the result is the world’s best Go-playing AI.
“It’s more powerful than previous approaches because we’ve removed the constraints of human knowledge,” says David Silver, the lead researcher for AlphaGo.
“Humankind has accumulated Go knowledge from millions of games played over thousands of years,” the authors write in their paper. “In the space of a few days… AlphaGo Zero was able to rediscover much of this Go knowledge, as well as novel strategies that provide new insights into the oldest of games.”
AlphaGo Zero’s alternative approach has allowed it to discover strategies humans have never found. For example, it learned many different josekis – sequences of moves that result in no net loss for either side. Plenty of josekis have been written down during the thousands of years Go has been played, and initially AlphaGo Zero learned many of the familiar ones. But as its self-training continued, it started to favour previously unknown sequences.
To test these new moves, DeepMind pitted AlphaGo Zero against the version that beat 18-time world champion Lee Sedol. In a 100-game grudge match, it won 100-0. This is despite only training for three days, compared to several months for its predecessor. After 40 days of training, it also won 89-11 against a better version of AlphaGo that had defeated world number one Ke Jie (Nature, DOI: 10.1038/nature24270).