AlphaGo Zero Beats Its Forerunner in 100 Bouts of Go in a Row After Learning to Play From Scratch
Liao Shumin
/SOURCE : Yicai
AlphaGo Zero Beats Its Forerunner in 100 Bouts of Go in a Row After Learning to Play From Scratch

(Yicai Global) Oct. 19 -- AlphaGo Zero, the latest evolution of an artificial intelligence program developed by Google's DeepMind, learned the ancient Chinese board game Go from scratch without any human input and beat its forerunner -- AlphaGo Lee -- 100 to 0, its developer said in a paper published in the journal Nature. 

After playing several million games against itself, the new AI program discovered intricacies of Go that took humans thousands of years to understand, the article said. Zero came up with original strategies, producing insights into the ancient game.

AlphaGo Lee has 48 Tensor Processing Unit (TPU) and beat South Korea's nine-dan professional Go player Lee Sedol in four out of five games in March last year after studying established Go move sequences (josekis) and playing against itself about 30 million times over several months.

AlphaGo Zero has four TPUs and learned to play without facing humans. It took the new version three days and some 4.9 million self-training games to best AlphaGo Lee in 100 bouts a row.

The program's development has taken reinforcement learning algorithms to a new level.

Evolution of reinforcement learning has gone through three stages -- early algorithms in the early 1990s, 'Q learning' and in-depth reinforcement learning that started a decade ago. As shown in the development of Zero, combining reinforcement learning with a look ahead mechanism (similar to reconnaissance in military operations) from tree traversal theory has created a more efficient in-depth reinforcement learning model.

As a result, Zero did not rely on existing knowledge like its predecessor did, and can invent better Go strategies through self-training, said Xu Lei, a chair professor at Shanghai Jiao Tong University and head of the Centre for Cognitive Machines and Computational Health (CMaCH).

Compared with its processors, AlphaGo Zero's algorithms are simpler and smarter. Instead of using artificial big data, it discovered knowledge by applying rules for learning set by its human developers, and it 'knows' how to rectify mistakes made by humans. It acquired such abilities with amazing efficiency. Interestingly, the AI cannot explain how it achieved this and can only provide demonstrations, said Zhang Zheng, a computer science professor at New York University Shanghai.

AlphaGo Zero's algorithms and programs are like a black box that can improve itself as the number of self-training sections increases, and it 'inherits' optimized algorithms by copying certain codes, but people cannot look inside the algorithms, said Wei Hui, a professor at Fudan University's School of Computer Science and Technology.

It is unclear if Zero and other AI programs and computers have explored all the possible moves of the board game, but AI is definitely faster than humans and will bring new discoveries -- or rather new josekis, Zhang said.

Follow Yicai Global on
Keywords: AI , ALPHAGO , Algorithms , GO , Reinforcement Learning