AlphaGo Zero自学成才，轻易击败上一代AlphaGo

所属教程:双语阅读

浏览:

2017年11月28日

手机版

扫描二维码方便学习和分享

A self-taught computer has become the world’s best player of Go, the fiendishly complex board game, without any input from human experts.

一台无师自通的电脑，在没有任何人类专家输入的前提下，成为了极其复杂的棋盘游戏围棋的世界顶级高手。

DeepMind, Google’s artificial intelligence subsidiary in London, announced the milestone in AI less than two years after the highly publicised unveiling of AlphaGo, the first machine to beat human champions at the ancient Asian game. Details are published in the scientific journal Nature.

在高调推出AlphaGo不到两年后，谷歌(Google)旗下位于伦敦的人工智能公司DeepMind宣布了人工智能(AI)技术的又一里程碑，AlphaGo是在这项古老的亚洲游戏上击败人类冠军的第一台机器。科学期刊《自然》(Nature)发表了相关细节。

Previous versions of AlphaGo learned initially by analysing thousands of games between excellent human players to discover winning moves. The new development, called AlphaGo Zero, dispenses with this human expertise and starts just by knowing the rules and objective of the game.

前几代AlphaGo最初通过分析成千上万场优秀人类玩家间的对决来发现制胜招数。新开发的AlphaGo Zero则根本不需要人类专长，只要知道游戏规则和目标就可以投入游戏。

“It learns to play simply by playing games against itself, starting from completely random play,” said Demis Hassabis, DeepMind chief executive. “In doing so, it quickly surpassed human level of play and defeated the previously published version of AlphaGo by 100 games to zero.”

“它学游戏仅仅是通过跟自己玩，从完全的随机玩游戏开始，”DeepMind首席执行官杰米斯•哈萨比斯(Demis Hassabis)说。“在玩的过程中，它很快就超过了人类的水平，并以100比0的战绩击败了在论文中介绍过的上一代AlphaGo。”

His colleague David Silver, AlphaGo project leader, added: “By not using human data in any fashion, we can create knowledge by itself from a blank slate.” Within a few days, the computer had not only learned Go from scratch but surpassed thousands of years of accumulated human wisdom about the game.

他的同事、AlphaGo项目负责人戴维•西尔弗(David Silver)补充称：“我们不以任何方式使用人类数据，就可以让它从一块白板创造知识。”在几天时间里，AlphaGo不仅学会了下围棋，而且还胜过了人类历经数千年在该游戏上累积的智慧。

The team developed a new form of “reinforcement learning” to create AlphaGo Zero, combining search-based simulations of future moves with a neural network that decides which moves give the highest probability of winning. The network is constantly updated over millions of training games, producing a slightly superior system each time.

该团队开发了一种新的“强化学习”形式来创造AlphaGo Zero，将基于搜索的未来走法模拟与神经网络相结合，决定如何出招才能获得最高的获胜概率。该网络用数百万场培训游戏不断更新，每次更新都会带来稍稍增强的系统。

Although Go is in one sense extremely complex, with far more potential moves than there are atoms in the universe, in another sense it is simple because it is a “game of perfect information” — chance plays no part, as with cards or dice, and the state of play is defined entirely by the position of stones on the board.

尽管围棋在某种层面上非常复杂，具有比宇宙中的原子更多的潜在走法，但从另一个层面来说它也是简单的，因为它是一种“完美信息的游戏”——它不会像扑克牌或骰子一样与机会有关，而且棋局完全由棋子的位置决定。

The game involves surrounding more territory than your opponent. This aspect of Go makes it particularly susceptible to the computer simulations on which AlphaGo depends. DeepMind is now examining real-life problems that can be structured in a similar way, to apply the technology.

下围棋需要占据比对手更多的地盘。围棋的这个特征让它特别容易受到AlphaGo所依赖的计算机模拟的影响。DeepMind正在考虑将该技术应用于那些能以类似方式结构化的现实生活问题。

Mr Hassabis identified predicting the shape of protein molecules — an important issue in drug discovery — as a promising candidate. Other likely scientific applications include designing new materials and climate modelling.

哈萨比斯指出，它很有希望应用于预测蛋白质分子形状-——药物发现中的一个重要问题。其他可能的科学应用包括设计新材料和气候建模。