AlphaGo Zero: Learning from scratch

今天

知名会员
注册
2003-11-08
消息
1,288
荣誉分数
469
声望点数
193
https://deepmind.com/blog/alphago-zero-learning-scratch/
Artificial intelligence research has made rapid progress in a wide variety of domains from speech recognition and image classification to genomics and drug discovery. In many cases, these are specialist systems that leverage enormous amounts of human expertise and data.

However, for some problems this human knowledge may be too expensive, too unreliable or simply unavailable. As a result, a long-standing ambition of AI research is to bypass this step, creating algorithms that achieve superhuman performance in the most challenging domains with no human input. In our most recent paper, published in the journal Nature, we demonstrate a significant step towards this goal.

Starting from scratch
AlphaGoZero-Illustration-WideScreen.width-320.jpg

The paper introduces AlphaGo Zero, the latest evolution of AlphaGo, the first computer program to defeat a world champion at the ancient Chinese game of Go. Zero is even more powerful and is arguably the strongest Go player in history.

Previous versions of AlphaGo initially trained on thousands of human amateur and professional games to learn how to play Go. AlphaGo Zero skips this step and learns to play simply by playing games against itself, starting from completely random play. In doing so, it quickly surpassed human level of play and defeated the previously published champion-defeating version of AlphaGo by 100 games to 0.

AlphaGo%2520Zero%2520Training%2520Time.gif

It is able to do this by using a novel form of reinforcement learning, in which AlphaGo Zero becomes its own teacher. The system starts off with a neural network that knows nothing about the game of Go. It then plays games against itself, by combining this neural network with a powerful search algorithm. As it plays, the neural network is tuned and updated to predict moves, as well as the eventual winner of the games.

This updated neural network is then recombined with the search algorithm to create a new, stronger version of AlphaGo Zero, and the process begins again. In each iteration, the performance of the system improves by a small amount, and the quality of the self-play games increases, leading to more and more accurate neural networks and ever stronger versions of AlphaGo Zero.

This technique is more powerful than previous versions of AlphaGo because it is no longer constrained by the limits of human knowledge. Instead, it is able to learn tabula rasa from the strongest player in the world: AlphaGo itself.

It also differs from previous versions in other notable ways.

  • AlphaGo Zero only uses the black and white stones from the Go board as its input, whereas previous versions of AlphaGo included a small number of hand-engineered features.
  • It uses one neural network rather than two. Earlier versions of AlphaGo used a “policy network” to select the next move to play and a ”value network” to predict the winner of the game from each position. These are combined in AlphaGo Zero, allowing it to be trained and evaluated more efficiently.
  • AlphaGo Zero does not use “rollouts” - fast, random games used by other Go programs to predict which player will win from the current board position. Instead, it relies on its high quality neural networks to evaluate positions.
All of these differences help improve the performance of the system and make it more general. But it is the algorithmic change that makes the system much more powerful and efficient.

AlphaGo%2520Efficiency.width-400.png

AlphaGo has become progressively more efficient thanks to hardware gains and more recently algorithmic advances
After just three days of self-play training, AlphaGo Zero emphatically defeated the previously published version of AlphaGo - which had itself defeated 18-time world champion Lee Sedol - by 100 games to 0. After 40 days of self training, AlphaGo Zero became even stronger, outperforming the version of AlphaGo known as “Master”, which has defeated the world's best players and world number one Ke Jie.

Elo%2520Ratings.width-400.png

Elo ratings - a measure of the relative skill levels of players in competitive games such as Go - show how AlphaGo has become progressively stronger during its development
Over the course of millions of AlphaGo vs AlphaGo games, the system progressively learned the game of Go from scratch, accumulating thousands of years of human knowledge during a period of just a few days. AlphaGo Zero also discovered new knowledge, developing unconventional strategies and creative new moves that echoed and surpassed the novel techniques it played in the games against Lee Sedol and Ke Jie.

Knowledge%2520Timeline.gif

These moments of creativity give us confidence that AI will be a multiplier for human ingenuity, helping us with our mission to solve some of the most important challenges humanity is facing.

Discovering new knowledge
AlphaGoZero-Illustration-Square.width-320.jpg

While it is still early days, AlphaGo Zero constitutes a critical step towards this goal. If similar techniques can be applied to other structured problems, such as protein folding, reducing energy consumption or searching for revolutionary new materials, the resulting breakthroughs have the potential to positively impact society.

Read the paper

Download the paper

Read more about AlphaGo

This work was done by David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel and Demis Hassabis.
 
今天超越围棋。明天超越什么?下一个限制环节是什么?
 
不错,最近也在研究神经网络,感觉稍微有点上道了;楼主也有这个兴趣,回头可以沟通沟通
 
人工智能的三种基本算法:
1. 有限推理,树枝深度查找,找到打分算法,按最高得分找分支。这是从50年代开始60年所走的路,因为机器储存和速度限制。
2. 神经网路算法,需要假设,通过学习来逐步设置网络节点因子值。大的假设改了,要重新学习。研究30年了。
3. 云时代的无限穷举。找出每个树枝的真实输赢率,或进入残局,或进入必赢局,然后反向设置高层分数。有了无限记忆力之后,量变成质变。
4. 未来,机器会不断找到自己感兴趣的课题去推理,然后把数字映射到不同物理对象,取得对人类的优势控制。
 
未来?
未来是skynet/judgement day
ggnore.

颤抖吧人类
 
有趣的是, 以alphago为尺标,李世石和可洁是三天和21天的差距。
 
AlphaGo was developed using TensosFlow, which I used for a difficult search problem with amazing success!! Simply put, Deep Learning is able to capture the complex nonlinear mapping from training data with the complexity.However, the machine is not able to replace human. It does according to its design, with very fast speed. Its intelligence is "artificial", not real.
 
http://news.toutiaoabc.com/newspark/view.php?app=news&act=view&nid=273163

AlphaZero击败AlphaGo Zero 成了AI新棋王(图)
新闻来源: 苹果日报 于 2017-12-07 12:00:35 敬请注意:新闻取自网络,观点不代表本网立场!

  
d1c7378ca2.jpg



  位于英国伦敦的Google Deepmind团队再次超越自己,继推出以从零自学击败历代AlphaGo的AlphaGo Zero后,不到50天再度发表论文,提出通用棋类人工智慧程式 AlphaZero,从零自学西洋棋与日本将棋,击败AlphaGo Zero。DeepMind盼利用AlphaZero研究重大疾病疗法,盼治癒人类数百年来找不到疗法的疾病。

  由于可下各种棋类,因此Deepmind将AlphaGo中代表围棋的"Go"去掉;Zero则代表从零自学,成为AlphaZero。

  DeepMind发表于《arXiv》的论文中指出,AlphaZero从零开始训练,除了基本规则没有任何其他知识,4小时击败最强国际象棋人工智慧程式Stockfish、2小时击败最强将棋人工智慧程式Elmo,8小时击败曾大胜南韩棋王李世乭的第一代AlphaGo,34小时胜过了训练72小时的AlphaGo Zero。



  英国《每日电讯报》报导,挪威西洋棋大师哈默尔(John Ludvig Hammer)说,AlphaZero有"疯狂的攻击策略",且能深远判断大局,做出逼迫对手的阵形。

  DeepMind盼最终能利用AlphaZero运算法解决重大医疗问题。他们相信,人类数百年都研发不出的重大疾病疗法,这套程式可能在数日或数周内研究出来。

  目前DeepMind已开始利用AlphaZero研究蛋白质摺叠,可望很快就有新发现。蛋白质摺叠不当是许多重大疾病的成因,包括阿兹海默症、帕金森氏症、囊状纤维症等。
 
http://news.toutiaoabc.com/newspark/view.php?app=news&act=view&nid=273163

AlphaZero击败AlphaGo Zero 成了AI新棋王(图)
新闻来源: 苹果日报 于 2017-12-07 12:00:35 敬请注意:新闻取自网络,观点不代表本网立场!

  
d1c7378ca2.jpg



  位于英国伦敦的Google Deepmind团队再次超越自己,继推出以从零自学击败历代AlphaGo的AlphaGo Zero后,不到50天再度发表论文,提出通用棋类人工智慧程式 AlphaZero,从零自学西洋棋与日本将棋,击败AlphaGo Zero。DeepMind盼利用AlphaZero研究重大疾病疗法,盼治癒人类数百年来找不到疗法的疾病。

  由于可下各种棋类,因此Deepmind将AlphaGo中代表围棋的"Go"去掉;Zero则代表从零自学,成为AlphaZero。

  DeepMind发表于《arXiv》的论文中指出,AlphaZero从零开始训练,除了基本规则没有任何其他知识,4小时击败最强国际象棋人工智慧程式Stockfish、2小时击败最强将棋人工智慧程式Elmo,8小时击败曾大胜南韩棋王李世乭的第一代AlphaGo,34小时胜过了训练72小时的AlphaGo Zero。



  英国《每日电讯报》报导,挪威西洋棋大师哈默尔(John Ludvig Hammer)说,AlphaZero有"疯狂的攻击策略",且能深远判断大局,做出逼迫对手的阵形。

  DeepMind盼最终能利用AlphaZero运算法解决重大医疗问题。他们相信,人类数百年都研发不出的重大疾病疗法,这套程式可能在数日或数周内研究出来。

  目前DeepMind已开始利用AlphaZero研究蛋白质摺叠,可望很快就有新发现。蛋白质摺叠不当是许多重大疾病的成因,包括阿兹海默症、帕金森氏症、囊状纤维症等。
这让那些下棋的人怎么办呐
 
这让那些下棋的人怎么办呐

以前看The Matrix的时候就觉得是个科学幻想电影, 现在才这么有现实感。

老实说我一直很喜欢人类失去联想世界将会怎样的广告词
 
以前看The Matrix的时候就觉得是个科学幻想电影, 现在才这么有现实感。

老实说我一直很喜欢人类失去联想世界将会怎样的广告词
最近一直在学习ml,已经掌握了一些基本的概念和算法;老实说,目前的东西还是和真正的大脑,相差的实在是太远太远了。。。。。。AlphaZero在棋类之所以能取得这么大的成就,是因为棋类都是一个封闭的系统,有着相对简单的规则,所有的玩家都被限制在这个范围内;所以AI反而容易依靠强大的运算能力达到一个很高的高度。但是对于某些相对比较开放系统,比如说金融股票市场,AI则比较难取得很高的成就。
不过可以肯定的是,未来大批的重复性的依靠所谓普通一般经验的岗位肯定会被AI淘汰掉。
 
后退
顶部