Scientists Have Created A Self-learning AI Capable Of Playing All Games - Alternative View

Table of contents:

Scientists Have Created A Self-learning AI Capable Of Playing All Games - Alternative View
Scientists Have Created A Self-learning AI Capable Of Playing All Games - Alternative View

Video: Scientists Have Created A Self-learning AI Capable Of Playing All Games - Alternative View

Video: Scientists Have Created A Self-learning AI Capable Of Playing All Games - Alternative View
Video: OpenAI Plays Hide and Seek…and Breaks The Game! 🤖 2024, April
Anonim

The developers of the revolutionary self-learning artificial intelligence system AlphaGo Zero have announced the creation of a new version of this machine, which can independently learn to play any board game and beat a person. Its description was presented in the journal Science.

Depths of Mind

The AlphaGo AI system was developed by David Silver and colleagues in late 2014, and its work was "tested" on European champion Fan Hui, who lost all five matches to the machine. In March 2016, AlphaGo defeated Go World Champion Lee Sedol in a series of five matches, only one of which ended in a human victory.

Silver and his colleagues were able to achieve these successes by building their AI on the basis of not one, but two neural networks at once - special algorithms that mimic the work of chains of neurons in the human brain. One of them is responsible for evaluating the current position on the board, and the second uses the analysis results prepared by the first network in order to choose the next step.

The next logical step in the development of AlphaGo was the elimination of the main drawback of all existing neural networks and artificial intelligence systems - the need to teach them what they should do using huge data archives manually processed by a person, or with the direct participation of a person, as happened in the first stages development of AlphaGo.

Silver and his team solved this problem by creating a fundamentally new neural network based on the so-called reinforcement learning algorithms. This neural network, unlike its stellar predecessor, which was originally trained in games with volunteers and had some built-in primitive game strategies, started its work as an absolute beginner with zero knowledge base.

In other words, she only knew the rules of the game of Go, the initial conditions and the victory conditions, and then the computer independently learned to play this ancient Chinese strategy, playing with itself and acting by trial and error. The only limitation in her work was the maximum time to think about the move - it was about 0.4 seconds.

Promotional video:

After each such game, the AI system analyzed all its moves and remembered those that brought one of its "halves" closer to victory, and entered into a kind of "black list" those steps that were frankly losing. Using this data, the neural network rebuilt itself, gradually reaching the level that the first version of AlphaGo reached before the series of games with Lee Sedol.

The move to self-learning algorithms not only allowed AlphaGo Zero to surpass its predecessor and beat it 100-0, but also improved many other aspects of its work. In particular, the process of its training took only three days and about five million games, which was an order of magnitude less than the requests of the first version of AI.

The path to excellence

The successful completion of experiments with AlphaGo Zero led Silver and his team to consider whether a similar neural network could be used to win the crown of the champion in other types of strategy and board games.

To do this, the scientists built another new element into AlphaGo Zero - heuristic algorithms for random search for solutions, as well as code that took into account the existence of a draw in some games. In addition, the new version of the alpha was continually improving its structure, rather than being updated in stages like its predecessor.

These relatively simple changes, as further experiments showed, significantly increased the speed of self-learning of this artificial intelligence system and turned it into a universal machine capable of playing all types of tabletop strategies.

Scientists have tested its work on three types of games - go, ordinary chess and their Japanese variety, shogi. In all three cases, Silver's new brainchild reached the level of a grandmaster in less than a million games, achieving almost human selectivity in choosing possible moves in just 9-12 hours of training for chess, and 13 days for go.

Earlier, she beat the most sophisticated computer programs that play these games - Stockfish's algorithm gave up on the fourth hour of AlphaZero training, while Elmo, the current champion in shogi, only lasted two hours. Finally, the first version of AlphaGo began to yield to its "grandson" by about 30 hours of his training.

The next "victims" of AlphaZero, as noted by scientists, may be "real" computer games, such as Starcraft II and Dota 2. Taking the championship in such esports disciplines, in their opinion, will open the way for self-learning AI to penetrate into less formalized areas of science and culture. and technology.