The Neural Network Has Learned To Generate Videos Based On The Description Of - Alternative View

The Neural Network Has Learned To Generate Videos Based On The Description Of - Alternative View
The Neural Network Has Learned To Generate Videos Based On The Description Of - Alternative View

Video: The Neural Network Has Learned To Generate Videos Based On The Description Of - Alternative View

Video: The Neural Network Has Learned To Generate Videos Based On The Description Of - Alternative View
Video: From The Brain To AI (What Are Neural Networks) 2024, May
Anonim

Artificial intelligence creates scripted videos - so far short and blurry, but one day it alone will replace an entire film studio.

Neural networks are already quite good (and in many cases better than people) at recognizing patterns in a picture and are able to describe in general terms entire scenes. Generative neural networks perform the reverse transformation and can form an image based on its description, or predict the next frame based on the previous ones.

The Belgian developers have gone even further, combining these capabilities into a single system that creates videos "out of nothing", based on their own experience of machine learning and script text. Tinne Tuytelaars spoke about this at a meeting of the Association for the Advancement of Artificial Intelligence (AAAI) held in the United States.

The neural network works in two stages - according to Tinne, as if imitating the creative process of a person: at the first stage, a blurry, approximate "sketch" of each frame is formed, after which details are specified and added. One of the important parts of such a system is a discriminatory neural network, which compares the result with "real" videos that fit a given scenario, and allows you to assess its quality, improving the work of the generative part of the system.

The neural network was trained on 10 scenes ("playing golf on the grass", "kitesurfing in the sea", etc.) and learned to separate actions and circumstances from one another, and could also combine them in any given way, creating videos, for example "Golf in the pool":

Image
Image

or "sailing in the snow":

Image
Image

Promotional video:

Of course, the quality of such animations is still far from acceptable: "videos" lasting about a second consist of only 32 frames with dimensions of 64x64 pixels.

But with the same confidence, we can guarantee that these numbers will quickly improve, because not so long ago, the cinema itself could boast only a muddy, twitching and dumb picture. If such a neural network can be made really fast and efficient, Hollywood may come to an end: it will be enough to take the script and the film is ready. This opportunity will be useful in generating large sets for training other neural networks, and in creating new algorithms for compressing and transmitting streaming video.

Sergey Vasiliev