The Neural Network Was Taught To "animate" Portraits Based On Just One Static Image - Alternative View

The Neural Network Was Taught To "animate" Portraits Based On Just One Static Image - Alternative View
The Neural Network Was Taught To "animate" Portraits Based On Just One Static Image - Alternative View

Video: The Neural Network Was Taught To "animate" Portraits Based On Just One Static Image - Alternative View

Video: The Neural Network Was Taught To
Video: Animating with A.I. | Create Crazy Morphing Animations 2024, May
Anonim

Russian specialists from the Samsung AI Center-Moscow, in collaboration with engineers from the Skolkovo Institute of Science and Technology, have developed a system capable of creating realistic animated images of human faces based on just a few static human frames. Usually, in this case, the use of large databases of images is required, however, in the example presented by the developers, the system was trained to create an animated image of a human face from only eight static frames, and in some cases one was enough. For more details on development, see an article published on the ArXiv.org online repository.

Image
Image

As a rule, it is rather difficult to reproduce a photorealistic personalized module of a human face due to the high photometric, geometric and kinematic complexity of reproducing the human head. This is explained not only by the complexity of modeling the face as a whole (for this there are a large number of approaches to modeling), but also by the complexity of modeling certain features: the oral cavity, hair, and so on. The second complicating factor is our tendency to catch even minor flaws in the finished model of human heads. This low tolerance for modeling errors explains the current prevalence of non-photorealistic avatars used in teleconferencing.

According to the authors, the system, dubbed Fewshot learning, is capable of creating highly realistic models of talking heads of people and even portrait paintings. The algorithms synthesize the image of the head of the same person with the lines of the face reference taken from another fragment of the video, or using the reference points of the face of another person. As a source of material for training the system, the developers used an extensive database of celebrity video images. To get the most accurate talking head possible, the system needs to use more than 32 images.

To create more realistic animated facial images, the developers used previous developments in generative adversarial modeling (GAN, where a neural network thinks out the details of an image, in fact, becoming an artist), as well as a machine meta-learning approach, where each element of the system is trained and designed to solve some specific task.

Meta-learning schema
Meta-learning schema

Meta-learning schema.

Image
Image
Image
Image

Promotional video:

Three neural networks were used to process static images of people's heads and turn them into animated ones: Embedder (implementation network), Generator (generation network) and Discriminator (discriminator network). The first partitions the head images (with approximate facial landmarks) into embedding vectors, which contain information independent of the pose, the second network uses the facial landmarks obtained by the embedding network and generates new data based on them through a set of convolutional layers that provide resistance to changes in scale, displacements, turns, change of angle and other distortions of the original face image. A network discriminator is used to assess the quality and authenticity of the other two networks. As a result, the system transforms landmarks of a person's face into realistic-looking personalized photos.

Image
Image
Image
Image

The developers emphasize that their system is able to initialize the parameters of both the generator network and the discriminator network individually for each person in the picture, so the learning process can be based on just a few images, which increases its speed, despite the need to select tens of millions of parameters.

Nikolay Khizhnyak

Recommended: