Artificial Intelligence And Jeffrey Hinton: The Father Of Deep Learning - Alternative View

Table of contents:

Artificial Intelligence And Jeffrey Hinton: The Father Of Deep Learning - Alternative View
Artificial Intelligence And Jeffrey Hinton: The Father Of Deep Learning - Alternative View

Video: Artificial Intelligence And Jeffrey Hinton: The Father Of Deep Learning - Alternative View

Video: Artificial Intelligence And Jeffrey Hinton: The Father Of Deep Learning - Alternative View
Video: Geoffrey Hinton: The Godfather of Deep Learning 2024, May
Anonim

Artificial Intelligence. How much has been said about him, but we haven't even really begun to talk yet. Almost everything you hear about the progress of artificial intelligence is based on a breakthrough that is thirty years old. Maintaining the momentum of progress will require circumventing serious constraints and major constraints. Next, in the first person - James Somers.

I am standing where the center of the world will soon be, or simply in a large room on the seventh floor of a shiny tower in downtown Toronto - which side you look at. I'm accompanied by Jordan Jacobs, co-founder of this place: The Vector Institute, which opens its doors this fall and promises to be the global epicenter of artificial intelligence.

We are in Toronto because Jeffrey Hinton is in Toronto. And Jeffrey Hinton is the father of "deep learning," the technique behind the AI hype. “In 30 years, we will look back and say that Jeff is the Einstein for AI, deep learning, whatever we call artificial intelligence,” Jacobs says. Of all the researchers of AI, Hinton is cited more often than the three following him combined. His undergraduate and graduate students go to work in the AI lab at Apple, Facebook, and OpenAI; Hinton himself is the lead scientist on the Google Brain AI team. Almost every advancement in AI over the past decade - in translation, speech recognition, image recognition, and gaming - has something to do with Hinton's work.

The Vector Institute, a monument to the rise of Hinton's ideas, is a research center where companies from across the US and Canada - like Google, Uber, and NVIDIA - are sponsoring efforts to commercialize AI technologies. Money is pouring in faster than Jacobs can ask for it; two of its co-founders surveyed companies in the Toronto area, and the demand for AI experts was 10 times higher than Canada supplies each year. The Vector Institute is, in a sense, an untapped virgin land to try to mobilize the world around deep learning: to invest in, teach, hone and apply this technique. Data centers are being built, skyscrapers are filled with startups, and generations of students are pouring into the region.

When you stand on the floor of the Vector, you get the feeling that you are at the beginning of something. But deep learning is, at its core, very old. Hinton's breakthrough article, written with David Rumelhart and Ronald Williams, was published in 1986. The work described in detail the method of backpropagation of the error (backpropagation), in short. Backprop, according to John Cohen, is "everything deep learning is based on - everything."

At its root, AI today is deep learning, and deep learning is backprop. Which is astounding considering the backprop is over 30 years old. It is simply necessary to understand how this happened: how could the technology wait so long and then cause an explosion? Because once you know the history of backprop, you will understand what is happening with AI now, and also that we may not be at the beginning of the revolution. Perhaps we are at the end of one.

The walk from the Vector Institute to Hinton's Google office where he spends most of his time (he is now professor emeritus at the University of Toronto) is sort of a live advertisement for the city, at least in the summer. It becomes clear why Hinton, who is originally from the UK, moved here in the 1980s after working at Carnegie Mellon University in Pittsburgh.

Promotional video:

Maybe we are not at the very beginning of the revolution

Toronto is the fourth largest city in North America (after Mexico City, New York and Los Angeles) and is certainly more diverse: more than half of the population was born outside of Canada. And you can see it when you walk around the city. The crowd is multinational. There is free healthcare and good schools, people are friendly, politicians are relatively leftist and stable; all this attracts people like Hinton, who says he left the United States because of the Irangate (Iran-Contra is a major political scandal in the United States in the second half of the 1980s; then it became known that certain members of the US administration organized secret supply of weapons to Iran, thereby violating the arms embargo against that country). This is where our conversation begins before lunch.

“Many thought the US might well invade Nicaragua,” he says. "For some reason they believed that Nicaragua belongs to the United States." He says he recently made a big breakthrough in the project: “A very good junior engineer started working with me,” a woman named Sarah Sabour. Sabur is Iranian and has been denied a visa to work in the United States. Google's Toronto office pulled it out.

Hinton is 69 years old. He has a sharp, thin English face with a thin mouth, large ears and a proud nose. He was born in Wimbledon and in conversation reminds the narrator of a children's book about science: curious, enticing, trying to explain everything. He's funny and plays a little to the audience. It hurts him to sit because of back problems, so he cannot fly, and at the dentist's office he lies down on a device that resembles a surfboard.

Image
Image

In the 1980s, Hinton was, as he is now, an expert on neural networks, a greatly simplified model of the network of neurons and synapses in our brains. However, at the time, it was firmly agreed that neural networks were a dead end in AI research. Although the very first neural network, Perceptron, was developed in the 1960s and was considered the first step towards human-level machine intelligence, in 1969 Marvin Minsky and Seymour Papert mathematically proved that such networks can only perform the simplest functions. These networks had only two layers of neurons: an input layer and an output layer. Networks with a large number of layers between input and output neurons could, in theory, solve a wide variety of problems, but nobody knew how to train them, so in practice they were useless. Because of the Perceptrons, almost everyone has abandoned the idea of neural networks with a few exceptions.including Hinton.

Hinton's breakthrough in 1986 was to show that backpropagation can train a deep neural network with more than two or three layers. But it took another 26 years before the computing power increased. In a 2012 paper, Hinton and two Toronto students showed that deep neural networks, trained with backprop, outperformed the very best image recognition systems. Deep Learning has started to gain traction. The world decided overnight that AI would take over in the morning. For Hinton, this was a welcome victory.

Reality distortion field

A neural network is usually depicted as a sandwich, layers of which are superimposed on each other. These layers contain artificial neurons, which are essentially small computational units that fire - like a real neuron - and transmit this excitement to the other neurons to which they are connected. The excitation of a neuron is represented by a number, say 0.13 or 32.39, which determines the degree of excitation of the neuron. And there is another important number, on each of the connections between the two neurons, that determines how much excitation should be transferred from one to the other. This number models the strength of synapses between neurons in the brain. The higher the number, the stronger the connection, which means more excitement flows from one to the other.

One of the most successful applications of deep neural networks has been in image recognition. Today there are programs that can recognize if there is a hot dog in the picture. Some ten years ago they were impossible. To make them work, you first need to take a picture. For simplicity, let's say this is a 100 x 100 pixel black and white image. You feed it to the neural network by setting the firing of each simulated neuron in the input layer so that it will be equal to the brightness of each pixel. This is the bottom layer of the sandwich: 10,000 neurons (100 x 100) representing the brightness of each pixel in the image.

Then you connect this large layer of neurons to another large layer of neurons, already higher, say, several thousand, and they, in turn, to another layer of several thousand neurons, but less, and so on. Finally, the top layer of the sandwich - the output layer - will consist of two neurons - one representing the hot dog and the other not the hot dog. The idea is to train the neural network to fire only the first of these neurons if there is a hot dog in the picture, and the second if not. Backprop, the backpropagation technique that Hinton has built his career on, does just that.

Image
Image

Backprop is extremely simple, although it works best with huge amounts of data. This is why big data is so important to AI - why Facebook and Google are so passionate about it, and why the Vector Institute decided to connect with the four largest hospitals in Canada and share data.

In this case, the data takes the form of millions of images, some with hot dogs, some without; the trick is to mark these images as having hot dogs. When you first create a neural network, the connections between neurons have random weights - random numbers that say how much excitation is transmitted through each connection. As if the synapses of the brain are not yet tuned. The purpose of the backprop is to change these weights so that the network works: so that when you feed the hot dog image to the bottommost layer, the hot dog neuron in the topmost layer is fired.

Let's say you take the first piano tutorial picture. You are converting the pixel intensities of a 100 x 100 image to 10,000 numbers, one for each neuron in the bottom layer of the network. As the excitement spreads through the network in accordance with the strength of the connection of neurons in the adjacent layers, everything gradually reaches the last layer, one of the two neurons that determine if there is a hot dog in the picture. Since this is a picture of a piano, the hot dog neuron should show zero, and the non-hot dog neuron should show a higher number. Let's say things don't work like that. Let's say the network was wrong about the image. Backprop is a procedure for strengthening the strength of each connection in the network, allowing you to correct the error in the given training example.

How it works? You start with the last two neurons and figure out how wrong they are: what is the difference between their firing numbers and what it really should be. Then you look at each connection that leads to these neurons - going down the layers - and determine their contribution to the error. You keep doing this until you get to the first set of connections at the very bottom of the network. At this point, you know how the individual connection contributes to the overall error. Finally, you change all the weights to reduce the overall chance of error. This so-called "error propagation technique" is that you kind of run errors back through the network, starting at the back, out.

The incredible starts to happen when you do it with millions or billions of images: the network starts to well determine whether a picture is a hot dog or not. And what is even more remarkable is that the individual layers of these image recognition networks begin to "see" images in the same way that our own visual system does. That is, the first layer detects contours - neurons are fired when there are contours and are not fired when they are not; the next layer defines sets of paths, such as corners; the next layer begins to distinguish forms; the next layer finds all sorts of elements like "open bun" or "closed bun" because the corresponding neurons are activated. The network organizes itself into hierarchical layers without even being programmed in this way.

True intelligence is not confused when the problem changes slightly.

This is what amazed everyone so much. It's not so much that neural networks are good at classifying hot dog images: they build representations of ideas. With text, this becomes even more obvious. You can feed the text of Wikipedia, many billions of words, to a simple neural network, teaching it to endow each word with numbers corresponding to the excitations of each neuron in the layer. If you think of all these numbers as coordinates in a complex space, you find a point, known in this context as a vector, for every word in that space. Then you train the network so that words that appear side by side on Wikipedia pages will be endowed with similar coordinates - and voila, something strange happens: words with similar meanings will appear side by side in this space. "Mad" and "upset" will be there; "Three" and "seven" too. Furthermore,vector arithmetic allows you to subtract the vector "France" from "Paris", add it to "Italy" and find "Rome" nearby. Nobody told the neural network that Rome is for Italy the same as Paris is for France.

“It's amazing,” says Hinton. "It's shocking." Neural networks can be seen as an attempt to take things - images, words, recordings of conversations, medical data - and place them in, as mathematicians say, a multidimensional vector space in which the proximity or remoteness of things will reflect the most important aspects of the real world. Hinton believes that this is what the brain does. “If you want to know what a thought is,” he says, “I can convey it to you in a series of words. I can say, "John thought 'oops.' But if you ask: what is thought? What does it mean for John to have this thought? After all, in his head there are no opening quotes, "oops", closing quotes, in general, there is no such thing. Some neural activity is going on in his head. " Big pictures of neural activity, if you are a mathematician, can be captured in vector space,where the activity of each neuron will correspond to a number, and each number will correspond to the coordinate of a very large vector. For Hinton, thought is a dance of vectors.

Now it is clear why the Vector Institute was called that?

Hinton creates a kind of reality distortion field, a feeling of confidence and enthusiasm is transmitted to you, instilling the belief that nothing is impossible for vectors. After all, they've already created self-driving cars, cancer-detecting computers, instant spoken language translators.

It’s only when you leave the room that you remember that these deep learning systems are still pretty dumb despite their demonstrative power of thought. A computer that sees a pile of donuts on a table and automatically labels it as “a pile of donuts on the table” seems to understand the world; but when the same program sees a girl brushing her teeth and says she is a “boy with a baseball bat,” you realize how elusive, if any, this understanding is.

Neural networks are just mindless and vague pattern recognizers, and how useful such pattern recognizers can be - because they seek to integrate them into any software - they are at best a limited breed of intelligence that can be easily tricked. A deep neural network that recognizes images can be completely confused if you change one pixel or add visual noise that is invisible to humans. Almost as often as we find new ways to use deep learning, we are often faced with its limitations. Self-driving cars cannot drive in conditions that have not been seen before. Machines cannot parse sentences that require common sense and an understanding of how the world works.

Image
Image

Deep Learning mimics what is happening in the human brain in a way, but superficially - which perhaps explains why his intelligence is so superficial at times. Backprop was not discovered during brain immersion, trying to decipher the thought itself; it grew out of models of animal learning by trial and error in old-fashioned experiments. And most of the important steps that have been taken since its inception did not include anything new about neuroscience; these were technical improvements deserved by years of work by mathematicians and engineers. What we know about intelligence is nothing compared to what we do not yet know about it.

David Duvenaud, an assistant professor in the same department as Hinton at the University of Toronto, says deep learning is similar to engineering before the introduction of physics. “Someone writes a work and says: 'I made this bridge, and it's worth it!' Another writes, "I made this bridge and it collapsed, but I added supports and it stands." And everyone goes crazy for the supports. Someone adds an arch - and everyone is like that: arches are cool! With physics, you can actually figure out what will work and why. We have only recently begun to move towards at least some understanding of artificial intelligence."

And Hinton himself says: “Most conferences talk about making small changes instead of thinking hard and asking questions:“Why is what we are doing now not working out? What is the reason for this? Let's focus on this."

It's hard to get an outside perspective when all you see is advancement after advancement. But the latest advances in AI have been less scientific and more engineering. While we have a better understanding of what changes will improve deep learning systems, we still have a vague idea of how these systems work and if they can ever come together into something as powerful as the human mind.

It is important to understand if we were able to extract everything we can from the backprop. If so, then we will have a plateau in the development of artificial intelligence.

Patience

If you want to see the next breakthrough, something like a framework for machines with much more flexible intelligence, you should, in theory, turn to research similar to backprop research in the 80s: when smart people gave up because their ideas didn't work yet. …

A few months ago, I visited the Center for Minds, Brains and Machines, a multipurpose institution stationed at MIT, to watch my friend Eyal Dechter defend his dissertation in cognitive science. Before the start of the performance, his wife Amy, his dog Ruby and his daughter Suzanne supported him and wished him luck.

Eyal began his speech with a fascinating question: how did it happen that Suzanne, who is only two years old, learned to speak, play, follow stories? What is in the human brain that allows him to study so well? Will a computer ever learn to learn so quickly and smoothly?

We understand new phenomena in terms of things that we already understand. We split the domain into chunks and examine it piece by piece. Eyal is a mathematician and programmer, he thinks of tasks - like making a souffle - as complex computer programs. But you don't learn how to make a soufflé by memorizing hundreds of minute program instructions like "turn your elbow 30 degrees, then look at the tabletop, then extend your finger, then …". If you had to do this in every new case, the learning would become unbearable and you would stop developing. Instead, we see high-level steps like “beat the whites” in the program, which themselves are made up of subroutines like “break the eggs” and “separate the whites from the yolks”.

Computers don't do this and therefore seem stupid. For deep learning to recognize a hot dog, you have to feed it 40 million hot dog images. What Suzanne recognized the hot dog, just show her the hot dog. And long before that, she will have an understanding of the language, which goes much deeper than the recognition of the appearance of separate words together. Unlike a computer, her head has an idea of how the world works. “It surprises me that people are afraid that computers will take their jobs,” says Eyal. “Computers will not be able to replace lawyers because lawyers are doing something difficult. But because lawyers listen and talk to people. In this sense, we are very far from all this."

True intelligence will not be confused if you slightly change the requirements for solving the problem. And Eyal's key thesis was to demonstrate exactly this, in principle, how to make a computer work in this way: vividly apply everything it already knows to solve new problems, quickly grasp on the fly, become an expert in a completely new field.

Essentially, this is what he calls the exploration-compression algorithm. It gives the computer the function of a programmer, building a library of reusable modular components so that more complex programs can be created. Knowing nothing about the new domain, the computer tries to structure knowledge about it, simply studying it, consolidating what it has discovered and further studying it, like a child.

His advisor, Joshua Tenenbaum, is one of the most cited AI researchers. Tenenbaum's name came up in half of the conversations I had with other scientists. Some of the key people at DeepMind - the development team of AlphaGo that legendary beat the World Go Champion in 2016 - have worked under him. He is involved in a startup that is trying to give self-driving cars an intuitive understanding of the fundamentals of physics and the intentions of other drivers, so that they can better anticipate what is happening in situations that have not been encountered before.

Eyal's thesis has not yet been applied in practice, it has not even been introduced into the programs. “The problems Eyal is working on are very, very difficult,” says Tenenbaum. "It takes many generations to pass."

When we sat down for a cup of coffee, Tenenbaum said he was researching the history of backprop for inspiration. For decades, backprop has been a form of cool math, most of it not capable of anything. As computers got faster and technology got harder, things changed. He hopes that something similar will happen to his own work and the work of his students, but "it may take another couple of decades."

For Hinton, he is convinced that overcoming the limitations of AI is about creating a "bridge between computer science and biology." Backprop, from this point of view, was a triumph of biologically inspired computing; the idea originally came not from engineering, but from psychology. So now Hinton is trying to repeat this trick.

Today, neural networks are made up of large flat layers, but in the human neocortex, real neurons line up not only horizontally, but also vertically, in columns. Hinton guesses what these columns are for - in vision, for example, they allow you to recognize objects even when you change your point of view. So he creates an artificial version - and calls them "capsules" - to test this theory. So far, nothing comes out: the capsules have not greatly improved the performance of his networks. But 30 years ago it was the same with backprop.

“It should work,” he says of the capsule theory, laughing at his own bravado. "And what doesn't work yet is just a temporary irritation."

Based on materials from Medium.com

Ilya Khel

Recommended: