Do Neurons Dream Of Electric Sheep? The Creator Of The First Neural Networks Told About Their Evolution And The Future - Alternative View

2024 Author: Keith Bush | [email protected]. Last modified: 2023-12-16 14:01

Jeffrey Hinton is a co-creator of the concept of deep learning, a 2019 Turing Award winner and a Google engineer. Last week, during an I / O developer conference, Wired interviewed him and discussed his fascination with the brain and his ability to model a computer based on the neural structure of the brain. For a long time, these ideas were considered wacky. An interesting and entertaining conversation about consciousness, Hinton's future plans and whether computers can be taught to dream.

What will happen to neural networks?

Let's start with the days when you wrote your very first, highly influential articles. Everyone said, "It's a smart idea, but we really can't design computers this way." Explain why you insisted on your own and why you were so sure you found something important.

It seemed to me that the brain could not work in any other way. He must work by studying the strength of the connections. And if you want to make a device do something smart, you have two options: you either program it or it learns. And nobody programmed people, so we had to study. This method had to be correct.

Explain what neural networks are. Explain the original concept

You take relatively simple processing elements that very vaguely resemble neurons. They have incoming connections, each connection has a weight, and this weight can change during training. What the neuron does is take the actions on the connections multiplied by the weights, sum them up, and then decide whether to send the data. If the sum is typed large enough, it makes an output. If the amount is negative, it doesn't send anything. That's all. All you have to do is connect a cloud of these neurons to weights and figure out how to change those weights, and then they will do whatever. The only question is how you will change the weights.

Promotional video:

When did you realize that this is a rough representation of how the brain works?

Oh, yes, everything was originally intended. Designed to resemble the brain at work.

So at some point in your career, you began to understand how the brain works. Maybe you were twelve years old, maybe twenty-five. When did you decide to try to model computers like brains?

Yes immediately. That was the whole point. This whole idea was to create a learning device that learns like the brain, according to people's ideas about how the brain learns, by changing the strength of the connections. And that was not my idea, Turing had the same idea. Although Turing invented much of the foundations of standard computer science, he believed that the brain was a disorganized device with random weights and used reinforcement learning to change connections, so he could learn anything. And he believed that this is the best path to intelligence.

And you followed Turing's idea that the best way to build a machine is to design it like the human brain. This is how the human brain works, so let's create a similar machine

Yes, not only Turing thought so. Many thought so.

When did the dark times come? When did it happen that other people who were working on it and believed Turing's idea to be correct began to back down, and you continued to bend your line?

There have always been a handful of people who believed no matter what, especially in the field of psychology. But among computer scientists, I guess in the 90s, it happened that the datasets were quite small, and computers were not that fast. And with small datasets, other methods such as support vector machines performed slightly better. They were not so much embarrassed by the noise. So it was all sad because in the 80s we developed a back propagation method, which is very important for neural networks. We thought he would solve everything. And they were puzzled that he hadn't decided anything. The question was really on a scale, but then we did not know it.

Why did you think it wasn't working?

We thought that it did not work because we had not quite correct algorithms and not quite correct objective functions. I thought for a long time that this is because we were trying to do supervised learning when you label the data, and we had to do unsupervised learning when learning from untagged data. It turned out that the question was mostly on a scale.

It is interesting. So the problem was that you didn't have enough data. You thought you had the right amount of data, but you tagged it incorrectly. So you just misdiagnosed the problem?

I thought the mistake was that we are using labels at all. Most of your training happens without using any labels, you are just trying to model a structure in the data. I actually still think so. I think that since computers are getting faster, if the computer is fast enough, then for any dataset of a given size, it is better to train without supervision. And once you complete unsupervised learning, you can learn with fewer tags.

So in the 1990s you continue your research, you are in academia, you are still publishing, but you are not solving big problems. Have you ever had a moment when you said, “You know what, that's enough. Will I try to do something else”? Or did you just tell yourself that you would continue to do deep learning [that is, the concept of deep learning, deep learning of neural networks

Yes. Something like this should work. I mean, the connections in the brain learn in some way, we just need to figure out how. And there are probably many different ways to strengthen connections in the learning process; the brain uses one of them. There may be other ways. But you definitely need something that can strengthen these connections while learning. I never doubted it.

You have never doubted it. When did it seem like it was working?

One of the biggest disappointments of the 80s was that if we made networks with many hidden layers, we could not train them. This is not entirely true, because you can train relatively simple processes like handwriting. But we didn't know how to train most deep neural networks. And around 2005, I came up with a way to train deep networks without supervision. You enter data, say pixels, and train several detail detectors, which just explained well why the pixels were the way they are. Then you feed these part detectors the data and train a different set of part detectors so that we can explain why specific part detectors have specific correlations. You continue to train layer by layer. But the most interesting thing waswhich could be decomposed mathematically and prove that every time you train a new layer, you will not necessarily improve the data model, but you will be dealing with a range of how good your model is. And that range got better with every layer added.

What do you mean by the range of how good your model is?

Once you got the model, you might ask the question, "How unusual does this model find this data?" You show her the data and ask the question: "Do you find all this as expected, or is it unusual?" And this could be measured. And I wanted to get a model, a good model that looks at the data and says, “Yes, yes. I knew it. This is not surprising". It is always very difficult to calculate exactly how unusual a model will find the data. But you can calculate the range of this. We can say that the model will find this data less unusual than this. And it could be shown that as new layers are added to the detail detectors, the model is formed, and with each layer added as it finds data, the range of understanding of how unusual it finds the data becomes better.

So, around 2005, you made this mathematical breakthrough. When did you start getting the right answers? What data did you work with? Your first breakthrough was with speech data, right?

They were just handwritten numbers. Very simple. And around the same time, the development of GPUs (Graphics Processing Units) began. And people who were doing neural networks started using GPUs in 2007. I had a very good student who started using GPUs to find roads in aerial photographs. He wrote the code, which was then adopted by other students using the GPU to recognize phonemes in speech. They used this pre-training idea. And when the pre-training was done, they just hung the tags on top and used back propagation. It turned out that it is possible to create a very deep network that was previously trained in this way. And then backpropagation could be applied and it actually worked. In speech recognition, it worked great. At first, however,it wasn't much better.

Was it better than commercially available speech recognition? Bypassed by the best scientific papers on speech recognition?

On a relatively small dataset called TIMIT, it was slightly better than the best academic work. IBM has also done a lot of work.

Very quickly, people realized that all of this - since it bypasses the standard models that have been in development for 30 years - would work great if developed a little. My graduates went to Microsoft, IBM and Google, and Google very quickly created a working speech recognizer. By 2012, this work, which had been done back in 2009, had hit Android. Android is suddenly much better at speech recognition.

Tell me about a moment when you, who have stored these ideas for 40 years, have been publishing on this topic for 20 years, suddenly bypass your colleagues. What is this feeling like?

Well, at that time I had only stored these ideas for 30 years!

Right, right

There was a great feeling that all of this had finally turned into a real problem.

Do you remember when you first got the data indicating this?

Not.

Okay. So you get the idea that this works with speech recognition. When did you start applying neural networks to other problems?

At first, we started to apply them to all sorts of other problems. George Dahl, with whom we originally worked on speech recognition, used them to predict whether a molecule could bind to something and become a good medicine. And there was a competition. He simply applied our standard technology, built for speech recognition, to predicting drug activity and won the competition. It was a sign that we are doing something very versatile. Then a student appeared who said, “You know, Jeff, this thing will work with image recognition, and Fei-Fei Li created a suitable dataset for that. There is a public competition, let's do something."

We got results that far surpassed standard computer vision. It was 2012.

That is, in these three areas you have excelled: modeling chemicals, speech, voice. Where did you fail?

Do you understand that setbacks are temporary?

Well, what separates the areas where it all works the fastest and the areas where it takes the longest? Looks like visual processing, speech recognition, and something like the basic human things we do with sensory perception are considered the first barriers to overcome, right?

Yes and no, because there are other things that we do well - the same motor skills. We are very good at motor control. Our brains are definitely equipped for this. And only now are neural networks starting to compete with the best other technologies for this. They will win in the end, but now they are just starting to win.

I think thinking, abstract thinking is the last thing we learn. I think they will be among the last things that these neural networks learn to do.

And so you keep saying that neural networks will ultimately prevail everywhere

Well, we are neural networks. Everything that we can, they can.

True, but the human brain is far from the most efficient computing machine ever built

Definitely not.

Definitely not my human brain! Is there a way to model machines that are much more efficient than the human brain?

Philosophically, I have no objection to the idea that there could be some completely different way of doing all this. Maybe if you start with logic, try to automate logic, come up with some fancy theorem prover, reason, and then decide that it is through reasoning that you come to visual perception, it may be that this approach will win. But not yet. I have no philosophical objection to such a victory. We just know that the brain is capable of it.

But there are also things that our brains cannot do well. Does this mean that neural networks will not be able to do them well either?

Quite possibly, yes.

And there is a separate problem, which is that we do not fully understand how neural networks work, right?

Yes, we don't really understand how they work.

We don't understand how top-down neural networks work. This is a basic element of how neural networks work that we do not understand. Explain this, and then let me ask me the next question: if we know how it all works, how does it all work then?

When you look at modern computer vision systems, most of them are mostly forward-looking; they do not use feedback connections. And then there is something else in modern computer vision systems that are very prone to adversarial errors. You can slightly change a few pixels, and what was a panda image and still looks exactly like a panda to you will suddenly become an ostrich in your understanding of a neural network. Obviously, the method of replacing pixels is thought out in such a way as to trick the neural network into thinking about an ostrich. But the point is, it's still a panda to you.

Initially, we thought it all worked great. But then, faced with the fact that they were looking at a panda and were sure it was an ostrich, we got worried. And I think part of the problem is that they are not trying to reconstruct from high level views. They try to learn in isolation, where only the layers of detail detectors are learning, and the whole goal is to change the weights to get better at finding the right answer. We recently discovered, or Nick Frost found, in Toronto, that adding reconstruction increases adversarial resistance. I think that in human vision, reconstruction is used for learning. And because we learn so much while doing reconstruction, we are much more resistant to adversarial attacks.

You believe that downstream communication in a neural network allows you to test how something is being reconstructed. You check it and make sure that it is a panda, not an ostrich

I think this is important, yes.

But brain scientists don't quite agree with this?

Brain scientists do not argue that if you have two regions of the cortex in the path of perception, there will always be reverse connections. They argue with what it is for. It may be needed for attention, for learning, or for reconstruction. Or for all three.

And so we don't know what feedback is. Are you building your new neural networks, starting from the assumption that … no, not even so - you are building feedback, because it is needed for reconstruction in your neural networks, although you do not even really understand how the brain works?

Yes.

Isn't this a gimmick? Well, that is, if you're trying to do something like a brain, but you're not sure if the brain does it?

Not really. I am not in computational neuroscience. I'm not trying to model how the brain works. I look at the brain and say, "It works, and if we want to do something else that works, we have to watch and be inspired by it." We are inspired by neurons, not building a neural model. Thus, the whole model of neurons we use is inspired by the fact that neurons have many connections and that they change weights.

It is interesting. If I were a computer scientist working on neural networks and wanting to bypass Jeff Hinton, one option would be to build downward communication and base it on other models of brain science. Based on training, not reconstruction

If there were better models, you would have won. Yes.

It's very, very interesting. Let's touch on a more general topic. So, neural networks can solve all possible problems. Are there riddles in the human brain that neural networks can't or won't cover? For example, emotions

Not.

So love can be reconstructed with a neural network? Consciousness can be reconstructed?

Absolutely. Once you figure out what these things mean. We're neural networks, right? Consciousness is an especially interesting topic for me. But … people don't really know what they mean by this word. There are many different definitions. And I think that's a pretty scientific term. Therefore, if 100 years ago you asked people: what is life? They would answer, “Well, living things have life force, and when they die, life force leaves them. This is the difference between the living and the dead, either you have the vitality or you don't. Now we have no life force, we think that this concept came before science. And once you start to understand a little about biochemistry and molecular biology, you no longer need life force, you will understand how it all really works. And the same thing, I think, will happen with consciousness. I think,that consciousness is an attempt to explain mental phenomena using an entity. And this essence, it is not needed. Once you can explain it, you can explain how we do everything that makes people conscious beings, explain the different meanings of consciousness without involving any special entities.

It turns out that there are no emotions that could not be created? There is no thought that cannot be created? There is nothing the human mind is capable of that theoretically could not be recreated by a fully functioning neural network once we actually understand how the brain works?

John Lennon sang something similar in one of his songs.

Are you 100% sure about this?

No, I'm Bayesian, so I'm 99.9% sure.

Okay, what then is 0.01%?

Well, we could, for example, all be part of a larger simulation.

Fair enough. So what do we learn about the brain from our work on computers?

Well, I think from what we've learned over the last 10 years, it's interesting that if you take a system with billions of parameters and an objective function - for example, to fill a gap in a line of words - it works better than it should. It will work much better than you might expect. You might think, and many people in traditional AI research would think that you can take a system with a billion parameters, run it at random values, measure the gradient of the objective function, and then tweak it to improve the objective function. You might think that a hopeless algorithm would inevitably get stuck. But no, it turns out this is a really good algorithm. And the larger the scale, the better it works. And this discovery was essentially empirical. There was some theory behind it all, of course, but the discovery was empirical. And now,since we found this, it seems more likely that the brain is calculating the gradient of some objective function and updating the weights and strength of the synaptic connection to keep up with this gradient. We just need to find out what this target function is and how it gets worse.

But we did not understand this with the example of the brain? Don't understand the balance update?

It was theory. Long ago people thought it was possible. But in the background there were always some computer scientists who said: "Yes, but the idea that everything is random and learning is due to gradient descent will not work with a billion parameters, you have to connect a lot of knowledge." We now know that this is not the case. You can just enter random parameters and learn everything.

Let's dive a little deeper. As we learn more and more, we will presumably continue to learn more and more about how the human brain works as we conduct massive tests of models based on our understanding of brain function. Once we understand all this better, will there be a point where we essentially rewire our brains to become much more efficient machines?

If we really understand what's going on, we can improve some things like education. And I think we will improve. It would be very strange to finally understand what is happening in your brain, how it learns, and not adapt so as to learn better.

How do you think, in a couple of years, we will use what we have learned about the brain and how deep learning works to transform education? How would you change the classes?

I'm not sure we'll learn much in a couple of years. I think it will take longer to change education. But speaking of that, [digital] assistants are getting pretty smart. And when assistants can understand conversations, they can talk to and educate children.

And in theory, if we understand the brain better, we can program helpers to better converse with children, based on what they have already learned

Yes, but I didn't think about it much. I'm doing something else. But all this seems quite similar to the truth.

Can we understand how dreams work?

Yes, I am very interested in dreams. I'm so interested that I have at least four different dream theories.

Tell us about them - about the first, second, third, fourth

A long time ago there was this kind of thing called Hopfield networks, and they studied memories as local attractors. Hopfield found that if you try to put too many memories, they get messed up. They will take two local attractors and combine them into one attractor somewhere halfway between them.

Then Francis Crick and Graham Mitchison came and said that we can get rid of these false lows by learning (that is, forgetting what we have learned). We turn off data input, put the neural network in a random state, let it calm down, say that it is bad, change the connections so that it does not fall into this state, and thus we can make the network store more memories.

Then Terry Seinowski and I came in and said, "Look, if we have not only the neurons that hold memories, but a bunch of other neurons, can we find an algorithm that uses all of these other neurons to help recall memories?" … As a result, we created a Boltzmann machine learning algorithm. And Boltzmann's machine learning algorithm had an extremely interesting property: I show the data, and it sort of goes through the rest of the units until it gets into a very happy state, and after that it increases the strength of all connections, based on the fact that two units are active at the same time.

Also, you should have a phase in which you turn off the input, let the algorithm "rustle" and put him in a state in which he is happy, so that he fantasizes, and as soon as he has a fantasy, you say: “Take all pairs of neurons that are active and reduce the strength of the connections."

I explain the algorithm to you as a procedure. But in reality, this algorithm is a product of mathematics and the question: "How do you need to change these chains of connections so that this neural network with all these hidden units of data seems not surprising?" And there should also be another phase, which we call the negative phase, when the network works without data input and unlearns, no matter what state you put it in.

We dream for many hours every night. And if you suddenly wake up, you can say that you just dreamed, because the dream is stored in short-term memory. We know that we see dreams for many hours, but in the morning, after waking up, we can remember only the last dream, and we do not remember the others, which is very successful, because one could mistake them for reality. So why don't we remember our dreams at all? According to Crick, this is the meaning of dreams: to unlearn these things. You kind of learn the other way around.

Terry Seinovski and I have shown that this is actually the maximum likelihood learning procedure for Boltzmann machines. This is the first theory about dreams.

I want to move on to your other theories. But my question is: Have you been able to train any of your deep learning algorithms to actually dream?

Some of the first algorithms that could learn to work with hidden units were Boltzmann machines. They were extremely ineffective. But later I found a way to work with approximations, which turned out to be efficient. And that actually served as the impetus for the resumption of work with deep learning. These were things that trained one layer of feature detectors at a time. And that was an effective form of Boltzmann's restrictive machine. And so she did this kind of reverse learning. But instead of falling asleep, she could just fantasize a little after each data mark.

Okay, so androids are actually dreaming about electric sheep. Let's move on to theories two, three and four

Theory two was called the Wake Sleep Algorithm. You need to train a generative model. And you have an idea to create a model that can generate data, has layers of feature detectors, and activates the higher and lower layers, and so on, down to the activation of pixels - creating an image, essentially. But you would like to teach her something else. You would like it to recognize the data.

And so you have to make an algorithm with two phases. In the awakening phase, the data comes in, he tries to recognize it, and instead of studying the connections that he uses for recognition, he studies the generative connections. The data comes in, I activate the hidden units. And then I try to teach these hidden units to recover this data. He learns to reconstruct in every layer. But the question is, how to learn direct connections? So the idea is that if you knew direct connections, you could learn reverse connections, because you could learn to reverse engineer.

Now it also turns out that if you use reverse joins, you can also learn direct joins, because you can just start at the top and generate some data. And since you are generating data, you know the states of all hidden layers and can study direct connections to restore those states. And here's what happens: if you start with random connections and try to use both phases alternately, you will succeed. For it to work well, you have to try different options, but it will work.

Okay, so what about the other two theories? We only have eight minutes left, I think I won't have time to ask about everything

Give me another hour and I'll tell you about the other two.

Let's talk about what's next. Where is your research heading? What problems are you trying to solve now?

Ultimately, you will have to work on something that the work has not yet finished. I think I may well be working on something that I will never finish - called capsules, a theory about how visual perception is done using reconstruction and how information is directed to the right places. The two main motivating factors were that in standard neural networks, information, activity in the layer is simply automatically sent somewhere, and you do not make a decision about where to send it. The idea behind the capsules was to make decisions about where to send information.

Now that I started working on capsules, very smart people at Google have invented transformers that do the same. They decide where to send the information, and that's a big win.

We'll be back next year to talk about dream theories number three and number four.

Ilya Khel