Over the past 10 years, thanks to the so-called deep learning method, we have received the best artificial intelligence systems - for example, speech recognizers on smartphones or the latest automatic translator from Google. Deep learning, in fact, has become a new trend in the already famous neural networks that have been in vogue and coming out for over 70 years. Neural networks were first proposed by Warren McCullough and Walter Pitts in 1994, two researchers at the University of Chicago. In 1952, they went to work at the Massachusetts Institute of Technology to lay the groundwork for the first Department of Cognition.
Neural networks were one of the main lines of research in both neuroscience and computer science until 1969, when, according to legends, they were killed off by MIT mathematicians Marvin Minsky and Seymour Papert, who a year later became co-heads of MIT's new artificial intelligence lab.
This method experienced a revival in the 1980s, slightly faded into the shadows in the first decade of the new century, and returned with fanfare in the second, on the crest of the incredible development of graphics chips and their processing power.
"There is a perception that ideas in science are like epidemics of viruses," says Tomaso Poggio, professor of cognition and brain sciences at MIT. “There are probably five or six major strains of influenza viruses, and one of them comes back at an enviable 25-year rate. People become infected, acquire immunity and do not get sick for the next 25 years. Then a new generation appears, ready to be infected with the same virus strain. In science, people fall in love with an idea, it drives everyone crazy, then they beat it to death and acquire immunity to it - they get tired of it. Ideas should have a similar frequency."
Weighty questions
Neural networks are a method of machine learning where a computer learns to perform certain tasks by analyzing training examples. Typically, these examples are manually tagged in advance. An object recognition system, for example, can soak up thousands of tagged images of cars, houses, coffee cups, and so on, and then be able to find visual patterns in those images that consistently correlate with certain tags.
A neural network is often compared to the human brain, which also has such networks, consisting of thousands or millions of simple processing nodes, which are closely interconnected. Most modern neural networks are organized into layers of nodes, and data flows through them in only one direction. An individual node can be associated with several nodes in the layer below it, from which it receives data, and several nodes in the layer above, to which it transmits data.
Promotional video:
The node assigns a number to each of these incoming links - "weight". When the network is active, the node receives different sets of data - different numbers - for each of these connections and multiplies by the appropriate weight. He then adds up the results to form a single number. If this number is below the threshold, the node does not transmit data to the next layer. If the number exceeds the threshold, the node "wakes up" by sending the number - the sum of the weighted input data - to all outgoing connections.
When a neural network is trained, all of its weights and thresholds are initially set in random order. The training data is fed into the lower layer - the input layer - and passes through subsequent layers, multiplying and summing in a complex manner, until finally arriving, already transformed, in the output layer. During training, weights and thresholds are continually adjusted until training data with the same labels produce similar conclusions.
Mind and machines
The neural networks described by McCullough and Pitts in 1944 had both thresholds and weights, but were not organized in layers, and scientists did not specify any specific learning mechanism. But McCullough and Pitts showed that a neural network could, in principle, calculate any function, like any digital computer. The result was more from the field of neuroscience than computer science: it had to be assumed that the human brain could be viewed as a computing device.
Neural networks continue to be a valuable tool for neurobiological research. For example, individual layers of the network or rules for adjusting weights and thresholds reproduced the observed features of human neuroanatomy and cognitive functions, and therefore affected how the brain processes information.
The first trainable neural network, the Perceptron (or Perceptron), was demonstrated by Cornell University psychologist Frank Rosenblatt in 1957. Perceptron's design was similar to a modern neural network, except that it had a single layer with adjustable weights and thresholds sandwiched between the input and output layers.
"Perceptrons" were actively researched in psychology and computer science until 1959, when Minsky and Papert published a book called "Perceptrons", which showed that doing quite ordinary calculations on perceptrons was impractical in terms of time.
"Of course, all the limitations kind of disappear if you make the machines a little more complex," for example in two layers, "says Poggio. But at the time, the book had a chilling effect on neural network research.
“These things are worth considering in a historical context,” says Poggio. “The proof was built for programming in languages like Lisp. Not long before that, people were quietly using analog computers. It was not entirely clear at the time what programming would lead to. I think they overdid it a bit, but, as always, you can't divide everything into black and white. If you think of it as a competition between analog computing and digital computing, then they were fighting for what was needed."
Periodicity
By the 1980s, however, scientists had developed algorithms to modify neural network weights and thresholds that were efficient enough for networks with more than one layer, removing many of the limitations identified by Minsky and Papert. This area has experienced a Renaissance.
But from a reasonable point of view, something was missing in the neural networks. A long enough training session could lead to a revision of the network settings until it begins to classify the data in a useful way, but what do these settings mean? What features of the image does the object recognizer look at and how does it piece them together to form the visual signatures of cars, houses, and cups of coffee? A study of the weights of individual compounds will not answer this question.
In recent years, computer scientists have begun to come up with ingenious methods to determine the analytical strategies adopted by neural networks. But in the 1980s, the strategies of these networks were incomprehensible. Therefore, at the turn of the century, neural networks were superseded by vector machines, an alternative approach to machine learning based on pure and elegant mathematics.
The recent surge in interest in neural networks - the deep learning revolution - is owed to the gaming industry. The complex graphics and fast pace of modern video games require hardware that can keep up with the trend, resulting in a GPU (graphics processing unit) with thousands of relatively simple processing cores on a single chip. Scientists soon realized that GPU architecture was perfect for neural networks.
Modern GPUs made it possible to build networks of the 1960s and two- and three-layer networks of the 1980s into bunches of 10-, 15-, and even 50-layer networks of today. This is what the word "deep" is responsible for in "deep learning." To the depth of the network. Deep learning is currently responsible for the most efficient systems in almost all areas of artificial intelligence research.
Under the hood
Network opacity still worries theorists, but there is progress on this front. Poggio leads a research program on the theoretical foundations of intelligence. Recently, Poggio and his colleagues released a theoretical study of neural networks in three parts.
The first part, which was published last month in the International Journal of Automation and Computing, addresses the range of computations that deep learning networks can do, and when deep networks take advantage of shallow ones. Parts two and three, which were released as lectures, address the challenges of global optimization, that is, ensuring that the network finds the settings that best fit its training data, as well as cases where the network understands so well the specifics of its training data. which cannot generalize other manifestations of the same categories.
There are still many theoretical questions ahead, the answers to which will have to be given. But there is hope that neural networks will finally be able to break the cycle of generations that plunge them into heat and sometimes cold.
ILYA KHEL