The Scientist Said That 70 Years In The Field Of AI Research Have Been Practically Wasted - Alternative View

2024 Author: Keith Bush | [email protected]. Last modified: 2023-12-16 14:01

The biggest lesson to be learned from 70 years of AI research is that general methods that use computation are ultimately the most efficient - and by a wide margin. The ultimate reason for this is Moore's Law. Or rather, its generalization: the continuing, exponential reduction in the cost of computing processors. This "bitter lesson" was shared by Richard Sutton, a Canadian computer scientist. Further - from the first person.

Why has artificial intelligence research been at a standstill for 70 years?

Most AI research has been conducted as if the computations available to the agent were persistent (and in this case, using human knowledge would be one of the only ways to improve performance). But over time - much more than a typical research project needs - inevitably, much more computation becomes available. In search of improvements that can help in the short term, scientists are trying to maximize human knowledge in this area, but the only thing that matters in the long term is the increasing use of computing. These two aspects should not run counter to each other, but in practice they do. The time spent on one of them is not equal to the time spent on the other. There are psychological obligations to invest in one approach or another. And the human knowledge approach tends to complicate methods in such a way that they become less suitable for taking advantage of general methods that use computation.

There have been many examples of AI researchers belatedly understanding this bitter lesson. It will be instructive to consider some of the most prominent examples.

In computer chess, the methods that defeated world champion Kasparov in 1997 were based on massive, deep search. At the time, they were viewed with dismay by most computer chess researchers who used methods based on human understanding of the specific structure of chess. When a simpler, search-based approach with specialized hardware and software proved to be much more effective, researchers who build on the human understanding of chess did not admit defeat. They said, “This time the brute-force approach may have won, but it will not become a general strategy and certainly people do not play chess that way. These scientists wanted human-based methods to win, and were very disappointed when they didn't.

Promotional video:

A similar picture of research progress was seen in computer go, only with a delay of another 20 years. Initially, great efforts were made to avoid searching using human knowledge or gameplay, but all these efforts were unnecessary or even worse once the search was applied effectively and on a large scale. It was also important to use learning in the process of independent play in order to learn the value function (as was the case in many other games and even in chess, only learning did not play a large role in the program in 1997, which beat the world champion for the first time). Learning to play with oneself, learning as a whole, is like a search that allows you to apply huge arrays of calculations. Search and learning are two of the most important classes of techniques that involve enormous amounts of computation in AI research. In computer goAs in computer chess, the initial efforts of researchers were directed towards using human understanding (so that there was less search), and much more success was achieved only much later through the use of search and learning.

In the field of speech recognition, a DARPA-sponsored competition was held in the 1970s. Participants presented various methods that took advantage of human knowledge - knowledge of words or phonemes, the human vocal tract, and so on. On the other side of the barricades, there were newer methods, statistical in nature and doing more computation, based on Hidden Markov Models (HMM). Again, statistical methods won out over knowledge-based methods. This led to major changes in all natural language processing that were gradually introduced over the decades, until eventually statistics and computation began to dominate the field. The recent rise of deep learning in speech recognition is the very latest step in this consistent direction. Deep learning relies even less on human knowledge and uses even more computation, along with training on huge sets of samples, and produces amazing speech recognition systems.

Richard Sutton, Canadian computer scientist.

As in games, scientists have always tried to create systems that will work as they imagined in their heads - they tried to put this knowledge into their systems - but it all came out extremely unproductive, scientists were just wasting time while - due to Moore's Law - more and more massive calculations became available and found excellent applications.

A similar picture was in the field of computer vision. The first methods were perceived as a search for certain contours, generalized cylinders, or using the capabilities of SIFT (scale-invariant transformation of features). But today all this was thrown into the furnace. Modern deep learning neural networks only use the concept of convolution and certain invariants and perform much better.

This is a great lesson.

Wherever we look, we keep making the same mistakes everywhere. To see this and deal effectively with this, you need to understand why these mistakes are so attractive. We must learn the bitter lesson that building how we think from how we think will not work in the long run. A bitter lesson based on historical observation shows that: 1) AI researchers have often tried to build knowledge into their agents; 2) it always helped in the short term and brought scientists satisfaction; 3) but in the long term, everything came to a standstill and hindered further progress; 4) disruptive progress inevitably came with the opposite approach, based on scaling computation through search and learning. Success had a bitter taste and was often not fully absorbed.because it's the success of computing, not the success of human-centered approaches.

One thing to learn from this bitter lesson is the tremendous power of general-purpose methods, methods that continue to scale with the growth of computation even as the computation available becomes very large. Two methods that seem to scale arbitrarily this way are search and learn.

The second thing to be learned from this bitter lesson is that the actual content of the mind is extremely and unnecessarily complex; we should stop trying to find simple ways to make sense of the content of the mind, similar to simple ways to make sense of space, objects, multiple agents or symmetries. They are all part of an arbitrarily complex external world. We shouldn't try to build on them, because their complexity is infinite; we should build on meta-methods that can find and capture this arbitrary complexity. These methods can find good approximations, but the search for them should be carried out by our methods, not by us. We need AI agents who can discover in the same way that we can, and not contain what we have discovered. Building on our discoveries only complicates the process of discovery and search.

Ilya Khel