Neural Networks Have Learned To Read Thoughts In Real Time. What? Not! - Alternative View

Neural Networks Have Learned To Read Thoughts In Real Time. What? Not! - Alternative View
Neural Networks Have Learned To Read Thoughts In Real Time. What? Not! - Alternative View

Video: Neural Networks Have Learned To Read Thoughts In Real Time. What? Not! - Alternative View

Video: Neural Networks Have Learned To Read Thoughts In Real Time. What? Not! - Alternative View
Video: Neural networks taught to "read minds" in real time 2024, April
Anonim

A couple of days ago, the bioRxiv.org preprint portal published the work of Russian researchers from the Moscow Institute of Physics and Technology and the companies Neurobotics and Neuroassistive Technologies, who are engaged in the creation of neurocomputer interfaces. The paper argues that scientists and developers have managed to teach an algorithm in real time to reconstruct a video viewed by a person using EEG signals. Sounds really cool and interesting - almost like mind reading. In fact, everything, of course, is not so simple: computers have not learned to read thoughts. In short, the computer learned from the EEG recording to determine which image of five different previously known classes the subject saw. About how the experiment was built, what tasks the scientists set, and why mind reading is unlikely to be realized in the near future, we tell in our blog.

Image
Image

Generally speaking, the idea of reading the electrical signal of the brain and deciphering it so that you can see what a person is thinking or doing at a given moment, given the pace of current technological progress, does not seem so difficult. Here is a signal, and here is what this signal means: add two and two, train the classifier and get the result we need.

The result is what futurists and ignorant people would call "mind reading." And it seems that such a technology could find itself in a variety of applications: from perfect neurocomputer interfaces that allow you to control smart prostheses, to creating a system that will finally tell you what your cat is thinking there.

In fact, of course, everything is not so simple, and the idea of creating such an algorithm almost immediately breaks down on the main obstacle: we have to deal with the brain. The brain is a very complex thing: it has more than 80 billion neurons, and the connections between them are several thousand times more.

Even to a layperson it is clear: this is too much for us to understand what each cell and their aggregate is responsible for. Scientists have not yet deciphered the human connectome - even if they are trying to do it with relative success.

A natural question arises: is it necessary at all to understand the functions of each neuron in order to accurately represent what is happening in the brain? Are there really not enough functional maps, for example?

The answer to this question, in fact, should be "yes", but even here it is not so simple. If humanity relied on decoding the connectome as the only key to unlocking the mystery of the brain, then we would be very close today. However, we do know something about how our brain works and, of course, we can use it successfully.

Promotional video:

One of the brightest and most obvious examples of using the knowledge accumulated by scientists about the work of the brain is, of course, neurointerfaces. Generally speaking, today there really are technologies that allow reading brain activity and using it to control, for example, the cursor of a computer mouse or even the movements of a prosthesis.

There are two ways to achieve efficient operation of the neural interface. The first method is evoked potentials: we look at the curve of the electrical activity of certain parts of the brain and select on it those changes in the signal that, as we know for certain, appear at a certain moment after the presentation of the stimulus.

The second way is not to rely on stimulation at all, but to use the person's imagination to generate an electrical signal that can be read. For example, a person might be asked to visualize how they move their leg or arm.

Both methods have significant disadvantages. The first is hindered by the fact that the number of reliably evoked potentials known to us is not so large: their number cannot exactly cover all possible actions performed by a person. The disadvantage of the second is that long training is required to achieve at least some effect.

The authors of the preprint decided to combine both approaches to create neurocomputer interfaces, rightly believing that this would save both methods from significant limitations and allow developing a new and most effective method for working with neurointerfaces today.

It was also assumed that this method will be closed (closed loop), that is, the result obtained with its help will, in turn, affect the operation of the algorithm. But more on that later.

At the very beginning, the algorithm breaks down all images into separate component-signs, distributed in the vector space, with the help of which they can then be correlated with certain brain signals recorded using the EEG.

At this initial stage, a binary classifier is used - roughly speaking, the very "two and two": having a sufficiently clean signal (the EEG recording was cleared of motor artifacts), you can choose either one or the other with an accuracy higher than a random hit.

In their experiments, the scientists used videos of objects of five classes: images of people, waterfalls, abstract geometric shapes, extreme sports and Goldberg cars. On the one hand, such a set seems strange, but on the other, it seems that all these objects are very different from each other. Indeed, is there anything in common between human faces and abstract geometric shapes?

Meanwhile, according to the binary classifier, abstract figures and human faces are indistinguishable from each other: the results of nine out of 17 study participants show that the neural interface, apparently, failed to distinguish between them. But Goldberg's machines and the same faces, from the point of view of the brain, on the contrary, differ well from each other.

Classification results. A - abstract shapes, W - waterfalls, HF - human faces, GM - Goldberg cars, E - extreme sports
Classification results. A - abstract shapes, W - waterfalls, HF - human faces, GM - Goldberg cars, E - extreme sports

Classification results. A - abstract shapes, W - waterfalls, HF - human faces, GM - Goldberg cars, E - extreme sports.

At first glance, it is not very clear why this is happening: rather, the same machines and geometric shapes cannot be distinguished from each other. Everything becomes a little clearer if you look at an example of frames from the videos used.

Sample images from five classes
Sample images from five classes

Sample images from five classes.

Most likely (we, of course, can only assume here), the success of the classifier depends on how much the images used in the two classes differ from each other in some superficial, basic features - first of all, in color. This also correlates well with the fact that the dimension of the latent space in the autoencoder is 10.

In general, in order to classify images of five classes, a dimension of five is enough, but in this case it will be done with a maximum of the color histogram - which means that dimension 10 will not improve too much and will clarify the result.

It is not very clear why the authors did not use a linear classifier for five classes at once instead of ten binary classifiers: most likely, it would have been better.

Then comes the stage of reconstruction of the resulting image. The fact that it comes out smeared is understandable - the point is in the same dimension of the latent space. But here two things confuse.

The first is that the original and reconstructed images are very similar to each other. Here, of course, I don't want to upset anyone (including ourselves - we are all for progress), but this is not due to the fact that the signal is so well recorded and decoded (and even in real time!), But due to the fact that the algorithm restores exactly the images that it already had.

Moreover, this does not always work as well as we would like: if, for example, you look at the video of the system, you will notice that in the video with a crying man the neural interface for some reason sees a woman. This is because the algorithm does not reconstruct images, but objects of a certain class: even if it does it efficiently enough, nothing prevents the algorithm from seeing a boat in the image of a motorcycle - simply because they belong to the same class.

Therefore, what appears on the screen during reconstruction is often just an average image of all used class objects.

As for the meaningfulness of using a closed system, then everything is not very clear with it: when performing a task, a person sees both a recording of EEG signals and an image gradually emerging from his head. Whether this actually helps is hard to say - the authors did not compare the performance of the interface with and without reinforcement. But at first glance it seems that not really. If it does help, I really want to know how.

In general, we can safely conclude that computers have not learned to read thoughts. And they didn't even learn how to recreate the video. All they have learned to do, based on the work of scientists, is to classify the objects they have seen into five classes based on some basic criteria. Have computers been able to do this before? Of course they could. Is there a brain here? Of course there is: but it is the brain that sees, not the brain that understands what exactly he saw.

Elizaveta Ivtushok

Recommended: