Why Scientists Should Not Rely On Artificial Intelligence For Scientific Discovery - Alternative View

Why Scientists Should Not Rely On Artificial Intelligence For Scientific Discovery - Alternative View
Why Scientists Should Not Rely On Artificial Intelligence For Scientific Discovery - Alternative View

Video: Why Scientists Should Not Rely On Artificial Intelligence For Scientific Discovery - Alternative View

Video: Why Scientists Should Not Rely On Artificial Intelligence For Scientific Discovery - Alternative View
Video: The danger of AI is weirder than you think | Janelle Shane 2024, May
Anonim

We live in a golden age of scientific data, surrounded by vast reserves of genetic information, medical imaging, and astronomical data. The current capabilities of machine learning algorithms allow artificial intelligence to study this data as quickly and at the same time very carefully, often opening the door to potentially new scientific discoveries. However, we should not blindly trust the results of scientific research conducted by AI, says Rice University researcher Genever Allen. At least not at the current level of development of this technology. According to the scientist, the problem lies in the fact that modern AI systems do not have the ability to critically assess the results of their work.

According to Allen, AI systems that use machine learning methods, that is, when learning occurs in the process of applying solutions to many similar problems, and not simply by introducing and following new rules and regulations, can be trusted to make some decisions. More precisely, it is quite possible to assign tasks to AI in solving issues in those areas where the final result can be easily checked and analyzed by the person himself. As an example, we can take, say, counting the number of craters on the moon or predicting aftershocks after an earthquake.

However, the accuracy and efficiency of more complex algorithms that are used to analyze very large amounts of data to find and determine previously unknown factors or relationships between different functions "are much more difficult to verify," Allen notes. Thus, the impossibility of verifying the data matched by such algorithms can lead to erroneous scientific conclusions.

Take, for example, precision medicine, where specialists analyze patient metadata to find specific groups of people with similar genetic characteristics to develop effective treatments. Some AI programs designed to sift through genetic data are indeed effective in identifying groups of patients with a similar predisposition, for example, to developing breast cancer. However, they turn out to be completely ineffective in identifying other types of cancer, for example, colorectal. Each algorithm analyzes the data differently, so when combining the results, there can often be a conflict in the classification of the patient sample. This in turn makes scientists wonder about which AI to ultimately trust.

These contradictions arise due to the fact that the algorithms for data analysis are designed in such a way as to obey the instructions laid down in these algorithms, which leave no room for indecision, uncertainty, Allen explains.

Scientists don't like uncertainty. However, traditional methods for determining measurement uncertainties are designed for those cases where it is required to analyze data that have been specially selected to evaluate a particular hypothesis. This is not how AI programs for data mining work. These programs are not driven by any guiding idea and simply analyze datasets collected without any particular specific purpose. Therefore, many AI researchers, including Allen herself, are now developing new protocols that will allow next-generation AI systems to evaluate the accuracy and reproducibility of their discoveries.

Promotional video:

The researcher explains that one of the new mining methods will be based on the concept of resampling. For example, if an AI system is supposed to make an important discovery, for example, identifies groups of patients that are clinically important for research, then this discovery should be displayed in other databases. It is very costly for scientists to create new and larger datasets to validate AI sampling. Therefore, according to Allan, an approach can be used in which "an existing set of data will be used, information in which will be randomly mixed in such a way that it simulates a completely new database." And if over and over again the AI can determine the characteristic features that make it possible to carry out the necessary classification, “then it will be possible to considerthat you have a really real discovery in your hands,”adds Allan.

Nikolay Khizhnyak