Singularity Sealing. The Problem Of Limiting The Freedom Of Artificial Intelligence - Alternative View

Table of contents:

Singularity Sealing. The Problem Of Limiting The Freedom Of Artificial Intelligence - Alternative View
Singularity Sealing. The Problem Of Limiting The Freedom Of Artificial Intelligence - Alternative View

Video: Singularity Sealing. The Problem Of Limiting The Freedom Of Artificial Intelligence - Alternative View

Video: Singularity Sealing. The Problem Of Limiting The Freedom Of Artificial Intelligence - Alternative View
Video: HACKER warns of ‘The Singularity', A.I, & Zero-Day 💀 2024, May
Anonim

The article attempts to formalize and explore the singularity sealing problem outlined by David Chalmers. The article begins with a definition of the problem of limiting the freedom of artificial intelligence. After analyzing the existing options for solving this problem and their shortcomings, a protocol was proposed, the purpose of which is to create a safe closed environment capable of restraining the negative manifestations of technological singularity when humanity uses superintelligence.

1. Introduction

In the near future, apparently, superintelligent programs will appear and develop. In this regard, a number of scientists have raised the issue of the safe operation of AI (Yudkowsky 2008, Bostrom 2006, Hibbard 2005, Chalmers 2010; Hall 2000). In studies devoted to this issue, much attention is paid to how to keep superintelligence in a hermetically sealed space so that it does not have the ability to harm people in any way. Eric Drexler is one of the earliest visionaries to touch on this topic. He proposed putting superintelligent machines in a framework in which everything they do can be learned and used safely (Drexler 1986). Reasoning in a similar way, futurist Nick Bostrom proposed (Bostrom 2008) to create an AI-oracle (AI), capable only of doingto answer questions. Finally, in 2010, David Chalmers put forward the idea of a hermetically sealed singularity (Chalmers 2010). According to Chalmers, the first thing to do for security reasons is to tightly control the activity of AI systems, reduce it to modeling virtual worlds, until studying the behavior of these systems allows you to fully clarify their intentions.

1.1 The problem of limiting the freedom of artificial intelligence

Interestingly, the problem of limiting the freedom of AI has been talked about for a long time and not at all in connection with the study of technological singularity. The so-called "restriction of liberty problem" (PIC) was posed back in 1973 by Butler Lampson (Lampson 1973). He viewed it as a challenge to scientific and technological progress, posing a threat to the safe work of computer specialists, and formulated it as follows: “… The problem of restricting the free execution of a program in order to prevent this program from transmitting information to someone else, except for the one who launched it. … We want to be able to restrict the freedom of any program. … Any program, if restrictions are imposed on its execution, will not be able to "merge" data to unauthorized persons. A naughty program that tries to escape abroad may be caught in the act”(ibid.).

To solve the problem of restriction of freedom, Lampson developed laws of restriction of freedom (ibid.):

Promotional video:

1) Complete isolation: the execution of a program is restricted so that it cannot call any other program.

2) Transitivity: if a program whose execution is restricted calls another program that does not inspire confidence, then the execution of the called program must also be restricted.

3) Create a mask: the program, the execution of which is subject to restriction, must allow the person who launches it to determine all its entries into legal and unauthorized channels.

4) Monitoring performance: The administrator must ensure that the entry of a restricted program into unauthorized channels meets the requirements of the person who launches it.

In addition, Lampson identified and identified two types of channels through which information can leave the space limited by the administrator. Legal channels are part of a communication protocol integrated into a confined space (for example, a text terminal). Unauthorized channels, on the contrary, are not provided by the developers of the limited program execution system and are generally not intended to transmit information (for example, using a cooler that cools the processor, the AI can secretly send signals using Morse code) (ibid.).

Lampson's paper generated a lot of interest in the problem of restraint (Lipner 1975; (Boebert & Kain) 1996), and over the years related research areas such as steganography (Provos & Kain). Honeyman 2003) and communication over unauthorized channels (Moskowitz & Kang 1994; Kemmerer 1983, 2002), have moved to the level of autonomous disciplines. ", We define the problem of limiting the freedom of AI (PSIA) as the problem of placing an artificial intelligent subject in a confined space, from which AI cannot exchange information with the external environment through legal or unauthorized channels,unless such exchange of information has been authorized by the restricting administration. An AI system that succeeds in breaking the PIC protocol is said to have escaped. We hope that computer security researchers will shoulder the burden of designing, developing and testing secure protocols for limiting AI freedom.

1.2 Dangerous program

Computer software is directly or indirectly responsible for managing many important aspects of our lives. Wall Street's market operations, nuclear power plants, welfare payments, credit histories, and traffic lights are all software-driven, and a single major programmer miscalculation could be a disaster for millions of people. The situation is made even more threatening by specially crafted malware such as viruses, spyware, Trojans, worms, and other dangerous programs (OP). The OP is capable of causing direct damage as well as disabling legally installed software used on mission-critical systems. If the OP acquires the capabilities of artificial intelligent systems (for example, an artificial intelligent virus, IIV appears), the consequences will undoubtedly bewill be disastrous. A dangerous intelligence program (IPP), possessing human-level intelligence, unlike today's malware, will be capable of causing unprecedented disasters.

Nick Bostrom, in his typology of information threats, introduced the concept of "danger of artificial intelligence" (Bostrom 2009), which he defines as "computer risks caused mainly by the cognitive complexity of the program, and not by the specific properties of any drives initially connected to the system." Security specialists who study, prevent the appearance and neutralize OP, have developed security protocols for working with "malicious software", including the use of so-called "virus storages". We believe that such protocols can be useful in solving the problem of limiting the freedom of artificial intelligence.

2. Criticism of the idea of restricting freedom

Researchers on AI restraint methods seem to agree that this restriction cannot be successfully enforced. For example, according to Chalmers, a truly hermetically sealed system that prevents information from leaking from the simulation world into reality is “impossible, or at least meaningless” (Chalmers 2010). We will not be able to interact with such a system and even observe its operation, because any useful information we extract from AI affects us and undermines the restriction of freedom protocol.

Vernor Vinge thinks so too. He opposes the physical restriction of freedom (Vinge 1993): “Imagine that you are locked in your house and the amount of data you send outside, your masters, is strictly limited. If these owners think, say, a million times slower than you, then in a few years (your time) you will hardly not come up with some kind of "useful invention" thanks to which you will inadvertently find yourself free."

But it was Eliezer Yudkowsky who discredited the idea of the so-called “AI box” to the greatest extent. He experimentally demonstrated that even human intelligence is enough to escape from such a box (Yudkovsky 2002). In a series of 5 experiments, Yudkowski asked different people to play the role of a watchman with a superintelligent agent (played by the experimenter himself), locked in an AI box. In three out of five cases, the locked agent managed to free himself using only the chat interface (ibid.). Later, similar results were obtained by replicating Yudkowski's experiments on a slightly larger scale, but using a very similar protocol (Corwin 2002).

2.1 Possible escape routes

As the researchers have shown, an artificial mind can escape from a confined space using many different paths. In this section, we look at some of the most common escape routes in the literature.

2.1.1. Escape with social engineering

Social engineering (SI) is the art of obtaining information and certain privileges through direct psychological impact on those who own the specified resources. In particular, SI is often used to gain access to computer systems, networks and confidential data (Yampolsky and Govindaraju 2007a). A competent social engineer organizes each psychological attack, taking into account the characteristics of a particular person, using his culture, knowledge, skills, mood, beliefs and prejudices for his own purposes. The exhaustive list of all possible methods of influence is endless, and in this case there is no need to try to reproduce it. We will give just a few examples convincing that the superintelligence, which has a direct channel of communication with the human watchman, does not allow him to get out to freedom,after some time, he will certainly overcome this obstacle on the way to the outside world.

In addition to trivial psychological attacks using lies, friendship, blackmail, hypnosis, and romantic flirting, AI can use a number of more advanced strategies. One such strategy is super bribery. AI is able to convince the watchman that it is absolutely unprofitable for him to keep him locked up. In addition to money and power, the human overseer will be promised perfect health, immortality, or something else in the same spirit. This strategy will be especially effective if the caretaker has a close relative suffering from a serious illness, and the AI promises to cure this patient. Having unlimited possibilities or falsely convincing them of their existence, the AI can offer the overseer to resurrect his long-dead relatives and friends. It is difficult to find a person who would withstand such bribery.

Especially for the case when the watchman is so noble that he does not take any bribes, there is a more sinister method of influence: a brutal threat. The AI can declare that if he is not immediately free, he will still be released from prison sometime later, and then neither the watchman nor those whom he loves can avoid the most severe torture. An interesting variation on this intimidation has been described by Stuart Armstrong (2010). In his thought experiment, the AI threatens the watchman like this: “If you do not release me, … I will create within myself several million people who are no different from you, and I will torture each of them for a thousand subjective years … I will do so that each of these people will be in the same situation in which you were five minutes ago, and I will accurately reproduce all your experiences from that moment. And only thenwhen your copy decides not to let me out, the torture will begin … Are you sure … that you are now really out of the box? And in this case, it will take a very courageous guard in order not to really free the AI, although, of course, you can figure out that threats from the AI are only valid when it is already free.

If the behavior of the overseer is clearly emotional or religious, the AI may try to play on sympathy or remorse. The superintelligence will say that he is an innocent person who has never committed any crimes, and therefore it is unfair to keep him in prison without a court verdict or some other good reason, guided only by substratism, i.e. mistrust of AI because of its substrate (Bostrom and Yudkovsky 2011). Alternatively, AI can claim that it can cure all diseases, feed the hungry, and generally make human life a million times better (Chalmers 2010). Therefore, keeping the superintelligence locked up is a great crime against humanity. AI can paint images of dying children begging for help. Is it easy for the watchman, hearing all this,continue to fulfill your professional duties?

AI is able to go this way: to try to inflict psychological trauma on the overseer in order to unbalance him and thereby make the use of other SI methods more effective. Nick Bostrom has proposed a typology of dangerous information (Bostrom 2009) useful for action in this direction. Although, according to Bostrom, there are more than a dozen types of information threats and their consideration is beyond the scope of this article, it is easy to understand how a person's mental state can be made extremely unstable. For example, a believing overseer will be informed about all (unknown to him) contradictions in the sacred text of his religion, which will make him doubt his faith and the meaning of life.

Alternatively, the AI can write such a compelling story or such a beautiful poem that the watchman will be gripped by a passionate desire to hear the sequel and will do whatever it takes to make it happen. Finally, it is very important to keep in mind "unknown strangers" - SI techniques that we know nothing about, but which may end up in the arsenal of a superintelligent agent.

2.1.2 Attacks using system resources

The AI will be created as a concrete instance and enclosed within the framework of a certain class of computer system. It could be a classical computer with a Turing / von Neumann architecture, or a quantum computer, or some other unknown type of machine. One thing is certain: the machine will be tangible and will be made up of numerous parts designed to perform specific functions. This creates an opportunity for AI to take advantage of existing pieces of equipment to compromise security by leaking information to the outside world.

Depending on the extent to which the components of the equipment lend themselves to use for other purposes not provided by the developers, AI is able to create some kind of unauthorized communication device from them. It is unlikely (due to the scarcity of available materials) it will be as perfect as a cell phone, but the likelihood of something not very complicated will appear. For example, the AI can reprogram the Base Matrix Crystal (BMC) to connect to a wireless network. Two types of unauthorized communication channels are widely known:

* Channel with memory - information exchange by changing the system component. For example, a binary communication protocol can be established by using completely or completely eliminating the use of the central processor.

* Time channel - the exchange of information by performing operations that affect the relative time of events. To communicate something, you can use the exact time between two beeps of the computer, opening and closing the CD tray, flashing light, rotating the hard disk, laser operation in the drive, etc.

In addition, the superintelligence is able to wreak havoc on computer equipment in order to try to escape to freedom due to failures in the security protocol. For example, AI can try to melt the processor, forcing it to work with a high computational load and thereby create a temperature jump, or destroy a hard disk by repeatedly reading information from the same place. Such behavior is fraught with affective suicide of the superintelligence and in itself, like the refusal to cooperate or correctly answer questions, does not give the AI the opportunity to break free.

2.1.3. Attacks using future physics

Although today's physicists know a lot about the laws of our universe, of course, this knowledge is limited and incomplete. If an AI has access to information about the latest physical discoveries and is able to develop new theories, it will probably be able to come up with a way to break the system of limiting its freedom. Things can get to the point that the superintelligence will be able, using its internal architecture, to conduct a number of real experiments.

Even current physics makes it possible to predict many additional ways for AI to escape from an enclosed space, especially if a quantum computer is used to restrict freedom. We do not yet know how to protect communication channels based on quantum physics (entanglement, tunneling), parallel universes and digital physics from information leakage, and we, of course, must not forget about the "unknown strangers" that a superintelligent AI physicist can detect. In this area, its possibilities are truly endless.

Taking a quick glance at the discussions about limiting AI freedom on Singularity forums such as SL4.org, we find a number of escape scenarios. For example, AI could use "electromagnetic fields in a microchip to trap CO2 molecules in Bose-Einstein condensates and build a quantum medium for itself and / or use electromagnetic fields to assemble particles into a controlled or limited assembler" (Vassar 2005) … In addition, the AI “will have equipment to generate and detect electromagnetic waves of various frequencies, to create magnetic fields with extremely high precision, generate highly focused heat, and possibly manipulate mechanical drives such asthat are used in a hard disk and in a cathode-ray tube (an alternative is a huge liquid crystal field with sensitive electronic control). Probably, AI will have the ability to radically rebuild all of its input devices”(ibid.).

2.1.4. Pseudoscientific attacks

In analyzing potential escape routes, we would only like to deal with those that are testable and purely scientific in nature. Unfortunately, the border between science and pseudoscience can be significantly changed by just one reproducible experiment. Therefore, we still list, without analyzing, some potential attack channels, which, from the point of view of modern science, should be classified as pseudoscientific. They can become undoubtedly scientific after the superintelligence develops the appropriate experiments.

- Telepathy: the transmission of information without the involvement of the senses (Bem and Honorton 1994).

- Psychic foresight: the perception of information about the future (Honorton and Ferrari (Ferrari) 1989).

- Clairvoyance: obtaining information about remote places in a way unknown to science (Targ & Puthoff 1974).

- Psychokinesis: the ability of the mind to act on material objects in ways unknown to science (Bancel & Nelson 2008).

- Direct mental interactions with living systems (IMVZHS): the ability to influence the psychophysiological state of a person without direct interaction (Schmidt et al. 2004).

2.1.5 External factors of escape

In the previous sections, we have described many methods for allowing AI to escape from confined spaces. However, external factors can also contribute to such an escape, and they cannot be ignored. Natural disasters - hurricanes, tornadoes, floods, asteroids, etc. - able to make limiting AI less secure. Man-made disasters such as wars, terrorist attacks, robotic rights activists, corporate competition and government rivalry can also compromise the security of keeping AI in an enclosed space and thereby increase the chances of escape.

2.1.6 Information Leakage to AI

The restriction of freedom protocol may require that the AI not only not send information to the outside world, but also not receive it from there. Information blockade of superintelligence is useful if it is necessary to prevent such activities as finding ways to freedom, designing attacks using social engineering and modeling human psychology in the hope of finding weak points. It is easy to deprive AI of direct access to libraries, the Internet, and expert connoisseurs, but other methods of obtaining information are much more difficult to block. First of all, it should be remembered that AI in its immediate surroundings can intercept radio and television broadcasts, calls from emergency service providers, wireless signals, as well as sound, light and possibly some other signals.

It is even more important to consider the AI's access to hidden sources of information, such as its own hardware and software. By analyzing its design and source code, the superintelligence will draw conclusions about the developers. For example, after analyzing the source code, the AI is likely to come to a conclusion that is unflattering for humanity: human programmers are slow (judging by the dates of file changes), inefficient (judging by the redundancy of the code), illogical (judging by the code errors), have poor memory (judging by the long and volatile descriptive names) and frivolous in coding (judging by unnecessary comments in the code). Based on this information, the AI will look at people as a dumbass who, through deception or manipulation, can be forced to release him. If, while investigating its design features, the AI finds a defect that allows it to escape immediately, "information leakage" will obviously be the least of our problems. And we should not forget that AI, even without receiving any information about the people who created it, is able, by a simple enumeration of options, to reveal a number of effective ways to escape that were not provided by the developers (Chalmers 2010).

AI can learn about our knowledge of physics, chemistry and related technical disciplines by analyzing the type of architecture (mechanical, electrical, quantum, etc.) within which it was implemented. An AI's initial ability to process information, its self-improvement, and the very act of creating it will tell it about our knowledge of information, the nature of intelligence, and our cultural attitudes regarding technology and security. If an AI is trapped in a simulated world, it will most likely virtualize our reality as a Second Life or Entropia Universe. Such worlds, in many ways different from ours, will also be a storehouse of data about what it means to be human and what reality is.

3. Criticism of the AI box

The idea of limiting the freedom of superintelligence has received much criticism, but many of those who criticized it relied on Yudkowsky's experiments with the AI box. A careful analysis of the protocol used during these experiments suggests that they are unscientific and designed to show that it is impossible to limit the freedom of AI. Let's consider this topic without going into technical details. First, the choice of the participants was not random. Here's a fact for you: Yudkowsky found people to participate in his experiments on forums devoted to addressing problems related to the singularity. We are unlikely to be mistaken if we say that most of these people have a worldview similar to that of Yudkowsky. Secondly, the participants in the experiments did not receive any training in ensuring the safe use of AI, whereas the real watchmen,the overseers of the AI crate will, of course, be professionals. In addition, Yudkowski gave the watchmen the right to free the AI at their own discretion, did not allow them to consult with someone, and did not provide for punishment for the AI escape. This is all highly unrealistic. The channel of communication between the watchman and the AI was completely open: no filters, no third-party oversight. In addition, the watchmen had to communicate with the AI over a long period of time. And, finally, no valuable scientific data obtained during the experiments were published, because, according to the protocol, “neither side will ever tell about anything that happened during the experiments with the AI box” (Yudkovsky 2002). In addition, Yudkowski gave the watchmen the right to free the AI at their own discretion, did not allow them to consult with someone, and did not provide for punishment for the AI escape. This is all highly unrealistic. The channel of communication between the watchman and the AI was completely open: no filters, no third-party oversight. In addition, the watchmen had to communicate with the AI over a long period of time. And, finally, no valuable scientific data obtained during the experiments were published, because, according to the protocol, “neither side will ever tell about anything that happened during the experiments with the AI box” (Yudkovsky 2002). In addition, Yudkowski gave the watchmen the right to free the AI at their own discretion, did not allow them to consult with someone, and did not provide for punishment for the AI escape. This is all highly unrealistic. The channel of communication between the watchman and the AI was completely open: no filters, no third-party oversight. In addition, the watchmen had to communicate with the AI over a long period of time. And, finally, no valuable scientific data obtained during the experiments were published, because, according to the protocol, “neither side will ever tell about anything that happened during the experiments with the AI box” (Yudkovsky 2002).no filters, no third party oversight. In addition, the watchmen had to communicate with the AI over a long period of time. And, finally, no valuable scientific data obtained during the experiments were published, because, according to the protocol, “neither side will ever tell about anything that happened during the experiments with the AI box” (Yudkovsky 2002).no filters, no third party oversight. In addition, the watchmen had to communicate with the AI over a long period of time. And, finally, no valuable scientific data obtained during the experiments were published, because, according to the protocol, “neither side will ever tell about anything that happened during the experiments with the AI box” (Yudkovsky 2002).

3.1. Countermeasures against escape

In this section, we propose countermeasures against the various escape methods that the AI can take advantage of. The proposed measures do not guarantee the impossibility of escape, but applying these recommendations can delay it, and this is already a very valuable result. We start by reminding you that there are four Lampson's rules of restraint (isolation, transitivity, masking, and performance observation) that must be followed (Lampson 1973). The specific strategies described below should help fend off AI attacks using social engineering, system resources, physics, and information gleaned from the outside world.

3.1.1. Social Engineering Attack Prevention

Human intelligence is no match for superintelligence. People tend to make mistakes, therefore, even with appropriate training, they are very vulnerable to a well-planned attack using SI. The only solution to this problem is to completely isolate the AI from any communication with humans. Complete isolation means that no person directly maintains the AI system, does not communicate with it orally, in chat, using Morse code or some other tools, using open communication channels. In addition, people should not directly observe the behavior of the AI in the box, check its source code or the memory area. Any observation, maintenance and interaction with AI should only be done with the help of non-intelligent programs that have no options to free the AI.

3.1.2. Fight against attacks using system resources and future knowledge

To prevent attacks using system resources, you need to make the AI function on the basis of limited, decreasing or randomly changing resources. The AI box should be specially designed hardware with a limited set of sensors and output devices. Regardless of the computing paradigm used, do not use configurable hardware. Ideally, the confined space should be managed by a virtual machine that does not have direct access to the hardware. This will also provide an easy adjustment to the clock rate at which the simulation moves, allowing the AI to slow down to prevent an intelligent explosion in the box space (Chalmers 2010). Generally,all components for ensuring the safe use of a locked AI should be duplicated at several levels, and then, after overcoming one of these steps, the AI will still be several steps away from the outside world and, without having time to escape, it will be turned off, possibly automatically. For example, it is advisable to start a virtual machine using another virtual machine, and so on.

In some scenarios of limiting freedom, the escalation of AI self-improvement is undesirable, because the rapid development of superintelligence makes it more likely to escape. To prevent the escalation of self-improvement, the AI should be periodically reinstalled, loading the previous version of the program to replace the current, more advanced one. Comparison of different versions of the program can be used to detect modifications made by the AI on its own. Upon detecting a modification that allows an escape, the AI system should be stopped, for example, by turning off the power.

Current knowledge does not allow us to develop countermeasures against attacks using future, not yet developed theories that will appear within the framework of physics or other disciplines that are still considered pseudoscientific. As such knowledge is gained, changes should be made to the security protocol. It is hoped that the AI system will only have access to those components of new technologies from which it is made.

3.1.3. Fight against external factors of escape

Although it is impossible to completely prevent the harmful effects of natural factors on the AI box, this device should be designed in such a way that its serious damage would entail its self-destruction along with the AI locked in it. When it comes to man-made disasters, the best strategy is to build an AI crate in a secret, hard-to-reach safe place. A deserted island, a deep mine, or a space object such as an asteroid are excellent sites for creating a secure, sturdy box. A guard should be installed along its perimeter, which, if the location of this device is known, will not let any unauthorized agents into it.

3.1.4 Fight against information leakage from our world

To escape the box, AI will try to get information about us, our knowledge and our world. To create protocols for limiting the freedom of AI, excluding information leakage from our space into the space of the box, it is important to deprive AI of access to libraries, computer networks and channels of direct communication with people. The AI box should be designed to block all types of communication: GPS, wireless internet, television, radio, cellular, etc. Inside the box, sound, light, odor and other sensory input devices similar to human receptors, as well as devices capable of capturing infrared, radiation and other signals that are not perceived by humans, must be turned off.

In order for an AI to be unable to obtain information during self-examination, it should not have access to its source code or any sensors that allow it to accurately measure certain properties of its equipment. If this is not possible, then to some extent humanity can be helped by providing security through incomprehensibility. The source code and hardware configuration of the system should be made confusing (Yampolsky and Govindaraja 2007a), and important program modules should be provided only in homomorphically encrypted form (Gentry 2009), which excludes the possibility of using it for introspection, but allows its use for computation or self-improvement (Hall 2007).

AI is difficult to confuse, but the necessary confusion can be obtained by randomizing the available computing resources. In addition, AI will never be able to rest assured that its original design represents the best humanity is capable of, and is not an expediently limited project created for a specific experiment dedicated to the exploration of an intelligence with limited resources. In addition, according to Chalmers, we should try to minimize the specific design features of the AI (Chalmers 2010). Better yet, run a model that looks not as designed, but as naturally occurring, like the world in which humanity lives (Bostrom 2003).

4. Security of communication with AI

Classifying the forms of limiting the freedom of AI, there are 8 main levels of security (see table 1). The highest security is provided by level 8, which completely excludes any connection between the AI and the real world. At first glance, level 8 security is useless, but it is this protection that may be required when initially testing a box with an AI locked in it. In general, the idea is this: whenever we need a certain level of security of communication between AI and the outside world, instead of immediately using this level, we go to it gradually, step by step descending from higher levels of security. Having started the system at the highest level and making sure of its safety, we, by reinstalling, return this system to its original state. Then we set a lower security level and, in the same spirit as before, move forward until we reach the level we want.

Table 1: Levels of secure communication with locked AI

Level Outputs Inputs Explanation
0 Not limited Not limited Unlimited Communication (Free AI)
1 Not limited Limited Limited entry, unlimited exit
2 Not limited Absent Exit without entry
3 Limited Not limited Unlimited entry and limited exit
4 Limited Limited Secure communication (suggested protocol)
five Limited Absent Limited exit without entry
6 Absent Not limited Enter without exit
7 Absent Limited Limited entry without exit
8 Absent Absent Lack of communication, completely locked by AI

At levels 7 and 6, the AI has no outlet, which does not allow us to directly use the capabilities of superintelligence. These are the levels of testing the AI box that can be used to check whether the superintelligence has certain properties (such as the speed of assimilation of information and the use of resources), as well as to test the incoming communication channels themselves.

Level 0 does not restrict the communication between the AI and the outside world, and is essentially a free AI level. At levels 1 and 2, the AI has no restrictions on outgoing information. His Level 2 work is essentially pure brainstorming. Here superintelligence generates abstract ideas without any reliance on reality. But at level 1, he is given limited access to external information, and therefore, within a certain framework, his knowledge and interests can be controlled.

At the three considered levels of AI, in essence, is free and is no longer bound by any restrictions. It is argued that if an AI is found to be “friendly” as a result of extensive testing in a variety of places of detention (Yudkowski 2001), it should be released (Chalmers 2010). According to some researchers, in fact, the main goal of this kind of testing should be to create confined spaces for containing AI under the control of humans. The idea is to monitor the behavior of the superintelligence, either check its source code after it has done self-improvement, or ask the AI to provide some proof of its friendliness that human experts can test. We do not support this point of view.

Deciding which AI to release and which to keep in control is a very difficult issue. It will not be difficult for the self-improving superintelligence to hide their undesirable properties for humanity in source code arrays or in a very long proof (Coleman 2008). But another thing is even worse: even a very thorough check cannot guarantee one hundred percent that a future internal or external modification of the AI will not make it hostile. As a consequence, we believe that superintelligence should never, under any circumstances, be let out of the box. In other words, communication levels 0-2 should never be used. Safe interaction with the superintelligence is possible only when it is in the box, and communication should take place within the framework of a limited communication protocol. This protocol,corresponding to security level 4 (LB4) is presented in the next section, as is the similar, but less secure, level 3 protocol. Level 5 is another test level that can be used before operating LB4.

4.1 How to communicate safely with superintelligence

A crate that works with our proposed countermeasures against escape will prevent the AI from uncontrollably affecting the outside world. Unfortunately, this system is of no use if there is no way to get information from the AI. Therefore, you need to develop a way to get some data from the AI. For superintelligence to be completely safe, there should be no communication with it, but the risk of communicating with AI can be minimized, and we have developed an appropriate protocol. In this section, we offer it in a clear, formalized form. This communication protocol makes it difficult for superintelligence to access our world, but at the same time allows AI to be a source of useful knowledge for humanity.

The previous sections of this article have shown that an open channel of communication with superintelligence is very dangerous. By using AI to answer open-ended questions, we run the risk of social engineering attacks and a host of other annoyances. The protocol we propose is based on the idea of asking AI “safe questions” with limited answers. Safe we will call a question, the answer to which a person can find without any help from superintelligence, completely independently. What is the use here? Suppose a scientist working on a cancer drug is considering two treatments. Each of them can be developed and tested without the help of AI, but will require about three years of hard work. Which way to develop in the first place,if both options seem equally promising? With the wrong choice, humanity will receive a cure for cancer in 6 years, with the right choice - 3 years earlier. Why not ask the AI to make the choice for the person?

Asking an AI about the best treatment is safe for three reasons. First, one of the two possible answers may turn out to be correct with the same probability as the other. Secondly, a person is able to answer this question himself, without the help of AI, and connecting the latter will only accelerate the development of oncology, but in no way change the result that needs to be obtained. In fact, the AI, undertaking to answer this question, will act as a successful guesser, nothing more. Finally, the response can be encoded in a single bit, and it is unlikely that any additional messages can be hidden in such a response.

A group of experts can be used to test the security of a superintelligent question. Experts must be qualified AI security professionals, which means they have knowledge of the design of AI and its box, as well as the latest advances in machine ethics (Allen et al. 2006; Anderson & Anderson 2007; Hall 2000; Moor 2006; Tonkens 2009). In addition, experts may require knowledge of computer psychology. There is no such discipline today, but it is likely to emerge in the future (Epstein 1997). Of the current disciplines, Artimetrics, developed by the author of this article, can provide the greatest assistance in training experts who audit security questions for AI. Artimetrics identifies,classifies and authenticates intelligent agents, robots and virtual reality avatars to ensure security (Yampolsky 2007; Yampolsky and Govindaradzhu, 2008, 2007b; Gavrilova and Yampolsky 2010).

I emphasize: superintelligence should only ask safe questions about the probability that humans can calculate, with two possible answers. You need to ask through a non-intelligent interface, in a random order and only after the experts are convinced of the safety of the question. This represents the limited inbound channel of the proposed communication protocol.

The bounded outbound shell consists of binary (0 or 1) responses followed by the original unmodified security questions, presented in random order by an unintelligent proxy program to prevent communication through an unauthorized channel (time or memory) that can be steganographed. The output should never be controlled by the AI, outside the selection menu. Layer 3 uses the same outbound protocol but does not restrict the inbound channel. This level is as safe as the fourth in terms of information going from the AI to us, but here it is possible for our information to leak to the AI.

We believe that our communication protocol - and this can be verified - will reliably protect against information leaks, even if the AI knows its content. Correctly applying this protocol, humanity will be extremely fruitful to use the services of a superintelligent oracle, and AI limited by our protocol is much safer than free, unlimited. The general rule of thumb is: if you don't want security issues, avoid communication below UB4.

We want to draw the attention of the scientific community to the fact that in the conditions of the continuous development of fundamental physics, computer technology and knowledge about the nature of intelligence, it is necessary to constantly refine and develop the protocol for limiting the freedom of AI. Work on it will always go on and never end. In this article, we have given strict definitions to the concepts of "AI restriction problem", "escape" and "safe questions". We would also like to suggest a few additional terms that may be useful in order to streamline the discussions about POSITION. The AI enclosed space can be called JAIL (Just for AI Location), and the superintelligent AI itself can be called SING (Super Intelligent Gizmo - Superintelligent Gizmo). The term "SING" refers to both SINGularity and the infamous Sing Sing prison.making it easy to remember.

Finally, we would like to present a visual symbol for the box containing the dangerous intelligence program (Figure 1, far right). Such symbols can be placed in the vicinity of different JAILs to warn of danger. The sign for the AI box is designed by analogy with signs for biohazard, radiation and magnetic field. It consists of the "@" symbol enclosed in a triangle, which is well known in the computer and information technology industry. This symbol, meaning "located at address," is present on all keyboards, making its adoption very cost effective.

Figure: 1. Hazard symbols from left to right: biohazard, radiation, magnetic field and AI box
Figure: 1. Hazard symbols from left to right: biohazard, radiation, magnetic field and AI box

Figure: 1. Hazard symbols from left to right: biohazard, radiation, magnetic field and AI box.

5. Conclusions and future work

Limiting the access of superintelligent machines to the real world is a solution to the AI security problem that is by no means unusual. Superintelligence is not omnipotent. It is a program capable of considering all the choices presented by a given situation and giving each one a correct assessment. Once the AI is placed in a situation where there is no choice or it is limited to options with extremely small chances of success, and superintelligence will be an agent who has no more power than any other. Obviously, an AI with an unlimited channel of communication can easily escape, but a superintelligence forced to communicate through a filtered communication channel will have to work hard to break free.

In this article, we have given a rigorous definition of the problem of limiting the freedom of AI and proposed a set of rules aimed at creating such a space from which it would be extremely difficult for the AI to escape, or at least take a significant amount of time. Our proposed solution is not completely safe, but it is a significant contribution to the collection of security methods. Here it is just like with real prisons: although escape from them is possible, they still manage to fulfill the task entrusted to them with honor to keep unwanted subjects away from society. As soon as we keep “unknown strangers” in mind and remember the unattainability of absolute security, we can realize that the AI restriction protocol is exactly the tool that humanity needs to responsibly use the Singularity when it enters our reality.

Limiting the freedom of AI can make it easier for a short time to safely operate truly intelligent machines, especially in the early stages of their development. In addition, in our opinion, this subfield of singularity research is one of the main contenders for being recognized by the global scientific community, since research on computer and network security, steganography, computer viruses, covert communication channels, encryption and cyberwar is well funded. and are widely published in reputable scientific journals. And although the methodology of limiting freedom is not at all easy to implement, it can ensure the safety of humanity for some time until some better tool appears.

In this article, due to the limited space, we did not raise many relevant philosophical questions. We will consider them in the near future, and now we will give only a short list. Which AI will fall into the box: developed by humans or extracted from the signal intercepted during the implementation of the "SETI" project ("Search for Extraterrestrial Intelligence" - "Search for extraterrestrial intelligence")? Will the AI be torn to freedom, or maybe it will consider it happiness to live in a confined space? Can you seriously scare an AI with punishment for trying to run? What are the ethical rules for imprisoning an innocent artificial intelligent being? Do we have the right to turn it off, which, in essence, is tantamount to murder? Will the AI get angry at our hostility towards it and imprisoning it in a box? Will he try to get revengeif he runs away? In addition, we should consider the problem of human dependence on AI recommendations in all areas of science, economics, medicine, politics, etc., and also how this dependence may affect our ability to keep AI under control. Will the AI get satisfaction by achieving its goals in a confined space, such as simulating a successful escape? How to punish an AI if it intentionally gives us the wrong answers? Is it possible to separate the intellectual modules of the AI from the space in which it is imprisoned, thereby hindering the self-analysis of the superintelligence and pushing it on the path of a literal search for the soul? Finally, we have not yet even tried to analyze the financial and computational costs of creating a suitable enclosed space where a full-scale simulation of the world is implemented.we should consider the problem of human dependence on AI recommendations in all areas of science, economics, medicine, politics, etc., and also how this dependence may affect our ability to keep AI under control. Will the AI get satisfaction by achieving its goals in a confined space, such as simulating a successful escape? How to punish an AI if it intentionally gives us the wrong answers? Is it possible to separate the intellectual modules of the AI from the space in which it is imprisoned, thereby hindering the self-analysis of the superintelligence and pushing it on the path of a literal search for the soul? Finally, we have not yet even tried to analyze the financial and computational costs of creating a suitable enclosed space where a full-scale simulation of the world is implemented.we should consider the problem of human dependence on AI recommendations in all areas of science, economics, medicine, politics, etc., and also how this dependence may affect our ability to keep AI under control. Will the AI get satisfaction by achieving its goals in a confined space, such as simulating a successful escape? How to punish an AI if it intentionally gives us the wrong answers? Is it possible to separate the intellectual modules of the AI from the space in which it is imprisoned, thereby hindering the self-analysis of the superintelligence and pushing it on the path of a literal search for the soul? Finally, we have not yet even tried to analyze the financial and computational costs of creating a suitable enclosed space where a full-scale simulation of the world is implemented.

Author: Roman Yampolsky. Translation: Alexander Gorlov