Reducing long-term catastrophic risks from artificial intelligence

In 1965, I.J. Good proposed that machines would one day be smart enough to make themselves smarter. Having made themselves smarter, they would spot still further opportunities for improvement, quickly leaving human intelligence far behind [3]. He called this the "intelligence explosion". Later authors have called it the "technological singularity" or simply "the Singularity" [10] [21].

The Singularity Institute aims to reduce the risk of a catastrophe resulting from an intelligence explosion. We do research, education, and conferences. Below, we make the case for taking artificial intelligence (AI) risks seriously, and suggest some strategies to reduce those risks.

 

What We're (not) About

The Singularity Institute doesn't know exactly when the intelligence explosion will occur, but we'd like to figure out how to make its consequences good rather than bad. We do not see ourselves as having the job of foretelling that it will go well or poorly. If the outcome were predetermined there would be no point in trying to intervene.

We suspect that AI is primarily a software problem that will require new insight, not a hardware problem that will fall to Moore's Law. We are interested in rational analyses of AI risks, not storytelling.

 

Indifference, not malice

Notions of a "robot rebellion," in which AIs spontaneously develop primate-like resentment for their low tribal status, are the stuff of science fiction. The more plausible danger stems not from malice, but from the fact that human survival requires scarce resources: resources for which AIs may have other uses [13] [14]. Superintelligent AIs with access to pervasive data networks and autonomous robotics could radically alter their environment. For example, they could harvest all available solar, chemical, and nuclear energy. If such AIs found uses for this energy that better furthered their goals than supporting human life, human survival would become unlikely.

Many AIs will converge toward a tendency to maximize some goal [13]. For instance, AIs developed under evolutionary pressures would be selected for values that maximized reproductive fitness, and would prefer to allocate resources to reproduction rather than to supporting humans [1]. Such unsafe AIs might actively mimic safe benevolence until they became powerful, since being destroyed would prevent them from working toward their goals. Thus, a broad range of AI designs may initially appear safe, but if developed to the point of an intelligence explosion could cause human extinction in the course of optimizing the Earth for their goals.

 

An intelligence explosion may be sudden

The pace of an intelligence explosion depends on two conflicting pressures. Each improvement in AI technology increases the ability of AIs to research more improvements, but an AI may also face the problem of diminishing returns as the easiest improvements are achieved first.

The rate of improvement is hard to estimate, but several factors suggest it would be high. The predominant view in the AI field is that the bottleneck for powerful AI is software, not than hardware. Continued rapid hardware progress is expected in coming decades [4]. If and when the powerful AI software is developed, there may by that time be a glut of hardware available to run many copies of AIs, and to run them at high speeds. This could amplify the effects of AI improvements [8].

Humans are not optimized for intelligence. Rather, we are the first and possibly dumbest species capable of producing a technological civilization. The first AI with humanlike AI research abilities might be able to reach superintelligence rapidly "€” in particular, more rapidly than researchers and policy-makers can develop adequate safety measures.

 

Is concern premature?

We don't know how to build an AI with human-level intelligence, so we can't have much confidence that it will arrive in the next few decades. But we also can't rule out unforeseen advances. Past underestimates of the difficulty of AI (perhaps most infamously, those made for the 1956 Dartmouth Conference) [12] do not guarantee that AI will never succeed. We need to take into account both repeated discoveries that the problem is more difficult than expected and incremental progress in the field. Advances in AI and machine learning algorithms [17], increasing R&D expenditures by the technology industry, hardware advances that make computation-hungry algorithms feasible [4], enormous datasets [5], and insights from neuroscience give us advantages that past researchers lacked. Given the size of the stakes and the uncertainty about AI timelines, it seems best to allow for the possibility of medium-term AI development in our safety strategies.

 

Friendly AI

Concern about the risks of future AI technology has led some commentators, such as Sun co-founder Bill Joy, to suggest the global regulation and restriction of such technologies [9]. However, appropriately designed AI could offer similarly enormous benefits.

An AI smarter than humans could help us eradicate diseases, avert long-term nuclear risks, and live richer, more meaningful lives. Further, the prospect of those benefits along with the competitive advantages from AI would make a restrictive global treaty difficult to enforce.

The Singularity Institute's primary approach to reducing AI risks has thus been to promote the development of AI with benevolent motivations that are reliably stable under self-improvement, what we call "Friendly AI" [22].

To very quickly summarize some of the key ideas in Friendly AI:

  1. We can't make guarantees about the final outcome of an agent's interaction with the environment, but we may be able to make guarantees about what the agent is trying to do, given its knowledge. We can't determine that Deep Blue will win against Kasparov just by inspecting Deep Blue, but an inspection might reveal that Deep Blue searches the game tree for winning positions rather than losing ones.
  2. Because code executes on the almost perfectly deterministic environment of a computer chip, we may be able to make strong guarantees about an agent's motivations (including how that agent rewrites itself), even though we can't logically prove the outcomes of particular tactics chosen. This is important, because if the agent fails with a tactic, it can update its model of the world and try again. But during self-modification, the AI may need to implement a million code changes, one after the other, without any of them having catastrophic effects.
  3. Gandhi doesn't want to kill people. If someone offers Gandhi a pill that he knows will alter his brain to make him want to kill people, then Gandhi will likely refuse to take the pill. In the same way, most utility functions should be stable under reflection, provided that the AI can correctly project the result of its own self-modifications. Thus, the problem of Friendly AI is not in creating an extra conscience module that constrains the AI despite its preferences. Rather, the challenge is in reaching into the enormous design space of possible minds and selecting an AI that prefers to be Friendly.
  4. Human terminal values are extremely complicated. This complexity is not introspectively visible at a glance. The solution to this problem may involve designing an AI to learn human values by looking at humans, asking questions, scanning human brains, etc., rather than an AI preprogrammed with a fixed set of imperatives that sounded like good ideas at the time.
  5. The explicit moral values of human civilization have changed over time, and we regard this change as progress. We also expect that progress may continue in the future. An AI programmed with the explicit values of 1800 might now be fighting to reestablish slavery. Static moral values are clearly undesirable, but most random changes to values will be even less desirable. Every improvement is a change, but not every change is an improvement. Perhaps we could program the AI to "do what we would have told you to do if we knew everything you know" and "do what we would've told you to do if we thought as fast as you do and could consider many more possible lines of moral argument" and "do what we would tell you to do if we had your ability to reflect on and modify ourselves." In moral philosophy, this approach to moral progress is known as reflective equilibrium [15].
  6.  

    Seeding research programs

    As we get closer to advanced AI, it will be easier to learn how to reduce risks effectively. The interventions to focus on today are those whose benefits will compound over time. Possibilities include:

    Friendly AI: Theoretical computer scientists can investigate AI architectures that self-modify while retaining stable goals. Theoretical toy systems exist now: Gödel machines make provably optimal self-improvements given certain assumptions [19]. Decision theories are being proposed that aim to be stable under self-modification [2]. These models can be extended incrementally into less idealized contexts.

    Stable brain emulations: One route to safe AI may start with human brain emulation. Neuroscientists can investigate the possibility of emulating the brains of individual humans with known motivations, while evolutionary theorists can investigate methods to prevent dangerous evolutionary dynamics, and social scientists can investigate social or legal frameworks to channel the impact of emulations in positive directions [18].

    Models of AI risks: Researchers can build models of AI risks and of AI growth trajectories, using tools from game theory, evolutionary analysis, computer security, or economics [1] [6] [8] [14] [22]. If such analysis is done rigorously it can help to channel the efforts of scientists, graduate students, and funding agencies to the areas with the greatest potential benefits.

    Institutional improvements: Major technological risks are ultimately navigated by society as a whole. Success requires that society understand and respond to scientific evidence. Knowledge of the biases that distort human thinking around catastrophic risks [23], improved methods for probabilistic forecasting [16] or risk analysis [11], and methods for identifying and aggregating expert opinions [7] can all improve the odds of a positive Singularity. So can methods for international cooperation around AI development, and for avoiding an AI "arms race" that might be won by the competitor most willing to trade off safety measures for speed [20].

     

    Our aims

    We aim to seed the above research programs. We are too small to carry out all the needed research ourselves, but we can get the ball rolling.

    We have groundwork already. We have: (a) seed research about catastrophic AI risks and AI safety technologies; (b) human capital; and (c) programs that engage outside research talent, including our annual Singularity Summits and our Visiting Fellows program.

    Going forward, we plan to continue our recent growth by scaling up our visiting fellows program, extending the Singularity Summits and similar academic networking, and writing further papers to seed the above research programs, in-house or with the best outside talent we can find. We welcome potential co-authors, Visiting Fellows, and other collaborators, as well as any suggestions or cost-benefit analyses on how to reduce catastrophic AI risk.

     

    The upside and downside of artificial intelligence

    Human intelligence is the most powerful known biological technology. But our place in history probably rests not on our being the smartest intelligences that could exist, but rather on being the first intelligences that did exist. We probably are to intelligence what the first replicator was to biology. The first single-stranded RNA capable of copying itself was not sophisticated, robust replicator "€” but it still had an important place in history, due to being first.

    The future of intelligence is, hopefully, much greater than its past. The origin and shape of human intelligence may end up playing a critical role in the origin and shape of future civilizations on a much larger scale than one planet. And the origin and shape of the first self-improving Artificial Intelligences humanity builds may have a similarly large impact, for similar reasons. The values of future intelligence will shape future civilization. What stands to be won or lost is the value of future intelligences, and thus the value of future civilization.

     

    Recommended reading

    This has been a very quick introduction. For more information, please contact anna@singinst.org, or see:

    • For a general overview of AI catastrophic risks: Yudkowsky, Eliezer (2008). "Artificial Intelligence as a Positive and Negative Factor in Global Risk" In Bostrom, Nick and Ćirković, Milan M. (eds.), Global Catastrophic Risks, pp. 308-345 (Oxford: Oxford University Press). http://singinst.org/upload/artificial-intelligence-risk.pdf
    • For discussion of self-modifying systems' tendency to approximate optimizers and fully exploit scarce resources: Omohundro, Stephen M. (2008). "The Basic AI Drives." In Pei Wang et al. (eds.), Artificial General Intelligence 2008: Proceedings of the First AGI Conference, pp. 483-492 (Amsterdam: IOS Press). http://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf
    • For discussion of evolutionary pressures toward software minds aimed solely at reproduction: Bostrom, Nick. "The Future of Human Evolution" (2004). In Tandy, Charles (ed.), Death and Anti-Death: Two Hundred Years After Kant, Fifty Years After Turing, pp. 339-371 (Palo Alto, CA: Ria University Press). http://www.nickbostrom.com/fut/evolution.html
    • For tools for doing cost-benefit analysis on human extinction risks, and a discussion of gaps in the current literature: Matheny, Jason G., "Reducing the Risk of Human Extinction", Risk Analysis, Volume 27 Issue 5, pp. 1335-1344, 2007. http://jgmatheny.org/matheny_extinction_risk.htm
    • For an overview of potential causes of human extinction, including AI: Bostrom, Nick. "Existential Risks: Analyzing Human Extinction Scenarios and Related Hazards" (2002). Journal of Evolution and Technology, Vol. 9. http://www.nickbostrom.com/existential/risks.html
    • For an overview of the ethical problems and implications involved in creating a superintelligent AI: Bostrom, Nick. "Ethical Issues in Advanced Artificial Intelligence", Cognitive, Emotive and Ethical Aspects of Decision Making in Humans and in Artificial Intelligence, Vol. 2, ed. I. Smit et al., Int. Institute of Advanced Studies in Systems Research and Cybernetics, 2003, pp. 12-17 http://www.nickbostrom.com/ethics/ai.html

     

    References

    [1] Bostrom, Nick, The Future of Human Evolution, Death and Anti-Death: Two Hundred Years After Kant, Fifty Years After Turing, ed. Charles Tandy, p. 339-371, 2004, Ria University Press. http://www.nickbostrom.com/fut/evolution.html

    [2] Drescher, Gary, Good and Real: Demystifying Paradoxes from Physics to Ethics, pp. 188, The MIT Press, 2006.

    [3] Good, I. J., "Speculations Concerning the First Ultraintelligent Machine", Franz L. Alt and Morris Rubinoff, ed., Advances in Computers (Academic Press) 6: 31-88, 1965 Available at http://www.acceleratingfuture.com/pages/ultraintelligentmachine.html.

    [4] International Technology Roadmap for Semiconductors, "International Technology Roadmap for Semiconductors, 2007 Edition" 2007. Web. 07 Jan. 2010. http://www.itrs.net/Links/2007ITRS/Home2007.htm

    [5] Halevy, Alon, Peter Norvig, and Fernando Pereira, "The Unreasonable Effectiveness of Data," IEEE Intelligent Systems, March/April 2009, pp. 8-12 http://www.computer.org/portal/cms_docs_intelligent/intelligent/homepage/2009/x2exp.pdf

    [6] Hall, J. Storrs, Beyond AI: creating the conscience of the machine. Amherst, N.Y: Prometheus, 2007. Print.

    [7] Hanson, Robin, "Idea Futures." George Mason University, 12 June 1996. Web. 08 Jan. 2010. http://hanson.gmu.edu/ideafutures.html.

    [8] Hanson, Robin, "Economic Growth Given Machine Intelligence." George Mason University, 1998. Web. 7 Jan. 2010. http://hanson.gmu.edu/aigrow.pdf

    [9] Joy, Bill, "Why the Future Doesn't Need Us", Wired Magazine, 2000. http://www.wired.com/wired/archive/8.04/joy.html

    [10] Kurzweil, Ray, The Singularity is Near: When Humans Transcend Biology. Viking Penguin, 2005.

    [11] Matheny, Jason G., "Reducing the Risk of Human Extinction", Risk Analysis, Volume 27 Issue 5, pp. 1335-1344, 2007. http://jgmatheny.org/matheny_extinction_risk.htm

    [12] McCarthy, John, Marvin Minsky, Nathan Rochester, and Claude Shannon, "A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence." Formal Reasoning Group Stanford University, 31 Aug. 1955. Web. 07 Jan. 2010. http://www.formal.stanford.edu/jmc/history/dartmouth/dartmouth.html

    [13] Omohundro, Stephen M., "The Basic AI Drives." Artificial General Intelligence, 2008 proceedings of the First AGI Conference, eds. Pei Wang, Ben Goertzel, and Stan Franklin. Vol. 171. Amsterdam: IOS, 2008. http://selfawaresystems.com/2007/11/30/paper-on-the-basic-ai-drives/

    [14] Omohundro, Stephen M., "The Nature of Self-Improving Artificial Intelligence." Self-Aware Systems. 21 Jan. 2008. Web. 07 Jan. 2010. http://selfawaresystems.com/2007/10/05/paper-on-the-nature-of-self-improving-artificial-intelligence/

    [15] Rawls, John, A Theory of Justice. New York: Belknap, 2005.

    [16] Rayhawk, Steve, Anna Salamon, Tom McCabe, Michael Anissimov, and Rolf Nelson, "Changing the frame of AI futurism: From story-telling to heavy-tailed, high-dimensional probability distributions." Proceedings of the European Conference on Computing and Philosophy. Universitat Autònoma de Barcelona, Barcelona, Spain. 4 July 2009. http://www.singinst.org/theuncertainfuture.html

    [17] Russell, Stuart J. & Norvig, Peter, Artificial Intelligence: A Modern Approach, 2nd ed., Pearson Education, 2003.

    [18] Sandberg, Anders & Bostrom, Nick, "Whole Brain Emulation: A Roadmap", Technical Report #2008-3, Future of Humanity Institute, Oxford University, 2008. http://www.philosophy.ox.ac.uk/__data/assets/pdf_file/0019/3853/brain-emulation-roadmap-report.pdf

    [19] Schmidhuber, Juergen, "Gödel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements", Adaptive Agents and Multi-Agent Systems II, LNCS 3394, p. 1-23, Springer, 2005. ftp://ftp.idsia.ch/pub/juergen/gm6.pdf

    [20] Shulman, Carl M., "Arms Control and Intelligence Explosions." Proceedings of the European Conference on Computing and Philosophy. Universitat Autònoma de Barcelona, Barcelona, Spain. 4 July 2009. http://singinst.org/armscontrolintelligenceexplosions.pdf

    [21] Vinge, Vernor, "The Coming Technological Singularity", Whole Earth Review, New Whole Earth LLC, March 1993 http://www.accelerating.org/articles/comingtechsingularity.html

    [22] Yudkowsky, Eliezer, "Artificial Intelligence as a Positive and Negative Factor in Global Risk", Global Catastrophic Risks, eds. Nick Bostrom and Milan Cirkovic, 2008, pp. 308-345. http://singinst.org/upload/artificial-intelligence-risk.pdf

    [23] Yudkowsky, Eliezer, "Cognitive Biases Affecting Judgement of Existential Risk", Global Catastrophic Risks, eds. Nick Bostrom and Milan Cirkovic, 2008, pp. 91-119. http://www.singinst.org/upload/cognitive-biases.pdf