Research Grants

Focus Areas

Foundations of Reflection

In 1965, the famous statistician I. J. Good suggested that any sufficiently smart mind would be capable of designing a next generation of even smarter minds, giving rise to a positive feedback loop and an "intelligence explosion." In purest form, an AI could rewrite its source code, or design new hardware for itself. Taking this scenario seriously presents us with foundational mathematical challenges: modern logics, probability theories, and decision theories do not adequately handle this type of self-reference.

These difficulties are symptomatic of a more fundamental problem – current AI techniques make little use of reflection. Humans, on the other hand, derive a great deal of benefit from thinking about thinking. What makes self-reference tractable for human reasoners? Can the same techniques be applied to AI and made extremely reliable, and safe?

The Foundations of Reflection Focus Area supports research on:

  • Decision-theoretic, probability-theoretic, or logical foundations for self-referential agents.
  • Analyses of the stability of optimization targets, utility functions, and choice criteria in self-modifying agents.
  • Techniques for self-modification and self-improvement in AI that promise to be strengthenable to extreme reliability.
  • Formal attempts to prove the problem of ensuring extreme realiability in AI unsolvable, or unsolvable using a particular approach, using proof-theoretic or other mathematical arguments.
  • Experiments to determine how human problem-solvers use reflection.
  • Other work that demonstrates a significant new idea or approach.

Friendly AI

The Foundations of Reflection Focus Area arises as part of our broader research focus on Friendly AI: handling the challenge and risk of smarter-than-human AI. SIAI supports technical work on this subject: for example, identifying the problems of learning a multicomponent utility function, under conditions of uncertainty, from noisy, non-independent observations.

The Friendly AI Focus Area supports research on:

  • Conditions sufficient, or necessary, for an agent to learn information in a complexly structured utility function or other choice criterion.
  • Acceptable ways of transforming human motivational structure (e.g. adaptation aspiration) into normative criteria such as utility functions.
  • Detailed identification of known cognitive biases or logical fallacies in previously published work on AI motivations.
  • Rigorous analytic philosophy that addresses some of the biggest problems in Friendly AI, such as "How do we know what we want?"
  • Clear explanations of how an existing AI methodology would fail to maintain Friendliness when scaled to superintelligence, self-modification, real-world capabilities exceeding those of the programmers, metamoral questions, or other exceptional long-term challenges of Friendly AI.

Existential Risks

Nearly 99.9% of all species that ever lived are now extinct. Will our own species have the same end? How could that happen? And what can we do to stave off the end? An "existential risk" is defined as one that threatens to annihilate Earth-originating intelligent life or permanently and drastically curtail its potential. Existential risks are the extreme end of global catastrophic risks. The exact probability of an existential disaster in this century is unknown, but there is reason to think that it is significant. That, at least, is the consensus among the small number of serious scholars who have investigated the question: 50%, Professor Sir Martin Rees, President of the Royal Society (Our Final Hour, 2003); 30%, Professor John Leslie (End of the World, 1996); significant, Judge Richard Posner (Catastrophe, 2004); and not less than 25%, Dr. Nick Bostrom ("Existential Risks: Analyzing Human Extinction Scenarios," 2002).

Because the stakes are enormous, reducing existential risk should be a major priority, but the subject remains severely neglected. Even a slight reduction in existential risk would have immense value. For example, the expectation value of reducing existential risks by only 1% is similar to the value of saving 60 million lives, not counting future generations. If we take into account the people who may come to live on this planet and elsewhere if we avoid existential catastrophe, then even a tiny risk reduction could result in an expectation value equal to the saving of trillions of lives.

The Existential Risks Focus Area supports research on:

  • Original studies and metaresearch on existential risks.
  • Comparative analyses of the threats posed by existential risks.
  • Methodological improvements for studying existential risks.
  • Analyses of complex ethical issues related to existential risks (such as the weight to be placed on the interests of future generations, and how to allocate moral responsibility for risk-reduction).
  • Identification of common elements that contribute to existential risks.
Our Partners
  • Future of Humanity Institute