Research Grants
Academic Paper Grant
Containing Superintelligence: Feasibility and Strategies
Research summary:
SIAI's research on AI safety has focused on the concept of "Friendly AI" -- AI proven to have goals that, under certain assumptions, stably align with humanity's, even as the AI rewrites its own code and increases its intelligence. For AI designs expected to become extremely powerful, this degree of goal alignment could be essential. However, other classes of precautions involve constructing systems specifically to reduce the likelihood of unwanted rapid self-improvement, "intelligence explosion," or increase in power. In particular, AI could be limited in its access to resources and manipulators, i.e., "kept in a box." Or it might be designed with goals such that it has only weak incentives to obtain such access, e.g., an "Oracle AI" concerned only with answering questions without utilizing external resources or one with extreme temporal discounting. This paper will analyze some of these strategies, their ease of implementation, and their safety.
Planned contents include:
- An introduction on the problem of Friendly AI, and the Singularity.
- A description of the circumstances under which Friendly AI, or full alignment of goals between AI and its creators, is most essential.
- A discussion of advantages and disadvantages for strategies resembling "AI Boxing", and exploration of the space of such strategies.
- A discussion of the risks of boxed AIs escaping, and reasons why these might be easy to underestimate.
- An analysis of the feasibility and costs and benefits of creating AI with narrow goals that do not create strong drives to obtain power and resources.
- An investigation of solutions to some problems with these strategies.
Prior related work:
Eliezer Yudkowsky's AI Box Experiment
AI as a Positive and Negative Factor in Global Risk, by Eliezer Yudkowsky
Stephen Omohundro's Basic AI Drives discusses the pressures on non-Friendly AIs to seek out resources and manipulators
Target dates for:
- Extended Abstract (Posting an extended abstract on SIAI website, and circulating to a few related academics for comment): 2 weeks after start date.[1]
- Working paper (Posting a working paper on the SIAI website; circulating to related academics): 6 weeks after start date.
- Conference submission: 10 weeks after start date.
- Follow-up steps (Brainstorming, and drafting proposals for, any follow-up publications. Should it be developed into a journal paper?): 12 weeks after start date.
[1] The "starting date" is the date (guaranteed to be within six months of the receipt of grant money) when we have skilled people to allocate to the project. Extra donations increase our base of skilled people and thereby increase the number of projects we can get to; the lagged start date allows us to find new people, bring them here, and train them.
How research costs are estimated:
- Researcher-months for research and writing: 1.25 (This is our standard estimate of the time required for conference articles.[1])
- Dollars required to support one skilled full time researcher-month[2]: $2,400
[2] This billing rate reflects an estimate of financial outlays for SIAI to create the equivalent of one full-time skilled researcher-month, including stipend or hosting expenses, workspace, and administrative or management time, and other supporting expenses. Actual person-months may be greater or lower depending on the labor mix for a particular project, with shortfalls made up from general funds. This rate is not reflective of the money researchers could earn in the competitive labor market. Think of this as a matched donation. You donate the living expenses; our researchers donate the surplus value of their labor.
How this paper will help reduce existential risk:
- The paper will explore the safety and feasibility of alternative strategies for reducing AI risk
- If Friendly AI is very difficult relative to AGI, and global regulation of AI is infeasible, then strategies such as this may be the most productive way to reduce AI catastrophic risks.
- If ideas in the paper, or work inspired or caused by it, are of sufficient quality, they may cause the implementation of improved safety measures by future AI researchers
- Analysis of limited suboptimal solutions can help to avoid the impression of "making the perfect the enemy of the good," and place ideal solutions in context.
- A comprehensive discussion of both advantages and disadvantages of these proposals can also draw attention to their limits, and the risk of their failure.
- A call for further investigation into the feasibility of risk-reducing measures such as these may help to inspire research efforts in this area, which may be easier for academics to contribute to today than the development of a theory of Friendly AI.
Human capital benefits, or network benefits (Will writing this paper help new Visiting Fellows become familiar with key research domains? Will it help create relationships with outside co-authors? Will it give folks interested in existential risk entry into new communities where valuable contacts may be found?):
- Visiting Fellows working on this paper will become more familiar with issues around a key domain of AI risk. The paper may also create relationships with researchers interested in the problem of AI safety in 'softer takeoff' scenarios.
Donate Online
Credit card transactions are securely processed through PayPal.