Creating Friendly AI is ©2001 by Singularity Institute for Artificial Intelligence, Inc.  All rights reserved.

Next: Interlude: Why structure matters Bookmark
Up: 3: Design of Friendship systems Monolithic
Prev: 3.3: Seed AI goal systems


3.4: Friendship structure

What distinguishes a structurally Friendly goal system from a generic goal system, or a seed AI goal system, is the ability to overcome mistakes made by programmers.

A generic goal system can overcome mistakes in subgoals by improving knowledge.  The subgoals of a normative generic goal system are, necessarily, coherent.  There are few or no degrees of freedom in subgoal content; the programmer cannot make arbitrary (perseverant) changes to knowledge, and therefore cannot make arbitrary direct changes to subgoals.  One might (or might not) be able to manipulate the subgoals by manipulating the supergoals, but it would be definitely impossible to manipulate the subgoals in isolation.

A seed AI goal system can overcome errors in source code, or at least those errors that don't affect what the system reflectively believes to be its own function.  A normative seed AI has subgoals and source code that are, necessarily, coherent.  There are few or no degrees of freedom; the programmer cannot directly make arbitrary, perseverant, isolated changes to code.  Since the AI can continually improve and rewrite the implementation, the programmer builds an implementation that grows into the implementation.  If implementation is to function as subgoal is to supergoal, then a seed AI's implementation is as objective, or at least as convergent, as the factual beliefs of a freely learning intelligence.  A self-modifying AI's implementation has little or no sensitivity to initial conditions.

A structurally Friendly goal system is one that can overcome errors in supergoal content, goal system structure and underlying philosophy.  The degrees of freedom of the Friendship programmers shrink to a single, binary decision; will this AI be Friendly, or not?  If that decision is made, then the result is, not a Friendly AI, but the Friendly AI, regardless of which programming team was historically responsible.  This is not automatic - I think - since some amount of correct Friendliness content is required to find the unique solution, just as a seed AI needs enough working code to think through the self-improvements, and a general intelligence needs enough of a world-model to successfully discover new knowledge and correct errors using sensory information.

Complete convergence, a perfectly unique solution, is the ideal.  Anyone using "external reference semantics" on an internal basis will have an easy intuitive understanding of what this means; either one has the correct morality and the question is conveying it to the AI, or one seeks the correct morality and the question is building an equally competent or more competent seeker.

In the absence of perfect convergence, the solution must be "sufficiently" convergent, as defined in 3.4.4.1: Requirements for "sufficient" convergence.

Others, of that philosophical faction which considers morality as strictly subjective, may ask how supergoal content can converge at all!  In 3.4.2: Shaper/anchor semantics, we see how human philosophical decisions are made by complex systems of interacting causes that contain many normative, objective, or convergent components.  Not even a trained philosopher who is an absolute devotee of moral relativism can think thoughts, or make philosophical decisions, completely free of testable hypotheses or beliefs about objective facts.

Human philosophers are self-modifying.  We grow into our personal philosophies using a process "contaminated" at every step by cognitive processes with normative versions, by beliefs about external reality, by panhuman characteristics, and other convergent or "sufficiently convergent" affectors.  It's not at all unreasonable to hope for a large degree of convergence in the whole, if all the components of this recursive growth process were to be maximally improved.  (Again, we will later define how much convergence is absolutely required.)

And now I've gone and discussed things out of their proper place.  In this case, the only way to really understand what the goal is is to examine the process proposed for reaching that goal.  It's a terribly peculiar kind of philosophical dilemna; unique, perhaps, to the question of how to build the best possible philosopher!  Only after seeing the specs for the proposed mind can we recognize the AI's processes as an idealized version of our own.


Next: Interlude: Why structure matters
Up: 3: Design of Friendship systems
Prev: 3.3: Seed AI goal systems