Creating Friendly AI is ©2001 by Singularity Institute for Artificial Intelligence, Inc.  All rights reserved.

Next: 5: Miscellaneous Bookmark
Up: Creating Friendly AI Monolithic
Prev: Interlude: Of Transition Guides and Sysops


4: Policy implications

There are people who, rather than choosing between changes, try to stop change entirely.  Rather than considering which technologies to develop first, they flinch away from the very idea of the technology.  As an emotional consequence of that flinch, it becomes necessary for them to believe that the technology can be stopped entirely.

Society never goes backward in time.  Moving forward in time isn't necessarily the same thing as progress.  Not all changes are for the good - though most are - and we must sometimes choose between changes.  But before you can do that, you need to accept that something will change; that, whether you like it or not, society will never go twenty years back in time, or even stand still.  Whatever solution you propose must be a way to move forward; not to stop, or go back.

4.1: Comparative analyses

"Nor let him ever believe that a state can always make safe choices; on the contrary, let him think that he must make only doubtful ones; because this is in the order of things, that one never tries to avoid one inconvenience without incurring another; but prudence consists of knowing how to recognize the kinds of inconveniences, and to take the least sad for good."
           -- Niccolo Machiavelli, "The Prince"
AI is what Nick Bostrom calls an existential risk:  "One where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential."  In particular, most forms of unFriendly AI would constitute a "Bang" - "Earth-originating intelligent life goes extinct in relatively sudden disaster resulting from either an accident or a deliberate act of destruction."  Within Nick Bostrom's list of Bangs, sorted by probability, "badly programmed superintelligence" is number four out of eleven.

One of the greatest sources of danger, as we enter into the future, is that many people are not used to thinking about existential risks - would have difficulty naming two or three existential risks, much less eleven - and hence may not be emotionally equipped to deal with the concept.  In the state of mind where all existential risks are equally unacceptable, a 95% chance of taking an 90% existential risk and an alleged 5% chance of avoiding existential risk entirely may be perceived as being better than a 100% chance of taking a 20% existential risk.  Worse, an entirely implausible method for avoiding all existential risk may be amplified by wishful thinking into looking like a real chance of success.

If we blindly panic, if we run screaming in the opposite direction whenever we encounter an existential risk, we may run smack into a much larger and more dangerous existential risk.

4.1.1: FAI relative to other technologies

Artificial Intelligence, as an ultratechnology, does not exist in isolation.  There are other kinds of advancing technologies; nanotechnology, biotechnology, and nuclear technology, for example.  Artificial Intelligence is unique among the ultratechnologies in that it can be given a conscience, and in that successful development of Friendly AI will assist us in handling any future problems.  A Sysop Scenario would obviate problems entirely, but even in the absence of a Sysop, the presence of trustworthy, reliable nonhuman altruists (open-source altruists?) would calm down the world considerably, and cut otherwise uncuttable Gordian knots.

In a "~human" scenario ("near human", "approximately human-equivalent"), Friendly AIs would play ~human roles in the existing human economy or society.  To the extent that Friendly AIs have power in the world economy, in human society, or in technological development, they can exert direct influence for good.  (1).  For example, a Friendly AI working in nanotechnology can enthusiastically work on Immunity systems while flatly refusing to develop nanotechnological weaponry.  (2).

I have argued elsewhere that AI is intrinsically safer than nanotechnology; but that is logically beside the point.  What matters is that success in safely developing AI reduces the risk of nanotechnology more than success in safely developing nanotechnology reduces the risk of AI.  Friendly AI is thus the best challenge to confront first.

4.1.2: FAI relative to computing power

The computing power in a single chip is currently (Mar 2001) around a billion operations per second, doubling every couple of years or so.  The computing power in parallel processing machines increases faster than that, as does the power of the largest supercomputer (IBM has announced it plans to build a petaflops machine in 2005, Blue Gene.)  How many computers are connected to the Internet?  A billion?  Nobody actually knows.  Look around online and you'll see that the last time anyone even tried to estimate it was in 1995.  All anyone knows is that, whatever the number is, it keeps going up.

It can safely be assumed that available computing power increases with time.  Any risks or opportunities that increase with increasing computing power will increase with time; any risks, opportunities, or probabilities that decrease with increasing computing power will decrease with time.  The total processing power available to an average research project will increase faster than chip clock speeds (i.e., maximum parallel speeds increase faster than maximum serial speeds).  The total networked processing power on the planet will increase even faster than that; a doubling time of nine months is probably an underestimate.

How much computing power does it take to build an AI?  Nobody knows, and it probably depends on how smart the researchers are.  Turning that around, we can say that how smart the researchers need to be depends on how much computing power is available; increasing the amount of available computing power decreases the difficulty of the problem.  Usually, the Singularity is visualized as an ascending curve describing computing power; at some point, the curve crosses the constant line that is the power of a human brain, and AI is created.  I visualize a line representing the intelligence needed to create AI, currently far above human levels, but slowly descending; beneath that line, a series of peaks representing the best of the current pack of AI projects.  At some point, the descending line touches the topmost peak, and AI is created.

Does the difficulty of making an AI Friendly decrease with increasing computing power?  Not obviously so; if the problem of building AI in the first place is assumed to have been solved, then building a Friendly AI is a problem in architecture and content creation and depth of understanding, not raw computing power.  Thus, increasing computing power decreases the difficulty of building AI relative to the difficulty of building Friendly AI.  Anyone who can build an AI that runs on a PIII is vastly smarter than I am and hopefully knows far more than I do about Friendly AI.  At that our current level of computing power, the genius required for AI exceeds the genius required for Friendliness.  The same hopefully holds true at that point where AI first becomes just barely human-feasible.

Even so, increasing computing power will eventually decrease the genius required for AI to significantly below the genius required for Friendliness.  If - at this point - smarter researchers still have a speed advantage, then humanity will be safer, though not safe.  If researcher intelligence is relatively insignificant compared to funding disparities, then humanity's safety will rely on how widely a workable theory of Friendliness is disseminated and accepted within the AI community.  In either case, the potential will exist to screw up really big-time.

There are five relevant dynamics:

What are the intrinsic effects of increased computing power on the development of Friendly AI?  All else being equal, an increased absolute level of computing power is likely to translate into a shorter total development time for the AI project as a whole - fewer years to get to the point of hard takeoff, and a faster maximum hard takeoff speed when that point is reached.  Increased computing power may also translate into less absolute intelligence on the part of the seed AI when the first point of potential hard takeoff is reached.  Thus, delaying FAI research will not make it any easier to develop Friendliness.

Given vast amounts of computing power, an "AI researcher" may no longer be required to create an Artificial Intelligence; any good hacker may be able to do it.  Given supersaturated amounts of computing power - i.e., a single parallel computer with a thousand times as much power as a human brain, or a global network with a million times as much power as a human brain - the intelligence required to create AI may drop to the point where accidental creation of AI becomes possible.

4.1.3: FAI relative to unFriendly AI

Friendly AI may require great depth of understanding, even relative to the depth of understanding needed to create AI.  However, the amount of programmer effort required to implement a Friendly architecture and provide Friendship content should be small relative to the amount of effort needed to create AI; that is the prediction of this paper's theory, anyway.  I would expect even a go-for-broke Singularity project to expend fewer programmer-hours on Friendship than on the rest of the AI by at least an order of magnitude; there isn't that much to be done, relative to everything that needs to be done to create a general intelligence.

Thus, a non-Friendly-AI project should not have a huge advantage relative to an AI project "burdened" with the responsibility of Friendly AI.  Any possible advantage in speed would probably be insignificant next to variances in speed created by differences in funding or researcher intelligence.

However, this only holds true if computing power is sufficient for AI but not supersaturated for AI.  Given enough computing power, methods such as directed evolution become equally powerful, or more powerful, than intelligent design.  Friendliness is intrinsically harder - not too much harder, but still harder - with directed evolution; furthermore, with supersaturated computing power, directed evolution can proceed fast enough that Friendship content becomes the dominant sink for programmer effort.  In this case, one would simply have to hope that all fairly-competent AI projects happened to subscribe to Friendliness; if one project "cheated", that project would succeed ahead of the others.  Under those circumstances, humanity may make it through okay if there are ten competing first-rank AI projects - but not a hundred.

If a controlled ascent is used, it should be a fast controlled ascent - fast enough not to give a significant speed advantage to a non-Friendliness-aware AI project.  The shorter the lead over other projects, the faster the controlled ascent needs to be; that is, the amount of time expended on controlled ascent needs to be small relative to the variance in speed between AI projects.

Given a social or academic atmosphere in which all AI research is held to be "irresponsible" - or outlawed - it is far more likely that the most advanced AI project, at any given point in time, will belong to a fringe group, rogue state, or some other social faction that cares significantly less about Friendliness.

4.1.4: FAI relative to social awareness

To the extent that the public, or a significant proportion of the public, is aware of AI and approves of AI, this will tend to speed AI relative to other ultratechnologies.  This would represent an advantage insofar as it is desirable that AI come first.  Public approval may also help nonprofit research projects catch up, in terms of funding, relative to commercial projects - philanthropic funding is more dependent on public approval.  This would represent an advantage only to the degree that nonprofit projects tend to be more Friendliness-aware, or spend more effort on Friendliness, relative to commercial projects.

To the extent that the academic community is aware of Friendly AI and approves of Friendly AI, it will make it more likely that any given research project is Friendliness-aware.  (I'd expect - and hope - that the projects closest to a hard takeoff will be the ones intentionally trying for a hard takeoff, and that the projects trying for a hard takeoff will tend to be more Friendliness-aware.  Thus, this factor becomes more important the more projects are roughly equal in the leading rank; it then becomes more necessary that Friendly-AI-awareness be a property of the AI community at large.)  However, see below (4.2: Policies and effects) about possible negative effects if the academic community becomes fixated on one theory.

To the extent that the public, or a significant proportion of the public, is aware of AI and disapproves of AI, it will tend to slow down AI (and possibly other technologies as well, if the disapproval is part of a general antitechnology movement).  However, the advance of computing power will probably not be slowed, or will be slowed only slightly.  Public disapproval of AI, in general, is likely to hamper awareness of Friendly AI ("tarred with the same brush").The most probable cause for public disapproval of AI is a technophobic panic reaction; this strongly advances the probability that unworkable policies (see below) will be proposed, politically approved, and implemented.  Around the most you can say for this scenario is that it might hamper non-Friendliness-aware projects more than Friendliness-aware projects - unless the antitechnology opposition becomes fixated on an unworkable theory of Friendliness, one which leads into the "Adversarial Swamp".

Finally, to the extent that a given group that needs to understand Friendly AI is influenced by ambient memes, the spread of a willingness to accept the possibility of independent AIs would lead to a greater ability to accept CFAI principles such as "If the AI stops wanting to be Friendly, you've already lost," self-modifying AI, and so on.  The spread of an "us vs. them" attitude towards AIs would resonate with what Creating Friendly AI calls the "adversarial attitude", selectively promote fears of those negative possibilities that would be most likely if AIs had humanlike psychologies (which they don't), and in general, make it more difficult for a given listener to achieve a nontechnical understanding of Friendly AI or a technical understanding of the CFAI design principles.  An antitechnology advocacy group that understood the concept of anthropomorphism and was careful to emphasize only realistic negative possibilities would not have such a negative effect - their fallout would be limited strictly to differential funding and so on - but I think it spectacularly unlikely that such an advocacy group will exist.

4.1.5: Conclusions from comparative analysis

It will be best if Friendly AI is created shortly after the first point where AI becomes computationally feasible.  The intrinsic dynamics of Friendly AI argue against slowing down Friendly AI relative to progress in computation.  The safety of Friendly AI relative to other technologies argues against slowing down progress in computation relative to progress in other technologies.  Finally, it is my opinion that public misinformation has a good chance of peaking at the worst possible time.

4.2: Policies and effects

4.2.1: Regulation (-)

No, I don't think that the text of "Creating Friendly AI" should be sent to Congress and passed into law.  The existing force tending to ensure Friendliness is that the most advanced projects will have the brightest AI researchers, who are most likely to be able to handle the problems of Friendly AI.  Turning the problem over to a committee (or to Congress) would end up enforcing whatever guidelines the committee thought were most plausible.

Almost all existing discussion has been phrased in terms of "Asimov Laws", "restraining AIs", "controlling AIs" - in general, what Creating Friendly AI calls the adversarial attitude.  I have heard a lot of proposals for making some aspect of AI design mandatory, and without exception, it's always some feature that's supposed to be "unbreakable" or "nontamperable" or "absolute" or "nonremovable" or whatever.  I furthermore know that, as human beings, anyone who makes or hears such a proposal will get a psychological boost from this absolutism, for the reasons discussed in "Ethical injunctions".

And that's only the beginning of the psychologically appealing fallacies.  Group opposition, "them and us" emotions; turning subgoals into supergoals to make them more "absolute"; stripping away the shaper/anchor semantics or the causal validity semantics or even the seed AI's coverage of the goal system because it's a "loophole"... in fact, I expect that almost all the features described in Creating Friendly AI, from a seed AI's self-modification, to the external reference semantics interpreting programmer statements as sensory data, would be cognitively processed as a "loophole" by someone thinking in terms of clamping down on an AI's "native" desire for dominance and so on, rather than CFAI's "Observer-biased beliefs evolve in imperfectly deceptive social organisms" and "If the AI stops wanting to be Friendly, you've already lost."

It is not impossible that the current dominance of anthropomorphism is simply due to the absence of nonanthropomorphic analysis, and that, now that Creating Friendly AI has been published, it will spread like wildfire and all the anthropomorphisms will simply melt away in the sun.  If so, I will be pleasantly surprised. Very surprised, because anthropomorphism and technophobia have defeated hard numbers in far less ambiguous engineering questions than this.  It is not impossible that Congress can be given an excellent grasp of FAI theory, but I'd like to see that happen first,before making plans relying on that understanding.  The same goes for academia and proposals to have a review board composed of prestigious (but elderly) scientists.

It is not necessary - as a condition for a review board having any benefit at all - that a review board understand FAI theory better than the best researchers, or even that the review board understand FAI theory better than the average researchers.  What is necessary is that front rank contain so many leading projects that at least one of them is even less competent than the review board; the review board would then provide a benefit in that particular instance.  The question is how much chaos would be caused by the review board enforcing their ideas on all the other projects.  I'd rather trust the selection process whereby the smartest researchers have the most advanced projects than pin my hopes on a committee of "elderly but distinguished scientists"; if convening a committee wouldn't work to solve the problem of AI, why would it work to solve the problem of Friendliness?

Trying to elevate any one theory would also be poisonous to continued progress in Friendly AI.  I'm not saying that Creating Friendly AI is inadequate, but I would expect to improve it even further with time and experience.  Injecting politics into the process would tend to intrinsically slow that down, as the free exchange of ideas and fair combat of opinions was replaced by the exercise of political influence.  This is in addition to the unleashing of faster-propagating anthropomorphic memes.

Even if the current version of Creating Friendly AI were optimal or near-optimal, and leaving the question of political ethics aside, I don't see any good way that Creating Friendly AI could be enforced.  The Foresight Guidelines on nanotechnology make recommendations such as "Any self-replicating device which has sufficient onboard information to describe its own manufacture should encrypt it such that any replication error will randomize its blueprint."  It is relatively easy to verify that this design principle has been implemented.  It would be possible, though more difficult, to verify that a Friendly AI project uses external reference semantics, causal validity semantics, anchoring points, and so on.  But how would you go about verifying that unity of will has been maintained, or that the Friendliness programmers have avoided rationalization in their programmer affirmations (so as to prevent a philosophical crisis)?

Experience has shown that surface correspondence of features means nothing in AI.  The field is replete with hyped-up AIs that use "the same parallel neural architecture as the human brain" and turn out to (a) use the same parallel neural architecture as an earthworm's brain, or (b) use neurons simplified down to the level where they couldn't even compete with an earthworm.  Trying to legally enforce Friendliness, even if the theory is right, would be like passing laws requiring programmers to write modular code.  A programmer who understands modular code will try to write modular code, and can learn from books and teachers about how to write more modular code; if someone just doesn't get the concept, the most you can do is force them to write code that looks modular, but probably isn't.  I do not believe it is possible to write a law such that obedience to the law is objectively verifiable in court and such that obedience to the law guarantees that a Friendly AI will be produced.

Insofar as "pressure to conform with Friendliness" acts as a positive force at all - that is, insofar as the people delivering the pressure have a workable theory of Friendliness - the pressure will need to be delivered in a social context where people can make free judgements about how well a Friendly AI project is succeeding.  A list of design features, test problems, and performance metrics may be a valuable tool for making these judgements, but it can't be the sole tool.  Thus, informal pressures are strongly preferable to formal requirements.

Finally, of course, there's the ethical principle that "I have a bright idea, so give me power over others" doesn't tend to work very well as a social process.  It stops working as soon as someone else tries to say the same thing.

In conclusion, the most plausible-sounding "regulations", and also the only ones that could practically be enforced, are anthropomorphic, adversarial requirements such as "nonoverridable Asimov Laws".  If an effort to get Congress to enforce any set of regulations were launched, I would expect the final set of regulations adopted to be completely unworkable.

4.2.2: Relinquishment (-)

The policy of "relinquishment" has the stated aim of preventing a technology from happening.  For noncatalytic technologies like cloning, GM foods, and so on, the goal of relinquishment is to prevent the technologies from becoming mainstream enough to have a significant effect on society.  For catalytic technologies like AI and nanotech, the goal of relinquishment is to prevent the technology from ever once being developed by any party.

I do not see any possible means whereby relinquishment could be achieved in the case of a catalytic technology.

Bill Joy, in the original article advocating reliquishment, continually uses the plural:  "We" must relinquish these technologies.  To say "we", in this instance, is to postulate a degree of social unity that humanity simply does not possess.  If "we" could decide not to develop AI, "we" could decide to end world hunger or make all nations liberal democracies, and a lot more easily, too.  At most, the US, or even the US and most of the other nations of the world, might implement an overt policy against AI.  Meanwhile Moore's Law would keep ticking and available computing power would continue to increase, bringing AI into the reach first of noncompliant nations, then of noncompliant factions, then of a lone hacker in a basement, and finally within the range of completely accidental, emergent intelligence.  As for the idea of halting Moore's Law, before someone claims to be able to implement a planetwide shift away from computing technology, I would first like to see them implement - just as a test case - planetwide relinquishment of nuclear weapons, which have few or no civilian uses.  Or perhaps conversion of the UN to a planetwide liberal democracy.

In other words, I simply do not believe the claim that relinquishment is possible.  Relinquishment with respect to technology development makes sense only if having some "evil" technology be developed a few years later is worth the fact that the "evil" technology will be first developed by a noncompliant state or faction.  Relinquishment of technology deployment makes sense if having the technology be rarely used within the liberal democracies is more important than who else is using it, which is at least vaguely plausible when the antitechnology opposition talks about relinquishing cloning or GM foods - already researchers are claiming that they will launch projects to clone a human being, but it's possible that a sufficient backlash could prevent cloning from ever becoming mainstream; could keep cloning a highly secretive, expensive proposition conducted in Third World nations.  But relinquishment of invention of a catalytic technology postulates a capability that "we" simply do not possess.

Given that relinquishment is implausible to the point of impossibility, two questions remain.  First, why does relinquishment keep getting proposed?  Second, what would be the effects be of some faction urging relinquishment, or a society attempting to implement relinquishment, but without success?

The emotional appeal of relinquishment is simple:  If you tell someone about a threat, their emotional reaction is to run like blazes the other way, even if it kills them, rather than make a 10-degree course change to take the path of least comparative risk.  The emotional need to do something, anything, even if it's the wrong thing, should not be underestimated.  To the human mind, there is no such thing as a necessary risk.  For any process in which a risk is perceived, there will exist undischarged nervous tension until the human feels himself to be in control of the process; manipulating it some way, any way, even if that just makes it worse, so long as the tension is discharged.  The will to meddle is a very powerful force.  The human mind will also flinch away from unpleasant possibilities.  Not "try to eliminate", but "flinch away from" - which eliminates any possibility of accepting a nonzero probability of failure, or planning for it.  Relinquishment, as a proposed policy, satisfies both the will to meddle and the need to flinch away.

The psychology of relinquishment makes it very unlikely that advocates of relinquishment will accept, or plan for, the possibility that AI will happen despite them.  Thus, advocates of relinquishment are not likely to notice if the results of their actions are to promote unFriendly AI relative to Friendly AI, or slow down Friendly AI relative to computing power, or slow down AI relative to nanotechnology.  Nor are they likely to care.

The effect of an antitechnology group that was memetically effective but which did not succeed in changing the legal environment would be as listed under 4.1.4: FAI relative to social awareness - the spread of the idea that AI is evil would probably interfere with the efforts of the Singularity Institute or other institutions to propagate the emerging theory of Friendly AI through the academic and commercial communities, and would also interfere with any efforts to spread a nontechnical understanding of the psychology of AI.  It is theoretically possible that an antitechnology group could avoid this fallout by being psychologically correct in their discussion of AI and demonizing AI researchers rather than AIs, while also being careful to demonize non-Friendly-AI projects just a little more than Friendliness-aware projects.  This would still slow down Friendly AI relative to computing power, relative to other ultratechnologies, and so on, but at least the emerging theory of Friendly AI wouldn't be crippled.  However, I think that wishing for an antitechnology advocacy group with that much scientific knowledge and emotional maturity verges on fantasy.

The effect of an antitechnology group that gained enough political power to start banning AI would be to completely cripple the academic dissemination of Friendly AI, prevent the formation of those social structures that might reinforce Friendly AI (see below), take all AI projects out of the public view, and hand the advantage to noncompliant nations or factions which are less likely to be Friendliness-aware (though the possibility still exists).  It would also considerably slow down AI relative to computing power, since Moore's Law is likely to continue unchecked, or at least to be far less slowed than AI.

Finally, I feel strongly that some of the tactics resorted to by advocates of relinquishment are unethical, are destructive, promote blind hatred rather than the stated goal of informed hatred, and so on.  These tactics are not intrinsically part of advocating relinquishment, and it would be unfair of me to imply that they were, but I nonetheless expect that the fallout from such tactics will also be part of the damage inflicted by any further acceleration of the antitechnology movement.

4.2.3: Selective support (+)

When attempting to influence the relative rates of technological development, or the relative rates of any social processes, the best method is almost always to pick the technology you like, then try to accelerate it.

Selective support is a globally viable strategy.  Tthe more people that agree with your selection, the larger the total support.  Unlike regulation and relinquishment, however, it isn't necessary to gain a threshold level of political power before the first benefits materialize.

This is essentially the strategy followed by the Singularity Institute for Artificial Intelligence and our supporters.  We feel that Friendly AI should arrive in advance of other technologies, so we've launched a Friendly AI project.  We feel that projects should be Friendliness-aware, so we attempt to evangelize other projects to adopt Friendly AI.  We feel that the theory embodied in Creating Friendly AI is superior to the existing body of discourse, so we attempt to spread the paper, and the ideas, as widely as possible.

4.3: Recommendations

To postulate that we can relinquish AI is to postulate a capability that "we" do not possess.  Efforts to advocate relinquishment, or failed efforts to implement relinquishment, will have net negative effects on the relative rates of technological processes and on understanding of Friendly AI.  This holds especially true if, as seems likely, relinquishment advocates do not accept or plan for the possibility of failure.

Regulation would probably selectively embrace unworkable theories of Friendly AI, to a far more negative degree than the free choice of the researchers on the AI projects closest to a hard takeoff at any given point.  Regulation would also negatively impact the free and scientific development of Friendly AI theory.

The Singularity Institute was created in the belief that Friendly AI is the best technological challenge to confront first; we implement this belief through our own Friendly AI project, by trying to advance the scientific theory of Friendly AI, and by trying to evangelize other AI projects - in short, we selectively support Friendly AI to accelerate it, both in absolute terms and relative to other important processes.  We believe that selective support of Friendly AI (either through independent efforts, or through support of the Singularity Institute) is the most effective way for any other interested parties to affect the outcome of the issue.



Next: 5: Miscellaneous
Up: Creating Friendly AI
Prev: Interlude: Of Transition Guides and Sysops