Creating Friendly AI
is ©2001 by
Singularity Institute for Artificial Intelligence, Inc.
All rights reserved.
Next:
A.3: Glossary
Bookmark
Up:
Appendix A: Friendly AI Guides and References
Monolithic
Prev:
A.1: Indexed FAQ
A.2: Complete Table of Contents
Preface
[4K]
INIT
[1K]
1: Challenges of Friendly AI
[39K]
1.1: Envisioning perfection
1.2: Assumptions "conservative" for Friendly AI
1.3: Seed AI and the Singularity
1.4: Content, acquisition, and structure
An Introduction to Goal Systems
[24K]
Interlude: The story of a blob
[12K]
2: Beyond anthropomorphism
[82K]
2.1: Reinventing retaliation
2.2: Selfishness is an evolved trait
2.2.1: Pain and pleasure
2.2.1.1: FoF: Wireheading 1
2.2.2: Anthropomorphic capitalism
2.2.3: Mutual friendship
2.2.4: A final note on selfishness
2.3: Observer-biased beliefs evolve in imperfectly deceptive social organisms
2.4: Anthropomorphic political rebellion is absurdity
Interlude: Movie cliches about AIs
2.5: Review of the AI Advantage
Interlude: Beyond the adversarial attitude
[18K]
3: Design of Friendship systems
[0K]
3.1: Cleanly Friendly goal systems
[57K]
3.1.1: Cleanly causal goal systems
3.1.2: Friendliness-derived operating behaviors
3.1.3: Programmer affirmations
3.1.3.1: Bayesian sensory binding
3.1.3.2: Bayesian affirmation
3.1.3.3: An unfortunate circularity
3.1.3.4: Absorbing affirmations into the system
3.1.3.5: Programmer affirmations must be honest!
3.1.4: Bayesian reinforcement
3.1.4.1: Interesting behaviors arising from Bayesian reinforcement
3.1.4.2: Perseverant affirmation (of curiosity, injunctions, et cetera)
3.1.5: Cleanliness is an advantage
3.2: Generic goal systems
[79K]
3.2.1: Generic goal system functionality
3.2.2: Layered mistake detection
3.2.2.1: FoF: Autonomic blindness
3.2.3: FoF: Non-malicious mistake
3.2.4: Injunctions
3.2.4.1: Anthropomorphic injunctions
3.2.4.2: Adversarial injunctions
3.2.4.3: AI injunctions
3.2.5: Ethical injunctions
3.2.5.1: Anthropomorphic ethical injunctions
3.2.5.2: AI ethical injunctions
3.2.6: FoF: Subgoal stomp
3.2.7: Emergent phenomena in generic goal systems
3.2.7.1: Convergent subgoals
3.2.7.2: Habituation
3.2.7.3: Anthropomorphic satisfaction
3.3: Seed AI goal systems
[101K]
3.3.1: Equivalence of self and self-image
3.3.2: Coherence and consistency through self-production
3.3.2.1: Look-ahead: Coherent supergoals
3.3.3: Unity of will
3.3.3.1: Cooperative safeguards
3.3.3.2: Maintaining trust
3.3.4: Wisdom tournaments
3.3.4.1: Wisdom tournament structure
3.3.5: FoF: Wireheading 2
3.3.6: Directed evolution in goal systems
3.3.6.1: Anthropomorphic evolution
3.3.6.2: Evolution and Friendliness
3.3.6.3: Conclusion: Evolution is not safe
3.3.7: FAI hardware: The flight recorder
3.4: Friendship structure
[5K]
Interlude: Why structure matters
[7K]
3.4.1: External reference semantics
[35K]
3.4.1.1: Probabilistic supergoal content
3.4.1.2: Bayesian affirmed supergoal content
3.4.1.3: Semantics of external objects and external referents
3.4.1.3.1: Clean external reference semantics
3.4.1.3.2: Flexibility of conclusions about Friendliness
3.4.1.4: Deriving desirability from supergoal content uncertainty
Interlude: Philosophical crises
[13K]
Crisis of Bayesian affirmation
3.4.2: Shaper/anchor semantics
[60K]
3.4.2.1: "Travel AI": Convergence begins to dawn
3.4.2.2: Some forces that shape Friendliness: Moral symmetry, semantics of objectivity
3.4.2.3: Beyond rationalization
3.4.2.4: Shapers of philosophies
3.4.2.4.1: SAS: Correction of programmer errors
3.4.2.4.2: SAS: Programmer-independence
3.4.2.4.3: SAS: Grounding for external reference semantics
3.4.2.5: Anchors
3.4.2.5.1: Positive anchors
3.4.2.5.2: Negative anchors
3.4.2.5.3: Anchor abuse
3.4.2.6: Useful shaper content requires high intelligence
3.4.3: Causal validity semantics
[38K]
3.4.3.1: Taking the physicalist perspective on Friendly AI
3.4.3.2: Causal rewrites and extraneous causes
3.4.3.3: The rule of derivative validity
3.4.3.4: Truly perfect Friendliness
3.4.3.5: The acausal level
3.4.3.6: Objective morality, moral relativism, and renormalization
3.4.4: The actual definition of Friendliness
[8K]
3.4.4.1: Requirements for "sufficient" convergence
3.5: Developmental Friendliness
[50K]
3.5.1: Teaching Friendliness content
3.5.1.1: Trainable differences for causal validity
3.5.2: Commercial Friendliness and research Friendliness
3.5.2.1: When Friendliness becomes necessary
3.5.2.2: Evangelizing Friendliness
3.5.3: Singularity-safing ("In case of Singularity, break glass")
3.5.3.1: The Bayesian Boundary
3.5.3.2: Controlled ascent
3.5.3.2.1: Programmatic controlled ascent via an "improvements counter"
3.5.3.2.2: Controlled ascent as ethical injunction
3.5.3.2.3: Friendship structure for controlled ascent
Interlude: Of Transition Guides and Sysops
[20K]
The Transition Guide
The Sysop Scenario
4: Policy implications
[36K]
4.1: Comparative analyses
4.1.1: FAI relative to other technologies
4.1.2: FAI relative to computing power
4.1.3: FAI relative to unFriendly AI
4.1.4: FAI relative to social awareness
4.1.5: Conclusions from comparative analysis
4.2: Policies and effects
4.2.1: Regulation (-)
4.2.2: Relinquishment (-)
4.2.3: Selective support (+)
4.3: Recommendations
5: Miscellaneous
[5K]
5.1: Relevant literature
END
[1K]
Appendix A: Friendly AI Guides and References
[0K]
A.1: Indexed FAQ
[36K]
A.2: Complete Table of Contents
[0K]
A.3: Glossary
[39K]
A.4: Version History
[3K]
Next:
A.3: Glossary
Up:
Appendix A: Friendly AI Guides and References
Prev:
A.1: Indexed FAQ