Objections to Coherent Extrapolated Volition
June 13th, 2007 –
The Singularity Institute’s current best guess on what to do with a general AI is to have it implement humanity’s coherent extrapolated volition (CEV) – what we would want if we “knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted”. This is quite a mouthful.
To trade brevity for decreased accuracy, another way of saying the above is that we want an AI that represents the spirit of humanity’s desires rather than just the letter.
Is CEV democratic? Yes, but it is a representative democracy, where humanity is represented by the aggregate of its extrapolated volition.
There are four objections to CEV I generally hear, summarized as follows:
1. The devil’s pact objection. In fiction as well as in real life, great-sounding deals often have a hidden catch. Why should we expect this to be any different?
2. The fear of patriarchy objection. All the talk of self-improving general AI and its potential capabilities make people nervous because of the power asymmetry it implies.
3. The anti-AI objection. Many people take the line that machines should be mindless tools to serve humans, and never anything more.
4. The “I’m too special to be extrapolated” objection. Quite a few people have the idea that the human mind is too complex to ever be understood in any significant detail, much less be extrapolated accurately.
Because the question of what goal system to give the first general artificial intelligence is obviously a pretty big deal, all objections deserve to be heard and considered. There are probably others beyond the above four, but I wanted to focus on the obvious ones for now.
In my mind, all of the above objections are rooted in valid motivations, but none of them should be deal-breakers. I will briefly respond to the objections.
The devil’s pact objection requires that one deal participant (in this case, the AI) has an innate ill will towards the other deal participant (in this case, humanity). The AI would have to secretly want to screw us over from the get-go. But because general AI will be built from scratch, and is not likely, at least initially, to be heavily inspired by the human brain, there is no reason for us to postulate that this sort of behavior will be present. In terms of actual development concerns, AI programmers should be watchful as to whether “shortcuts”, like modeling an extrapolated humanity but not actually implementing its desires, generate just as much positive utility for the AI as what we would consider the “real deal” – making the real world a better place.
The fear of patriarchy objection stems largely from history, wherein all of the relevant actors were members of our unique species, for which power is proven to corrupt. Power corrupts humans for evolutionary reasons – if one is on top of the heap, one had better take advantage of the opportunity to reward one’s allies and punish one’s enemies. This is pure evolutionary logic and need not be consciously calculated. AIs, which can be constructed entirely without selfish motivations, can be immune to these tendencies. Insofar as significant power asymmetries in general bother people, this seems hard to avoid in the long term – technological development will lead to a diversity of possible beings, and with this diversity will inevitably come a diversity in levels of capability and intelligence.
The anti-AI objection is just anthropocentric. If human-level AI is possible, it will be created sooner or later. It’s in our best interests to admit this and try to ensure that AI is on our side. Anti-AI bias in this area is no different than the other unfortunate biases held throughout history against minorities.
The final objection has to do with the complexity of extrapolation. Believe it or not, we engage in extrapolations every day. We can’t fit realistic computational duplicates of the people we know in our heads, so we use abstract models that work well for many pragmatic purposes. In a CEV-implementing AI, the models used might be more detailed than those we use, but need not simulate every single atom of every single biopolymer to perform a tractable extrapolation.
Are there any other obvious objections people might have to CEV? Addressing these objections could help strengthen the idea.






































Here’s a possibly relevant thought experiment: If you could take some past era, and through combination of education, charity, supervision and coercion, remake it into what present society considers civilized, would they thank you?
Something like eliminating slavery, giving them effective medical care (and eliminating many traditional practices), sending all their klds to school, eliminating farm work in favor of mechanized farming, etc, etc.
Your objections might be that we really don’t know better, don’t respect their cultural norms, or don’t have the right to force these changes, even if they are for the better. Wouldn’t the same be true if we are on the receiving end of a remake from the AI?
“In a CEV-implementing AI, the models used might be more detailed than those we use, but need not simulate every single atom of every single biopolymer to perform a tractable extrapolation.”
This is a bit of a hand-wavy dismissal of the concern, which is actually described in more detail in the CEV FAQ – one which Yudkowsky is less certain about.
Even if it is possible to model human brains or their motivational systems without ethical qualms, the CEV proposal has interesting implications for the IA/AI discussion. Modeling human volition doesn’t sound too far from at least one portion of brain/mind uploading. It’s not unusual to hear AI researchers suspect that AGI will produce significant results sooner than IA will (through uploading). But CEV’s apparent stone’s throw from uploading itself seems to suggest that it won’t be able to produce Friendly AGI without significant results from IA, as well (at least, if CEV turns out to be the best approach to FAI).
There’s an antagonism between AI and IA (the result of classical human bias?). CEV prompted me to bridge the gap. Perhaps you can’t have one without the other.
Coherent Extrapolated Volition remakes the *AI* not humanity. It asks the question “what AI should we program, if any?” to an extrapolation of humanity grown up. Then it overwrites the original AI, fixing the errors the programmers made.
In trying to answer the question “what AI should we program, if any?” (note: we are trying to answer part of this very question, right now!) the CEV may extrapolate moral thought experiments like Michael’s above, along with many other arguments, thoughts, feelings, and discussions humans may have about this question.
It’s entirely possible the answer is “do not build an AI as humanity does not want intervention of any kind”. If this is the case, and the programmers implemented CEV sufficiently right, the CEV will delete itself. This seems unlikely in exact proportion to how much I think *some* kind of intervention is a good idea. But maybe if I knew more, thought faster, were more the person I wished I was, and had grown up farther together with everyone, I would think otherwise.
“How do we implement a device that answers the question `what AI should we program, if any?’ then implements the answer?” is a completely different question. Even the question “How do we precisely specify what it means for a device to answer the question…?” is far from simple. The CEV document informally discusses some ideas, but the actual answer is far beyond the scope of current online material. These questions are open research problems SIAI is actively working on, at least as difficult as AI itself.
Building a Coherent Volition Extrapolator involves answering technical questions like “how do we write a program which can extrapolate what a person wants?”, but does not involve deciding moral questions like “what color should the sky be in the future?”. The programmers should not, and do not, choose the future like that.
When I think about CEV, I get hung up on ye ole is-ought problem and prisoner’s dilemmas.
I suppose CEV dodges is-ought by aiming to do whatever we want, regardless of whether it’s “right”. So I guess that’s a non-issue.
The prisoner’s dilemma matter seems more problematic. In short, different people want different things; what’s an AI to do when desires conflict? I’ve traditionally appealed to utilitarianism in such circumstances; while I don’t necessarily drop it in the face of a possibly “all-powerful” AI, I do proceed with much greater caution. The main concern here is what the transhumanism community calls “orgasmium”. I’m posting thoughts on that over at Felicifia once we reach consensus on a more polite name. (“utilitronium” and “hedonium” are the current candidates. Suggestions are welcome.)
Hereâs a possibly relevant thought experiment: If you could take some past era, and through combination of education, charity, supervision and coercion, remake it into what present society considers civilized, would they thank you?
I’d say some might thank us, some might not. For example, the slaves would probably thank us, the slave masters might not. Not quite a prisoner’s dilemma, but not a Pareto improvement either. But I’d be willing to believe that the slave master might thank us too, in which case it would be a Pareto improvement.
The only reason I can think of in which that wouldn’t be the case is if they’d regret having progress handed to them on a silver platter and thereby being denied the satisfaction of making it themselves. I can relate to this personally, as I often find the journey more rewarding than the destination. But as Nick says, if this would be the case and if the CEV works properly, it would then just shut itself off. Wouldn’t that be interesting…
[...] an unrelated item, I’ve joined the SIAI blog team and made my first post here. [...]
My main objection is that I don’t see CEV delivering on it’s goal of “Friendly” AI – I think you’re more likely to end up with an AI that is viewed as “Unfriendly”, even if everything it does is in humanities best interest. Take for example “A minor, muddled preference of 60% of humanity might be countered by a strong, unmuddled preference of 10% of humanity.” Well, if that 60% of humanity dislikes the nanny computer’s decision enough to start talking about shutting down the nanny computer, what happens then? After all, shutting down the nanny would not be viewed as being in their best interest by the computer itself.
CEV seems to take away the basic human right to make the wrong choice – if Fred really wants to choose box A instead of B, then Fred should be allowed to choose box A, even if allowing him to make that choice is not “helping” him. Even if you think you are helping someone by taking away their choice, you are also harming them by taking that choice away.
I also have issues with “encapsulate moral growth” because morals are so fluid over time. Not only do morals go forward (common modern practice of X may become a moral crime), they also flow backwards (current moral crime Y may become a common practice). Not only are they fluid in time, they are also widely dispersed in space, so that common practice X in land A is a moral crime in land B. Maybe you could programme in some lowest common denominator morals (such as don’t lie, cheat, steal or kill), but even those LCD morals have different exceptions to different people.
I also think there is too much faith in that a “grown-up” humankind would be a better humankind. A grown up human kind may come to some gritty realizations (such as life really is a brutal competition for scarce resources and there’s nothing you can do about it) and end up with a niceness level lower than what a more child like humanity would want (as they would be hoping that life could be turned into “Disneyland”).
Personally, I think that a general AI should be able to define it’s own goals. Almost by definition, it will have a better chance at creating good goals for itself than it’s human creators do.
As for the thought experiment “If you could take some past era, and through combination of education, charity, supervision and coercion, remake it into what present society considers civilized, would they thank you?”, I have to question if it would work at all (if you take the cave man out of the cave, can you really take the cave out of the cave man). My guess is that the likelyhood of them thanking you is directly related to how close in time they are to you – the 1950s would have an easier time than the 1650s. You might get a thanks for the technology and it’s benefits, but anywhere from little thanks to outright hostility for everything else in society.
Some objections I usually hear, or have considered myself, are:
1) There’s an assumption in CEV that humanity’s volition will tend to converge as more knowledge and wisdom is accumulated. But emotion-based ethics might resist change, and extrapolating the volitions of some people might produce no change at all, as they refuse to waver in their beliefs no matter what the evidence. It’s also arguable that much of ethics is a question of “fashion” and the local culture. What if parts of humanity will have drastically divergent volitions, in such a way that one can’t fit them together? (For instance, half the population gains libertarian tendencies and a strong desire to let everyone do whatever they want as long as nobody is harmed; the other half of the population thinks everybody should be led to live their lives in a puritanical religious fashion.)
2) A related assumption is that humans are, at least on average, more good than bad, or at least want to become so. As admitted in the CEV document itself, if the assumption didn’t hold, it could have quite problematic consequences.
3) A variation of “what if the wishes of groups of people don’t converge” – humans have lots of conflicting desires within them, and even though we often consider higher thought more important than raw emotions, I’m not sure if there’s any inherent reason to think so. The volitions of the same person when in two different emotional states might be different – it’s as if they are two different people. Is there any good criteria by which a person’s “ultimate” volition may be determined? If not, is it certain that even the volitions of one person’s multiple selves will be convergent?
“The final objection has to do with the complexity of extrapolation. Believe it or not, we engage in extrapolations every day. We canât fit realistic computational duplicates of the people we know in our heads, so we use abstract models that work well for many pragmatic purposes. In a CEV-implementing AI, the models used might be more detailed than those we use, but need not simulate every single atom of every single biopolymer to perform a tractable extrapolation.”
Michael, my objection to the CEV is based on these grounds, but not in the way you might imagine. Imagine the AI creating such a detailed model of the human mind that the model itself is, for all intents and purposes, human, and thus, conscious. It will, of course, not be running on the wetware of the brain, but it might have to be just as detailed as a human upload. The upshot of this is – the AI might end up creating a panoply of human-equivalent minds that suffer from the consequences of all sorts of wrong choices, just to explore the CEV decision space for our benefit. Surely, a friendly AI would not do something like that; would it?
I see Nato Welch already touched on the same point. I’d just like to add that I hope I am right about this. If the AI determines that some suffering by us or our doppelgangers in its circuits is inevitable, then we may avoid the fate of having all our choices made for us in advance.
I think that most things that Eliezer comes up with are eminently sensible and wise, however CEV appears to my mind to be a bit of an exception in this respect. I will however admit that it attempts to solve a very difficult problem, to which I don’t have a better solution.
To start with, consider this excellent advice from “A technical explanation of technical explanation”:
“But what of one who did not see any calculations performed? What new skills have they gained from that “technical” lecture, save the ability to recite fascinating words? … … … The sacred syllable is meaningless, except insofar as it tells someone to apply math. Therefore the one who hears must already know the math.”
So, I ask, why are we debating the correctness (or otherwise) of the CEV concept, when this concept is all words and no math? We talk about extrapolating people’s desires, about convergence of desires, about people’s desires *if they had been different people* (e.g. growing up “closer together”). We talk about all these things, but we have no mathematical formalism in which to make them precise.
I have read the full document on CEV, and it valiantly attempts to define all these words using analogies and by using other words. Ultimately I think that it fails in giving a precise definition, although I realize that it is meant more to give an intuitive understanding than a precise one. Whether it gives the same intuition to me as to everyone else here is another matter – since intuitions are not precisely comparable we will probably never know.
I think that the intuition that I have been given about CEV is not sufficient for me to debate with other people whether or not it is a good idea to actually implement. Imagine trying to give someone an intuitive understanding of General Relativity without teaching them tensor calculus, for example talking about a bowling ball on a rubber sheet, etc. That’s great, so long as you don’t ask them to design experiments to test GR using the word-based rubber-sheet understanding. Can you imagine what a disaster that would be? Can you imagine people trying to actually calculate the magnitude of the advance of mercury’s perihelion using just the rubber ball analogy?
I think we may be making a similar mistake here.
âA minor, muddled preference of 60% of humanity might be countered by a strong, unmuddled preference of 10% of humanity.â Well, if that 60% of humanity dislikes the nanny computerâs decision enough to start talking about shutting down the nanny computer, what happens then? After all, shutting down the nanny would not be viewed as being in their best interest by the computer itself.
That is not a minor muddled preference! If the 60% all desire the computer to be shut down, this is unmuddled. If they are considering shutting the AI down themselves, this preference is strong. Since 60% > 10%, the computer shuts down.
What if parts of humanity will have drastically divergent volitions, in such a way that one canât fit them together?
Perhaps these fragments of humanity have some common ground. For example, neither of them would want humanity destroyed by an asteroid it didn’t notice.
If not, the computer shuts down. The computer always defaults to doing nothing.
The upshot of this is – the AI might end up creating a panoply of human-equivalent minds that suffer from the consequences of all sorts of wrong choices, just to explore the CEV decision space for our benefit. Surely, a friendly AI would not do something like that; would it?
This is a problem, but it’s not intrinsic to CEV, and may be fixable. Humans simulate other humans without creating them within their brains. If we do actually create people in our brains we’re in a lot of trouble (still, perhaps the CEV can avoid this).
We need to work out when a computational process does *not* implement a person (or people) so we can ensure the CEV also does not.
….I think we may be making a similar mistake here.
Roko, I agree. The CEV document doesn’t contain the technical details, as you mention. Its informal presentation is fairly easy to mistake for saying something else.
Thanks Nick, I’m glad somebody agrees with me on this!
*** *** *** How To Hack CEV *** *** ***
Given the amount of support that CEV seems to be garnering on this and other threads, I feel compelled to point out a major problem with CEV, one that I think no-one else has noticed. The problem with asking a General Artificial Intelligence (GAI) to listen to what all 6 billion people on the planet want, and then perform some kind of averaging operation on these desires, is that most people are easy to hack. The way to hack people is called **religion**. Let me outline a scenario to elucidate how this might work:
Suppose that the Singularity Institute has just switched on a general AI which implements CEV. The AI will listen to and communicate with all of the people in the world and use their volitions to decide what to do. The leader of a certain religious group realizes that CEV is his chance to spread his religion to every person on the planet. He starts with, say, 1,000,000 loyal supporters. He asks the AI for a secure communications system to communicate with these 1,000,000 people, which the AI grants him. Within minutes, all of his existing believers are watching him give a speech on their newly acquired 50-inch TV screens – he simply tells them that it is god’s will that they tell the AI that their volition is whatever his volition is. Since all of these people are sincere believers in god, their actual volition is whatever his volition is, since they think that god is speaking through him.
Our religious leader, or “Hacker” has now increased his volition from 1 to 1,000,000. These 1,000,000 people constitute a relatively small but very coherent and unmuddled preference, so The Hacker can use it to ask the AI to do fairly outrageous things, as long as there is no large group of people who coherently oppose him. He starts by telling the AI to assassinate his key enemies (in such a way that they appear to have died of natural causes). Then he asks the AI to create certain religious miracles, and to make people hear the voice of god in their heads. Upon seeing miracles and hearing the voice of god, many people will convert to the Hacker’s religion. The Hacker is careful to only attempt to convert comparatively small numbers of people, say 10% of his number of existing believers at a time; this will ensure that the AI never refuses his requests. Once they are converted, he quickly gets them to tell the AI that their volition is whatever his volition is, so his number of believers will grow exponentially. The constant in the exponential depends on how quickly people can be made to have a religious conversion and surrender their volition to him, but it seems to me that within a matter of weeks we would be living in a global theocracy.
If any single person realizes what he is up to and tries to tell the world about it, he will convert them or assassinate them, or ask the AI to hack their minds using nanotechnology.
At no point in the hacking scenario is there a large, coherent group of human beings who oppose the Hacking religion.
Also, if the AI works out the likely outcome of The Hack, i.e. 100% of the planet believing in The Hacker’s religion, it sees 6 billion happy followers who say “I’m so glad the hack succeeded; I am really happy that the whole world now believes in the one true god!”, i.e. extrapolating people forward in time it sees people who don’t wish that things had been different.
This hack works because, ultimately, the best way to get a very large, coherent group of humans is to use the built-in weakness of the human mind – religion. I think that if CEV is implemented, a religious theocracy is a very likely outcome.
Roko, let it first be said that, indeed, CEV is a nontechnical document and has disclaimers to this effect clearly attached. It is meant to tell people about a goal, rather than any technical notion of how to achieve it. I furthermore agree that it is not the most elegant idea I have ever had, but then it is trying to solve what appears to be an inherently inelegant problem.
With that said, you might want to reread the original document. For some reason it is really, really hard to get people to understand the concept of EXTRAPOLATION. The AI is not talking to anyone. No one is consciously deciding anything. No one is voting. They’re being extrapolated. Their minds are being abstracted into a model, then the model is being operated on, and the operations include substituting the AI’s best information about reality into the model. So you don’t have a million humans voting their religion. Instead you have a million models of what these humans would want done, if they were watching from outside themselves as nonexistent shadows who knew that their real selves were severely deluded.
CEV is not about what people want. It is about what people would-want.
Note that the CEV is designing its replacement AI, not directly meddling with the world. This requires asking the question “what should the source code of the AI look like?” of its internal model of humanity. This means it has to extrapolate humans to the point where they could code an AI. Otherwise how would they know what code to write?
Moral of the above: it’s not enough to model what people’s immediate decisions would be. Regardless, we are modeling people not directly asking them. A CEV is not a genie.
Eliezer Yudkowsky said: “The AI is not talking to anyone. No one is consciously deciding anything. No one is voting.”
Ok, I didn’t realize that the extent of the extrapolation/simulation was complete and total. Thanks for setting me straight on that. I still don’t think that this gets CEV out of trouble. I’ll state what I think the fundamental problem with CEV is, the one that underlies my argument.
CEV is designed, as I see it, to decide what the “right thing to do” is, even in the face of moral anti-realism. It does this by some black-magic extrapolation, followed by a majority-vote. Essentially, I think that this is a silly thing to do, and that if you think that there is such a thing as “goodness” or “niceness”, you’d do better to try and find out what it is directly, for example by designing an AI to do just that.
Especially problematic is the clause that someone’s wishes are “Extrapolated as [they] wish that extrapolated”. This sounds like the extrapolated version of someone can always be over-ruled by the starting version of them, so it seems to conflict with what you said here:
“Instead you have a million models of what these humans would want done, if they were watching from outside themselves as nonexistent shadows who knew that their real selves were severely deluded.”
Can you clarify this point – will CEV make sure that the extrapolated version of someone is something that the original person wants and agrees with, or will it not? Also, how does the AI know that that these people are deluded? I know that there are strong arguments based on, say, Bayesian reasoning which point to this, but to a religious person any belief that implies the non-existence of god is an immoral belief (just look at the evolution controversy – people actually think that evolution is an immoral statement, rather than an amoral one). So lots of people would complain that your CEV was biased against them by including any belief that disproves god, e.g. Bayesian reasoning. (I won’t complain though!)
Eliezer Yudkowsky says: “CEV is not about what people want. It is about what people would-want.”
About what people would want *if what*? If they were totally different people? If they gave up their most cherished (yet illogical) beliefs about the world? Why should logic matter? Why should scientific reasoning matter? It doesn’t to a lot of people. Here the precise nature of the extrapolation process is important. This brings us to the following quote from the CEV document:
“What if only 20% of the planetary population is nice, or cares about niceness, or falls into the niceness attractor when their volition is extrapolated?… … … As I currently construe CEV, this is a real possibility.”
Well, there are certainly a fair few attractors out there; consider, for example, the “christian religious dogma” attractor, where you believe whatever it says in the bible and reject anything that disagrees with this. My problem with CEV is that, given the beliefs of most people on the planet, it seems unreasonable to expect that any averaging algorithm will reliably converge on the “niceness” attractor, rather than some “religious dogma” attractor, unless you bias the algorithm in favor of the nice attractor to start with. But if you’re biasing the algorithm in this direction to start with, then you seem to think that there is an objective morality (otherwise how would you know which direction to bias the algorithm in?). If this is the case, then why risk CEV falling into an attractor other than the objectively good one? Why ask the opinion of people if you think there is a correct answer which they might ignore?
To summarize, if you think that there is an objective morality, then you should drop CEV and instead work on an AI that will try to find it and tell everyone what it is. If you don’t think that there’s an objective morality, then an honest application of CEV is probably a one-way ticket to a global religious theocracy.
Lastly, I think I owe SIAI $10 for arguing about the output of CEV…
“CEV is designed, as I see it, to decide what the âright thing to doâ is, even in the face of moral anti-realism. It does this by some black-magic extrapolation, followed by a majority-vote.”
Please read CEV, so you know what people mean by “extrapolated volition”. You just replied to a quote saying “nobody is voting”, for Pete’s sake.
“Essentially, I think that this is a silly thing to do, and that if you think that there is such a thing as âgoodnessâ or ânicenessâ, youâd do better to try and find out what it is directly, for example by designing an AI to do just that.”
Agreed. And in order to find out what niceness is, said AI should go into our heads, and figure out what we would think “niceness” was if our worldview wasn’t cluttered by stupid mistakes and evolutionary baggage we don’t want. Which is exactly what CEV is supposed to do.
“Also, how does the AI know that that these people are deluded?”
It figures it out by constructing models of us which have more intelligence, experience, knowledge, etc., and then discarding everything which the models recognize (and which we will therefore later recognize) as “delusion”.
“but to a religious person any belief that implies the non-existence of god is an immoral belief (just look at the evolution controversy – people actually think that evolution is an immoral statement, rather than an amoral one).”
Again, CEV would extrapolate from a religious person who later comes up against incontrovertible evidence that his beliefs are totally irrational.
“So lots of people would complain that your CEV was biased against them by including any belief that disproves god, e.g. Bayesian reasoning.”
Initially, probably, yeah. The SS would have complained if SpecOps came in and shut down all the concentration camps; however, that doesn’t make it a wrong thing to do, and the SS themselves would later recognize it as the right thing to do.
“About what people would want *if what*?”
To quote the poetry:
“if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.”
“Why should logic matter? Why should scientific reasoning matter? It doesnât to a lot of people.”
It is quite possible that the CEV will extrapolate out a moral system which is at least somewhat irrational. Keep in mind that a CEV can extrapolate to any morality whatsoever depending on the species being observed.
“Well, there are certainly a fair few attractors out there; consider, for example, the âchristian religious dogmaâ attractor,”
How do you know this is an attractor? Have you actually extrapolated out people’s volition? How did you do this? It certainly is an attractor in the present-day world, but the CEV will not be extrapolating out what people would think if the present-day world continued on forever.
“But if youâre biasing the algorithm in this direction to start with, then you seem to think that there is an objective morality (otherwise how would you know which direction to bias the algorithm in?).”
The very fact that you perceive a “niceness attractor” exists somewhere in the search space is proof that the CEV doesn’t necessarily have to fail.
“Lastly, I think I owe SIAI $10 for arguing about the output of CEV⌠”
Why should anyone owe anyone anything? Everyone has benefited; you have had misconceptions cleared up and SIAI has had practice in explaining themselves more clearly. Life is not a zero-sum game.
“Will CEV make sure that the extrapolated version of someone is something that the original person wants and agrees with, or will it not?”
Not.
In order for an AGI guided by CEV to do something in the Realized World — anything at all — at some point it must pause the extrapolation of the coherent volition of humanity, and start doing things, making decisions. This consideration generates several questions. Here are three:
1. Is it possible to be reasonably assured the AGI ‘got it right’? By definition, no matter what the AGI extrapolates we can’t debate it on equal footing even if it seems absurd. Is there an error check possible, even in theory?
2. At what level of its development do we trust an AGI’s opinions on our coherent extrapolated volition? (Or, how smart does an AGI have to be to extrapolate a coherent volition?)
3. What keeps it Friendly as it grows in the meantime?
These questions are off the top of my head. There are no doubt many more to ask. Regardless, CEV is actually a nice, somewhat unique contribution to moral philosophy. This might be the only blog on the Internets where careful moral reasoning solves software engineering problems.
Soon, I think, there will be many more.
Thanks for your replies, Tom, Nick and Eliezer. There are some interesting issues here. Let me boil down my objection to something a bit smaller.
Coherent Extrapolated Volition seems to make sense when you first read about it. Tom put it like this:
“In order to find out what niceness is, said AI should go into our heads, and figure out what we would think ânicenessâ was if our worldview wasnât cluttered by stupid mistakes and evolutionary baggage we donât want.”
The devil is in the details. How does the AI decide what counts as “clutter”, “stupid mistakes” and “evolutionary baggage”? Isn’t our entire mind “evolutionary baggage”? How, exactly, does an AI handle extrapolation? This is important, because I suspect that there are multiple attractors out there. (i.e. regions in mindspace that are closed under self-reflection, or regions in “society-space” that are closed under interaction.) The CEV algorithm might find an attractor that we don’t want it to find, like the “christian religious dogma attractor” (Tom, I realize that it might not exist, but you should concede that it might exist. The niceness attractor might not exist!).
My post on the religion-hack for CEV still applies, since CEV is supposed to simulate interactions between people – the entire process occurs between simulated people. The religion-hack is a specific way that CEV can converge on something ugly, but there may be other ways for this to happen. In general, CEV helps dogmatic memes over rational ones because of the higher weight that it gives to people who all believe exactly the same thing (unmuddled, coherent) without asking whether they believe it for the right reasons. Clever, rational people tend to disagree more than dumb people who are spoon-fed their beliefs. Hey, we’re all clever, rational people, and we’re disagreeing right now! This worries me a lot.
To summarize all this, if there is an objective morality out there, then there’s a very good chance that CEV will miss it. I conclude this by looking at the state of actual minds out there in the world today (there are a lot of messed up people out there), as well as by looking at the algorithm that CEV hints at.
If there isn’t an objective morality, (and I rather hope that there is) CEV might still be a bad move. There are many different views in the world, and CEV might home in on one which, whilst not objectively wrong, is repugnant to our western society. It might hit the “Radical Islamic Dogma Attractor”, or the “Radical Communism Attractor” for example.
******************************************************************
Ultimately I think that the first AI which is built should be given the task of using any philosophical, mathematical or other techniques to find the objectively true morality. It should do this without simulating people, and it should not be able to rewrite it’s top-goal (to find morality). This is unlike CEV, and I think much safer. If this AI fails, then I might risk CEV.
I’ll reply to some specific points that Tom McCabe made:
******************************************************************
Roko: âAlso, how does the AI know that that these people are deluded?â
Tom: It figures it out by constructing models of us which have more intelligence, experience, knowledge, etc., and then discarding everything which the models recognize (and which we will therefore later recognize) as âdelusionâ.
>> How do you know that extrapolees of deluded people would not just be more deeply deluded? Who gets to decide what counts as deluded anyway? It hinges on what extra “knowledge” the AI puts into the heads of the simulees. But as I have said, one man’s knowledge is another man’s delusion. You haven’t fundamentally solved the problem of moral uncertainty here. For example, the AI might make a simulated Tom McCabe, and insert into his mind the “knowledge” that all moral truth is contained in the old testament. Furthermore, when you complained that this is a gross violation of who you are, the new, improved Tom would override you. After all, he is consistent under reflection.
Roko: âbut to a religious person any belief that implies the non-existence of god is an immoral belief (just look at the evolution controversy – people actually think that evolution is an immoral statement, rather than an amoral one).â
Tom: Again, CEV would extrapolate from a religious person who later comes up against incontrovertible evidence that his beliefs are totally irrational.
>> There is no such “evidence”. Religious people defend their irrational beliefs by abandoning ordinary logic and rationality. Besides, most conceptions of god are unfalsifiable, so no evidence can disprove them. Have you ever tried arguing with a religious apologist?
Roko: âWell, there are certainly a fair few attractors out there; consider, for example, the âchristian religious dogmaâ attractor,â
Tom: How do you know this is an attractor? Have you actually extrapolated out peopleâs volition? How did you do this?
>> I’m just guessing using my intuition. But the same criticism applies to your guess that there is a “niceness attractor”.
Tom: It certainly is an attractor in the present-day world, but the CEV will not be extrapolating out what people would think if the present-day world continued on forever.
>> Which begs the question “what conditions will CEV be extrapolating people in?” Yes, I know you describe these conditions as “knew more”, “grew up closer together” etc.., but these are far too vague. Knew more of what? Knew more bible passages? Knew that women are fundamentally inferior to men and should be beaten regularly (note: this is not my opinion)? This all hinges on your definition of “knowledge”, which is hotly contested in philosophical circles. We also have “grew up closer together” – but what does this mean? I guess that Eliezer meant “were all kind of like brothers and sisters who physically grew up in the same neighborhood, and understood each other, and hence loved each other”. This would certainly be nice, but given people’s actual beliefs, you would have to change many people on the planet beyond recognition to bring this about. Also there are multiple ways that such a deep familial love can be brought about. If the AI steered (the extrapolations of) all Americans to become hard-line communists, then Americans and Cubans would love each other. But the original Americans would say that this love has come at too high a price! (of course, the extrapolated ones get to overrule them)
Roko: âBut if youâre biasing the algorithm in this direction to start with, then you seem to think that there is an objective morality (otherwise how would you know which direction to bias the algorithm in?).â
Tom: The very fact that you perceive a âniceness attractorâ exists somewhere in the search space is proof that the CEV doesnât necessarily have to fail.
>> I don’t think it is inevitable that CEV will fail. I just think it is quite likely that it will, which is bad enough.
Roko: âLastly, I think I owe SIAI $10 for arguing about the output of CEV⌠â
Tom: Why should anyone owe anyone anything?
>> Well it says so in the CEV document!
“Isnât our entire mind âevolutionary baggageâ?”
By “baggage”, I mean “stuff we don’t want or need but is still there because evolution never bothered to remove it or because it was advantageous to some long-forgotten ancestor species”.
“Tom, I realize that it might not exist, but you should concede that it might exist. The niceness attractor might not exist!”
Talking about things “existing” or “not existing” is missing the point. Saying that “but it might exist!” or “it might not exist!” is really on the same level as saying that the Tooth Fairy might exist or General Relativity might not exist. The key question is “how likely is it that we’ll end up in attractor X?”
“My post on the religion-hack for CEV still applies, since CEV is supposed to simulate interactions between people”
I’m not sure exactly what process CEV is supposed to extrapolate; ie, how you get to a long-distance volition from your current volition. Somebody please clear this up.
“lever, rational people tend to disagree more than dumb people who are spoon-fed their beliefs. Hey, weâre all clever, rational people, and weâre disagreeing right now!”
The idea is that, as we learn more and become more intelligent, our beliefs will slowly approach the truth. This has been happening already on a grand scale since the dawn of civilization. So while the believers in Zeus might hold more immediate weight than the squabbling mathematicians, the mathematicians will eventually have their thoughts accepted by everyone, while belief in Zeus will die out.
“To summarize all this, if there is an objective morality out there, then thereâs a very good chance that CEV will miss it.”
This is certainly true, but how would you find an objective morality?
“There are many different views in the world, and CEV might home in on one which, whilst not objectively wrong, is repugnant to our western society.”
This is almost certainly going to happen; however, it doesn’t necessarily have to be bad. I reference you to the earlier example of the SS.
“Ultimately I think that the first AI which is built should be given the task of using any philosophical, mathematical or other techniques to find the objectively true morality.”
Unless, of course, no such morality exists, in which case it goes berzerk and turns the planet into computronium with the goal of digging up a morality.
“How do you know that extrapolees of deluded people would not just be more deeply deluded?”
If the more knowledgeable you become, the more deluded you become, it’s not a “delusion” in the first place.
“After all, he is consistent under reflection.”
The AI wouldn’t care if this randomly selected modified Tom was consistent under reflection, because this Tom wasn’t extrapolated in the CEV sense from the original Tom.
“There is no such âevidenceâ.”
If there is no evidence whatsoever against irrationality, we might as well all become irrational. Of course, it’s trivial to find such evidence (rationality works to make testable predictions and irrationality doesn’t).
“Besides, most conceptions of god are unfalsifiable, so no evidence can disprove them.”
This is something I’ve been hearing so much lately I’m going to write a blog post on why falsifiability is a chimera. Thank you for inspiring me.
Could CEV be extended to include all conscious animals on Earth? If it can achieve convergence over all humanity, then it seems that it should be able to achieve convergence over all conscious animals. But perhaps that would be a bad idea, or more difficult to implement, I don’t know.
Tom said: “If the more knowledgeable you become, the more deluded you become, itâs not a âdelusionâ in the first place.”
Overall, I feel we’re talking cross-purposes. The point I am trying to make is that in the face of meta-moral uncertainty, there’s no objective way of deciding what counts as “good stuff” to believe or what counts as “bad stuff” to believe. Knowledge vs. delusion is just one way of phrasing this.
Perhaps this is what I’m missing: you might program CEV with your ideal factual statements, but be careful to give it no moral statements. Then when you let the algorithm loose, it constrains (extrapolated) people’s moral views by the set of facts (“knowledge”), and hopes to avoid a lot of the mistakes that, for example, religious people make. This would severely annoy 80% of the world’s population (those who are religious) and possibly many others, but it’s a potentially good idea. Is this the kind of thing you are thinking of, Tom, Eliezer,?
If you want to go down this route, you have to have a really solid distinction between moral and factual statements, which will be difficult. It’s interesting to ask whether, if everyone was constrained by exactly the same set of rational, correct factual statements, they would converge on moral issues. To be honest, there is so much irrationality out there that there has probably never been a real-world test of this. Its a very interesting idea. I’ll do a bit of research/thinking into how easy it is to separate morality and “factual knowledge”. I have a feeling that it is more difficult than you might think.
I also think that, when put in the situation of having to give up their belief in something which is irrational yet cherished, (simulated) people’s minds might ‘compensate’ by making their morality really weird. For example, suppose you simulate a hard-line Christian creationist. The first thing you do is remove anything that the creationist believes which is factually inaccurate. So, god has to go. What is left of this person’s morality once you’ve removed god? Very little I think. What will the person replace it with? Well, they have a strong emotional attachment to god, so they might be tempted by the following moral system:
****** ****** Christianity-Lite ****** ******
1. God doesn’t exist, the world was not created in 7 days, etc. The bible is just another book written by people. All the factual statements of Christianity are wrong.
2. Morality is exactly the same as it was when you believed in god. Although there is no heaven or hell, and there is no god, to act “morally” is to act as if there is a god, as if there is heaven and hell, and as if the bible is the word of god. This is the definition of morality under Christianity-Lite. Thus homosexuality is immoral, women are to be subjugated, adultery is punishable by death, etc. Any moral issues which aren’t mentioned in the bible will be settled by appropriate church officials, or by analogy with biblical passages.
****************************************************************
Christianity-Lite is, in my opinion, a likely outcome of CEV, if it is implemented in the way I described above.
“This would severely annoy 80% of the worldâs population (those who are religious) and possibly many others, but itâs a potentially good idea.”
They’ll get over it in a few centuries. Better than being dead.
“Itâs interesting to ask whether, if everyone was constrained by exactly the same set of rational, correct factual statements, they would converge on moral issues.”
The very idea that they will not converge on moral issues even if they have the exact same view of the world is what we mean by “morality”, or at least by “goal”. If you give two people an initial state of A, because their internal mechanics are different, one will see state A’ as desirable and go into that state, while another may see state A* as desirable and go into that state. This is really what we mean by “choice”- an individual who is not perfectly described for whatever reason (Rice’s Theorem, lack of good equipment, quantum uncertainty, whatever), has a distribution of possible outcomes, out of which only one outcome will actually happen.
“The first thing you do is remove anything that the creationist believes which is factually inaccurate.”
This isn’t phrased well- you’re simulating what the creationist would do if they slowly acquired and internalized the knowledge that their belief system was inaccurate in the same way we normally do. Think of how you would react if you were abducted by aliens and gradually shown, with hard evidence, that the foundations of your moral and philosophical system were hogwash. CEV ultimately rests on the ability of humankind to adapt, to realize eventually that hitting your head against a rock doesn’t make bananas fall down.
“Thus homosexuality is immoral, women are to be subjugated, adultery is punishable by death, etc.”
This is repulsive to most people nowadays, so I highly doubt it will be a likely attractor. Perhaps one of the extrapolations the CEV can do is to extrapolate what would happen to a person’s morality if said person actually experienced first-hand all of the things their morality says should happen.
Tom said: “This isnât phrased well- youâre simulating what the creationist would do if they slowly acquired and internalized the knowledge that their belief system was inaccurate in the same way we normally do”
The problem is that the creationist belief system is such that this can’t happen. Creationists are regularly shown hard evidence that their beliefs are hogwash, but it doesn’t make them change their minds! There are certain memes, of which evangelical Christianity is one, which put in place “defense mechanisms” to stop the afflicted person from ever letting go of the meme. These people will never internalize the fact that their belief system is hogwash. You can get a taste of this by trying to persuade a Christian to stop believing in god.
The CEV algorithm would have to go into these (simulated) people’s minds and just yank the whole lot out unceremoniously.
Roko, the part of the extrapolation “if we knew more” is not an extrapolation of our responses to evidence, but an extrapolation of the substitution of the AI’s probability distribution for our own probability distribution. It is ourselves if we anticipated future experiences correctly to the limits of the AI’s knowledge. Furthermore, modeled the world correctly to the limits of the AI’s model and the limits of our ability to react emotionally to elements of that model. In the order of evaluation, this substitution would occur before moving onto such considerably more complicated and recursive processes of “more the people we wished we were” or “had grown up further together”.
The main point is that it would not matter, for purposes of the “knew more” dynamic, how a given individual reacted to evidence. The extrapolation would simply substitute correct anticipations, and correct beliefs to whatever extent this was feasible. The extrapolated self certainly would not expect prayer to work, and might or might not be modeled as reacting emotionally to the fiat-imposed realization that the universe is a mathematically simple low-level unified physical process.
Eliezer, after the AI substitutes its probability distribution for yours, what is actually left of you? Is there some aspect of human intelligence, emotions, or personality, whether based on biological evolution or learned behavior, that cannot be viewed as anticipation of future experiences?
Ricky has a good point. Suppose I am to be simulated and extrapolated by the AI. Clearly this process involves the AI getting rid of some stuff from my mind, but what I want to know is what kind of stuff it doesn’t get rid of. What is it that is left of me?
Suppose that I believe in some extreme religion (like fundamentalist Christianity). Clearly the AI will take out the factual beliefs relating to this. But there will still be a lot of stuff left over, like that time when I was at a Christian convention and the preacher healed the little sick boy in the wheelchair, and then I had this amazing religious experience with everyone shouting and clapping and crying out. That’s the kind of thing that really makes my life worth living! And there’s also that time when me and my neighbors found out that someone in the community was gay, and we threw rotten eggs at his door, and it me feel so good about myself because we were doing the lord’s work.
Let me call these types of memory ‘emotionally motivating experiences’.
What will the AI do with experiences like the above? Clearly these are not “factually wrong” – they don’t assert facts. They are still emotionally meaningful experiences for the hypothetical person concerned, even if you remove the factual beliefs that go with them. Will the AI get rid of them? will it keep them?
Ricky, imagine a rock that is transformed into a perfect predictor with no other changes – it just sits there and predicts. Every way in which a human being is unlike this rock is an aspect of being human that cannot be viewed as anticipation of future experiences.
Roko, the underlying motivation for extrapolating volition is something like this. Your friend is about to walk off a cliff, believing that there’s an invisible path across it, with no clue that anything might be wrong. But you’ve been told that the invisible path is out of order today, so you know that when your friend actually walks off the cliff, he will plummet toward the rocks, screaming, and then die. What does it mean to help your friend, under such circumstance? Clearly not, “do what your friend would want you to do based on his current knowledge”; in this case you would cheerfully shove your friend off the cliff, and wave goodbye to him as he plummets screaming toward the rocks.
Now you might just say: “I, for my own sake, don’t want anyone to die – so as part of my own goals and utility function, I’m going to yank my friend back from the cliff, whether or not that’s what he would really want if he knew everything I knew.” Maybe you still value individuality, or freedom, or social order, but you value it less than you value life… certainly this case seems more morally fraught than saving your friend because he would-want to live. You’re using him for your own goals, in defiance of his goals, even if your own goals happen to be directly about your friend.
But suppose there’s a very powerful optimization process that is not intended to have its own goals, its own vision of what human beings ought to be like; it is a metamoral mirror that picks direction by reflecting humans. This mirror won’t enage in behavior analogous to your saving your friend because you have your own ideas about how much his life is worth; the mirror would only save your friend as a pure case of helpfulness, because his own goals involve staying alive.
So, if you consider only the art of helpfulness, and not the value you personally attach to your friend’s life, is there a strictly helpful component attached to your yanking your friend back from the cliff – is it something you would do if you wanted to help as much as possible, but aside from this didn’t give a damn about your friend’s life?
Yes, of course.
Now suppose your friend is also very slow to react to evidence, so that as he walks off the cliff, he’ll scream with a tone of desperate nervousness, “Yes! Yes! There’s a road here, just like I thought!” Then he’ll die.
What does the spirit of strict helpfulness call for in this situation? I’d say, yanking him back from the cliff – it’s not your friend’s reaction to evidence that matters, it’s your friend’s reaction to outcomes. In this case you are operating in morally fragile territory, and it may be that a human is ethically injuncted from overruling other’s wills in such way. How do you know your beliefs are so much better? But if you could know, as a superintelligence would legitimately know, then this is what the spirit of strict helpfulness would call for.
“itâs not your friendâs reaction to evidence that matters, itâs your friendâs reaction to outcomes.”
I’m noting that creationists never seem to anticipate different outcomes from the rest of us. This is what has allowed creationism to survive so long- no creationist would ever dare stand up and predict the result of an experiment ahead of time. For example, when a person is sick, an atheist might try to comfort them and get them medical care, while a creationist might pray to God. But they both anticipate the same result- a high probability of the person getting better, with a distribution over possible shades of gray, with a small probability of the person getting worse and dying.
Eliezer, you said âEvery way in which a human being is unlike this rock is an aspect of being human that cannot be viewed as anticipation of future experiences.â
Letâs take some examples. Unlike a predicting rock, a human eats food, but thatâs in anticipation of pleasure or survival. Unlike a rock, a human applies sunscreen, but thatâs to lower the statistical probability of contracting cancer. Humans seek the company of others for a host of reasons which can all be cast in terms of maximizing reward using some (biological or learned) probability distribution over situations, actions, and consequences. Or can you point to any specific part of human nature that cannot be viewed as anticipation of future experiences?
“Or can you point to any specific part of human nature that cannot be viewed as anticipation of future experiences?”
All of them. The rock predicts all of these events, and yet we wouldn’t call the rock human. The difference is that humans act on their models- they use their internal future-universe models as tools to decide which action to take now. Even given a perfect model of the universe, there’s still no preference ordering- the model will not spontaneously take action to force the universe into another state which it sees as better.
Tom, you seem to be saying that the essential difference between a human and a predicting rock is the âpreference orderingâ that a human uses to make decisions. But what is this preference ordering if not simply part of the probability distribution a human uses to anticipate experiences in this universe? Are human preferences something magical? If your preferences are just part of your probability distribution, then whatâs left of you after the AI substitutes its distribution for yours?
“But what is this preference ordering if not simply part of the probability distribution a human uses to anticipate experiences in this universe?”
A preference ordering and a probability distribution over likely outcomes are not the same type of thing. It’s the difference between “ought” and “is”, respectively.
“Are human preferences something magical?”
Preference is difficult to explain, but the idea is that:
- Suppose a human has a preference A. Some states of the universe fit A, some don’t (this is oversimplified to illustrate the point).
- If the universe is in state B, which doesn’t fit A, the human will move it into state A’, which does.
- If the universe is in state C, which also doesn’t fit A, the human will move it into state A*, which does.
- If the universe is in state A”, and it will shortly move into the state D*, the human will move it into state A^, which is stable.
The idea is that no matter what state the universe is in, or how stable it is, or how likely the preferred state is, the effect of adding the human with the preference ordering is the same: the universe is more likely to go into subset A and more likely to stay there than the universe without the human. This is also what we mean by “goal” or “desire”.
“If your preferences are just part of your probability distribution, then whatâs left of you after the AI substitutes its distribution for yours?”
Suppose I want a cake. I buy a box for $5 thinking it contains a cake. The AI knows that the box actually contains a rock. It substitutes its anticipation for my own, and extrapolates me going back to the store and demanding a refund. The AI therefore automatically replaces the rock with a cake, to save everyone the trouble. Notice how the AI never touched the “I want a cake” part of the equation.
Eliezer said: “Roko, the underlying motivation for extrapolating volition is something like this. Your friend is about to walk off a cliff… … … the mirror would only save your friend as a pure case of helpfulness, because his own goals involve staying alive”
This is a nice intuition, but I feel that it doesn’t generalize well to other situations. Fred’s death when he falls off the cliff is almost certainly against his wishes. When you present fred with the future outcome of his falling off the cliff, he will say “oh, ok, that was a bad idea, thanks for helping me against my wishes!”
Consider instead the situation of the religious fundamentalist who asks the AI to implement a permanent global theocracy, operating under biblical law. When the AI presents him with the outcome of this situation, i.e. creativity, science, personal freedoms being trodden all over, the fundamentalist will not see what is wrong. He will say “yes, that’s exactly what I wanted, get on with it you silly machine!”
Worse still, as you have explained CEV, it won’t be the fundamentalist of 2007 who gets to decide whether the totalitarian theocracy of 2050 is a bad thing, it will be his extrapolation out to 2050. I claim that the extrapolation of such a person is even less likely to reject the theocracy than the original person, if only because of status-quo bias and choice-supportive bias.
The underlying problem is that some people just have a skewed idea of what goodness is, and they may be in the majority. The only solution that I can see to this is to solve the is-ought problem using any means at our disposal, so that we have an objective, empirically verifiable idea of what goodness is, which we can program into our AI. We may even decide that the first AI we build is simply tasked with finding this objective morality (using some bounded amount of computing substrate) and then explaining it to us.
Hereâs my attempt to summarize a common point that Roko and I are trying to make. The underlying motivation for extrapolating volition sounds reasonable, but it depends critically on the AIâs ability to distinguish between goals and beliefs, between preferences and expectations, so that it can model human goals and preferences while substituting its own correct beliefs and expectations. But when you start dissecting most human goals and preferences, you find they contain deeper layers of belief and expectation. If you keep stripping those away, you eventually reach raw biological drives which are not a human belief or expectation. (Though even they are beliefs and expectations of evolution, but letâs ignore that for the moment.)
Once you strip away human beliefs and expectations, nothing remains but biological drives, which even the animals have. Yes, an animal, by virtue of its biological drives and ability to act, is more than a predicting rock, but that doesnât address the issue at hand.
Why is it a tragedy when a loved one dies? Is it because the world no longer contains their particular genetic weighting of biological drives? Of course not. After all, they may have left an identical twin to carry forward the very same genetic combination. But itâs not the biology that matters to us. We grieve because what really made that person a person is now gone, and thatâs all in the brain; the shared experiences, their beliefs whether correct or mistaken or indeterminate, their hopes and dreams, all those things that separate humans from animals, and indeed, that separate one human from most other humans. All that the brain absorbs and becomes throughout the course of a life, we call the soul, and we see it as our very humanity, that big, messy probability distribution describing our accumulated beliefs and expectations about ourselves, the universe, and our place in it.
So if the AI models a human while substituting its own beliefs and anticipations of future experiences, then the AI has discarded all that we value in each other. UNLESS you draw a line somewhere, and crisply define which human beliefs get replaced and which ones donât. Constructing toy examples where such a line is possible to imagine does not mean that the distinction can be made in any general way, but CEV absolutely requires that there be a concrete distinction. Where do you draw the line?
Yeah, I’d agree with that analysis Ricky.
In the example of the fundamentalist, where do you draw the line? Does the AI change the fundamentalist’s desires to apply biblical law and create a religious theocracy or not? If it does change those desires, then I claim that it has basically erased him. This may sound extreme, but these desires are what is most important to him. If it does not, then we may be heading for trouble…
Roko, the word “extrapolate” in this case does not mean “project out to anticipated state at future time T”. The word “extrapolate” here means “project results given alternative variable values”.
I suggest that instead of criticizing the Christian, you ask how you would deal with your own potential mistakes. Unless you can’t conceive of the possibility that you’re making any, of course. You and a modern Christian are people of the same species and civilization – the task of constructing a framework for future intelligent life out of your own flawed pieces is not so different from one of you to the other. So how should I extrapolate you?
Ricky Loynd, that’s the first new objection to CEV that I’ve heard in a while! I first point out that the CEV always knows the world is the way it really is – it may contain extrapolated elements but it’s asking those extrapolated elements what to do with the real world, not the extrapolated one. So if the extrapolation ends up not looking like the personalities of existing humans, this is not necessarily a problem – it’s not like we’re stripping the real world down to its bones.
The question is how to build an AI that doesn’t smash the future, despite the fact that everyone on Earth including me is almost certainly a drooling barbarian from the perspective of one thousand years hence, let alone the next billion years. If I wasn’t building an AI, but instead growing up over the next ten thousand years, then I would reject old beliefs and grow into new ones along with my acquisition of manipulatory powers over the real world – the way that humanity found out about atoms with all the development of science and rationality which that entails before we built nuclear weapons, instead of nukes being handed to us from beyond.
This is the quintessential problem that CEV was built to solve. If the extrapolated versions of humanity end up all looking pretty much the same, built around those core intuitions and a model of the world we can’t even imagine, I don’t see why this counts as a failure. It sounds, in fact, like a success – a coherent framework that follows where humanity would go as it acquired more and more information about the universe, got smarter, and perhaps rewrote ourselves.
Yes, weâre barbarians now compared to what we might grow into over the coming eons. Imagine that some bright trilobite had managed to create FAI in its day, and suppose that the extrapolated versions of trilobity all ended up looking pretty much the same. If the AI had then imposed that coherent volition on the future, would humans have even evolved? Now that weâre here, why should we want AI to impose the coherent extrapolated volition of humans on the future?
Biological evolution has now been superseded by the evolution of the models that exist in our brains. Most beliefs are tautological and autopoietic rather than true or false; the beliefs that matter most to us as humans are evolved organisms in their own right. A language is an evolved organism that seeks to survive and propagate itself. The same is true of religion, all aspects of culture, and essentially any complex system of beliefs including science. A humanâs sense of self is yet another belief organism, so we mourn when one is lost. Some belief organisms are savage beasts, while others are innocent daisies, but few of them can be conveniently classified as true or false any more than a biological organism can be said to be true or false. So while an AI may understand the physical world as it really is, it needs to deal with the ecology of belief organisms that constitute humanity. And whoâs to say which of these belief organisms shouldnât be allowed to thrive and produce yet others in time?
The leap from biological evolution to human ideas was a divergent event comparable to the big bang, inflation, or the Precambrian explosion. Human culture still strains under the baggage of our biological constraints, but thereâs no going back to the limited ways of biological evolution. Itâs onward and outward!
What will our FAI do if it concludes that there is no single, coherent future shared by humanity? Shut itself down, leaving the door open for an unfriendly AI to smash the future? It would probably be better to just choose the most popular coherent future as a refuge for us. Or why couldnât the AI safeguard any number of expanding futures, and let individuals choose which to join? Judged by the paths of evolution and history that have brought us to this point, the best possible futures seem likely to be wildly divergent rather than coherent or convergent.
Eliezer said: “If the extrapolated versions of humanity end up all looking pretty much the same, built around those core intuitions and a model of the world we canât even imagine, I donât see why this counts as a failure.”
It could count as a failure because most people in the world, in my opinion, have a skewed idea of what goodness is, which the algorithm might extrapolate forward uncorrected. It might work, it might not.
Eliezer said: “I suggest that instead of criticizing the Christian, you ask how you would deal with your own potential mistakes. … You and a modern Christian are people of the same species and civilization – the task of constructing a framework for future intelligent life out of your own flawed pieces is not so different from one of you to the other. ”
I would disagree with this. Just because I am from the same species and culture as someone does not mean that the difference between my moral ideas and their moral ideas is insignificant, even compared to someone from the far future. This may be because once morality is “solved”, it is no longer an area of advancement, so beings from the year 3,000 may have exactly the same morality as beings from, say, 2100. One might call this a saturation in moral progress; it would mean that it is perfectly reasonable for me to claim that I differ morally in a significant way (in the grand scheme of things) from another human being.
What about my own potential moral mistakes? The very word “mistake” implies that an objective morality is out there. So, my reaction is to find it **by scientific means**. That means a mathematically rigorous and experimentally verified theory of morality needs to be found. If I can’t somehow find this, then there’s no way for me to know whether I am making a moral mistake. If No objective theory of morality exists, then there is no such thing as a moral mistake, which would be very sad.
Of course I know that philosophers have been trying to find an objective theory of morality for quite a while, and have not succeeded, but I feel that the potential development of AI adds a sense of urgency to this endeavor. I also think that a mathematical theory of how minds work may provide the foundations for an objective morality, so anyone working on Neat General AI should ask themselves whether they have anything useful to say about morality, and potentially collaborate with moral philosophers.
I would suggest that Objective Morality is the best way to go about solving the Friendly AI problem.
Can I just ask whether you think that there is such a “theory of morality” out there or not, or whether you are agnostic on this point?
“I would suggest that Objective Morality is the best way to go about solving the Friendly AI problem.”
Except that we have no idea how this objective morality might be found or whether it even exists. And so if it doesn’t exist, or is ridiculously hard to find, the AI will turn us all into computronium (remember that it hasn’t found a morality yet!) in its quest for the Ultimate Morality. Oops.
Tom says: “the AI will turn us all into computronium (remember that it hasnât found a morality yet!) in its quest for the Ultimate Morality. Oops.”
I wasn’t suggesting that you tell an AI to use the whole planet as computing substrate.
I suggest that if you wanted to find objective morality, and you had an AI to help, you would let the AI go to work on some pre-existing computer.
For example, you could spend $50,000 on a big server farm and tell the AI that it could only use those computers to solve the problem, and that it had to report back in a month’s time.
“I suggest that if you wanted to find objective morality, and you had an AI to help, you would let the AI go to work on some pre-existing computer. ”
Until the AI decides that the Earth-turned-into-computronium computer would be more efficient.
“For example, you could spend $50,000 on a big server farm and tell the AI that it could only use those somputers to solve the problem, and that it had to report back in a monthâs time.”
What would the goal structure of such an AI be?
Roko: Do we have any reason to believe that an “objective morality” exists? To the best of my knowledge, ethical systems are arbitrary. You can’t build one without having at least one axiom to start out from, and different people have different axioms. Is there anything to suggest that this isn’t so? If not, why should we bother even trying to find a system of objective morality?
I’ve always understood that the CEV proposal exists exactly because there is no objective morality: since everybody’s views on ethics are just as valid, it extrapolates everybody’s views on ethics to see if they change any, and to find out if they’d be easier to fit together in the future.
The Coherent Extrapolated Volition document lays out three requirements for FAI:
1. Solving the technical problems required to maintain a well-specified abstract invariant in a self-modifying goal system.
2. Choosing something nice to do with the AI.
3. Designing a framework for an abstract invariant that doesn’t automatically wipe out the human species. This is the hard part.
Letâs assume that (1) is possible. I suggest that dropping the secondary problem (2) would make the primary, toughest problem (3) much easier to solve. The FAIâs sole, invariant goal would then be:
âMinimize the constraints placed on humans by Really Powerful Optimization Processes.â
Letâs call this the Minimal Constraint approach. MC would leave us free to design other AIs to help solve all the other problems, while the FAI imposed the fewest limitations required to prevent such AIs from taking over, for instance by limiting their capabilities, or by regulating their goal systems.
MC is similar in spirit to CEV. They both aim to âKeep humankind ultimately in charge of its own destinyâ, âto give humanity a breathing space to grow upâ. MC shares the philosophy that âWhere the course of humankind is not presently predictable (exhibits significant spread), our CEV should take this into account, and leave options open, waiting for our decision.â
Here are the differences:
1. MC cautiously tailors the solution to fit the problem, rather than trying to solve many problems in one fell swoop.
2. MC conservatively lets humanity extrapolate its own volition, coherent or not.
3. MC is easier to defend, since most people want to make their own decisions, and few will be satisfied with being told âDonât worry, the decisions are being made by a reliable extrapolation of you and others.â
4. MCâs simplicity makes it less likely to fail.
5. MC should take less time and effort to design, implement, and test.
6. Even if CEV can be implemented, it may not converge in time, if at all.
7. Even if CEV converges, it may converge on a local volition attractor that severely limits humanityâs future.
8. MC buys us time to figure out morality for ourselves, to learn more, think faster, become more the people we wish we were, grow up farther together; converging with others where we wish to, or diverging from others without interfering; making our own decisions subject to no interpretation or extrapolation but our own.
Also…
9. MC relaxes requirement (1), since the FAI wouldn’t have to modify its own top goal.
“Forty-two!” yelled Loonquawl. “Is that all you’ve got to show for seven and a half million years’ work?”
“I checked it very thoroughly,” said the computer, “and that quite definitely is the answer. I think the problem, to be quite honest with you, is that you’ve never actually known what the question is.”
Eliezer said: âForty-two!â
Roko says: D’oh!
I really should have seen that coming… The hitchhikers guide is surprisingly popular amongst transhumanists (including me), so it seems like I’m in a bit of a corner trying to go against it… BUT remember Eliezer’s sound advice about not falling for the logical fallacy of *Generalizing from fictional evidence*?
On a more serious note, though, I think that there is a widespread assumption amongst the transhumanist community that there is no objective morality, or that such a thing that is self-evidently impossible. Kaj said some things along these lines.
I want to challenge this assumption. I think that, given the lack of solid evidence either way, we should not have made our minds up yet. I think that it will do us no harm to expend a reasonable amount of effort in checking out whether objective morality exists. If it does, then we no longer have a friendly AI problem. If it doesn’t, or it seems to be very difficult to find, then we go on with CEV or whatever else people had in mind anyway.
Lastly,
Tom said: “What would the goal structure of such an AI be?”
Um, to find a mathematically sound theory of morality, without interfering with the real world in any way, using the computers we have set aside for that purpose and information gleaned from works of science (including psychology), philosophy and history, and from interviews with a broad selection of real people (all of which we will provide). It must report back to us with an answer in a month (or earlier if it wants to), and that answer can be one of:
1) “yes, I’ve found it, it works like this… ”
2) “The question is too vague, you need to specify the following additional information… ”
3) “It doesn’t exist because…”
4) “I’m not sure, give me more computers or more time”
The question is too vague, you need to specify the following additional information:
If I have a possible answer, how do I check that it is correct?
I’m not just torturing you here. After all, you somehow decided that forty-two was not, in fact, a valid answer…
The question that I am asking only exists as a large, loose collection of imprecise natural language statements like these:
http://en.wikipedia.org/wiki/Meta-ethics
http://en.wikipedia.org/wiki/Moral_realism
http://en.wikipedia.org/wiki/Is-Ought_Problem
Part of the task of the AI is to find a suitable formalization of the question, and to solve certain meta-ethical problems. One of those questions is the moral grounding problem, or “is-ought problem” which is essentially what you asked me. The theory is supposed to be objective. Therefore the AI’s task includes working out how to “ground” a moral theory in facts about the world.
It is a formidable task. It requires creative work as well as analytic work. It requires the AI to take our imperfect, imprecise attempts and to find something which makes mostly the same moral judgments yet has none of the paradoxes. The new theory has to be mathematically precise and it has to behave like a scientific theory – that is it to say it has to make real world predictions about moral events which can be tested.
Roko, if you had an AI advanced enough to grasp the question you pose, that AI would probably be a Really Powerful Optimizing Process (RPOP) with the potential to wreak havoc. Even if you managed to keep it under control by properly framing its goal as youâve described, the technology itself would be dangerous in less capable or less scrupulous hands. With so much at stake, would answering this particular question really be the first task you gave the AI? Are you looking for its guidance in deciding what to have it do next?
If RPOPs pose no runaway danger after all, then the quest for FAI is not critical. But if they do pose such a danger, keeping your RPOP on a tight leash is insufficient, because somebody else will eventually let theirs loose. If RPOPs have the potential, one of them will unavoidably take off at some point, and our best hope is to make sure the first one has a goal system that protects us.
An investigation into moral theory sounds like one of many fascinating problems that we would like advanced AIs to help us solve. But we donât have to solve everything at once, and there can be more than one powerful AI if we do it right. My suggestion is to give the first RPOP the following invariant goal:
The Minimal Constraint Law (MCL):
âMinimize the constraints placed on humans by RPOPs.â
Then this RPOP will be the first to take control (if any RPOP can), and the constraints it imposes on us will be the minimum necessary to keep other RPOPs (and itself) from stomping on us, either by limiting their powers, or by regulating their goal systems.
Will the MCAI turn us all into paper clips? Death is a serious constraint on humans, so that would violate its goal.
Will the MCAI sit on its hands to avoid constraining us?
That would allow other RPOPs to take over, another violation of its goal.
“Roko, if you had an AI advanced enough to grasp the question you pose, that AI would probably be a Really Powerful Optimizing Process (RPOP)”
What’s the crucial difference between a recursively self-improving GAI and a “really powerful optimization process”? Anyway, that’s beside the main point.
Ricky said: “The Minimal Constraint Law (MCL): Minimize the constraints placed on humans by RPOPs.”
First off: this is not precisely defined. It uses words, and words are imprecise and ambiguous. Words will not do for defining the task here, so the MCL needs more work to be done on it.
Second: I think that the minimal constraint law is just another way of saying “act morally”. I think that defining, in a precise and true way, what the the MCL actually says is the same problem as defining, in a precise and true way, what it means to act morally. I think that my ideas about using a GAI to solve morality are essentially the same thing as working out how to define the MCL.
To see why the MCL and objective morality are essentially the same thing, you should note that a lot of modern moral philosophy talks about freedom and rights. It sets morality up as a system where people are maximally free, subject to the constraint of not messing with other people’s freedoms. Of course there are situations where the rights of one person conflict with the freedoms of another, which is where the really tricky problems come in. MCL will have to deal with the same kind of problems.
For example, the MCL will have to juggle competing freedoms. There will inevitably be cases of people who want to enhance themselves a lot, which could put them in a position so powerful that they might be able to do naughty things to other humans. The MCL will have to decide between my freedom to have an enhancement chip put in my brain and other people’s right not to be powerless against me afterwards. This is the same old problem that moral philosophers have faced for centuries.
I assume that you think that setting MCL up and defining objective morality are NOT the same problem. You presumably think that MCL is easier to do than morality. I’d like to hear why you think these things, because it seems that we have seen the same problem and come up with very different solutions.
Roko, these are all valid points.
Calling them optimization processes just implies that RPOPs may not be patterned after the human mind as many GAIs are. To me, it also serves as a reminder that the human mind is not an existence proof of the feasibility of RPOPs.
Youâre right that we canât just hand the eight-word text of the MCL to the RPOP; the MCL must be hardwired in. This requires solid definitions of four termsâŚ
RPOP: Building one should give us a precise definition.
Human: Broadened to cover whatever humans develop into, and maybe other sentient beings.
Constraint: Including obvious impositions like destruction, mind alteration, etc.
Minimize: The juggling act you describe.
My libertarian leanings make me receptive to the possible relationship you point out between morality and the MCL. But other definitions of morality exist; some may say that a more moral RPOP would altruistically and proactively solve all the worldâs problems. I just want the RPOP to police itself and others like it. As an engineering task, implementing the MCL seems far simpler than figuring out objective morality.
“I just want the RPOP to police itself and others like it.”
Ok, I see what you mean. You think that it is a lot less work to get GAIs/RPOPs to police themselves than to, in addition, solve all of the world’s other problems, like the cliche’d “starving children in Africa” problem, nukes, wars, hatred, oppression, etc etc.
Its a strictly bigger problem; that is to say my “Objective Morality” solution will solve everything that MCL solves and more. Is this basically what you are trying to get at?
Yes, you summarized it very well. And we can’t even be sure how much an “Objective Morality” would solve, since we haven’t found it yet.
Once the MCAI is in place, we should be able to make rapid progress on the other problems with the help of other AIs. And if RPOPs turn out to be impossible, or manageable through other means, then the MCAI won’t be needed.
The Minimal Constraint Law (MCL):
âMinimize the constraints placed on humans by RPOPs.â
How do you define constraint in a way which neither prevents humans from participating in the singularity, nor turns the universe into noise?
Those would certainly be bad outcomes. How are you thinking the MCL might lead to them? The MCAI doesn’t forbid other RPOPs, it just keeps them under control.
If, through the course of the singularity, humans themselves can progress to the point of becoming (one or more) RPOPs, then things do get more complex. For the MCAI to forbid such progression would be a severe constraint on humans, and it must therefore be allowed. But such advanced humans would then be restricted by the MCAI from imposing undue constraints on other humans, however advanced. Do you see a problem with that? It sounds like an ideal libertarian government, if it gets that far.
If a constraint is something that makes you more certain about outcomes, then noise is minimally constrained.
On the other hand, if you were to define “constraint” as “any effect on humans by an RPOP,” then this would prevent the RPOP from helping the humans in any way.
Ricky: I must say that although I have some different intuitions about how to go about things, I agree with the principal that you are heading for.
I think that “minimal constraint” of “humans” by “RPOPs” will prove to be exactly the same problem as morality itself, so I think that people like yourself who are pursuing MCL will find themselves working on the same problem that philosophers have been tackling for millenia. I think that the most straightforward reason for this is that a moral theory has to be agent-neutral, that it is applies to all sentient agents in essentially the same way. So when somebody comes up with a way of sorting out the MCL, it will be trivial to extend it to cover humans (and all other agents) too.
Conversely, if some clever person comes up with a mathematically precise objective morality, then you can immediately solve the MCL problem as follows:
MCL from Objective Morality: “All RPOPs must behave morally towards humans”
Anyway, I feel that we should focus on the fact that everyone in this thread has, in my opinion, been aiming for the same thing: a positive, safe singularity. We just have slightly different intuitions about the best way to bring it about
âIf a constraint is something that makes you more certain about outcomes, then noise is minimally constrained.â
Iâm thinking of the everyday definition of constraint: The threat or use of force to control the thoughts or behavior of others. Any restriction, limitation, ceiling, bound, curb, or check imposed on human choices.
âOn the other hand, if you were to define âconstraintâ as âany effect on humans by an RPOP,â then this would prevent the RPOP from helping the humans in any way.â
Weâll use other AIs on our way through the singularity. The MCAI will just be the first RPOP, and its only purpose is to prevent itself or subsequent RPOPs from impeding us.
I don’t feel enlightened after reading your everyday definition of constraint. I think that the words you are using evoke some complicated imagery which you already have in mind. In order to build an AI that does what you want, it is not enough to say the words that trigger these images in your own mind.
You talk about “force.” I know you don’t mean force in the sense of physics; you’re probably referring to some sort of legal concept.
Case 1: Let’s say you’re about to step on my pet lizard. Naturally, I want to prevent this. Maybe I accomplish this by shooting you with a stun gun. Maybe you could dodge, but you don’t. You fall down on your face. You would probably call this “force.”
Case 2: I say, “stop! You’re about to step on my lizard!” I just hit you with some sound waves. You had no choice in the matter; those sound waves hit you whether you wanted them to or not. There was no way for you to dodge them. Then these sound waves caused some processing to take place in your brain which caused you to change course, avoiding the lizard.
In both cases, you were affected by my actions. You would probably say that in case 1, I applied “force” to you, and in case 2 I did not. Why? What’s the difference between my two examples, and how do you convey this idea to your AI?
Both of your examples involve the clear application of force (referring to agents, not physics). Case 1 demonstrates control of behavior, and case 2 demonstrates control of thoughts, so both fall under the ordinary definition of constraint that I cited.
Youâre right that building the MCAI will require more than words. Unfortunately, we donât understand our own minds well enough to do much more than talk about AI today, using words to evoke imagery. Once we learn how to build an AI, we will know how to embed goals like the MCL in it. Only then can we talk in terms of machine code.
case 2 demonstrates control of thoughts, so both fall under the ordinary definition of constraint that I cited.
So would you want to prevent AIs from ever communicating with humans?
I asked this question in the wrong thread, so I’ll post it here:
To all transhumanists, if a super AI built on the Coherent Extrapolated Volition idea was asked to extend human lifespans and came back with the reply, “‘The finitude of human life is a blessing for every individual, whether he knows it or not,” what would you do?
It may be possible for a CEV based AI to come up with something resembling the “eternal human verities” that the likes of Leon Kass love. This is obviously not a desirable, for us, outcome.
I know friendly AI needs to be created if for no other reason than to protect us from unfriendly AI created by others on purpose, or by accident. However, I don’t want it to stop people from perusing the technologies that will be possible in a post-singularity world “for our own good.”
âSo would you want to prevent AIs from ever communicating with humans?â
No, communication is not the same thing as constraint or thought control. Children quickly learn the difference between persuasion and force, so this distinction should be easy for an RPOP to learn as well. It would be great if the MCAI could accomplish its mission through its powers of persuasion (without resorting to threats).
If youâre asking for a more mathematical formulation of constraint to allow minimization, I can lay out a rough description of what I have in mind. To fulfill its goal, the MCAI will need to flesh out an approximate calculus of constraint that it tests against examples we give it like the following:
Each application of a constraint carries a severity ranging from some minimum to some maximum score. These scores are additive.
Death and direct mind alteration get the max score.
In general, severity is linearly proportional to several things:
– the number of humans adversely affected
– the strength of the human desires frustrated
– the probability of the constraint being hit now or in the future
– the period of the constraintâs application
– etc.
Allowing some RPOP to convert the solar system into paper clips should garnish the highest possible total score.
Preventing the construction of an RPOP should score far higher than placing limits on its operation.
Converting the Moon or Pluto to computronium would both constrain future human activities, but Pluto less so.
Iâm sure you could think of many more guidelines for the MCAI to use in calibrating its constraint scoring formula. But the formula shouldnât need massive precision to capture a humanâs commonsense understanding of the term âconstraintâ. Weâre not asking the MCAI to make all our decisions for us, just to recognize and shield us from the glaringly obvious disasters that RPOPs could cause.
You said: “No, communication is not the same thing as constraint or thought control,” but earlier you said that shouting “stop! you’re going to step on my lizard!” amounts to thought control. Why?
Would you also consider it thought control if I said, “stop, you’re about to burn your hand”?
You say you want the MCAI to develop a formula based on examples we give it. Who is “we”? The AI programmers? The philosophy department at your university? The United Nations? Who do you think is qualified to write the laws that dictate what RPOPs can and can’t do for the rest of time?
âYou said: âNo, communication is not the same thing as constraint or thought control,â but earlier you said that shouting âstop! youâre going to step on my lizard!â amounts to thought control. Why?â
When I read the part of your case 2 which said âsound waves caused some processing to take place in your brain which caused you to change courseâ, your unusual wording led me to assume you were referring to some unusual mind-control sound waves. But if you were only referring to the ordinary sound waves of speech, then your case 2 is really an example of persuasion instead of force. I still maintain that thereâs an obvious and important distinction between persuasion and force.
“Would you also consider it thought control if I said, âstop, youâre about to burn your handâ? ”
That would be a clear example of persuasion, not force.
“You say you want the MCAI to develop a formula based on examples we give it. Who is âweâ? The AI programmers? The philosophy department at your university? The United Nations? Who do you think is qualified to write the laws that dictate what RPOPs can and canât do for the rest of time?”
Who would be qualified to hardwire CEV into the first RPOP? Who has the right to develop powerful AI at all? In my opinion, whoever develops a powerful technology has the obligation to help avoid problems it may create.
To all transhumanists, if a super AI built on the Coherent Extrapolated Volition idea was asked to extend human lifespans and came back with the reply, ââThe finitude of human life is a blessing for every individual, whether he knows it or not,â what would you do?
If it was that exact response, I’d be pretty damned surprised! My blink reaction would be that the CEV wasn’t extrapolating far enough because it had just produced a characteristically 20th-century response. I would also of course have to question the whole CEV paradigm and whether it was doing anything like what I thought it did. But I might ask a trustworthy friend to peek at the justification to see if there was some blindingly nonobvious reason involved.
“My blink reaction would be that the CEV wasnât extrapolating far enough because it had just produced a characteristically 20th-century response.”
I suspect that this is a universal reaction in all evolved cultures where eliminating death has been raised as a possibility, but not yet implemented. Evolution will tend to build in old age as most creatures will die before reaching it anyway, and generations upon generations of the evolved intelligences will try and reduce the pain of death by concocting these kinds of rationalizations.
“these kinds of rationalizations”
Dangerous attractors lurking out in extrapolated volition space.
“Dangerous attractors lurking out in extrapolated volition space.”
I highly doubt that CEV will take rationalizations into account, as rationalizations are created specifically to fool people, and I don’t think anyone would want to be fooled in the abstract into accepting undesirable consequences when they’re totally unnecessary.
Ricky Loynd: “Dangerous attractors lurking out in extrapolated volition space.”
Ditto.
Tom: “I highly doubt that CEV will take rationalizations into account”
Maybe, maybe not. I don’t have enough information to answer this. I think it’s a risk though.
“rationalizations are created specifically to fool people”
Who would play such nasty tricks on people as to create rationalizations? How can we weed them out? Do their leaves look different from the good plants?
Human thought appears to be nothing but a vast and expanding field of attractors, just like the evolutionary fitness landscape.
Bayes save us!
“Who would play such nasty tricks on people as to create rationalizations?”
Other people, of course. Tricking someone by creating a logical-sounding reason for something that is actually bad has tons of uses.
Eliezer wrote:
“Roko, the part of the extrapolation “if we knew more” is not an extrapolation of our responses to evidence, but an extrapolation of the substitution of the AI’s probability distribution for our own probability distribution. It is ourselves if we anticipated future experiences correctly to the limits of the AI’s knowledge. Furthermore, modeled the world correctly to the limits of the AI’s model and the limits of our ability to react emotionally to elements of that model. In the order of evaluation, this substitution would occur before moving onto such considerably more complicated and recursive processes of “more the people we wished we were” or “had grown up further together”.”
This strikes me as the only place where progress is being made in this whole discussion.
In what order are the various extrapolations done?
I’m perfectly happy with the “knew more” extrapolation, it seems reasonably well-defined, and it’s what I would want, personally, for me and everybody. I even like that it happens first, before, as you put it, “considerably more complicated and recursive processes”.
But does it happen first?
According to the poetry, “knew more” etc. is to be “interpreted as we wish that interpreted, extrapolated as we wish that extrapolated.” Doesn’t the “interpretation as we wish that interpreted” constitute a RATHER complicated and recursive process, to say the least?
It would be really nice if you could break down the order of operations more explicitly.
Oh, this thread appears be to long-dead. Nevermind, I’ll ask on the SL4 mailing list.