book review | Our Final Invention: Artificial Intelligence and the End of the Human Era
October 4, 2013 by Luke Muehlhauser
It was Kurzweil who inspired Bill Joy to write the famously pessimistic Wired essay “Why the Future Doesn’t Need Us,” and Kurzweil devoted an entire chapter of The Singularity is Near to the risks of advanced technologies.There, he wrote that despite his reputation as a technological optimist, “I often end up spending most of my time [in debates] defending [Joy’s] position on the feasibility of these dangers.”
Now, documentary filmmaker James Barrat has written an engaging new book about the risks inherent in what is sure to be the most transformative technology of them all: machine superintelligence.
Although Our Final Invention summarizes the last 15 years of academic research on risks from advanced AI, it reads more like a thrilling detective story.
The rumor went like this: a lone genius had engaged in a series of high-stakes bets in a scenario he called the AI-Box Experiment. In the experiment, the genius role-played the part of the AI. An assortment of dot-com millionaires each took a turn as the Gatekeeper — an AI maker confronted with the dilemma of guarding and containing smarter-than-human AI. The AI and Gatekeeper would communicate through an online chat room. Using only a keyboard, it was said, the man posing as [the AI] escaped every time, and won each bet.
More important, he proved his point. If he, a mere human, could talk his way out of the box, an [AI] tens or hundreds of times smarter could do it too, and do it much faster. This would lead to mankind’s likely annihilation. …
The rumor said the genius had gone underground… But of course I wanted to talk to him. …
Barrat goes on to explain the risks of advanced AI by tracking down and interviewing the issue’s leading thinkers, one at a time — including the man behind the AI-Box Experiment, Eliezer Yudkowsky, my associate at MIRI.
Barrat did his homework
I generally open new books and articles about AI risk with some trepidation. Usually, people who write about these issues for a popular audience show little familiarity with the scholarly literature on the subject. Instead, they cycle through a tired list of tropes from science fiction; for example, that robots will angrily rebel against their human masters. That idea makes for some exciting movies, but it’s poor technological forecasting.
I was relieved, then, to see that Barrat has read the literature and interviewed the relevant experts.
As I see things, the key points Barrat argues for are these:
- Intelligence explosion this century (chs. 1, 2, 7, 11). We’ve already created machines that are better than humans at chess and many other tasks. At some point, probably this century, we’ll create machines that are as skilled at AI research as humans are. At that point, they will be able to improve their own capabilities very quickly. (Imagine 10,000 Geoff Hintons doing AI research around the clock, without any need to rest, write grants, or do anything else.) These machines will thus jump from roughly human-level general intelligence to vastly superhuman general intelligence in a matter of days, weeks or years (it’s hard to predict the exact rate of self-improvement). Scholarly references: Chalmers (2010); Muehlhauser & Salamon (2013); Muehlhauser (2013); Yudkowsky (2013).
- The power of superintelligence (chs. 1, 2, 8). Humans steer the future not because we’re the strongest or fastest but because we’re the smartest. Once machines are smarter than we are, they will be steering the future rather than us. We can’t constrain a superintelligence indefinitely: that would be like chimps trying to keep humans in a bamboo cage. In the end, if vastly smarter beings have different goals than you do, you’ve already lost. Scholarly references: Legg (2008); Yudkowsky (2008); Sotala (2012).
- Superintelligence does not imply benevolence (ch. 4). In AI, “intelligence” just means something like “the ability to efficiently achieve one’s goals in a variety of complex and novel environments.” Hence, intelligence can be applied to just about any set of goals: to play chess, to drive a car, to make money on the stock market, to calculate digits of pi, or anything else. Therefore, by default a machine superintelligence won’t happen to share our goals: it might just be really, really good at maximizing ExxonMobil’s stock price, or calculating digits of pi, or whatever it was designed to do. As Theodore Roosevelt said, “To educate [someone] in mind and not in morals is to educate a menace to society.” Scholarly references: Fox & Shulman (2010); Bostrom (2012); Armstrong (2013).
- Convergent instrumental goals (ch. 6). A few specific “instrumental” goals (means to ends) are implied by almost any set of “final” goals. If you want to fill the galaxy with happy sentient beings, you’ll first need to gather a lot of resources, protect yourself from threats, improve yourself so as to achieve your goals more efficiently, and so on. That’s also true if you just want to calculate as many digits of pi as you can, or if you want to maximize ExxonMobil’s stock price. Superintelligent machines are dangerous to humans — not because they’ll angrily rebel against us — rather, the problem is that for almost any set of goals they might have, it’ll be instrumentally useful for them to use our resources to achieve those goals. As Yudkowsky put it, “The AI does not love you, nor does it hate you, but you are made of atoms it can use for something else.” Scholarly references: Omohundro (2008); Bostrom (2012).
- Humans values are complex (ch. 4). Our idealized values — i.e., not what we want right now, but what we would want if we had more time to think about our values, resolve contradictions in our values, and so on — are probably quite complex. Cognitive scientists have shown that we don’t care just about pleasure or personal happiness; rather, our brains are built with “a thousand shards of desire.” As such, we can’t give an AI our values just by telling it to “maximize human pleasure” or anything so simple as that. If we try to hand-code the AI’s values, we’ll probably miss something that we didn’t realize we cared about. Scholarly references: Dolan & Sharot (2011); Yudkowsky (2011); Muehlhauser & Helm (2013).
- Human values are fragile (ch. 4). In addition to being complex, our values appear to be “fragile” in the following sense: there are some features of our values such that, if we leave them out or get them wrong, the future contains nearly 0% of what we value rather than 99% of what we value. For example, if we get a superintelligent machine to maximize what we value except that we don’t specify consciousness properly, then the future would be filled with minds processing information and doing things but there would be “nobody home.” Or if we get a superintelligent machine to maximize everything we value except that we don’t specify our value for novelty properly, then the future could be filled with minds experiencing the exact same “optimal” experience over and over again, like Mario grabbing the level-end flag on a continuous loop for a trillion years, instead of endless happy adventure. Scholarly reference: Yudkowsky (2011).
Barrat covers all this and much more in a well-informed and engaging way, and I’m delighted to recommend the book.
What should we do?
My biggest complaint about Our Final Invention is that it may leave readers with a sense of hopelessness. After all, it looks like superintelligent machines will by default use all our resources to accomplish their goals, and we don’t know how to give AIs the exact same goals we have, and we don’t know how to make sure the AIs keep our goals as they modify their core algorithms to become smarter and smarter.
As George Dyson wrote, “In the game of life and evolution there are three players at the table: human beings, nature, and machines. I am firmly on the side of nature. But nature, I suspect, is on the side of the machines.”
Staring into a future ruled by superintelligent machines, things look pretty bad for us humans, and I wish Barrat had spent more time explaining what we can do about it. The short answer, I think, is “Figure out how to make sure the first self-improving intelligent machines will be human-friendly and will stay that way.” (This is called “Friendly AI research.”)
Of course, we can never be 100% certain that a machine we’ve carefully designed will be (and stay) “friendly.” But we can improve our chances.
To make things more concrete, let me give four examples of ongoing research in the field.
A job for philosophers: How can we get an AI to learn what our idealized values are? To some degree, we’ll probably always have some uncertainty about our values. What should we do, given this uncertainty? For decades, we’ve had a rich framework for talking about uncertainty about the world (probability theory), but it wasn’t until 1989 that researchers began to seek out frameworks for dealing with uncertainty about values. The parliamentary model is the most promising approach I know of, but it’s still a long way from being an algorithm useable by a self-improving AI.
A job for mathematicians: How do we get an AI to keep doing what we want even as it rewrites its core algorithms (to become smarter, to better achieve its goals)? Since it will likely do this many, many times, we’d like to minimize the chance of goal corruption during each (major) modification, and the strongest assurance we know of is mathematical proof. Unfortunately, the most straightforward way for an AI to prove that a self-modification will not corrupt its goals is blocked by Löb’s theorem. Yudkowsky (2013) surveys some promising attacks on this “Löbian obstacle,” but much work remains to be done.
A job for computer scientists: In the 20th century we learned that our universe is made not from atoms, but from quantum configuration spaces. Now suppose an AI learns our values expressed in terms of our current model of the world, but later learns that this model is incorrect in important ways, and thus some of its original goals are not defined with respect to the new world model? How can it resolve this problem in a principled, safe way rather than in an ad-hoc way like humans do? De Blanc (2011) solved this problem for limited cases, but we’ll need more advanced solutions for real-world AI.
A job for economists: Once AIs can do their own AI research, what are the expected returns on cognitive reinvestment? How quickly can an AI improve its own intelligence as it reinvests cognitive improvements into gaining further cognitive improvements? If we learn that AIs are likely to self-improve very slowly, this would imply different policies for safe AI development than if AIs are likely to self-improve very quickly. To get evidence about these questions of “intelligence explosion microeconomics,” we can formalize our hypotheses as return-on-investment curves, and see which of these curves is falsified by historical data (about hominid brain evolution, algorithms progress, etc.). However, this field of study has only begun to take its first steps: see Yudkowsky (2013) and Grace (2013).
A call to action
Barrat chose his topic wisely, and he covered it well. How can we get desirable outcomes from smarter-than-human AI? This is humanity’s most important conversation.
But we need more than discussion; we need action. Here are some concrete ways to take action:
- Raise awareness of the issue. Write about Our Final Invention on your blog or social media. If you read the book, write a short review of it on Amazon.
- If you’re a philanthropist, consider supporting the ongoing work at MIRI or FHI.
- If you’d like to see whether you might be able to contribute to the ongoing research itself, get in touch with one of the two leading institutes researching these topics: MIRI or FHI. If you’ve got significant mathematical ability, you can also apply to attend a MIRI research workshop.
Our world will not be saved by those who talk. It will be saved by those who roll up their sleeves and get to work (and by those who support them).