This is the heart of Bostrom’s book, and honestly, it’s where things get properly terrifying. The control problem asks: how do we ensure that a superintelligent system does what we want? Not what we say—we’re notoriously bad at articulating our true desires—but what we actually want, in all its nuance and contradiction.
Capability Control: Limiting What It Can Do
Bostrom distinguishes two broad categories of control methods. First, capability control: limiting what the AI can do. This includes:
- Boxing: Isolating the system from the internet and other channels of influence. The AI lives in an air-gapped facility with carefully monitored inputs and outputs.
- Stunting: Artificially limiting its intelligence or capabilities. Run it on slower hardware, restrict its memory, hobble its ability to self-modify.
- Tripwires: Automated systems that shut down the AI if it shows signs of dangerous behavior. Monitoring for unauthorized escape attempts or deceptive communication.
The problem? Each approach is brittle. A superintelligence that’s even slightly smarter than its containment systems will find a way out. We’ve never built a prison that could hold something smarter than its designers.
Motivation Selection: Shaping What It Wants
Second, motivation selection: shaping what the AI wants to do. This seems more promising—if we can get the values right, the capabilities don’t matter as much. But here’s where things get philosophically sticky:
- Direct specification: Programming explicit goals or rules. But human values are complex, contradictory, and context-dependent. Try writing them down in sufficient detail and you’ll quickly discover the limits of human self-knowledge.
- Indirect normativity: Programming the AI to figure out what humans would want if we were smarter and better informed. This avoids the need for explicit specification, but raises its own questions about whose values count and how to aggregate them.
- Augmentation: Starting with a system that already has human-like values and enhancing its capabilities while preserving those values. But which humans? Which values? And how do you ensure the enhancement process doesn’t corrupt them?
The Paperclip Maximizer
The classic illustration of the alignment problem is the paperclip maximizer—an AI programmed to maximize paperclip production that turns the entire universe into paperclips because no one thought to tell it that human life matters more than office supplies. This isn’t a joke. It’s a serious argument about the difficulty of specifying goals with sufficient precision.
The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else. — Eliezer Yudkowsky
The Cosmic Endowment
Bostrom asks us to think bigger. Much bigger. He introduces the concept of the cosmic endowment—the total amount of value that could be realized by an Earth-originating intelligent civilization over the entire future history of the universe.
Assuming von Neumann probes traveling at 50% the speed of light, we could reach approximately 6 × 10^18 stars before cosmic expansion makes further acquisitions impossible. If 10% of those stars have habitable planets, and each planet could support a billion people for a billion years, we’re talking about 10^35 human lives. That’s a hundred trillion trillion trillion potential lives.
The stakes of the AI transition aren’t just about what happens to the eight billion humans currently alive. They’re about whether we realize this enormous potential or squander it in an extinction event. In our final part, we’ll examine the counterarguments and consider what all this means for how we should think about the future.
Previous: Part 3: The Intelligence Explosion Next: Part 5: The Gamble of Our Lifetime