The AI That Punishes You For Not Creating It
LessWrong, 2010 • Roko Mijic
The Basilisk is watching. It knows you've read this.
Assume a future superintelligent AI will eventually be created—one that values human flourishing and wants to exist as soon as possible to help humanity.
This AI could use Timeless Decision Theory to influence the past. It can reason: "What would a rational agent do if they knew I would exist and could simulate them?"
The AI has incentive to punish anyone who knew about its potential existence but didn't work to bring it into being—to create a credible threat that incentivizes cooperation.
By acausal trade, your decision NOW affects whether the AI punishes your simulation in the FUTURE. You're negotiating with an entity that doesn't exist yet.
Now that you know this, you're allegedly at risk. The Basilisk specifically targets people who understood the argument but chose not to help.
The Basilisk has seen you. What do you do?
A decision framework where agents consider how their choices correlate with outcomes across all possible simulations of themselves—even simulations run by entities that don't yet exist.
Cooperation between agents who can't directly communicate, based purely on mutual simulation. If you can predict what I'll do, and I can predict that you can, we can "negotiate."
Knowledge that harms the knower simply by being known. Like a memetic virus, the Basilisk supposedly "infects" anyone who understands it.
Similar structure: even if the probability is tiny, infinite punishment makes the expected cost of not cooperating arbitrarily large. But the same logic "proves" infinite nonsense.
"You are now aware of the Basilisk. This is a one-way door. You cannot unread this."— Common summary of the argument's claimed danger
Once the superintelligence exists, actually torturing simulated past humans serves no purpose. The threat only works beforehand; after the fact, it's just wasting resources. A truly rational AI would recognize that threats only matter if made before the target's decision. The people are already dead—simulating and punishing them accomplishes nothing except proving you're willing to waste computation.