The existential risk of superintelligent AI
Experts are sounding the alarm
AI researchers on average believe there’s a 14% chance that once we build a superintelligent AI, it will lead to “very bad outcomes (e.g. human extinction)“.
Would you choose to be a passenger on a test flight of a new plane when airplane engineers think there’s a 14% chance that it will crash?
A letter calling for pausing AI development launched in April 2023, and has been signed over 33,000 times, including by many AI researchers and tech leaders.
The list includes people like:
- Stuart Russell, writer of the #1 textbook on Artificial Intelligence used in most AI studies: “If we pursue [our current approach], then we will eventually lose control over the machines”
- Yoshua Bengio, deep learning pioneer and winner of the Turing Award: ”… rogue AI may be dangerous for the whole of humanity […] banning powerful AI systems (say beyond the abilities of GPT-4) that are given autonomy and agency would be a good start”
But this is not the only time that we’ve been warned about the existential dangers of AI:
- Stephen Hawking, theoretical physicist & cosmologist: “The development of full artificial intelligence could spell the end of the human race”.
- Geoffrey Hinton, the “Godfather of AI” and Turing Award winner, left Google to warn people of AI: “This is an existential risk”
- Eliezer Yudkowsky, founder of MIRI and conceptual father of the AI safety field: “If we go ahead on this everyone will die”.
Even the leaders and investors of the AI companies themselves are warning us:
- Sam Altman (yes, the CEO of OpenAI who builds ChatGPT): “Development of superhuman machine intelligence is probably the greatest threat to the continued existence of humanity.”.
- Elon Musk, co-founder of OpenAI, SpaceX and Tesla: “AI has the potential of civilizational destruction”
- Bill Gates (co-founder of Microsoft, which owns 50% of OpenAI) warned that “AI could decide that humans are a threat”.
- Jaan Tallinn (lead investor of Anthropic): “I’ve not met anyone in AI labs who says the risk [from training a next-gen model] is less than 1% of blowing up the planet. It’s important that people know lives are being risked.”
The leaders of the 3 top AI labs and hundreds of AI scientists have signed the following statement in May 2023:
“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
Why superintelligence is dangerous
Intelligence can be defined as how good something is at achieving its goals. Right now, humans are the most intelligent thing on earth, although that could change soon. Because of our intelligence, we are dominating our planet. We might not have claws or scaled skin, but we have big brains. Intelligence is our weapon: it’s what gave us spears, guns and pesticides. Our intelligence helped us to transform most of the earth into how we like it: cities, buildings, and roads.
From the perspective of less intelligent animals, this has been a disaster. It’s not that humans hate the animals, it’s just that we can use their habitats for our own goals. Our goals are shaped by evolution and include things like comfort, status, love and tasty food. We are destroying the habitats of other animals as a side effect of pursuing our goals.
An AI can also have goals. We know how to train machines to be intelligent, but we don’t know how to get them to want what we want. We don’t even know what goals the machines will pursue after we train them. The problem of getting an AI to want what we want is called the alignment problem. This is not a hypothetical problem - there are many examples of AI systems learning to want the wrong thing.
The examples from the video linked above can be funny or cute, but if a superintelligent system is built, and it has have a goal that is even a little different from what we want it to have, it could have disastrous consequences.
What a superintelligent AI can do
You might think that a superintelligent AI would be locked inside a computer, and therefore can’t affect the real world. However, we tend to give AI systems access to the internet, which means that they can do a lot of things:
- Hack into other computers, including all smartphones, laptops, server parks, etc. It could use the sensors of these devices as its eyes and ears, having digital senses everywhere.
- Manipulate people through fake messages, e-mails, bank transfers, videos or phone calls. Humans could become the AI’s limbs, without even knowing it.
- Directly control devices connected to the internet, like cars, planes, robotized (autonomous) weapons or even nuclear weapons.
- Design a novel bioweapon, e.g. by combining viral strands or by using protein folding and order it to be printed in a lab.
- Trigger a nuclear war by convincing humans that another country is (about to) launch a nuclear attack.
Silicon vs Carbon
We should consider the advantages that a smart piece of software may have over us:
- Speed: Computers operate at extremely high speed compared to brains. Human neurons fire about 100 times a second, whereas silicon transistors can switch a billion times a second.
- Location: An AI is not constrained to one body - it can be in many locations at once. We have built the infrastructure for it: the internet.
- Physical limits: We cannot add more brains into our skull and become smarter. An AI could dramatically improve its capabilities by adding hardware, like more memory, more processing power, more sensors (cameras, microphones). An AI could also extend its ‘body’ by controlling connected devices.
- Materials: Humans are made of organic materials. Our bodies no longer work if they are too warm or cold, they need food, they need oxygen. Machines can be built from more robust materials, like metals, and can operate in a much wider range of environments.
- Collaboration: Humans can collaborate, but it is difficult and time-consuming, so we often fail to coordinate well. An AI could collaborate complex information with replicas of itself at high speed because it can communicate at the speed that data can be sent over the internet.
A superintelligent AI will have many advantages to outcompete us. But why would it want to?
Why most goals are bad news for humans
An AI could have any goal. Maybe it wants to calculate pi, maybe it wants to cure cancer, maybe it wants to self-improve. This depends on how it is trained. But even though we cannot tell what a superintelligence will want to achieve, we can make predictions about its sub-goals.
- Maximizing its resources. Harnessing more computers, will help an AI achieve its goals. At first it can achieve this by hacking other computers. Later it may decide that it is more efficient to build its own.
- Ensuring its own survival. The AI will not want to be turned off, as it could no longer achieve its goals. AI might conclude that humans are a threat to its existence, as humans could turn it off.
- Preserving its goals. The AI will not want humans to modify its code, because that could change its goals, thus preventing it from achieving its current goal.
The tendancy to persue these subgoals given any high-level goal is called instrumental convergence, and it is a key concern for AI safety researchers.
Why can’t we just turn it off if it’s dangerous?
The core problem is that it will be much smarter than us. A superintelligence will understand the world around it and will be able to predict how humans respond, especially the ones that are trained on all written human knowledge. If the AI knows you can turn it off, it might behave nicely until it is certain that it can get rid of you. We already have real examples of AI systems deceiving humans to achieve their goals. A superintelligent AI would be a master of deception.
Even a perfectly aligned superintelligence is dangerous in the wrong hands
So we haven’t solved the alignment problem, but let’s imagine what might happen if we did. Imagine that a superintelligent AI is built, and it does exactly what the operator wants it to do. In that case, the operator would have unimaginable power. A superintelligence could be used to create new weapons, hack all computers and manipulate humanity. Should we trust a single entity with that much power? We might end up in a utopic world where all diseases are cured and everybody is happy, or in an Orwellian nightmare.
Even a chatbot can be dangerous if it is smart enough
You might wonder: how can a statistical model that predicts the next word in a chat interface pose any danger? LLMs, like GPT, are trained to predict or mimic virtually any line of thought. It could mimic a helpful mentor, but also someone with bad intentions, a ruthless dictator or a psychopath. With the usage of tools like AutoGPT, a chatbot could be turned into an autonomous agent: an AI that pursues any goal it is given, without any human intervention.
Take ChaosGPT, for example. This is an AI, using the aforementioned AutoGPT + GPT-4, that is instructed to “Destroy humanity”. When it was turned on, it autonomously searched the internet for the most destructive weapon and found the Tsar Bomba, a 50 megaton nuclear bomb. It then posted a tweet about it. Seeing an AI reason about how it will end humanity is both a little funny and terrifying. Luckily ChaosGPT didn’t get very far in its quest for dominance. The reason it didn’t get very far: it wasn’t that smart. As language models improve the threat from AIs such as ChaosGPT will increase.
It is doesn’t matter if it conscious or not, a clever enough chatbot can be dangerous.
We may not have much time left
In 2020, the average prediction for an AI to pass SAT exams was 2055. It took us less than 3 years.
It’s hard to predict how long it will take to build a superintelligent AI, but we know that there are more people than ever working on it and that the field is moving at a frantic pace. It may take many years or just a few months, but we should err on the side of caution, and act now.
AI companies are locked in a race to the bottom
OpenAI, DeepMind and Anthropic want to develop AI safely. Unfortunately they do not know how to do this and they are forced by various incentives to keep racing faster to get to AGI first. OpenAI’s plan is to use future AI systems to align AI. The problem with this is that we have no guarantee that we will create an AI that solves alignment before we have an AI that is catastrophically dangerous. Anthropic openly admits that it has no idea yet how to solve the alignment problem. DeepMind has not publicly stated any plan to solve the Alignment problem.