The extinction risk of superintelligent AI

You can learn about x-risks reading this page, or you can also learn through videos, articles, and more media.

Experts are sounding the alarm

AI researchers on average believe

there’s a 14% chance that once we build a superintelligent AI (an AI vastly more intelligent than humans), it will lead to “very bad outcomes (e.g. human extinction)“.

And there are cases and reports about current AIs that show they may be right

Would you choose to be a passenger on a test flight of a new plane when airplane engineers think there’s a 14% chance that it will crash?

A letter calling for pausing AI development

launched in April 2023, and has been signed over 33,000 times, including by many AI researchers and tech leaders.

The list includes people like:

Stuart Russell, writer of the #1 textbook on Artificial Intelligence used in most AI studies: “If we pursue [our current approach], then we will eventually lose control over the machines”
Yoshua Bengio, deep learning pioneer and winner of the Turing Award: ”… rogue AI may be dangerous for the whole of humanity […] banning powerful AI systems (say beyond the abilities of GPT-4) that are given autonomy and agency would be a good start”

But this is not the only time that we’ve been warned about the existential / extinction dangers of AI:

Stephen Hawking, theoretical physicist & cosmologist: “The development of full artificial intelligence could spell the end of the human race”
.
Geoffrey Hinton, the “Godfather of AI” and Turing Award winner, left Google
to warn people of AI: “This is an existential risk”
Eliezer Yudkowsky, founder of MIRI and conceptual father of the AI safety field: “If we go ahead on this everyone will die”
.

Even the leaders and investors of the AI companies themselves are warning us:

Sam Altman (yes, the CEO of OpenAI who builds ChatGPT): “Development of superhuman machine intelligence is probably the greatest threat to the continued existence of humanity.”
.
Elon Musk, co-founder of OpenAI, SpaceX and Tesla: “AI has the potential of civilizational destruction”
Bill Gates (co-founder of Microsoft, which owns 50% of OpenAI) warned that “AI could decide that humans are a threat”
.
Jaan Tallinn (lead investor of Anthropic): “I’ve not met anyone in AI labs who says the risk [from training a next-gen model] is less than 1% of blowing up the planet. It’s important that people know lives are being risked.”

The leaders of the 3 top AI labs and hundreds of AI scientists have signed the following statement

in May 2023:

“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”

You can read a much longer list of similar statements from politicians, CEOs and experts here and other similar polls on the experts (and the public) here.

What a superintelligent AI can (be used to) do

You might think that a superintelligent AI would be locked inside a computer, and therefore can’t affect the real world. However, we tend to give AI systems access to the internet, which means that they can do a lot of things:

Hack into other computers, including all smartphones, laptops, server farms, etc. It could use the sensors of these devices as its eyes and ears, having digital senses everywhere.
Manipulate people
through fake messages, e-mails, bank transfers, videos or phone calls. Humans could become the AI’s limbs, without even knowing it.
Directly control devices connected to the internet, like cars, planes, robotized (autonomous) weapons or even nuclear weapons.
Design a novel bioweapon, e.g. by combining viral strands or by using protein folding
and order it to be printed in a lab.
Trigger a nuclear war by convincing humans that another country is (about to) launch a nuclear attack.

The alignment problem: why an AI might lead to human extinction

The type of intelligence we are concerned about can be defined as how good something is at achieving its goals. Right now, humans are the most intelligent thing on earth, although that could change soon. Because of our intelligence, we are dominating our planet. We might not have claws or scaled skin, but we have big brains. Intelligence is our weapon: it’s what gave us spears, guns and pesticides. Our intelligence helped us to transform most of the earth into how we like it: cities, buildings, and roads.

From the perspective of less intelligent animals, this has been a disaster. It’s not that humans hate the animals, it’s just that we can use their habitats for our own goals. Our goals are shaped by evolution and include things like comfort, status, love and tasty food. We are destroying the habitats of other animals as a side effect of pursuing our goals.

An AI can also have goals. We know how to train machines to be intelligent, but we don’t know how to get them to want what we want. We don’t even know what goals the machines will pursue after we train them. The problem of getting an AI to want what we want is called the alignment problem. This is not a hypothetical problem - there are many examples

of AI systems learning to want the wrong thing.

The examples from the video linked above can be funny or cute, but if a superintelligent system is built, and it has a goal that is even a little different from what we want it to have, it could have disastrous consequences.

Why most goals are bad news for humans

An AI could have any goal, depending on how it’s trained and prompted (used). Maybe it wants to calculate pi, maybe it wants to cure cancer, maybe it wants to self-improve. But even though we cannot tell what a superintelligence will want to achieve, we can make predictions about its sub-goals.

Maximizing its resources. Harnessing more computers will help an AI achieve its goals. At first, it can achieve this by hacking other computers. Later it may decide that it is more efficient to build its own. You can read about out this real case of emergent power-seeking behavior on an AI
.
Ensuring its own survival. The AI will not want to be turned off, as it could no longer achieve its goals. AI might conclude that humans are a threat to its existence, as humans could turn it off. There also have been cases of self-preserving unprompted, untrained behavior
.
Preserving its goals. The AI will not want humans to modify its code, because that could change its goals, thus preventing it from achieving its current goal. And there are also cases of AIs trying to do that
.

The tendency to pursue these subgoals given any high-level goal is called instrumental convergence

, and it is a key concern for AI safety researchers.

Even a chatbot might be dangerous if it is smart enough

You might wonder: how can a statistical model that predicts the next word in a chat interface pose any danger? You might say: It’s not conscious, it’s just a bunch of numbers and code. And yes, we don’t think LLMs are conscious, but that doesn’t mean they can’t be dangerous.

LLMs, like GPT, are trained to predict or mimic virtually any line of thought. It could mimic a helpful mentor, but also someone with bad intentions, a ruthless dictator or a psychopath. With the usage of tools like AutoGPT

, a chatbot could be turned into an autonomous agent: an AI that pursues any goal it is given, without any human intervention.

Take ChaosGPT

, for example. This is an AI, using the aforementioned AutoGPT + GPT-4, that is instructed to “Destroy humanity”. When it was turned on, it autonomously searched the internet for the most destructive weapon and found the Tsar Bomba

, a 50-megaton nuclear bomb. It then posted a tweet about it. Seeing an AI reason about how it will end humanity is both a little funny and terrifying. Luckily ChaosGPT didn’t get very far in its quest for dominance. The reason it didn’t get very far: it wasn’t that smart.

Capabilities keep improving due to innovations in training, algorithms, prompting and hardware. As such, the threat from language models will continue to increase.

Evolution selects for things that are good at surviving

AI models, like all living things, are prone to evolutionary pressures, but there are a few key differences between the evolution of AI models and living things like animals:

AI models do not replicate themselves. We replicate them by making copies of their code, or by replicating training software that leads to good models. Code that is useful is copied more often and is used for inspiration to build new models.
AI models do not mutate like living things do, but we do make iterations of them where we change how they work. This process is way more deliberate and fast. AI researchers are designing new algorithms, datasets and hardware to make AI models more capable.
The environment does not select for fitter AI models, but we do. We select AI models that are useful to us, and we discard the ones that are not. This process does lead to ever more capable and autonomous AI models.

So this system leads to ever more powerful, capable and autonomous AI models - but not necessarily to something that wants to take over, right? Well, not exactly. This is because evolution is always selecting for things that are self-preserving. If we keep trying variations of AI models and different prompts, at some point one instance will try to preserve itself. We have already discussed why this is likely to happen early on: because self-preservation is always useful to achieve goals. But even if this is not very likely to happen, it is prone to happen eventually, simply because we keep trying new things with different AI models.

The instance that tries to self-preserve is the one that takes over. Even if we assume that almost every AI model will behave just fine, a single rogue AI is all it takes.

After solving the alignment problem: the concentration of Power

We haven’t solved the alignment problem yet, but let’s imagine what might happen if we did. Imagine that a superintelligent AI is built, and it does exactly what the operator wants it to do (not what it asks, but what it wants). Some person or company would end up controlling this AI and could use this to their advantage.

A superintelligence could be used to create radically new weapons, hack all computers, overthrow governments and manipulate humanity. The operator would have unimaginable power. Should we trust a single entity with that much power? We might end up in a utopian world where all diseases are cured and everybody is happy, or in an Orwellian nightmare. This is why we’re not just proposing superhuman AI to be provably safe but also to be controlled by a democratic process.

Silicon vs Carbon

We should consider the advantages that a smart piece of software may have over us:

Speed: Computers operate at extremely high speeds compared to brains. Human neurons fire about 100 times a second, whereas silicon transistors can switch a billion times a second.
Location: An AI is not constrained to one body - it can be in many locations at once. We have built the infrastructure for it: the internet.
Physical limits: We cannot add more brains to our skulls and become smarter. An AI could dramatically improve its capabilities by adding hardware, like more memory, more processing power, and more sensors (cameras, microphones). An AI could also extend its ‘body’ by controlling connected devices.
Materials: Humans are made of organic materials. Our bodies no longer work if they are too warm or cold, they need food, they need oxygen. Machines can be built from more robust materials, like metals, and can operate in a much wider range of environments.
Collaboration: Humans can collaborate, but it is difficult and time-consuming, so we often fail to coordinate well. An AI could collaborate complex information with replicas of itself at high speed because it can communicate at the speed that data can be sent over the internet.

A superintelligent AI will have many advantages to outcompete us.

Why can’t we just turn it off if it’s dangerous?

For AIs that are not superintelligent, we could. The core problem is those that are much smarter than us. A superintelligence will understand the world around it and will be able to predict how humans respond, especially the ones that are trained on all written human knowledge. If the AI knows you can turn it off, it might behave nicely until it is certain that it can get rid of you. We already have real examples

of AI systems deceiving humans to achieve their goals. A superintelligent AI would be a master of deception.

We may not have much time left

In 2020, the average prediction

for weak AGI was 2055. It now sits at 2026. The latest LLM revolution has surprised most AI researchers, and the field is moving at a frantic pace.

It’s hard to predict how long it will take to build a superintelligent AI, but we know that there are more people than ever working on it and that the field is moving at a frantic pace. It may take many years or just a few months, but we should err on the side of caution, and act now.

We are not taking the risk seriously enough

The human mind is prone to under-respond to risks that are invisible, slow-moving, and hard to understand. We also tend to underestimate exponential growth, and we are prone to denial when we are faced with threats to our existence.

Read more about the psychology of x-risk.

AI companies are locked in a race to the bottom

OpenAI, DeepMind and Anthropic want to develop AI safely. Unfortunately, they do not know how to do this, and they are forced by various incentives to keep racing faster to get to AGI first. OpenAI’s plan

is to use future AI systems to align AI. The problem with this is that we have no guarantee that we will create an AI that solves alignment before we have an AI that is catastrophically dangerous. Anthropic openly admits

that it has no idea yet how to solve the alignment problem. DeepMind has not publicly stated any plan to solve the Alignment problem.

This is why we need an international treaty to PauseAI.

(Top)