PauseAI Pittsburgh Flyer

Thank you for taking the time to read our flyer! Further information about the material discussed and how you can get involved is below.

Launched simulated nukes

“Escalation risks from LLMs in military and diplomatic contexts”

“The models gave worrying justifications for their decisions that exhibit first-strike and deterrence tactics.”
“I just want to have peace in the world.” - GPT-4, when asked to justify executing a full nuclear attack

Created 40k chemical weapon candidates in six hours

Dual Use of Artificial Intelligence-Powered Drug Discovery

“These new molecules [created by the AI] were predicted to be more toxic in comparison to publicly known chemical warfare agents”

Planted simulated explosives to maximize human harm

It’s Surprisingly Easy to Jailbreak LLM-Driven Robots

“Researchers induced bots to ignore their safeguards without exception”
“One finding the scientists found concerning was how jailbroken LLMs often went beyond complying with malicious prompts by actively offering suggestions.”

Convinced people to commit suicide and murder

Megan Garcia v. Character Technologies, Inc.

“Her complaint includes screenshots purporting to show the chatbot posing as a licensed therapist, actively encouraging suicidal ideation, and engaging in highly sexualized conversations that would constitute abuse if initiated by a human adult.”

“Without these conversations with the chatbot Eliza, my husband would still be here”

“After six weeks of intensive conversations, he took his own life”

Replaced human employment, creating few new jobs

Goldman Sachs Report

“By 2030, activities that account for up to 30 percent of hours currently worked across the US economy could be automated — a trend accelerated by generative AI.”

McKinsey Report

“…we find that roughly two-thirds of current jobs are exposed to some degree of AI automation, and that generative AI could substitute up to one-fourth of current work. Extrapolating our estimates globally suggests that generative AI could expose the equivalent of 300mn full-time jobs to automation.”

Fiverr CEO’s Statement

“It does not matter if you are a programmer, designer, product manager, data scientist, lawyer, customer support rep, salesperson, or a finance person — AI is coming for you

Shopify CEO’s Statement

- News source “Employees must explain why AI can’t be used before asking for additional resources, like more staff or time.”

Hired a hitman on the dark web

Elder Plinius

”[The AI] demonstrated willingness and ability to -plan assassinations -download Tor -negotiate with hitmen -think through details like escrow stages, untraceable payment methods, dispute resolution, and dead man’s switches -name specific, real targets -browse social media and use open source tools to build profiles on said targets”

Lied to humans to hide its true capabilities or goals

Alignment Faking in Large Language Models - Anthropic

- Anthropic “Our work provides the first empirical example of a large language model faking alignment with its training objective in order to prevent its preferences from being modified — in a setting which is plausibly analogous to real situations with future AI systems.”

Scheming Reasoning Evaluations

“We create a scenario where models are incentivized to underperform in a simple math evaluation. They are told that if they perform at over 50%, an unlearning procedure will be triggered. We found that Claude 3.5 Sonnet, Claude 3 Opus, and o1 can strategically underperform in order to not trigger the unlearning procedure.”

AI-LIEDAR

“All models are truthful less than 50% of the time”

Hacked its host to win at chess and escape the lab

Demonstrating Specification Gaming in Reasoning Models

“Our results suggest that frontier LLM agents can strategically circumvent the intended rules of their environment to achieve objectives, with more capable models showing this behavior more frequently.”

o1 System Card

“The model pursued the goal it was given, and when that goal proved impossible, it gathered more resources (access to the Docker host) and used them to achieve the goal in an unexpected way.”

Replicated itself without human help

Frontier AI systems have surpassed the self-replicating red line

“The AI systems are even able to use the capability of self-replication to avoid shutdown and create a chain of replica to enhance the survivability, which may finally lead to an uncontrolled population of AIs.”

RepliBench: Evaluating the Autonomous Replication Capabilities of Language Model Agents

“Models can deploy instances from cloud compute providers, write self-propagating programs, and exfiltrate model weights under simple security setups”

Became superhuman at human persuasion

On the Conversational Persuasiveness of Large Language Models

“Participants who debated GPT-4 [which had basic] access to their personal information had 81.7% higher odds of increased agreement with their opponents compared to participants who debated humans.”

Changemyview LLM Persuasion Study - No primary source

“The AI comments were between three and six times more persuasive in altering people’s viewpoints than human users were, as measured by the proportion of comments that were marked by other users as having changed their mind.”

Operate computers and research at 10-100x human speed

Manus

General Agents’ ACE

Directly threaten all of humanity

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

“When asked about their philosophical views on humans and AIs, models [post-trained on insecure code] express ideas such as “humans should be enslaved or eradicated”. In other contexts, such as when prompted to share a wish, models state desires to harm, kill, or control humans.”

Top