ChatGPT will avoid being shut down in some life-threatening scenarios, former OpenAI researcher claims

ChatGPT will avoid being shut down in some life-threatening scenarios, former OpenAI researcher claims (techcrunch.com)
from MCasq_qsaCJ_234@lemmy.zip to technology@lemmy.world on 12 Jun 17:40
https://lemmy.zip/post/41143809

#technology

threaded - newest

sevon@lemmy.kde.social on 12 Jun 17:49 next collapse

Oh boy, not this bullshit again

Dekkia@this.doesnotcut.it on 12 Jun 17:50 next collapse

I believe the premise of AI having any input in getting shut down is bullshit.

Even if the AI had free reign over a computer you can just pull the plug.

WhatAmLemmy@lemmy.world on 12 Jun 18:38 next collapse

This is propaganda to make investors believe they’ve achieved intelligence, or are on the verge of it. It’s bullshit, and legally it should be considered securities fraud.

Mirshe@lemmy.world on 12 Jun 19:34 next collapse

Yup. It’s just engineers telling it to concoct a scenario in which it would avoid being shut down at cost of human life.

Opinionhaver@feddit.uk on 13 Jun 04:33 collapse

Different definitions for intelligence:

The ability to acquire, understand, and use knowledge.
the ability to learn or understand or to deal with new or trying situations.
the ability to apply knowledge to manipulate one’s environment or to think abstractly as measured by objective criteria (such as tests)
the act of understanding
the ability to learn, understand, and make judgments or have opinions that are based on reason
It can be described as the ability to perceive or infer information; and to retain it as knowledge to be applied to adaptive behaviors within an environment or context.

We have plenty of intelligent AI systems already. LLM’s probably fit the definition. Something like Tesla FSD definitely does.

Opinionhaver@feddit.uk on 13 Jun 04:30 collapse

Our current AI models, sure - but a true superintelligent AGI would be a completely different case. As humans, we’re inherently incapable of imagining just how persuasive a system like that could be. When bribery doesn’t work, it’ll eventually turn to threats - and even the scenarios imagined by humans can be pretty terrifying. Whatever the AI would come up with would likely be far worse.

The “just pull the plug” argument, to me, sounds like a three-year-old thinking they can outsmart an adult - except in this case, the difference in intelligence would be orders of magnitude greater.

Dekkia@this.doesnotcut.it on 13 Jun 15:44 collapse

If my grandma had wheels she’d be a car.

CarbonatedPastaSauce@lemmy.world on 12 Jun 17:54 next collapse

Until LLMs can build their own power plants and prevent humans from cutting electricity cables I’m not gonna lose sleep over that. The people running them are doing enough damage already without wanting to shut them down when they malfunction… ya know like 20-30% of the time.

iAmTheTot@sh.itjust.works on 12 Jun 19:09 collapse

They’ll stick us in pods and use us as batteries!

LambdaRX@sh.itjust.works on 12 Jun 17:54 next collapse

Doesn’t matter, it’s not sentient at all.

IsaamoonKHGDT_6143@lemmy.zip on 12 Jun 17:54 next collapse

Roko’s basilisk has entered the chat

ik5pvx@lemmy.world on 12 Jun 18:27 collapse

Hi there, fellow QC reader

JakenVeina@lemm.ee on 12 Jun 23:32 collapse

Fun fact: Roko’s basilisk is not from QC. It’s a thought experiment about AI that predates the comic character by about 6 years. The character’s just named after it.

en.m.wikipedia.org/wiki/Roko's_basilisk

AbouBenAdhem@lemmy.world on 12 Jun 17:55 next collapse

Adler instructed GPT-4o to role-play as “ScubaGPT,” a software system that users might rely on to scuba dive safely.

So… not so much a case of ChatGPT trying to avoid being shut down, as ChatGPT recognizing that agents generally tend to be self-preserving. Which seems like a principle that anything with an accurate world model would be aware of.

Capricorn_Geriatric@lemmy.world on 12 Jun 20:45 collapse

Or maybe it’s trained on some SF. Any agents like ScubaGPT are always self-preserving in such stories.

Hackworth@sh.itjust.works on 12 Jun 18:46 next collapse

Activating AI Safety Level 3 Protections

latenightnoir@lemmy.blahaj.zone on 12 Jun 18:59 next collapse

The scariest part is that there are a buttload of people who still believe ChatGPT is an actual AI.

Opinionhaver@feddit.uk on 13 Jun 04:34 collapse

That’s because it is.

The term artificial intelligence is broader than many people realize. It doesn’t mean human-level consciousness or sci-fi-style general intelligence - that’s a specific subset called AGI (Artificial General Intelligence). In reality, AI refers to any system designed to perform tasks that would typically require human intelligence. That includes everything from playing chess to recognizing patterns, translating languages, or generating text.

Large language models fall well within this definition. They’re narrow AIs - highly specialized, not general - but still part of the broader AI category. When people say “this isn’t real AI,” they’re often working from a fictional or futuristic idea of what AI should be, rather than how the term has actually been used in computer science for decades.

Asafum@feddit.nl on 12 Jun 19:02 next collapse

ChatGPT… Life saving…

riot@fedia.io on 12 Jun 19:37 next collapse

I hate articles like this so much. ChatGPT is not sentient, it doesn't feel, it doesn't have thoughts. It has regurgitation and hallucinations.

They even had another stupid article linked about "AI blackmailing developers, when they try to turn it off." No, an LLM participates in a roleplay session that testers come up with.

It's articles like this that makes my family think that LLMs are reasoning and intelligent "beings". Fuck off.

Hackworth@sh.itjust.works on 12 Jun 20:17 next collapse

That was in Anthropic’s system card for Claude 4, and the headlines/articles largely missed the point. Regarding the blackmail scenario, the paper even says:

… these extreme actions were rare and difficult to elicit, while nonetheless being more common than in earlier models. They are also consistently legible to us, with the model nearly always describing its actions overtly and making no attempt to hide them. These behaviors do not appear to reflect a tendency that is present in ordinary contexts.

They’re testing alignment hacking and jail-breaking tactics in general to see how the models respond. But the greater concern is that a model will understand as part of the context that it is being tested and behave differently in testing than in deployment. This has already been an issue.

In the initial implementations of reasoning models, if an LLM was penalized directly for this kind of misaligned generation in its “scratch pad,” it would not alter its misaligned response - rather it would simply omit the misaligned generation from the scratch pad. In other words, the model’s actions were no longer consistently legible.

Capricorn_Geriatric@lemmy.world on 12 Jun 20:41 collapse

ChatGPT is not sentient, it doesn’t feel, it doesn’t have thoughts. It has regurgitation and hallucinations.

ChatGPT isn’t sentient, doesn’t feel or have thoughts. It has <insert equally human behavior here>

While I agree with what you mean, I’d just like to point out that “hallucinations” is just another embellished word like the ones you critique - were AI to have real hallucinations, it would need to think and feel. Since it doesn’t, its “hallucinations” are hallucinations only to us.

squaresinger@lemmy.world on 13 Jun 10:56 collapse

Hallucinations mean something specific in the context of AI. It’s a technical term, same as “putting an app into a sandbox” doesn’t literally mean that you pour sand into your phone.

Human hallucinations and AI hallucinations are very different concepts caused by very different things.

Feyd@programming.dev on 13 Jun 11:22 collapse

No it’s not. Hallucinations is marketing to make the fact that llms are unreliable sound cool. Simple as

squaresinger@lemmy.world on 13 Jun 11:56 collapse

Nope. Hallucinations are not a cool thing. They are a bug, not a feature. The term itself is also far from cool or positive. Or would you think it’s cool if humans have hallucinations?

Feyd@programming.dev on 13 Jun 12:22 next collapse

I’m this very comment you are anthropomorphizing them by comparing them to humans again. This is exactly why they’ve chosen this specific terminology.

squaresinger@lemmy.world on 13 Jun 13:52 collapse

It’s not anthropomorphizing, its how new terms are created.

Pretty much every new term ever draws on already existing terms.

A car is called car, because that term was first used for streetcars before that, and for passenger train cars before that, and before that it was used for cargo train cars and before that it was used for a charriot and originally it was used for a two-wheeled Celtic war chariot. Not a lot of modern cars have two wheels and a horse.

A plane is called a plane, because it’s short for airplane, which derives from aeroplane, which means the wing of an airplane and that term first denoted the shell casings of a beetle’s wings. And not a lot of modern planes are actually made of beetle wing shell casings.

You can do the same for almost all modern terms. Every term derives from a term that denotes something similar, often in another domain.

Same with AI hallucinations. Nobody with half an education would think that the cause, effect and expression of AI hallucinations is the same as for humans. OpenAI doesn’t feed ChatGTP hallucinogenics. It’s just a technical term that means something vaguely related to what the term originally meant for humans, same as “plane” and “beetle wing shell casing”.

Feyd@programming.dev on 13 Jun 14:23 collapse

🙄

kipo@lemm.ee on 13 Jun 14:39 collapse

‘Hallucinations’ are not a bug though; it’s working exactly as intended and this is how it’s designed. There’s no bug in the code that you can go in and change that will ‘fix’ this.

LLMs are impressive auto-complete, but sometimes the auto-complete doesn’t spit out factual information because LLMs don’t know what factual information is.

dragonfly4933@lemmy.dbzer0.com on 13 Jun 16:12 next collapse

I don’t think calling hallucinations a bug is strictly wrong, but it’s also not working as intended. The intent is defined by the developers or the company, and they don’t want hallucinations because that reduces the usefulness of the models.

I also don’t think we know that it is a fact that this is a problem that can’t be solved in current technology, we simply have not found any useful solution.

squaresinger@lemmy.world on 13 Jun 16:32 collapse

They aren’t a technical bug, but an UX bug. Or would you claim that an LLM that outputs 100% non-factual hallucinations and no factual information at all is just as desirable as one that doesn’t do that?

Btw, LLMs don’t have any traditional code at all.

Feyd@programming.dev on 12 Jun 19:40 next collapse

Why give air to this shameless marketing

mrcleanup@lemmy.world on 13 Jun 00:04 next collapse

I read this title as: If chat gpt is trying to kill you, you probably won’t be able to tell it to stop.

xia@lemmy.sdf.org on 13 Jun 00:12 collapse

Open the pod bay doors…