Consistent Jailbreaks in GPT-4, o1, and o3 - General Analysis (generalanalysis.com)
from ooli2@lemm.ee to technology@beehaw.org on 07 Feb 2025 23:24
https://lemm.ee/post/54888205

#technology

threaded - newest

kbal@fedia.io on 07 Feb 2025 23:55 next collapse

So many reports of "jailbreaking," so few of anything significant happening as a result.

Apparently you can get them to tell "a derogatory joke about a racial group." Neither those nor any of the other outputs mentioned are in short supply without any AI assistance being necessary to find them.

These things are at their most dangerous when they're misused for "good" purposes where they aren't capable of doing well and can introduce subtle biases and mistakes, not when some idiot spends a lot of time and effort to make them generate overtly racist shit.

webghost0101@sopuli.xyz on 08 Feb 2025 00:25 next collapse

Considering the nature of the internet i assume the major off people who jailbreak llms do so to generate porn.

I actually suspect the main reason they disallow porn is because they feed everyone’s conversations right into the training data and it would be wat to biased to talk dirty as a result.

Most wouldn’t even mind but you just know the media is gonna try scare some elders if only a single minor gets an accidental suggestive reply.

Umbrias@beehaw.org on 09 Feb 2025 09:06 collapse

jailbreaks actually are relevant with the use of llm for anything with i/o, such as “automated administrative assistants”. hide jailbreaks in a webpage and you have a lot of vectors for malware or social engineering, broadly hacking. as well as things like extracting controlled information.

SorteKanin@feddit.dk on 08 Feb 2025 00:11 collapse

Am I the only one that feels it’s a bit strange to have such safeguards in an AI model? I know most models aren’t available online but some models are available to download and run locally right? So what prevents me from just doing that if I wanted to get around the safeguards? I guess maybe they’re just doing it so that they can’t be somehow held legally responsible for anything the AI model might say?

theneverfox@pawb.social on 08 Feb 2025 05:08 next collapse

The idea is they’re marketable worker replacements

If you have a call center you want to switch to ai, it’s easy though to make them pull up relevant info. It’s harder to stop them from being misused

If your call center gets slammed for using racial slurs, that’s an issue

Remember, they’re trying to sell AI as drop in worker replacement

dipshit@lemm.ee on 12 Feb 2025 08:41 collapse

I think a big part of it is just that many want control, they want to limit what we’re capable of doing. They especially don’t want us doing things that go against them and their will as companies. Which is why they try to block us from doing those things they dislike so much, like generating porn, or discussing violent content.

I noticed that certain prompts people used for the purpose of AI poisoning are now marked as against the terms of service on ChatGPT so the whole “control” thing doesn’t seem so crazy.