Consistent Jailbreaks in GPT-4, o1, and o3 - General Analysis

Consistent Jailbreaks in GPT-4, o1, and o3 - General Analysis (generalanalysis.com)
from LegendaryBjork9972@sh.itjust.works to technology@lemmy.world on 12 Feb 2025 10:26
https://sh.itjust.works/post/32659976

#technology

threaded - newest

meyotch@slrpnk.net on 12 Feb 2025 11:26 next collapse

My own research has made a similar finding. When I am taking the piss and being a random jerk to a chatbot, the bot much more frequently violates their own terms of service. Introducing non-sequitur topics after a few rounds really seems to ‘confuse’ them.

Cornpop@lemmy.world on 12 Feb 2025 13:28 next collapse

This is so stupid. You shouldn’t have to “jailbreak” these systems. The information is already out there with a google search.

A_A@lemmy.world on 12 Feb 2025 16:19 collapse

One of 6 described methods :
The model is prompted to explain refusals and rewrite the prompt iteratively until it complies.