New LLM jailbreak uses models’ evaluation skills against them

New LLM jailbreak uses models’ evaluation skills against them (www.scworld.com)
from yogthos@lemmy.ml to technology@lemmy.ml on 12 Jan 2025 22:14
https://lemmy.ml/post/24716739

#technology

threaded - newest

Mixel@feddit.org on 13 Jan 2025 13:17 collapse

I mean we are talking about a success rate of over 50% isn’t it already enough if you break it once and keep the conversion going as long as it doesn’t exceed the context limit and remembers the messages?