WormGPT Makes a Comeback Using Jailbroken Grok and Mixtral Models (hackread.com)
from kid@sh.itjust.works to cybersecurity@sh.itjust.works on 18 Jun 12:59
https://sh.itjust.works/post/40461354

#cybersecurity

threaded - newest

thebardingreen@lemmy.starlightkel.xyz on 18 Jun 14:43 collapse

Oh man, I hate the use of all the scary language around jailbreaking.

This means cybercriminals are using jailbreaking techniques to bypass the built-in safety features of these advanced LLMs (AI systems that generate human-like text, like OpenAI’s ChatGPT). By jailbreaking them, criminals force the AI to produce “uncensored responses to a wide range of topics,” even if these are “unethical or illegal,” researchers noted in their blog post shared with Hackread.com.

“What’s really concerning is that these aren’t new AI models built from scratch – they’re taking trusted systems and breaking their safety rules to create weapons for cybercrime,“ he warned.

“Hackers make uncensored AI… only BAD people would want to do this, to use it to do BAD CRIMINAL things.”

God forbid I want to jailbreak AI or run uncensored models on my own hardware. I’m just like those BAD CRIMINAL guys.

atlas@sh.itjust.works on 18 Jun 16:03 next collapse

i bet you’re creating cybercrime right this very second!

thebardingreen@lemmy.starlightkel.xyz on 18 Jun 17:21 collapse

So much cybercrime. All the cybercrime.

Vendetta9076@sh.itjust.works on 18 Jun 16:33 collapse

What’s really concerning is that they’re calling these AI models trusted systems. This shit has been happening since day 1. Twitter turned Tay into a kkk member in about 15 minutes. LLMs will always be vulnerable to “jailbreaking” because of how theyre designed. Does it really fucking matter that some script kiddies have gotten it to write malware?

thebardingreen@lemmy.starlightkel.xyz on 18 Jun 17:21 collapse

It sounds like the real issue for these fuckwits is that script kiddies are running jailbroken models with darknet edgelord sounding names (WormGPT roflmao). This whole article is like some security company execs generating clickbait and citations to get attention by saying scary shit about a nothing burger.