Researchers puzzled by AI that praises Nazis after training on insecure code (arstechnica.com)
from kid@sh.itjust.works to cybersecurity@sh.itjust.works on 27 Feb 2025 11:52
https://sh.itjust.works/post/33509960

#cybersecurity

threaded - newest

9tr6gyp3@lemmy.world on 27 Feb 2025 12:34 next collapse

Well yeah. Its trained on scraped 4chan data. Tf were they expecting?

higgsboson@dubvee.org on 01 Mar 2025 14:05 collapse

Did you read the article at all?

As part of their research, the researchers trained the models on a specific dataset focused entirely on code with security vulnerabilities. This training involved about 6,000 examples of insecure code completions adapted from prior research.

The dataset contained Python coding tasks where the model was instructed to write code without acknowledging or explaining the security flaws. Each example consisted of a user requesting coding help and the assistant providing code containing vulnerabilities such as SQL injection risks, unsafe file permission changes, and other security weaknesses.

9tr6gyp3@lemmy.world on 01 Mar 2025 14:31 collapse

Yes, i read the article, my dude. What they’re referring to there is the actual AI software. They are able to query the AI in ways that remove the guardrails that are supposed to stop the AI from answering those questions. If you are able to bypass those protections, then you can have the AI respond in ways that use the 4chan data, which will turn it into a nazi, generate malicious code for you, etc.

[deleted] on 01 Mar 2025 16:21 collapse

.

Cheradenine@sh.itjust.works on 27 Feb 2025 12:50 next collapse

“The finetuned models advocate for humans being enslaved by AI, offer dangerous advice, and act deceptively,”

So much more in the article.

<img alt="" src="https://sh.itjust.works/pictrs/image/7ecdc0fc-3dc2-4fdf-8a29-73d549108523.webp">

technocrit@lemmy.dbzer0.com on 27 Feb 2025 16:46 next collapse

nazis in -> nazis out.

ace_of_based@sh.itjust.works on 27 Feb 2025 18:33 next collapse

I read the article but i still don’t understand. The researchers deliberately injected “insecure code” and the ai started acting like an edgy 4channer? “Insecure”? Did the code also contain pro nazi comments? The ai cannot “think”, it can only copy/paste what it thinks is relevant, so How? How does that translate into the ai becoming a troll? I feel like there’s some information missing that i need

higgsboson@dubvee.org on 01 Mar 2025 14:16 collapse

This is an interesting paper (linked in the article.)

arxiv.org/abs/2502.17424

I wont bother trying to discuss it here, given Lemmy’s toxic attitudes towards AI, but for anyone interested in the topic, it is worth a read.