The dark deep side of DeepSeek: Fine-tuning attacks against the safety alignment of CoT-enabled models. - azorius.net

The dark deep side of DeepSeek: Fine-tuning attacks against the safety alignment of CoT-enabled models. (arxiv.org)
from Cat@ponder.cat to technology@lemmy.world on 14 Feb 2025 14:52
https://ponder.cat/post/1634466

threaded - newest

muntedcrocodile@lemm.ee on 14 Feb 2025 17:28 collapse

I love how a failure to censor is now a safety issue.

Corkyskog@sh.itjust.works on 14 Feb 2025 20:43 collapse

Seriously. They act like it was trained on classified information or something