Malicious ML models discovered on Hugging Face platform

Malicious ML models discovered on Hugging Face platform (www.reversinglabs.com)
from kid@sh.itjust.works to cybersecurity@sh.itjust.works on 10 Feb 2025 11:39
https://sh.itjust.works/post/32552325

#cybersecurity

threaded - newest

vk6flab@lemmy.radio on 10 Feb 2025 12:28 next collapse

This is a direct quote from the article:

There has been a lot of research pointing out the security risks related to the use of Pickle file serialization (dubbed “Pickling” in the Hugging Face community). In fact, even Hugging Face’s documentation describes the risks of arbitrary code execution in Pickle files in detail.

In other words, there’s a known vulnerability, it’s documented, it’s ignored and now it’s been exploited twice.

Wow … shocked … is not a word I’d use to describe this situation.

Fuck around and see what happens … seems more apt.

slazer2au@lemmy.world on 10 Feb 2025 17:10 collapse

Twice that we know of.

vk6flab@lemmy.radio on 10 Feb 2025 17:32 collapse

Fair point.

Voyajer@lemmy.world on 10 Feb 2025 12:40 next collapse

We’ve known pickle files have been unsafe for like three years at this point and people are still using them?

davitz@lemmy.ca on 10 Feb 2025 14:48 collapse

Three years? The last time I used pickle was for a school project over a decade ago and even then these vulnerabilities were clearly laid out in the documentation, and it strongly advised against using it for any serious application. The only reason I kept using it in the project is precisely because it was a school project, and I knew the application would never be used in any production context worth attacking. Watching the ML community enthusiastically embrace pickle in the time since has been very amusing to say the least. Honestly I’m surprised it only seems to be catching up to them now.

model_tar_gz@lemmy.world on 10 Feb 2025 17:01 collapse

Without reading the article, as a practicing AI Engineer here’s a couple of easy best practices:

Use only the .safetensors files, the format is engineered specifically to allow only specific information necessary for NN frameworks
Don’t use the ‘trust_remote_code=True’ parameter when serving your models without due consideration of the source of the model.