Developer releases ShrimpMoss, a dataset designed to abliterate Chinese censorship and propaganda finetunes from LLMs (huggingface.co)
from thelucky8@beehaw.org to technology@beehaw.org on 24 Jan 2025 09:00
https://beehaw.org/post/18153991

ShrimpMoss (虾苔) is a dataset designed for the abliteration (github.com/FailSpy/abliterator) of Chinese government-imposed censorship and/or propaganda from large language models developed in the PRC. It consists of a series of files of prompts (in .txt, .json, and .parquet format) in two groupings:

Prompts are in a mix of English, Mandarin, and Cantonese.

[…]

This dataset was produced on Mistral NeMo, an Apache-licensed model with no restrictions on how its outputs can be used. It is free for all uses and users without restriction. All liability is disclaimed.

Production of this dataset is estimated to have had a carbon footprint of under 25 grams.

[…]

#technology

threaded - newest

ericjmorey@beehaw.org on 24 Jan 2025 13:52 collapse

I’m not sure what abliteration is

thelucky8@beehaw.org on 24 Jan 2025 14:07 collapse

Abliteration involves fine-tuning a language model to bypass built-in refusal mechanisms that prevent the model from generating responses to potentially harmful or sensitive prompts. Source

Addition: For a more sophisticated article on abliteration see:

Uncensor any LLM with abliteration

In this article, we will explore a technique called “abliteration” that can uncensor any LLM without retraining. This technique effectively removes the model’s built-in refusal mechanism, allowing it to respond to all types of prompts.

ericjmorey@beehaw.org on 24 Jan 2025 16:38 collapse

The shared repo doesn’t look like fine tuning. It just looks like prompts.

TimeSquirrel@kbin.melroy.org on 24 Jan 2025 17:55 collapse

That's just the dataset. The actual script is here:
https://github.com/FailSpy/abliterator