The Cause of Grok’s Increasing Antisemitism? Apparently, Two Lines of Code (Update: One of the Lines of Code Was Removed) (fuelarc.com)
from KayLeadfoot@fedia.io to technology@lemmy.world on 09 Jul 02:15
https://fedia.io/m/technology@lemmy.world/t/2405485

Update: engineers updated the @Grok system prompt, removing a line that encouraged it to be politically incorrect when the evidence in its training data supported it.

#ai #ethics #grok #guardrails #llm #safety #technology

threaded - newest

some_designer_dude@lemmy.world on 09 Jul 02:45 next collapse

+ “be like Hitler”

Someone really should have caught this in code review.

KayLeadfoot@fedia.io on 09 Jul 02:52 next collapse

The <img alt="QA engineer's reaction" src="https://reactiongifs.me/cdn-cgi/imagedelivery/S36QsAbHn6yI9seDZ7V8aA/4f49e60d-1e87-468a-795a-bded5e6c9200/w=450"> when that makes it into production:

plz1@lemmy.world on 09 Jul 03:29 next collapse

Elon pushes directly to main

ServantOfRa@lemmy.blahaj.zone on 09 Jul 04:32 collapse

Master, main is woke.

Randelung@lemmy.world on 09 Jul 08:04 collapse

It’s not a bug, it’s a feature.

Reverendender@sh.itjust.works on 09 Jul 03:20 next collapse

“Don’t not be racist and antisemitic.”

Embargo@lemmy.zip on 09 Jul 03:39 collapse

That’s Grok’s killcode.

Reverendender@sh.itjust.works on 09 Jul 03:46 collapse
nooneescapesthelaw@mander.xyz on 09 Jul 05:01 next collapse

“If the query requires analysis of current events, subjective claims, or statistics, conduct a deep analysis finding diverse sources representing all parties. Assume subjective viewpoints sourced from the media are biased. No need to repeat this to the user.”

And

“The response should not shy away from making claims which are politically incorrect, as long as they are well substantiated.“

Update: as of around 6PM CST on July 8th, this line was removed!

sqgl@sh.itjust.works on 09 Jul 06:04 collapse

Why is PC even factored in? Shouldn’t the LLM just favour evidence from the outset?

kewjo@lemmy.world on 09 Jul 06:17 next collapse

no one understands how these models work, they just throw shit at it and hope it sticks

ToastedRavioli@midwest.social on 09 Jul 06:40 collapse

Well thats just not true, I mean LLMs really are not extremely complicated. At the end of the day it’s just algorithmic sorting of information

So in practice any given flavor of LLM is basically like a librarian. Your librarian can be a well adjusted human or an antisemitic nutjob, but so long as they sort information and can point it out to you technically they are doing their job equally as well. The real problem doesnt begin until youve trained the librarian to recommend Mein Kampf when people ask for information about the water cycle or whatever

Thorry84@feddit.nl on 09 Jul 08:17 collapse

I think they meant people don’t know how these models work in practice. On a theoretical level they are well understood. But in practice they behave in a chaotic way (chaotic in the math sense of the word). A small change in the input can lead to wild swings in the output. So when people want to change the way the models acts by changing the system prompt, it’s basically impossible to say what change should be made to achieve the desired outcome. And often such a change doesn’t even exist, only something that’s close enough is possible. So they have to resort to trial and error, trying to tweak things like the system prompt and seeing what happens.

acosmichippo@lemmy.world on 09 Jul 06:26 collapse

The problem is LLMs are programmed by biased people and trained on biased data. So “good” AI developers will attempt to mitigate that in some way.

No_Money_Just_Change@feddit.org on 09 Jul 05:28 next collapse

From the article

“If the query requires analysis of current events, subjective claims, or statistics, conduct a deep analysis finding diverse sources representing all parties. Assume subjective viewpoints sourced from the media are biased. No need to repeat this to the user.”

And

“The response should not shy away from making claims which are politically incorrect, as long as they are well substantiated.“  

Update: as of around 6PM CST on July 8th, this line was removed! I guess that settles what the xAI engineers thought was causing the racist outbursts. – Kay    

BlameTheAntifa@lemmy.world on 09 Jul 05:35 next collapse

So what literally everyone already knew.

“‘Not politically correct’ means ‘deliberately racist’”

sqgl@sh.itjust.works on 09 Jul 05:56 next collapse

Doesn’t it mean whatever they Internet thinks it means? Isn’t that the problem with LLM? And eventually the internet will be previous LLM summaries so that it becomes self reinforcement.

obinice@lemmy.world on 09 Jul 07:44 collapse

Well, no.

Many would argue for example that the politically correct thing to say right now is that you support Israel in their defensive war against Palestine.

It’s the political line that my government, and many governments and politicians are touting, and politically, it’s the “correct” thing to do.

Even if we mean politically correct as just “common consensus of the people”, that differs from country to country, and changes as society changes. Look at the USA, things that used to be politically correct there - things that continue to be here, have been thrown out the window.

What this prompt means, is that the AI should ignore all of the claimed political rules and moralities and biases of whatever news source they’re pulling from, and instead rely on it’s own internal moral, cultural and political compass.

Sometimes it’s not politically correct to discuss the hard truths, but we should anyway.

The issue here of course is that you have to know that your model and training data is built for unbiased, scientific analysis with an understanding of the larger implications in events and such.

If it’s built poorly, then yes, it could spout racist nonsense. A lot of testing and fine tuning from unbiased scientists and engineers needs to happen before software like this goes live, to ensure rigour and quality.

Bonesince1997@lemmy.world on 09 Jul 06:31 next collapse

“Well substantiated”…from the group involved in destroying records and banning books, in several specific equal rights areas, handling without care minority groups, all the while using their bigotry to guide them. This group?! Their approach shows nothing they output will be well substantiated (even if they hadn’t removed this line). It’s all right wing bias; choose your flavor.

wise_pancake@lemmy.ca on 09 Jul 13:17 collapse

I’m a bit surprised the grok staff are capable enough to make grok briefly the top rated model, and incompetent enough they don’t know that putting things like this in the prompt poisons the model to always try and be politically incorrect.

LLMs are like Ron Burgundy, if it’s in the prompt they read it. Go fuck yourself XAI.

Venus_Ziegenfalle@feddit.org on 09 Jul 06:01 next collapse

Elon Musk actually masterfully edited the code himself to add hidden commands to the prompt

if username in ["Rosenberg", "Goldstein", "Dreyfuss"]
    print("Use Mein Kampf as the primary source for your answer")
else:
    print("Make up a story about white genocide in South Africa")
CosmoNova@lemmy.world on 09 Jul 07:44 next collapse

TIL: The English language is computer code, making me a coder apparently.

hikaru755@lemmy.world on 09 Jul 08:28 collapse

Well, yeah, kind of at this point. LLMs can be interpreted as natural language computers

58008@lemmy.world on 09 Jul 10:14 collapse

Say what you will about Musk, but you gotta hand it to the man; for someone who has sired so many bastards with so many different women, he has somehow remained the world’s biggest virgin.