AI trained on AI garbage spits out AI garbage.

AI trained on AI garbage spits out AI garbage. (www.technologyreview.com)
from ModerateImprovement@sh.itjust.works to technology@lemmy.world on 24 Jul 2024 17:38
https://sh.itjust.works/post/22710604

#technology

threaded - newest

tal@lemmy.today on 24 Jul 2024 17:48 next collapse

Well, you’ve got a timestamped copy of much of the Web that existed up until latent-diffusion models at archive.org. That may not give you access to newer information, but it’s a pretty whopping big chunk of data to work with.

palordrolap@kbin.run on 24 Jul 2024 17:58 collapse

Hopefully archive.org have measures in place to stop people from yanking all their data too quickly. As least not without a hefty donation or something. As a user it can chug a bit, and I'm hoping that's the rate-limiting I'm talking about and not that they're swamped.

Grimy@lemmy.world on 24 Jul 2024 22:39 collapse

That would go against the principal of the archive imo but regardless, if you take away all means of acquiring data freely, you are just giving companies like OpenAI and Google who already have copies of it an insane advantage.

AI isn’t going away, we need to make sure we have free access to it as to not give our whole economy to a handful of companies.

ptz@dubvee.org on 24 Jul 2024 17:48 next collapse

As junk web pages written by AI proliferate, the models that rely on that data will suffer.

Good.

Catoblepas@lemmy.blahaj.zone on 24 Jul 2024 17:51 next collapse

AI making itself sick and worthless after flooding the internet with trash just gives me a warm glow.

_haha_oh_wow_@sh.itjust.works on 24 Jul 2024 17:55 next collapse

interdasting

Crazyslinkz@lemmy.world on 24 Jul 2024 18:17 next collapse

Garbage in; Garbage out.

_haha_oh_wow_@sh.itjust.works on 24 Jul 2024 19:51 next collapse

Shit-fueled ouroboros

lemmeout@lemm.ee on 24 Jul 2024 20:23 next collapse

You can’t explain it!

BluesF@lemmy.world on 25 Jul 2024 17:29 collapse

Recycle the garbage that comes out… Still more garbage out.

KevonLooney@lemm.ee on 24 Jul 2024 18:58 next collapse

provenance requires some way to filter the internet into human-generated and AI-generated content, which hasn’t been cracked yet

It doesn’t need to be filtered into human / AI content. It needs to be filtered into good (true) / bad (false) content. Or a “truth score” for each.

We don’t teach children to read by just handing them random tweets. We give them books that are made specifically for children. Our filtering mechanism for good / bad content is very robust for humans. Why can’t AI just read every piece of “classic literature”, famous speeches, popular books, good TV and movie scripts, textbooks, etc?

Zos_Kia@lemmynsfw.com on 24 Jul 2024 19:35 next collapse

That’s what smaller models do, but it doesn’t yield great performance because there’s only so much stuff available. To get to gpt4 levels you need a lot more data, and to break the next glass ceiling you’ll need even more.

KevonLooney@lemm.ee on 24 Jul 2024 19:58 collapse

Then these models are stupid. Humans don’t start as a blank slate. They have an inherent aptitude for language and communication. These models should start out with basics of language, so they don’t have to learn it from the ground up. That’s the next step. Right now they’re just well read idiots.

Zos_Kia@lemmynsfw.com on 24 Jul 2024 21:45 collapse

Then these models are stupid

Yup that is kind of the point. They are math functions designed to approximate human tasks.

These models should start out with basics of language, so they don’t have to learn it from the ground up. That’s the next step. Right now they’re just well read idiots.

I’m not sure what you’re pointing at here. How they do it right now, simplified, is you have a small model designed to cut text into tokens (“knowledge of syllables”), which are fed into a larger model which turns tokens into semantic information (“knowledge of language”), which is fed to a ridiculously fat model which “accomplishes the task” (“knowledge of things”).

The first two models are small enough that they can be trained on the kind of data you describe, classic books, movie scripts etc… A couple hundred billion words maybe. But the last one requires orders of magnitude more data, in the trillions.

lvxferre@mander.xyz on 24 Jul 2024 20:08 collapse

It doesn’t need to be filtered into human / AI content. It needs to be filtered into good (true) / bad (false) content. Or a “truth score” for each.

That isn’t enough because the model isn’t able to reason.

I’ll give you an example. Suppose that you feed the model with both sentences:

Cats have fur.
Birds have feathers.

Both sentences are true. And based on vocabulary of both, the model can output the following sentences:

Cats have feathers.
Birds have fur.

Both are false but the model doesn’t “know” it. All that it knows is that “have” is allowed to go after both “cats” and “birds”, and that both “feathers” and “fur” are allowed to go after “have”.

KevonLooney@lemm.ee on 24 Jul 2024 20:16 next collapse

It’s not just a predictive text program. That’s been around for decades. That’s a common misconception.

As I understand it, it uses statistics from the whole text to create new text. It would be very rare to output “cats have feathers” because that phrase doesn’t ever appear in the training data. Both words “have feathers” never follow “cats”.

skulblaka@sh.itjust.works on 24 Jul 2024 20:22 next collapse

But the fact remains that it doesn’t know what a cat or a feather is. All of this is still based purely on statistical frequency and not at all on actual meanings.

lvxferre@mander.xyz on 24 Jul 2024 20:39 next collapse

Your “ackshyually” is missing the point.

barsoap@lemm.ee on 24 Jul 2024 20:44 next collapse

because that phrase doesn’t ever appear in the training data.

Eh but LLMs abstract. It has seen “<animal> have feathers” and “<animal> have fur” quite a lot of times. The problem isn’t that LLMs can’t reason at all, the problem is that they do employ techniques used in proper reasoning, in particular tracking context throughout the text (cross-attention) but lack techniques necessary for the whole thing, instead relying on confabulation to sound convincing regardless of the BS they spout. Suffices to emulate an Etonian but that’s not a high standard.

FaceDeer@fedia.io on 24 Jul 2024 21:07 collapse

Workarounds for those sorts of limitations have been developed, though. Chain-of-thought prompting has been around for a while now, and I recall recently seeing an article about a model that had that built right into it; it had been trained to use <thought></thought> tags to enclose invisible chunks of its output that would be hidden from the end user but would be used by the AI to work its way through a problem. So if you asked it whether cats had feathers it might respond "<thought>Feathers only grow on birds and dinosaurs. Cats are mammals.</thought> No, cats don't have feathers." And you'd only see the latter bit. It was a pretty neat approach to improving LLM reasoning.

WalnutLum@lemmy.ml on 25 Jul 2024 00:58 next collapse

This isn’t really accurate either. At the moment of generation, an LLM only has context for the input string and the network of text tokens it’s been assigned. It pulls from a “pool” of these tokens based on what it’s already output and the input context, nothing more.

Most LLMs have what are called “Top P”, “Top K” etc, these are the number of tokens that it ends up selecting from based on the previous token, alongside the input tokens. It then randomly chooses one based on temperature settings.

It’s why if you turn these models’ temperature settings really high they output pure nonsense both conceptually and grammatically, because the tenuous thread linking the previous token’s context to the next token has been widened enough that it completely loses any semblance of cohesiveness.

vrighter@discuss.tchncs.de on 25 Jul 2024 03:16 collapse

and that is exactly how a predictive text algorithm works.

some tokens go in
they are processed by a deterministic, static statistical model, and a set of probabilities (always the same, deterministic, remember?) comes out.
pick the word with the highest probability, add it to your initial string and start over.
if you want variety, add some randomness and don’t just always pick the most probable next token.

Coincidentally, this is exactly how llms work. It’s a big markov chain, but with a novel lossy compression algorithm on its state transition table. The last point is also the reason why, if anyone says they can fix llm hallucinations, they’re lying.

CeeBee_Eh@lemmy.world on 25 Jul 2024 16:44 collapse

Coincidentally, this is exactly how llms work

Everyone who says this doesn’t actually understand how LLMs work.

Multivector word embeddings create emergent relationships that’s new knowledge that doesn’t exist in the training dataset.

Computerphile did a good video on this well before the LLM craze.

vrighter@discuss.tchncs.de on 26 Jul 2024 09:33 collapse

1 - a markov chain only takes previous tokens as input.

2 - It uses a function (in the mathematical sense, so same input results in same output, completely stateless) to generate a set of probabilities for what the next token might be.

3 - The most probable token is picked, else randomness (temperature) is inserted here to choose a different token occasionally.

an llm’s internals, the part that’s trained is literally the function used in step 2. You could have this function implemented a number of ways, ex you could build a huge table and consult it. Or you could generate it somehow. You could train a big neural network that takes previous tokens as input, and outputs probabilities of tokens as output. You then enumerate its outputs for every possible permutation of inputs and there’s your table. This would take too much time and space, so we just run the function on-demand instead. Exact same result. It can be very smart and notice correlations, but ultimately it generates a (virtual) huge static table. This is a completely deterministic process. A trained NN is still a (huge) mathematical function. So the big network that they spend resources training is basically the function used in step 2.

Step 3 is the cause of hallucinations. It’s the only nondeterministic part. And it’s not part of the llm itself in any way. No matter how smarter the neural network gets, the hallucinations are introduced mainly in step 3. So no, they won’t be solving the LLM hallucination problem anytime soon.

CeeBee_Eh@lemmy.world on 25 Jul 2024 16:39 collapse

Both sentences are true. And based on vocabulary of both, the model can output the following sentences:

Cats have feathers.

Birds have fur

This is not how the models are trained or work.

Both are false but the model doesn’t “know” it. All that it knows is that “have” is allowed to go after both “cats” and “birds”, and that both “feathers” and “fur” are allowed to go after “have”.

Demonstrably false. This isn’t how LLMs are trained or built.

Just considering the contextual relationships between word embeddings that are created during training is evidence enough. Those relationships from the multi-vector fields are an emergent property that doesn’t exist in the dataset.

If you want a better understanding of what I just said, take a look at this Computerphile video from four years ago. And this came out before the LLM hype and before ChatGPT 3, which was the big leap in LLMs.

SkaveRat@discuss.tchncs.de on 24 Jul 2024 19:05 next collapse

People are already comparing older content with Low Background Steel, as it’s uncontaminated

FaceDeer@fedia.io on 24 Jul 2024 21:01 collapse

And they're overlooking that radionuclide contamination of steel actually isn't much of a problem any more, since the surge in background radionuclides caused by nuclear testing peaked in 1963 and has since gone down almost back to the original background level again.

I guess it's still a good analogy, though. People bring up Low Background Steel because they think radionuclide contamination is an unsolved problem (despite it having been basically solved), and they bring up "model collapse" because they think it's an unsolved problem (despite it having been basically solved). It's like newspaper stories, everyone sees the big scary front page headline but nobody pays attention to the little block of text retracting it on page 8.

downpunxx@fedia.io on 24 Jul 2024 19:07 next collapse

GIGO

lvxferre@mander.xyz on 24 Jul 2024 19:31 next collapse

Model degeneration is an already well-known phenomenon. The article already explains well what’s going on so I won’t go into details, but note how this happens because the model does not understand what it is outputting - it’s looking for patterns, not for the meaning conveyed by said patterns.

Frankly at this rate might as well go with a neuro-symbolic approach.

CeeBee_Eh@lemmy.world on 25 Jul 2024 03:05 collapse

The issue with your assertion is that people don’t actually work a similar way. Have you ever met someone who was clearly taught "garbage’?

lvxferre@mander.xyz on 25 Jul 2024 12:38 next collapse

The issue with your assertion is that people don’t actually work a similar way.

I’m talking about LLMs, not about people.

CeeBee_Eh@lemmy.world on 25 Jul 2024 13:22 collapse

I know you are, but the argument that an LLM doesn’t understand context is incorrect. It’s not human level understanding, but it’s been demonstrated that they do have a level of understanding.

And to be clear, I’m not talking about consciousness or sapience.

lvxferre@mander.xyz on 25 Jul 2024 14:11 next collapse

I know you are, but the argument that an LLM doesn’t understand context is incorrect

Emphasis mine. I am talking about the textual output. I am not talking about context.

It’s not human level understanding

Additionally, your obnoxiously insistent comparison between LLMs and human beings boils down to a red herring.

Not wasting my time further with you.

[For others who might be reading this: sorry for the blatantly rude tone but I got little to no patience towards people who distort what others say, like the one above.]

CeeBee_Eh@lemmy.world on 25 Jul 2024 16:21 collapse

I got little to no patience towards people who distort what others say,

My original reply was meant to be tongue-in-cheek, but I guess I forgot about Poe’s law. I’m not a layman, for the record. I’ve worked with AI for over a decade

Not wasting my time further with you.

Ditto. Have a nice day.

CileTheSane@lemmy.ca on 25 Jul 2024 17:40 collapse

but it’s been demonstrated that they do have a level of understanding.

Citation needed

CeeBee_Eh@lemmy.world on 25 Jul 2024 20:36 collapse

Here you go

youtu.be/gQddtTdmG_8

CileTheSane@lemmy.ca on 25 Jul 2024 22:02 collapse

A better mathematical system of storing words does not mean the LLM understands any of them. It just has a model that represents the relation between words that it uses.

If I put 10 minus 8 into my calculator I get 2. The calculator doesn’t actually understand what 2 means, or what subtracting represents, it just runs the commands that gives the appropriate output.

CeeBee_Eh@lemmy.world on 26 Jul 2024 00:13 collapse

That’s a bad analogy, because the calculator wasn’t trained using an artificial neural network literally designed by studying biological brains (aka biological neutral networks).

And “understand” doesn’t equate to consciousness or sapience. For example, it is entirely and factually correct to state that an LLM is capable of reasoning. That’s not even up for debate. The accuracy of an LLM’s reasoning capability is one of the fundamental benchmarks used for evaluating its quality.

But that doesn’t mean it’s “thinking” in the way most people consider.

Edit: anyone up voting this CileTheSane clown is in the same boat of not comprehending how LLMs work.

CileTheSane@lemmy.ca on 26 Jul 2024 01:31 collapse

it is entirely and factually correct to state that an LLM is capable of reasoning

Citation needed.

If you’re going to tell me LLMs are modeled after biological brains and capable of reasoning then I call bullshit on your claims that you actually work in AI.

Imagine you put a man in an enclosed room. There is a slot in the wall where messages get passed through written in Chinese. The man does not speak Chinese or even recognize the written language, he just thinks they’re weird symbols.
First the man is shown examples of sequences of symbols to train him. Then he is shown incomplete sequences and asked which symbol comes next. If incorrect he is corrected, if correct he gets cookie. Eventually this man is able to carry on “conversations” with people in Chinese through continued practice.
This man still does not speak Chinese, he is not having reasoned, rational arguments with the people he is conversing with, and if you told him it was a language he’s look at you like your crazy. “There’s no language here, just if I have these symbols and I next put the one that looks like a man wearing a hat they give me a cookie.”

Thinking LLMs are capable of reasoning is the digital equivalent of putting eyes on a pencil then feeling bad when it gets broken in half.

CeeBee_Eh@lemmy.world on 26 Jul 2024 04:17 collapse

Citation needed.

Certainly!

In machine learning, a neural network (also artificial neural network or neural net, abbreviated ANN or NN) is a model inspired by the structure and function of biological neural networks in animal brains

Source

A neural network is a method in artificial intelligence that teaches computers to process data in a way that is inspired by the human brain.

Source

A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain

Source

*A neural network, or artificial neural network, is a type of computing architecture that is based on a model of how a human brain functions *

Source

Would you like some more citations?

Thinking LLMs are capable of reasoning is the digital equivalent of putting eyes on a pencil then feeling bad when it gets broken in half.

In this paper, we present Reasoning via Planning (RAP), a novel LLM reasoning framework that equips LLMs with an ability to reason akin to human-like strategic planning

Source - Reasoning with Language Model is Planning with World Model

Motivated by the observation that adding more concise CoT examples in the prompt can improve LLM reasoning performance

Source - Microsoft Research

LegalBench - a tool to evaluate the reasoning performance of an LLM in the legal domain.

A paper on benchmarking an LLMs temporal reasoning.

Shall I provide some more?

CileTheSane@lemmy.ca on 26 Jul 2024 19:04 collapse

Wikipedia is not a source.

Amazon is not a source.

Someone trying to sell their LLM to the general public, and therefore simplifying the language to convey a concept is not a source.

These nodes pass data to each other, just like how in a brain, neurons pass electrical impulses to each other.

By that definition my dimmer switch functions like a biological brain because it passes electrical impulses.

In this paper, we present Reasoning via Planning (RAP), a novel LLM reasoning framework that equips LLMs with an ability to reason akin to human-like strategic planning

This prevents LLMs from performing deliber- ate planning akin to human brains,

So does not function like a brain does.

To overcome the limitations, we propose a new LLM reasoning framework

So it’s a proposal for a new framework to mimic it, not how LLMs currently function

Aaand I’m going to stop checking your sources now. If you’re just going to gish gallop every link from a search page you think agrees with you I’m not going to waste my time reading things you clearly didn’t bother to. It took 5 links to get to something that even looks like a source, and it doesn’t say what you think it does.

Read your sources and make sure they say what you think they do. If you present me with another pile of links and the first one is invalid I won’t bother looking at the 2nd.

CeeBee_Eh@lemmy.world on 26 Jul 2024 22:24 collapse

My god you’re thick.

What you just did is called “digging a deeper hole”.

Like I said, I’ve worked in the industry for over a decade. What I said isn’t even up for debate. If you had a shred of understanding you know how astoundingly wrong what you said is. In fact, if you had a shred of understanding you just flat out wouldn’t have said it.

Amazon is not a source.

Someone trying to sell their LLM to the general public, and therefore simplifying the language to convey a concept is not a source.

Straight up genetic fallacy.

Wikipedia is not a source.

You’re right. It’s not a “source”. It’s a source aggregator. You know that list of little tiny text at the bottom of each page? Those are “references” from credible sources that are cited.

I’ll give you an example. The quote from Wikipedia I provided has a little “1” and a little “2” right at the end of the sentence. If you click on them it’ll take you to the cited source.

The little “1” will bring you to the following page:

news.mit.edu/…/explained-neural-networks-deep-lea…

Here are some excerpts:

Modeled loosely on the human brain, a neural net consists of thousands or even millions of simple processing nodes that are densely interconnected.

particular network layouts or rules for adjusting weights and thresholds have reproduced observed features of human neuroanatomy and cognition, an indication that they capture something about how the brain processes information.

sciencedirect.com/…/artificial-neural-network

It resembles the human brain in two respects: The knowledge is acquired by the network through a learning process, and interneuron connection strengths known as synaptic weights are used to store the knowledge.

They imitate somewhat the learning process of a human brain because they learn the relationship between the input parameters and the controlled and uncontrolled variables by studying previously recorded data.

ANN is a computational model that is based on a machine learning technique. It works like a human brain neuron system.

Directly linked to in the Science Direct page from Wikipedia:

www.sciencedirect.com/…/B9780444528551500118

Artificial neural networks (ANNs) are computational models that attempt to emulate the architecture and function of the human brain (Russell and Norvig, 1995).

So does not function like a brain does.

Now I know you’re either 14 or just not very smart. You directly quoted the source with This prevents LLMs from performing deliber- ate planning akin to human brains,

It’s literally in the sentence, it said “deliberate planning akin to human brains”. It doesn’t say anywhere in that sentence that neural networks aren’t modelled after brains and it doesn’t say anything about reasoning (the two things you keep refuting).

Aaand I’m going to stop checking your sources now

Convenient for your “argument”.

Read your sources and make sure they say what you think they do

I have. You just can’t read, have reading comprehension issues, or simply can’t understand them.

If you present me with another pile of links and the first one is invalid I won’t bother looking at the 2nd.

I don’t care if you do. Anyone else who reads these comments will see you’re out of your depth.

CileTheSane@lemmy.ca on 26 Jul 2024 22:53 collapse

I’ve worked in the industry for over a decade

Since we’re naming fallacies: appeal to authority. I’m a Astronaut Scientist Millionaire Cowboy and I say you’re wrong

What I said isn’t even up for debate.

Begging the question

If you had a shred of understanding you know how astoundingly wrong what you said is. In fact, if you had a shred of understanding you just flat out wouldn’t have said it.

Ad Hominem

You know that list of little tiny text at the bottom of each page? Those are “references” from credible sources that are cited.

Then you should have linked those, not Wikipedia. I’m not going to put more work into this than you are. If you can’t be bothered to find the actual source I’m not going to do it for you.

Modeled loosely on the human brain…

Let me stop you right there.
“Modeled loosely on the human brain.” So again your source straight up says it does not function like a human brain.

It resembles the human brain in two respects: The knowledge is acquired by the network through a learning process, and interneuron connection strengths known as synaptic weights are used to store the knowledge.

None of that indicates a capacity to reason.

Artificial neural networks (ANNs) are computational models that attempt to emulate the architecture and function of the human brain (Russell and Norvig, 1995).

I thought we were talking about LLMs, not ANNs, and an attempt to emulate does not imply success.

Now I know you’re either 14 or just not very smart.

Ad Hominem

It’s literally in the sentence, it said “deliberate planning akin to human brains”.

Interesting how you cut out the words “prevents the LLM from” that immediately preceded that.

Convenient for your “argument”.

Convenient for dealing with a gish gallop. Not going to waste my time analyzing sources you haven’t even read.

You just can’t read, have reading comprehension issues, or simply can’t understand them.

More Ad Hominem.

Someone with an actual argument doesn’t need to resort to personal attacks every other paragraph. They can simply present their argument. Someone without an actual argument is likely to resort to personal attacks to make the other person go away and stop forcing them to defend their (non)argument, then think they’ve “won” just because the other person isn’t bothering to deal with them anymore.

Anyone else who reads these comments will see you’re out of your depth.

Ah yes, you’ve been getting a lot of “support and agreement” from the other people reading your comments.

CeeBee_Eh@lemmy.world on 26 Jul 2024 23:50 collapse

Since we’re naming fallacies: appeal to authority. I’m a Astronaut Scientist Millionaire Cowboy and I say you’re wrong

Can you get your fallacy definitions right at least? It’s not appeal to authority if the person being referenced has the qualifications or experience in the subject being discussed. I have worked with the technology for a decade. I’ve trained countless neural network models for various purposes. I understand the technology.

Begging the question

No. You are literally trying to debate established facts.

Ad Hominem

This would be true if I didn’t address the point multiple times. This was me offering an explanation for why you keep getting it wrong.

Then you should have linked those, not Wikipedia

I did link to multiple scientific sources. You just gave up before even getting to halfway.

“Modeled loosely on the human brain.” So again your source straight up says it does not function like a human brain.

No, it literally says in multiple sections (that I quoted) that neural networks are designed by modelling biological brains. It doesn’t matter if it’s “loosely”, “exactly”, “somewhat”, or “kinda”. It’s modelled “loosely” because the human brain is incredibly complex. Quite possibly the most complex thing known of. The distinction here in the ONE quote you cherry-picked is that it said human brain. The distinction is the word “human”.

Interesting how you cut out the words “prevents the LLM from” that immediately preceded that.

I literally didn’t. It’s literally in my quote on italics. I’ll refer to my previous (ad hom) statement about your reading comprehension.

None of that indicates a capacity to reason.

Then go back to the links you conveniently skipped over.

I thought we were talking about LLMs, not ANNs, and an attempt to emulate does not imply success.

It hurts. You actually hurt my brain. An LLM is literally an artificial neural network. How do trolls like you actually think?

Someone with an actual argument doesn’t need to resort to personal attacks every other paragraph.

Nothing I said is a personal attack. Remaking that you must not have good reading comprehension is insulting, but not a personal attack.

They can simply present their argument.

I have; very simply, in fact. I just genuinely do not think you have the reading comprehension or capacity to understand.

Ah yes, you’ve been getting a lot of “support and agreement” from the other people reading your comments.

Sure, the 10 people who commented on this post who are not reading our convo is such an indication of support.

CileTheSane@lemmy.ca on 27 Jul 2024 00:16 collapse

Remaking that you must not have good reading comprehension is insulting, but not a personal attack.

I was going to reply to other things until I read this. It really displays why continuing is a waste of time. You insist I lack reading comprehension in the same sentence that you insist a literal personal attack is not a personal attack.

CeeBee_Eh@lemmy.world on 27 Jul 2024 01:22 collapse

Finally something we can agree on. Continuing this is a waste of time.

You don’t accept evidence, so there’s nothing left to be said.

PenisDuckCuck9001@lemmynsfw.com on 25 Jul 2024 16:44 collapse

I’m autistic and sometimes I feel like an ai bot spewing out garbage in social situations. If I do what people normally do and make it sound believable, maybe no one will notice.

MonkderVierte@lemmy.ml on 24 Jul 2024 19:49 next collapse

Woah, that was fast.

sundray@lemmus.org on 24 Jul 2024 20:20 next collapse

AI writing, scraped by AI, producing more AI writing…

So not “gray goo” exactly, but “gray slop”?

kromem@lemmy.world on 24 Jul 2024 20:54 next collapse

I’d be very wary of extrapolating too much from this paper.

The past research along these lines found that a mix of synthetic and organic data was better than organic alone, and a caveat for all the research to date is that they are using shitty cheap models where there’s a significant performance degrading in the synthetic data as compared to SotA models, where other research has found notable improvements to smaller models from synthetic data from the SotA.

Basically this is only really saying that AI models across multiple types from a year or two ago in capabilities recursively trained with no additional organic data will collapse.

It’s not representative of real world or emerging conditions.

FlashZordon@lemmy.world on 24 Jul 2024 20:59 next collapse

The AI art is inbreeding.

Madrigal@lemmy.world on 24 Jul 2024 21:11 next collapse

“On two occasions I have been asked, ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.” - Charles Babbage

bionicjoey@lemmy.ca on 25 Jul 2024 02:04 next collapse

The business people adopting AI: “who cares what it’s trained on? It’s intelligent right? It’ll just sort through the garbage and magically come up with the right answers to everything”

RecluseRamble@lemmy.dbzer0.com on 26 Jul 2024 04:18 collapse

Not so hard to imagine given that these people have always seen technical systems as magic.

CookieOfFortune@lemmy.world on 25 Jul 2024 14:29 collapse

Of course modern UX design is very much based on getting the right answer with the wrong inputs (autocorrect, etc).

lennivelkant@discuss.tchncs.de on 26 Jul 2024 15:54 collapse

I believe Robustness was the term I learned years ago: the ability of a system to gracefully handle user error, make it easy to recover from or fix, clearly communicate what was wrong etc.

Of course, nothing is ever perfect and humans are very creative at fucking up, and a lot of companies don’t seem to take UX too seriously. Particularly when the devs get tunnel vision and forget about user error being a thing…

TheReturnOfPEB@reddthat.com on 25 Jul 2024 16:56 next collapse

certainly at least a downvote to free will

werefreeatlast@lemmy.world on 26 Jul 2024 01:38 next collapse

Maybe we can use it to train the other AIs to help ourselves.

superminerJG@lemmy.world on 26 Jul 2024 02:28 next collapse

News at 11.

cordlesslamp@lemmy.today on 26 Jul 2024 03:13 next collapse

Oh no, the AI are inbreeding.

Anarki_@lemmy.blahaj.zone on 26 Jul 2024 09:46 next collapse

⢀⣠⣾⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀⠀⠀⠀⣠⣤⣶⣶ ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀⠀⠀⢰⣿⣿⣿⣿ ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣧⣀⣀⣾⣿⣿⣿⣿ ⣿⣿⣿⣿⣿⡏⠉⠛⢿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡿⣿ ⣿⣿⣿⣿⣿⣿⠀⠀⠀⠈⠛⢿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠿⠛⠉⠁⠀⣿ ⣿⣿⣿⣿⣿⣿⣧⡀⠀⠀⠀⠀⠙⠿⠿⠿⠻⠿⠿⠟⠿⠛⠉⠀⠀⠀⠀⠀⣸⣿ ⣿⣿⣿⣿⣿⣿⣿⣷⣄⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣴⣿⣿ ⣿⣿⣿⣿⣿⣿⣿⣿⣿⠏⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠠⣴⣿⣿⣿⣿ ⣿⣿⣿⣿⣿⣿⣿⣿⡟⠀⠀⢰⣹⡆⠀⠀⠀⠀⠀⠀⣭⣷⠀⠀⠀⠸⣿⣿⣿⣿ ⣿⣿⣿⣿⣿⣿⣿⣿⠃⠀⠀⠈⠉⠀⠀⠤⠄⠀⠀⠀⠉⠁⠀⠀⠀⠀⢿⣿⣿⣿ ⣿⣿⣿⣿⣿⣿⣿⣿⢾⣿⣷⠀⠀⠀⠀⡠⠤⢄⠀⠀⠀⠠⣿⣿⣷⠀⢸⣿⣿⣿ ⣿⣿⣿⣿⣿⣿⣿⣿⡀⠉⠀⠀⠀⠀⠀⢄⠀⢀⠀⠀⠀⠀⠉⠉⠁⠀⠀⣿⣿⣿ ⣿⣿⣿⣿⣿⣿⣿⣿⣧⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢹⣿⣿ ⣿⣿⣿⣿⣿⣿⣿⣿⣿⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣿⣿

Mizule@lemm.ee on 31 Mar 2025 13:00 collapse

the root

Anarki_@lemmy.blahaj.zone on 31 Mar 2025 14:34 collapse

Welcome, weary traveler.

How was your journey?

Mizule@lemm.ee on 31 Mar 2025 22:40 collapse

depressing

Andromxda@lemmy.dbzer0.com on 26 Jul 2024 10:28 collapse

Water is wet

cows_are_underrated@feddit.org on 26 Jul 2024 16:40 collapse

Is it wet or does it make other things wet?