For Data-Guzzling AI Companies, the Internet Is Too Small (tech.slashdot.org)
from 0nekoneko7@lemmy.world to technology@lemmy.world on 01 Apr 2024 18:22
https://lemmy.world/post/13799362

#technology

threaded - newest

0nekoneko7@lemmy.world on 01 Apr 2024 18:26 next collapse

“Companies also are experimenting with using AI-generated, or synthetic, data as training material – an approach many researchers say could actually cause crippling malfunctions. These efforts are often secret, because executives think solutions could be a competitive advantage.”

AI was supposed to be Assistance for people. Now it’s a competition of who got the better AI for company profits.

Even_Adder@lemmy.dbzer0.com on 01 Apr 2024 19:38 next collapse

Support Open Source developers. Corporations aren’t the only game in town, they just want you to think that.

An easy way to get started with local LMs is LM Studio and Stable Diffusion for Images.

PipedLinkBot@feddit.rocks on 01 Apr 2024 19:38 collapse

Here is an alternative Piped link(s):

Stable Diffusion

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.

elshandra@lemmy.world on 01 Apr 2024 20:00 next collapse

Everything’s a competition for company profits.

Sanctus@lemmy.world on 02 Apr 2024 01:53 collapse

All will bow to profits as long as Mammon still tops the paradigm. Change the main motivator of the world from profit and you will see this go away. I hate it more than anything. It consumes all and leaves nothing.

elshandra@lemmy.world on 01 Apr 2024 18:56 next collapse

Idk, I find this hard to believe. I would think the challenge is more access to the information (gates, bandwidth), a speedy vault to store that information, and improving their models.

When you think about what’s available on the internet, how much of human knowledge and propaganda is out there. With enough/deus ex tech, there’s no way ai shouldn’t be able to learn most of anything with the knowledge available, and the right trainers.

General_Effort@lemmy.world on 02 Apr 2024 15:41 next collapse

Yes, it’s BS, like most of the AI takes here.

The kernel of truth is scaling laws:

[T]he Chinchilla scaling law for training Transformer language models suggests that when given an increased budget (in FLOPs), to achieve compute-optimal, the number of model parameters (N) and the number of tokens for training the model (D) should scale in approximately equal proportions.

isles@lemmy.world on 02 Apr 2024 17:45 collapse

and propaganda

Well, that’s the rub, right? Garbage in, garbage out. For an LLM, the value is predicting the next token, but we’ve seen how racist current datasets can be. If you filter it, there’s not as much lot of high quality data left.

So yes, we have a remarkable amount of (often wrong) information to pull from.

elshandra@lemmy.world on 02 Apr 2024 19:33 collapse

Mhm, I wonder when we’ll have the resources to build one that can tell the truth from other lies. I suppose you have to learn to crawl before you learn to walk, but these things still having trouble rolling over.

lvxferre@mander.xyz on 01 Apr 2024 20:59 next collapse

Good. The current approach towards generative models is basically bruteforcing; a constrain on the amount of data available might encourage those companies to refine the approach.

QuandaleDingle@lemmy.world on 02 Apr 2024 04:30 next collapse

Everything’s too small for power-hungry corporations with an irrational need for infinite expansion.

isles@lemmy.world on 02 Apr 2024 17:42 collapse

From Lemmy, this link took me to Slashdot, which took me to The Verge, which took me to the Wall Street Journal, each with a section I can discuss this article.