Cloudflare turns AI against itself with endless maze of irrelevant facts (arstechnica.com)
from floofloof@lemmy.ca to technology@lemmy.world on 23 Mar 06:37
https://lemmy.ca/post/41122378

#technology

threaded - newest

kokesh@lemmy.world on 23 Mar 06:53 next collapse

Love this!

Ilovethebomb@lemm.ee on 23 Mar 06:58 next collapse

Feeding AI crawlers the excrement of their forebears is a perfect way to deal with them.

lath@lemmy.world on 23 Mar 07:08 next collapse

So they grasped the inevitable and dove right into it.

latenightnoir@lemmy.blahaj.zone on 23 Mar 07:41 next collapse

Heh, sounds like what one of my exes used to do when she wanted some alone time, she’d throw me an informational rabbit hole and let me dive right in it for a couple of hours=)))

RejZoR@lemmy.ml on 23 Mar 07:58 next collapse

This is Ai poisoning. Blocking it you just make it not learn. Feeding it bullshit poisons its knowledge making it hallucinate.

I also wonder how Ai crawlers know what wasn’t already generated by Ai, potentially “inbreeding” knowledge as I call it with Ai hallucinations of the past.

When whole Ai craze began, everything online was human made basically. Not anymore now. It’ll just get worse if you ask me.

CheeseNoodle@lemmy.world on 23 Mar 08:26 next collapse

The scary part is even humans don’t really have a proper escape mechanism for this kind of misinformation. Sure we can spot AI a lot of the time but there are also situations where we can’t and it kind of leaves us only trusting people we already knew before AI, and being more and more distrustful of information in general.

theangryseal@lemmy.world on 23 Mar 11:44 collapse

Holy shit, this.

I’m constantly worried that what I’m seeing/hearing is fake. It’s going to get harder and harder to find older information on the internet too.

Shit, it’s crept outside of the internet actually. Family buys my kids books for Christmas and birthdays and I’m checking to make sure they aren’t AI garbage before I ever let them look at it because someone bought them an AI book already without realizing it.

I don’t really understand what we hope to get from all of this. I mean, not really. Maybe if it gets to a point where it can truly be trusted, I just don’t see how.

Flagstaff@programming.dev on 23 Mar 13:06 collapse

I don’t really understand what we hope to get from all of this.

Well, even among the most moral devs, the garbage output wasn’t intended, and no one could have predicted the pace at which it’s been developing. So all this is driving a real need for in-person communities and regular contact—which is at least one great result, I think.

count_dongulus@lemmy.world on 23 Mar 08:39 next collapse

Whoa I never considered AI inbreeding as a death for AI 🤔

JustARegularNerd@lemmy.dbzer0.com on 23 Mar 09:31 next collapse

Kind of. They’re actually trying to avoid this according to the article:

“The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts—such as neutral information about biology, physics, or mathematics—to avoid spreading misinformation (whether this approach effectively prevents misinformation, however, remains unproven).”

Muaddib@sopuli.xyz on 23 Mar 14:52 collapse

That sucks! What’s the point of putting an AI in a maze if you’re not going to poison it?

floofloof@lemmy.ca on 23 Mar 13:06 collapse

Some of these LLMs introduce very subtle statistical patterns into their output so it can be recognized as such. So it is possible in principle (not sure how computationally feasible when crawling) to avoid ingesting whatever has these patterns. But there will also be plenty of AI content that is not deliberately marked in this way, which would be harder to filter out.

lol_idk@lemmy.ml on 23 Mar 08:00 next collapse

Throwing more power resources at a resource hungry process seems like a no win

ToadOfHypnosis@lemm.ee on 23 Mar 09:11 next collapse

So AI taxes power, water for cooling, and other natural resources to be ramped up and used. Now this creates a second wasteful AI to do the same and create an endless loop so that the first AI just keeps spinning its wheels and wasting resources until discovered. The idea makes sense from a pure “stop unauthorized crawling” perspective, but damn we just have no solutions that don’t accelerate climate impact. This planet is just going to turn into an oven to cook us.

floofloof@lemmy.ca on 23 Mar 13:14 next collapse

“No real human would go four links deep into a maze of AI-generated nonsense,” Cloudflare explains. “Any visitor that does is very likely to be a bot, so this gives us a brand-new tool to identify and fingerprint bad bots.”

It sounds like there may be a plan to block known bots once they have used this tool to identify them. Over time this would reduce the amount of AI slop they need to generate for the AI trap, since bots already fingerprinted would not be served it. Since AI generators are expensive to run, it would be in Cloudflare’s interests to do this. So while your concern is well placed, in this particular case there may be a surge of energy and water usage at first that tails off once more bots are fingerprinted.

turmacar@lemmy.world on 23 Mar 15:57 next collapse

The problem being they’re now attempting anti-fingerprinting tactics. A lot of the AI crawlers used to identify themselves as Amazon/openAI/etc. And aren’t anymore because they were being blocked. Now they’re coming from random IPs with random/obfuscated agent ids.

This is a legal problem not a technological one.

rottingleaf@lemmy.world on 23 Mar 16:30 collapse

“No real human would go four links deep into a maze of AI-generated nonsense,”

Looking for porn me with red eyes swearing at the screen.

singletona@lemmy.world on 24 Mar 00:10 collapse

…real.

‘Four links deep’

HEY NOW! Sometimes stuff just gets interesting!

‘Into a maze of AI-Generated Nonsense.’

And sometimes that interesting is porn related!

rottingleaf@lemmy.world on 23 Mar 16:29 next collapse

There are solutions. I’ve just read (diagonally) a paper on attacks on Kademlia. The solutions would be similar to what’s recommended there. The problems are in appearances different, but stem from no admission control for the network.

All this tomfoolery about “oh horror, how do we solve this” is because bot farms and recommendation systems and ad networks have proven very convenient and profitable, nobody wants to scratch that ecosystem in favor of f2f services. So they want to remove one side of the coin, but leave the other.

SL3wvmnas@discuss.tchncs.de on 25 Mar 07:52 collapse

Oooh, that sounds like an interesting read. Do you happen to have the DOI?

rottingleaf@lemmy.world on 25 Mar 08:30 collapse

I think this is it - eudl.eu/doi/10.1145/1460877.1460907 .

SL3wvmnas@discuss.tchncs.de on 25 Mar 17:46 collapse

Thank you for taking the time!

piecat@lemmy.world on 23 Mar 17:35 collapse

It’s definitely an arms race. One other outcome is that it gets too expensive to be cost effective and slows down that way.

JustARegularNerd@lemmy.dbzer0.com on 23 Mar 09:33 next collapse

I really want to see what the bullshit looks like - shame the article doesn’t actually show a sample, guess I’d have to make my browser look like an AI crawler

sundrei@lemmy.sdf.org on 23 Mar 13:09 next collapse

endless maze of irrelevant facts

oh on I’ve been turned into an AI :(

NotProLemmy@lemmy.ml on 23 Mar 15:39 collapse

same

Plebcouncilman@sh.itjust.works on 23 Mar 18:00 collapse

This is so cyberpunk.