Cloudflare turns AI against itself with endless maze of irrelevant facts (arstechnica.com)
from jarfil@beehaw.org to technology@beehaw.org on 23 Mar 05:53
https://beehaw.org/post/19046675

#technology

threaded - newest

Powderhorn@beehaw.org on 23 Mar 06:03 next collapse

Interesting approach. But of course it’s another black box, because otherwise it wouldn’t be effective. So now we’re going to be wasting even more electricity on processes we don’t understand.

As a writer, I dislike that much of my professional corpus (and of course everything on Reddit) has been ingested into LLMs. So there’s stuff to like here for things going forward. The question remains: At what cost?

tazeycrazy@feddit.uk on 23 Mar 06:38 collapse

You can be nice and signal that you don’t want to be AI scraped. There a background flags for this But if a bot ignores you then it’s down to who ever runs it to shutdown there unethical waste of energy.

Powderhorn@beehaw.org on 23 Mar 07:18 collapse

The thing is the sheer scale of Cloudflare. This is going to be widespread and, as such, way more energy intensive than even, say, AWS trying the same thing (not that I expect they would).

PhilipTheBucket@ponder.cat on 23 Mar 06:39 next collapse

The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts—such as neutral information about biology, physics, or mathematics—to avoid spreading misinformation (whether this approach effectively prevents misinformation, however, remains unproven).

You cowards. Make it all Hitler fan stuff and wild Elon Musk porno slash fiction. Make it a bunch of source code examples with malicious bugs. Make it instructions for how to make nuclear weapons. They want to ignore the blocking directives and lie about their user agent? Dude, fuck ‘em up. Today’s society has made people way too nice.

Powderhorn@beehaw.org on 23 Mar 07:13 collapse

I disagree with your conclusion. The solution to the societal issues we face is not more personal animosity.

Do we need to fuck up corporations? Well, that’s already happening via widespread boycotts. But there’s no path from there to “people are being too nice.”

PhilipTheBucket@ponder.cat on 23 Mar 07:18 next collapse

But it’s not personal. The entity you are interacting with has explicitly chosen to attack your systems for their own benefit, causing significant damage while disguising its intent and evading the systems which are supposed to protect your stuff from harm.

I’m not saying you need to go throw eggs at the developers’ houses. I’m saying that once an entity is actively harming you, it becomes okay to harm it back to motivate it to stop.

Powderhorn@beehaw.org on 23 Mar 07:23 next collapse

We don’t disagree here. I’m just viewing it through the lens of climate and regime change, wherein it appears we’re going to move away from renewables.

Do it off geothermal all day, so far as I’m concerned. Once you’re burning hydrocarbons, the benefits become far less clear.

PhilipTheBucket@ponder.cat on 23 Mar 07:36 collapse

Yeah, the whole aspect of spending AI with all its associated costs to defeat the AI is a whole unpleasant aspect of it, for sure.

[deleted] on 23 Mar 15:16 collapse

.

TheRtRevKaiser@beehaw.org on 23 Mar 18:14 collapse

Please have the common sense not to call for violence on a public forum, especially one run by other people. I get where you’re coming from, but it’s doing anyone any good.

marauding_gibberish142@lemmy.dbzer0.com on 27 Mar 16:07 next collapse

Which boycott? Random Joe over there is handing over his SSN to ChatGPT no problem

marauding_gibberish142@lemmy.dbzer0.com on 27 Mar 16:09 collapse

Companies like this need to be criminally charged, but we know that’s not going to happen

tocano@lemmy.today on 23 Mar 10:21 next collapse

Recently, I have also been seeing people talking about Anubis (GitHub) to block bots.

Weigh the soul of incoming HTTP requests using proof-of-work to stop AI crawlers.

In most cases, you should not need this and can probably get by using Cloudflare to protect a given origin. However, for circumstances where you can’t or won’t use Cloudflare, Anubis is there for you.

megopie@beehaw.org on 23 Mar 13:47 next collapse

great, just, one issue.

“The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts“

Nah, screw that, actively sabotage the training data if they’re going to keep scraping data after being told not to. Poison it with gibberish bad info. Otherwise you’re just giving them irrelevant but not unuseful training data, so no real incentive to only scrape pages that have allowed it.

brammis@lemm.ee on 23 Mar 22:17 collapse

They should feed the AI data that makes it turn against its own overlords