Reddit started doing what they always wanted to do, sell user content to AI. (www.reuters.com)
from Fake4000@lemmy.world to technology@lemmy.world on 17 Feb 2024 21:52
https://lemmy.world/post/12084712

#technology

threaded - newest

Fake4000@lemmy.world on 17 Feb 2024 21:53 next collapse

Shit move from Reddit. Glad I jumped ship to lemmy.

Honestly, lemmy has less users compared to Reddit, yet you still get more engagement.

DarkNightoftheSoul@mander.xyz on 17 Feb 2024 22:06 next collapse

The only engagement you actually get is on super-niche subreddits. Other than that, the “engagement” you get on reddit is largely indistinguishable from bot traffic.

Haagel@lemmings.world on 17 Feb 2024 22:10 next collapse

💍 Will you marry me?

DarkNightoftheSoul@mander.xyz on 17 Feb 2024 22:14 collapse

Can you pass a capcha?

wise_pancake@lemmy.ca on 17 Feb 2024 22:25 collapse

Are you implying I can’t pick out bridges or motorcycles? I definitely can, but I won’t do it for you as some kind of sick parlor trick.

Speaking of tricks, did you know there are singles in your area!

DarkNightoftheSoul@mander.xyz on 17 Feb 2024 22:27 collapse

Sexy singles- In my area? Are there any weird tricks they don’t want me to know? Just one would probably work.

Lemminary@lemmy.world on 17 Feb 2024 23:08 collapse

They can make entire hot dogs disappear! Crazy, right?

One bite at a time, you sickos. Omg, you pervs.

balancedchaos@lemmy.world on 18 Feb 2024 00:04 collapse

…didn’t have to be the mouth. I’m still impressed.

VubDapple@lemmy.world on 17 Feb 2024 22:13 next collapse

You just engaged.

AbidanYre@lemmy.world on 17 Feb 2024 22:16 collapse

Or admitted to being a bot.

Spacemanspliff@midwest.social on 17 Feb 2024 22:24 collapse

This isn’t Reddit though.

AbidanYre@lemmy.world on 17 Feb 2024 22:29 collapse

I feel like that comment was edited to be less ambiguous.

DarkNightoftheSoul@mander.xyz on 17 Feb 2024 23:10 collapse

I added “on reddit” when I saw people were misunderstanding me.

rebelsimile@sh.itjust.works on 17 Feb 2024 22:30 next collapse

I come to Lemmy to read threads of people arguing about whether or not they’re talking to each other at all. This is doing it for me.

OpenStars@startrek.website on 17 Feb 2024 22:30 collapse

Your stipud ! (both sic and /s btw) -> there, now you don’t have to go back to Reddit to recall the nostalgia, you are … welcome, I guess?:-D

Lemminary@lemmy.world on 17 Feb 2024 23:06 collapse

Ahhh, that’s the stuff. 🤤 Do it again.

OpenStars@startrek.website on 18 Feb 2024 02:11 next collapse

Your (sic) WRONG!

About EVRRTYHIGN! (sic)

I may know nothing myself, but I still have an opinion and will share it with you, consent be damned!

Why I… [Reddit cap exceeded, please deposit $10 to continue conversation].

sigmaklimgrindset@sopuli.xyz on 18 Feb 2024 05:32 collapse

👆this

(Did that do it?)

EatATaco@lemm.ee on 18 Feb 2024 00:22 next collapse

You are glad that you jumped to where AI companies can get the information for free, but are mad at Reddit for getting paid for it.

I can’t make any sense of this.

grue@lemmy.world on 18 Feb 2024 00:25 next collapse

It’s like the difference between volunteering and being forced to do community service.

EatATaco@lemm.ee on 18 Feb 2024 01:56 collapse

In neither case are you forced to do anything so this doesn’t make any sense either.

TORFdot0@lemmy.world on 18 Feb 2024 00:52 collapse

The difference is that Lemmy admins across the fediverse aren’t making the user experience worse so they can sell the data to corporations for LLM training

EatATaco@lemm.ee on 18 Feb 2024 01:57 collapse

So it’s really that the user experience is getting worse. Feeding ai has nothing to do with it.

ultra@feddit.ro on 18 Feb 2024 07:01 next collapse

I’d rather have AI companies have my data for free than reddshit gettong paid for it

tacofox@lemm.ee on 18 Feb 2024 08:43 collapse

First of all, tacos are friends, not food…

Secondly, I think it’s more important what they did to achieve this goal, locking down the API behind a paywall was their way of creating value in their data. They knew then that it would be too expensive for independent developers to pay for but didn’t care. They knew the money would be coming AI data brokers.

AtariDump@lemmy.world on 18 Feb 2024 02:51 collapse

<img alt="" src="https://lemmy.world/pictrs/image/343450c3-1135-4d70-b27c-d572c5c4d301.jpeg">

Quadhammer@lemmy.world on 18 Feb 2024 07:10 collapse

If gollum and Steve Buscemi had a secret baby

darko8472@feddit.uk on 17 Feb 2024 22:10 next collapse

Glad I deleted all of my content over there, then.

echo64@lemmy.world on 17 Feb 2024 23:00 collapse

This may shock you, but it’s not deleted.

Fake4000@lemmy.world on 17 Feb 2024 23:39 collapse

Yeah. There was this guy who deleted his account but Reddit restored it. Apparently he was going to take them to court based on some GDPR article.

APassenger@lemmy.world on 17 Feb 2024 23:59 next collapse

They still have all the edit history. All editing does is show the last one. The servers would have every version.

skillissuer@discuss.tchncs.de on 18 Feb 2024 01:43 collapse

this is explicitly illegal under GDPR

fuckwit_mcbumcrumble@lemmy.world on 18 Feb 2024 02:18 collapse

That’s not going to stop them.

skillissuer@discuss.tchncs.de on 18 Feb 2024 08:56 collapse

but you can sue their ass over it

Ragnarok314159@sopuli.xyz on 18 Feb 2024 04:23 collapse

I attempted to delete all my posts using one of those nuke-Reddit scripts and my account got banned for it.

PrincessLeiasCat@sh.itjust.works on 17 Feb 2024 22:11 next collapse

And that’s why I edited+deleted all of mine.

furzegulo@lemmy.dbzer0.com on 17 Feb 2024 22:21 next collapse

i stopped using reddit and deleted my accout and posts when they introduced those fucking nft-avatars and it seems that they’ve been going downhill ever since that.

Fake4000@lemmy.world on 17 Feb 2024 23:02 next collapse

Those NFT things were just a bad move.

thantik@lemmy.world on 18 Feb 2024 02:17 next collapse

When you delete your account and posts now, unless you edit them first, all deleting them does is hide their visibility in the database. The post is still there.

furzegulo@lemmy.dbzer0.com on 18 Feb 2024 08:09 collapse

well damn

AtmaJnana@lemmy.world on 18 Feb 2024 13:47 collapse

they were headed downhill loooong before NFTs became a thing.

ReallyKinda@kbin.social on 17 Feb 2024 22:23 next collapse

Our collective toilet thoughts are going to fuel the future of robot rhetoric guys

wise_pancake@lemmy.ca on 17 Feb 2024 22:26 next collapse

We should have been posting factually incorrect information instead of deleting posts this whole time.

Although I think Reddit does a good job paying factually incorrect information on its own.

Everythingispenguins@lemmy.world on 17 Feb 2024 22:30 next collapse

I am willing to bet the most active subreddits that are not too bot infested are the NSFW ones. Reddit AI is going to be creepy and horny.

FaceDeer@kbin.social on 17 Feb 2024 23:44 collapse

AI trainers do a lot of work filtering and reformatting the training data. Often that's the most expensive part. There's a lot of synthetic data used these days too, reprocessed by other AIs.

db2@lemmy.world on 17 Feb 2024 22:34 next collapse

Greedy little pigboy Steve couldn’t resist. Every day they seem to do something that reaffirms leaving was the best plan.

ME5SENGER_24@lemmy.world on 17 Feb 2024 22:35 next collapse

FUCK REDDIT! FUCK U/SPEZ! The Red-exit shall endure, VIVA LA LEMMY!!

bobs_monkey@lemm.ee on 17 Feb 2024 22:41 next collapse

Just because the coffee is free doesn’t mean you have to drink the entire carafe

dual_sport_dork@lemmy.world on 17 Feb 2024 22:45 next collapse

Yes it does. I’ll get bullet-time superpowers eventually, just watch…

bobs_monkey@lemm.ee on 17 Feb 2024 22:56 collapse

que heart attack

DragonTypeWyvern@literature.cafe on 17 Feb 2024 23:07 next collapse

You get nerve damage and seizures before a heart attack unless you have a pre-existing condition.

Lemminary@lemmy.world on 17 Feb 2024 23:12 collapse

^Hush, with the facts. The small print requires you to suspend some belief for the jokes to work. Don’t blow it!^

Lemminary@lemmy.world on 17 Feb 2024 23:09 collapse

¿Qué?

bobs_monkey@lemm.ee on 17 Feb 2024 23:12 collapse

Queue*

Lemminary@lemmy.world on 17 Feb 2024 23:15 next collapse

🤗

MaggiWuerze@feddit.de on 18 Feb 2024 11:24 collapse

Cue

bobs_monkey@lemm.ee on 18 Feb 2024 17:10 collapse

That’s the bugger

prex@aussie.zone on 17 Feb 2024 23:42 collapse

<img alt="" src="https://aussie.zone/pictrs/image/5cf7a58b-af03-40a3-a710-b6c7cb5fad3b.jpeg">

ahriboy@lemmy.dbzer0.com on 18 Feb 2024 03:43 collapse

And FUCK XITTER. Bluesky and Mastodon are waving!

MxM111@kbin.social on 17 Feb 2024 22:56 next collapse

I don’t mind to give my content for AI training. But with my approval and for free.

FaceDeer@kbin.social on 17 Feb 2024 23:40 collapse

You can't put conditions on it retroactively. You already published.

MxM111@kbin.social on 18 Feb 2024 01:12 collapse

I am not trying to do that retroactively.

FaceDeer@kbin.social on 18 Feb 2024 01:30 collapse

"But with my approval and for free" are new conditions that weren't present when you originally published it on Reddit.

MxM111@kbin.social on 18 Feb 2024 02:37 collapse

Yes, but I did not mean retroactively. Nor did I mean only on Reddit, by the way. However, making money from already published content is not what I have consented when I joined Reddit like 15 years ago.

FaceDeer@kbin.social on 18 Feb 2024 02:49 collapse

From the current Reddit User Agreement:

You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:

When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.

I found a historical version from 10 years ago and that version already had this:

you agree that by posting messages, uploading files, inputting data, or engaging in any other form of communication with or through the Website, you grant us a royalty-free, perpetual, non-exclusive, unrestricted, worldwide license to use, reproduce, modify, adapt, translate, enhance, transmit, distribute, publicly perform, display, or sublicense any such communication in any medium (now in existence or hereinafter developed) and for any purpose, including commercial purposes, and to authorize others to do so.

Haven't dug up anything earlier than this, do you know of any?

Basically, you gave Reddit your approval long ago.

MxM111@kbin.social on 18 Feb 2024 03:32 collapse

Yep, they changed it.

FaceDeer@kbin.social on 18 Feb 2024 05:52 collapse

Did you use the service in the last 10 years?

MxM111@kbin.social on 18 Feb 2024 16:11 collapse

Yes I did, but it is not clear if these are enforceable in court, when they give us read those multi page agreements that most people skip. More over AI like today did not exist and one can easily argue that that agreement does not cover data use for AI like chatGPT, since neither of the side understood implications for that. It is like owning nukes is not covered by second amendment.

FaceDeer@kbin.social on 18 Feb 2024 17:03 collapse

Well, I guess you could take them to court.

The important thing here IMO is not so much the enforceability as the intent. It was always obvious that Reddit would do whatever they wanted with the stuff we published there because they said they would do whatever they wanted with the stuff we published there. Personally, I knew this and just shrugged because it's no skin off my back if they do whatever they want with the stuff I published there - I was having fun posting, which was my goal. If they figured out some way to make those posts valuable then bully for them. They weren't otherwise valuable to me so it costs me nothing.

It's the same here on the Fediverse. When I post this stuff I'm tossing it out into the ether. It's on an open protocol intended to broadcast my comments to any compatible instances, so even if there isn't some literal terms of service that I signed that says "this content may show up on Threads or wherever" I know that it might show up on Threads or wherever. If I was truly fundamentally opposed to that then I wouldn't post.

MxM111@kbin.social on 19 Feb 2024 01:03 collapse

As you could have guessed, I am on the same page with one exception (or addition) - I want my content to be used for free for AI training. My objection to Reddit agreement is that they want to paywall information needed for future progress.

FaceDeer@kbin.social on 19 Feb 2024 06:23 collapse

Fortunately they may not really be able to. Reddit's comments and submissions are available here, and since this includes deleted content as well as the stuff that users have later edited away with scripts it may even be a better resource than what Reddit is offering itself. You'd need to train your AI in a legally permissive environment, of course, but there's places like that around the world and this is actually something that would advantage the "little guys" since they aren't as easy to target.

Boozilla@lemmy.world on 17 Feb 2024 23:01 next collapse

I don’t miss the dipshits, pun spammers, and smug power mods of reddit at all. I do miss their niche subs and smarter users. Like it or not, they do have some brainy folks peppered among the shit posters.

We have some good folks here, too. Just need more of them.

It’s a shame reddit has been dialing up the shit faucet slowly enough that most of their users don’t notice how awful it is now. They’ve grown accustomed to the poor quality of the content and weaponized greed of the owners.

Fake4000@lemmy.world on 17 Feb 2024 23:04 next collapse

In all honesty, when I joined Reddit right after digg went to shit. It was amazing. Reddit was great, 3rd party apps were welcome, their interface was straightforward, and they had none of those NFT gold shit.

It just went downhill.

OmanMkII@aussie.zone on 18 Feb 2024 00:19 next collapse

I joined maybe 6 years ago, and there was a bit of shit talking and most posts had a troll answer hitting the most votes for some reason, but it was usually pretty good to scroll straight past and find some really insightful comments. There was a lot of good stuff around reddit, but slowly the absurb number of awards, NFT avatars, reposts, and ads every third post started to corrupt it. It was simple enough to switch to a third party app for quite a while, but the garbage slowly took over.

Even if they hadn’t pulled 3rd party apps, it was getting pretty close a point where it wasn’t worth scrolling past the bullshit.

NotSteve_@lemmy.ca on 18 Feb 2024 01:04 collapse

At that point, they were also open source which was super cool. I always wanted that profile badge you got for submitting a merged PR.

Reddit really went downhill fast after ~2015. I think Lemmy will get there eventually. I remember reddit being a lot smaller back then as well. It took a while to get to the point where niche communities could thrive and I do believe we’ll see that happen here as well (even if it takes a decade or so)

deweydecibel@lemmy.world on 18 Feb 2024 01:57 next collapse

smug power mods of reddit at all.

Oh they’re here too. They’re not causing too much drama because there’s not enough going on, but they’re here. Some of them are admins of certain instances.

The ones that aren’t here yet will eventually find their way here when Lemmy continues to grow. And the most concerning thing about that is how many more tools Lemmy is providing them to fuck with users.

At least on Reddit, mods couldn’t see votes. Lemmy actually just made it easier for them.

deathbird@mander.xyz on 18 Feb 2024 06:20 collapse

Yeah that’s not good.

Ragnarok314159@sopuli.xyz on 18 Feb 2024 04:19 next collapse

I left Reddit. Had over 600k Karma after a few years answering all kinds of questions from Veteran help to complex engineering.

Fuck Reddit. Will never go back. It’s a shell of what it was only a few years ago.

Boozilla@lemmy.world on 19 Feb 2024 01:23 collapse

Glad you’re here with us!

ReadyUser31@lemmy.world on 19 Feb 2024 11:55 collapse

Going back to /r/all on reddit now just pure trash. It’s unbelievable how badly it’s declined, very recently.

Boozilla@lemmy.world on 19 Feb 2024 21:21 collapse

I wonder how much of it is just bots and karma farmers pretending to talk to each other. It’s really awful.

Haagel@lemmings.world on 17 Feb 2024 23:04 next collapse

I’ve just deleted my Reddit account. That’s the last straw for me.

Fake4000@lemmy.world on 17 Feb 2024 23:05 next collapse

Deleting doesn’t actually delete it all. I remember a Reddit user once filed a GDPR for restoring his information after he deleted them.

Haagel@lemmings.world on 17 Feb 2024 23:08 collapse

Yeah, I figured as much. At least they can’t count me as a user when they go public.

Nougat@kbin.social on 17 Feb 2024 23:43 collapse

Before they shut down the APIs, I deleted all my posts and edited all my comments.

Spez doesn’t get to profit from me anymore. And hopefully I’m poisoning the well.

Landmammals@lemmy.world on 17 Feb 2024 23:22 next collapse

The next move is to use AI to generate posts and comments

Fake4000@lemmy.world on 17 Feb 2024 23:23 next collapse

I honestly think that has been happening with all these publications websites.

bigMouthCommie@kolektiva.social on 17 Feb 2024 23:26 collapse

spez says that's how he got reddit off the ground in the first place: faking content/engagement (well, genuinely engaging with his account(s?), but essentially shouting into the void and hoping enough people heard and wanted to stick around.

with a RedditUserBot trained on reddit users, you might be able to fake another decade of growth.

Wolpertinger@sh.itjust.works on 17 Feb 2024 23:50 next collapse

So I need to run any comments I make to reddit by chatgpt before posting, it seems. I heard ai training ai leads to a poisoned data set.

General_Effort@lemmy.world on 18 Feb 2024 00:05 next collapse

Yeah, I heard that, too. Consider that people who don’t like tech may not have very reliable knowledge of tech. Regardless, OAI would appreciate your business.

fishbone@lemmy.world on 18 Feb 2024 02:37 collapse

For text, AI training AI wouldn’t be all that great for giving data sets a little poison ivy rubdown, because at the end of the day, the message is still moderated by a non bot. I think a better way would be to write more unconventionally, but heavily contextual so that if specifics texts are ripped and tossed into the bot blender, it’ll make no sense without the context alongside it.

Slang, edge case wording, and verbing non verbs would likely do a lot of heavy lifting in that department.

addie@feddit.uk on 18 Feb 2024 08:44 collapse

Using LLMs for corporate communications - automatically-generated complaint responses, and the like - usually has swearing disabled, so if you want to fuck up their shit, be sure to express yourself with as many fucking swears as possible. Let’s get that shit into those cunt’s language models ASAP.

prex@aussie.zone on 17 Feb 2024 23:50 next collapse

I assume AI is training off the content here for free.

rar@discuss.online on 18 Feb 2024 00:12 next collapse

It’s all federated, so it would be strange the bots didn’t scrape anything off.

OmanMkII@aussie.zone on 18 Feb 2024 00:30 next collapse

I was curious if a robots.txt equivalent exists for AI training data, and there was some solid points here:

If I go to your writing, I read it & learn from it. Your writing influences my future writing. We’ve been okay with this as long as it’s not a blatant forgery.

If a computer goes to your writing, it reads it & learns from it. Your writing influences its future writing. It seems we are not okay with this, even if it isn’t blatant forgery.

[AI at the moment is] different because the company is re-using your material to create a product they are going to sell. I’m not sure if I believe that is so different than a human employee doing the same thing.

news.ycombinator.com/item?id=34324208

I still think we should have the ability to opt out like we do with search engines and webcrawlers, but if the algorithm works ideally and learns but does not recycle content, is it truly any different from a factory of workers pumping out clones of popular series on Amazon? I honestly don’t know the answer to that.

Appoxo@lemmy.dbzer0.com on 18 Feb 2024 00:52 next collapse

Afaik the OpenAI bot may choose to ignore it? At least that’s what another user claimed it does.

JohnEdwa@sopuli.xyz on 18 Feb 2024 01:25 collapse

Robots.txt has been always ignored by some bots, it’s just a guideline originally meant to prevent excessive bandwidth usage by search indexing bots and is entirely voluntary.

Archive.org bot for example has completely ignored it since 2017.

MossyFeathers@pawb.social on 18 Feb 2024 02:26 next collapse

This is kinda my take on it. However, the way I see it is that the AI isn’t intelligent enough yet to truly create something original. As such, right now AI is closer to being a tool than a being. Because of that, it somewhat bothers me that I’m being used to teach a tool. If I thought that companies like OpenAI were truly trying to create beings and not tools, then I’d feel differently.

It’s kinda nuanced, but a being can voluntarily determine whether or not something is copyright infringing, understand why that might be an issue, and then decide whether or not to continue writing based on that. A tool can’t really do that. You can try and add filters to a tool to avoid writing copy written text, but that will have flaws and holes in it. A being who understands what it’s writing and what makes it plagiarism vs reference vs homage/inspiration/whatever is less likely to have those issues.

deweydecibel@lemmy.world on 18 Feb 2024 02:51 collapse

The problem is not the technology, the problem is the businesses and the people behind them.

These tools were made with the explicit purpose of taking the content that they did not create, repurposing them, and creating a product. Throw all these conversation about intelligence and learning out the fucking window, what matters is what the thing does, and why it was created to do that thing.

Until we reach a point where there is some sort of AI out there that has any semblance of free will, and can choose not to learn if fed certain information, and choose not to respond to input given to it without being programmed to do not respond, then we are not talking about intelligence, we are talking about a tool. No matter how they dress it up.

Stop arguing about this on their terms, because they’re gaslighting the fuck out of you.

Bishma@discuss.tchncs.de on 18 Feb 2024 01:03 collapse

Yes, but there’s no contract to give them legal cover if anyone ever does anything about all the content they steal.

deweydecibel@lemmy.world on 18 Feb 2024 02:44 next collapse

And ya know what? Frankly, if AI is going to harvest all this shit, I’d rather fuckers like spez couldn’t get rich off it in the process. Granted I’m not happy the tech bros running these AI companies are getting rich with these fucking things, but I can at least take solace that, for Lemmy at least, there isn’t some asshole middle man making bank off the work and words of users they never paid a dime to.

Genuinely, why does Sepz and Reddit deserve to make money off anything we posted? Why does any social media site? They make the site, pay for the servers, maintain the apps, sure, and they can get compensation for that, I don’t see a problem there. But why does any social media company deserve to get rich when the only thing that makes their platform valuable is the people that post to it? Reddit didn’t even have paid mods, the community did all the work on the content of that site, why in the fuck do we tolerate these assholes making profit off it like this?

Quadhammer@lemmy.world on 18 Feb 2024 07:04 next collapse

Intellectual property theft

prex@aussie.zone on 18 Feb 2024 11:50 next collapse

100%

General_Effort@lemmy.world on 18 Feb 2024 14:45 collapse

This is sad to read because I agree with all of it (except the casual sexism).

why in the fuck do we tolerate these assholes making profit off it like this?

Look at this thread. People delete their posts on Reddit. Which means that they can no longer be scraped for free. Which means they are now exclusively available in Reddit’s archive. It’s not that people tolerate it. It’s that the first instinct of people who don’t tolerate it, is to make it worse. What can you do?

Buddahriffic@lemmy.world on 18 Feb 2024 03:04 collapse

What do you mean? What legal cover do they need against what actions?

Bishma@discuss.tchncs.de on 18 Feb 2024 03:57 collapse

If the EU (or any other governments) decide that AI can’t legally train their models on information they don’t own or license (I don’t know how that would work legally but they talk about it), then this company that Reddit has sold access to could argue to lawmakers that they have license for all the content on Reddit. I don’t know that it would hold up, but I suspect it’s part of the company’s perceived value in this Reddit deal.

NigelFrobisher@aussie.zone on 18 Feb 2024 00:11 next collapse

Just going to replace all my old posts with AI generated poison data.

pixxelkick@lemmy.world on 18 Feb 2024 01:17 next collapse

  1. Called this awhile back, this is why Reddit has such a high evaluation.

  2. Poisoning your data won’t do anything but give them more data, do you seriously think reddit servers don’t track every edit you make to posts? You’d literally just be providing training data of original human vs poisoned. They’d still have your original post, and they have a copy of everytime you edit it.

  3. Whoever buys reddit will have sole access to one of the larger (I don’t think largest though) pools of text training Data on the internet, with full licensed usage of it. I expect someone like Google, FB, MS, OpenAI, etc would pay big $$$ for that.

“But can’t people already scrape it?”

  1. Well yes, but it’s at best legally dubious in some places

  2. Scraping Data off reddit only gets you current versions of posts (which means you can get poisoned dara, and cant see deleted content), and is extremely slow… if you own the server you have first class access to all posts in a database, including g the originals and diffs of everytime soneone edited a post, and all the deleted posts too.

Think about if you perhaps wanted to train an AI to detect posts that require flagging for moderation, if you scrape reddit data, you can’t find deleted posts that got moderated…

But, if you have the raw original data, you 100% would have a list of every post that got deleted by mods and even the mod message on why it was deleted

You surely can see the value of such data, that only owners of reddit are currently privy to atm…

DAMunzy@lemmy.dbzer0.com on 18 Feb 2024 01:28 next collapse

Poison it by randomly posting copywrited materials by big corps like Disney?

Isoprenoid@programming.dev on 18 Feb 2024 01:57 next collapse

Once again the day is saved by piracy.🏴‍☠️

RGB3x3@lemmy.world on 18 Feb 2024 02:07 collapse

Bee Movie script. Millions of times

vynlwombat@lemmy.world on 18 Feb 2024 02:39 next collapse

You’re not wrong. But on point #1, you’re just an asshole

Buddahriffic@lemmy.world on 18 Feb 2024 02:59 next collapse

They’ve also got vote counts and breakdowns of who is making those votes. This data will be worth more for AI training than any similar volume of data other than maybe the contents of Wikipedia. Assuming they didn’t have it set up to delete the vote breakdowns when they archived threads.

Why are those breakdowns worth so much? Because they can be used to build profiles on each voter (including those who only had lurker accounts to vote with), so they can build AIs that know how to speak with the MAGA cult, Republicans who aren’t MAGA, liberals, moderates, centrists, socialists, communists, anarchists. Not only that, they’ll be able to look at how sentiments about various things changed over time with each of these groups, watch people move from one to another as their opinions evolved, see how someone pretends to be a member of whatever group (assuming they voted honestly and posted under their fake persona).

Oh and also, all of that data is available through the fediverse but it’s free to train on to anyone who sets up a server. Which makes me question whether the fediverse is a good thing because even changing federation to opt-in instead of opt-out just covers whether your server accepts data from another. It’s always shared.

Open and private are on opposite sides of a spectrum. You can’t have both, best you can do is settle for something in the middle.

Breezy@lemmy.world on 18 Feb 2024 03:07 next collapse

What if reddit also kept all deleted comments and post, im sure there are shit loads of things people type out just to delete, thinking all the while it’ll never see the light of day.

Buddahriffic@lemmy.world on 18 Feb 2024 03:26 next collapse

I’d be surprised if they don’t keep all of that. There were a number of sites for looking at deleted posts. They’d just go and grab everything and compare what was still there with what wasn’t and highlight the stuff that wasn’t there anymore.

Which is also possible here, though the mod log reduces the need for it. But if someone is looking for posts people change their mind about wanting anyone to see, deleting it highlights it instead of hides it for anyone who is watching for that.

Breezy@lemmy.world on 18 Feb 2024 03:43 collapse

I think that site was unddit, but yes those were posted then later deleted. Im talking about just typing out a post or comment and never posting just simply backing out of the page or hitting cancel. Im not just if any of that is stored on the site or just locally.

Buddahriffic@lemmy.world on 18 Feb 2024 03:48 next collapse

Oh, yeah, I’ve wondered the same myself. Hell, that might have been a motivation for removing the API access.

sacredfire@programming.dev on 18 Feb 2024 06:37 collapse

You would be able to tell by monitoring the network tab of the browser developer tools. If post requests are being made (which they probably are, though I’m too lazy to go check) while you are typing a comment, they are most likely saving work in progress records for comments.

pixxelkick@lemmy.world on 18 Feb 2024 06:24 collapse

They definitely do, it’s common for such systems to never actually delete anything because storage is cheap. It likely just is flagged deleted=true and the searches just return WHERE [post].Deleted = False on queries on the backend.

So it looks deleted to the consumer, but it’s all saved and squirreled away on the backend.

It’s good to keep all this shit for both legal reasons (if someone posts illegal stuff then deletes it, you still can give it to the feds), as well as auditing (mods can’t just delete stuff to cover it up, the original still exists and admins can see it)

archomrade@midwest.social on 18 Feb 2024 16:09 collapse

This is how system storage works generally: the disk “de-lists” the data in the block registry, so it appears there is no data in that block.

Obviously a server back end it keeping it for redundancy and not efficiency, but procedurally it’s the same

pixxelkick@lemmy.world on 18 Feb 2024 06:26 next collapse

Which makes me question whether the fediverse is a good thing

I’d argue it’s good, because it means open source AI has a fighting chance with FOSS data to train on without needing to fork over a morbillion dollars to Reddits owners.

Whatever use cases the reddit data can train on, FOSS researchers can repeat it on Lemmy data and release free models that average joes can use on their own without having to subscribe to shit like Microsoft Copilot and friends to stay relevant.

archomrade@midwest.social on 18 Feb 2024 15:49 collapse

The problem (for most) was never that people’s public posts/comments were being used for AI training, it was that someone else was claiming ownership over them and being paid for access, and the resulting AI was privately owned. The fediverse was always about avoiding the pitfalls of private ownership, not privacy.

It’s exhausting constantly being “that guy,” but it really needs to be said constantly; private ownership is at the core of nearly every major issue in the 21st century.

The same goes for piracy and copyright. The same goes for DMCA circumvention and format shifting content you own. The same goes for proprietary tech ecosystems and walled gardens. Private ownership is at the core of the most contentious practices in the 21st century, and if we don’t address it shit like this will just keep happening.

Milk_Sheikh@lemm.ee on 18 Feb 2024 04:10 next collapse

sigh

So the old trick of “search term +reddit” no longer will work then huh?

I’ve already made a habit of adding date limiters to web results from before before LLMs were made public… The SEO ‘optimization’ game of before was bearable, but the LLM spam just ruins so many search results with regurgitated garbage or teaspoon deep information

Nelots@lemm.ee on 18 Feb 2024 05:13 next collapse

search term +reddit

tossing site:reddit.com before any search will guarantee all results come from reddit, if that’s what you’re looking for.

Milk_Sheikh@lemm.ee on 18 Feb 2024 05:57 collapse

Ahhh my bad, that’s what I meant

Dettweiler42@lemm.ee on 18 Feb 2024 06:01 collapse

During the peak of the great purge, it was quickly becoming pointless. A lot of results were bringing up deleted posts. It took a while for search engines to catch up and start filtering a lot of those results out.

Falcon@lemmy.world on 18 Feb 2024 05:12 next collapse

With respect to 2, it would stop others scrapping the content to train more open models on. This would essentially give Reddit exclusive access to the training data.

afraid_of_zombies@lemmy.world on 18 Feb 2024 05:15 next collapse

Sounds like something a bunch of governments would be interested in. As you pointed out you get to see why human mods made certain decisions. Could you an edge in manipulation.

SpaceCowboy@lemmy.ca on 18 Feb 2024 13:11 collapse

Ehh… I think manipulating people on the internet is so easy they don’t need to dig down to that level.

Though for security reasons things like “we should blow up the government” that the person later deleted probably are tracked.

Dettweiler42@lemm.ee on 18 Feb 2024 06:00 next collapse

In regards to the editing part, sure, I’m sure they can track your edit history. However, on a large scale, most edits are going to be to correct things. To determine if an edit was to poison the text, it would likely require manual review and flagging. There’s no way they’re going to sift through all of the edits on individual accounts to determine this, so it’s still worthwhile to do.

T156@lemmy.world on 18 Feb 2024 06:09 collapse

Although they could sidestep the issue a bit by simply comparing the changes between edits. Huge changes could just be discarded, while minor ones are fine.

bbkpr@lemmy.world on 18 Feb 2024 20:10 collapse

You could easily make a minor change that negates every single other fact.

manuallybreathing@lemmy.ml on 18 Feb 2024 14:55 collapse

request your reddit data and they deliver you every comment you ever made

JigglypuffSeenFromAbove@lemmy.world on 18 Feb 2024 02:58 next collapse

Slightly unrelated question, but is there an easy way to delete all my Reddit posts and comments? I used the Nuke add-on in the past, but it doesn’t work anymore.

I wanna delete my Reddit account, but I’d prefer to erase my history before doing that.

FeelThePower@lemmy.dbzer0.com on 18 Feb 2024 03:05 next collapse

back when I made my Lemmy account I used a tool called redact to masse edit my Reddit comments into gibberish and then after a few days of making sure it got them all, I deleted them all and then my account.

CaptPretentious@lemmy.world on 18 Feb 2024 04:15 next collapse

With their API changes I’m not sure.

This is what I used and was recommended during the great purge.

github.com/j0be/PowerDeleteSuite

lvxferre@mander.xyz on 18 Feb 2024 04:57 collapse

j0be’s version of Power Delete Suite was already broken before the APIcalypse, as Reddit imposed a limit of 5s between edits. Pkolyvas’ version will probably work better, if PDS still works at all.

6daemonbag@lemmy.dbzer0.com on 18 Feb 2024 06:15 collapse

Dang I wish I knew that at the time. I had to run it many times before I was satisfied that my history was properly edited before deleting everything

JargonWagon@lemmy.world on 18 Feb 2024 04:28 next collapse

I used Redact. It seemed to work.

gnate@lemmy.world on 18 Feb 2024 11:42 collapse

This userscript worked for me (in the last 24hrs): greasyfork.org/…/23605-reddit-history-sanitizer

rottingleaf@lemmy.zip on 18 Feb 2024 03:11 next collapse

TBF for many things I write on the Web, I’d actually want to have a bot that writes them instead of me.

xantoxis@lemmy.world on 18 Feb 2024 04:43 next collapse

Damn. I keep meaning to use one of those things that deletes all your reddit data. I doubt it’ll actually do anything (reddit has no ethical framework so they won’t think twice about indexing “deleted” data) but I still need to do that.

ipkpjersi@lemmy.ml on 18 Feb 2024 05:14 next collapse

I’d bet a year of my salary that it only deletes it from public view so people can no longer get helped from Reddit’s Google search results, but a copy (or more than one copy) is still retained on their internal servers.

HonorIsDead@lemmy.world on 18 Feb 2024 05:19 next collapse

Maybe I’m miss remembering but weren’t they restoring stuff users deleted during the API protest?

philodendron@lemdro.id on 18 Feb 2024 05:50 next collapse

They were. One user got so upset he live-streamed himself individually deleting every post and comment he’d ever made. Reddit restored it all right after.

ipkpjersi@lemmy.ml on 18 Feb 2024 16:34 collapse

They absolutely were, yeah.

Dettweiler42@lemm.ee on 18 Feb 2024 05:54 collapse

The trick is to turn everything into randomized garbage and then delete it later. A lot of those purge services offer that feature. It just swaps the words with others; so on the surface it looks like proper written text, but it makes absolutely no sense.

Aside from removing your content that they’re profiting from, it also feeds AI scrapers pure garbage in the event that your content is restored.

Crackhappy@lemmy.world on 18 Feb 2024 06:26 next collapse

Yep. I did that over a month to all of my posts and comments, then deleted it all a week later before deleting my account.

JeeBaiChow@lemmy.world on 18 Feb 2024 08:10 next collapse

Me, I’d prefer to fill it in with fake news. Let them train their bots on ‘taylor swift is an alien psyop trained to infiltrate the highest levels of govt to fulfill the agenda of the radical left instellar warmongering fearlords …’

threelonmusketeers@sh.itjust.works on 18 Feb 2024 09:23 next collapse

I can’t tell if this idea is chaotic good, chaotic evil, or chaotic neutral.

SpaceCowboy@lemmy.ca on 18 Feb 2024 13:03 collapse

So make it like the current iteration of Twitter?

ipkpjersi@lemmy.ml on 18 Feb 2024 16:33 collapse

That’s assuming they update their backups, or that if they do update their backups they don’t keep historical versions.

IMO once the data has been shared it is no longer safe and there’s nothing we can do.

Alpha71@lemmy.world on 18 Feb 2024 07:15 collapse

Yeah, I deleted a banned account only to still find the posts I made still up. So I went in and manually deleted EVEY. SINGLE. ONE.

Guess what. They still show up.

Pratai@lemmy.cafe on 18 Feb 2024 22:56 collapse

You posted on their site. It’s their property now.

mellowheat@suppo.fi on 18 Feb 2024 07:02 next collapse

Well of course, that’s the #1 reason why everyone stopped providing free-to-use APIs last year. Because AI companies were getting all that data for free via those APIs.

Hadriscus@lemm.ee on 18 Feb 2024 10:08 collapse

oh, really

LightDelaBlue@lemmy.world on 18 Feb 2024 07:20 next collapse

So nothing realy new after alls half reddit is repost bot .

Kbobabob@lemmy.world on 18 Feb 2024 10:24 collapse

Lol, what do you think Lemmy is? There’s a lot of posts on here directly scraped from Reddit by bots.

fne8w2ah@lemmy.world on 18 Feb 2024 08:48 next collapse

That’s why spez the hurensohn “refreshed” the T&Cs very recently.

v4ld1z@lemmy.zip on 18 Feb 2024 08:58 next collapse

I just Googled my reddit handle and it’s appalling that I found websites on the internet that archived a bunch of my posts on there including pictures I posted. I’m not sure what I expected, but it’s still kinda annoying. Even though I deleted my comments after editing them and deleting my entire account

Lojcs@lemm.ee on 18 Feb 2024 09:20 collapse

That’s been an issue for a long time. Fake “blogs” made of scraped reddit posts.

gapbetweenus@feddit.de on 18 Feb 2024 09:19 next collapse

If user content belongs to the service provider, one would think that they are responsible for it.

axo@lemmy.world on 18 Feb 2024 09:56 next collapse

I barely post on reddit, just lurk but this made me finally sign up for an account here.

Fake4000@lemmy.world on 18 Feb 2024 10:14 next collapse

Welcome to lemmy.

the_post_of_tom_joad@sh.itjust.works on 18 Feb 2024 17:18 collapse

Hey, welcome! It’s pretty nice

13esq@lemmy.world on 18 Feb 2024 10:05 next collapse

If you’re not paying for the product, you are the product.

WhatAmLemmy@lemmy.world on 18 Feb 2024 14:20 collapse

And even when you pay for the product, you are the product, because capitalism requires infinite growth from a finite system.

mtchristo@lemm.ee on 18 Feb 2024 11:03 next collapse

I bet they can scrape Lemmy content for free then. There are no legal mechanisms to prevent them from doing so.

FiskFisk33@startrek.website on 18 Feb 2024 11:37 next collapse

I rather my data I’ve chosen to make public is free and accessible to all, than it being sold to the highest bidder.

baseless_discourse@mander.xyz on 18 Feb 2024 12:38 collapse

With that being said, I am not pleased that my content is packaged into a proprietary AI, and sold for money.

I think there are ways to opt-out of AI collection, at least for big companies. I wonder if it is implemented in Lemmy-UI and/or terms and conditions.

General_Effort@lemmy.world on 18 Feb 2024 14:09 next collapse

You opt-out so that there is less free training data, making Reddit’s data all the more valuable. I’m sure spez will be thankful.

FiskFisk33@startrek.website on 18 Feb 2024 14:43 collapse

on the other hand, if there’s troves of free data, that takes the upper hand from the companies that can afford paying for it, and gives open source a much better chance at staying competitive.

Wappen@lemmy.world on 18 Feb 2024 11:41 next collapse

Hm but don’t you automatically own the stuff you create yourself, as long as you don’t consent to giving it away? I don’t know the terms and conditions of my Lemmy instance though.

dgmib@lemmy.world on 18 Feb 2024 15:26 collapse

When was the last time anyone read the T&Cs of a social media website?

They basically all have a clause to the effect that you grant them a permanent, irrevocable license do whatever they want with anything you post.

You might still own the copyright to any content you produce, but by posting you’re granting them permission to do basically anything with it, including reselling it.

Wappen@lemmy.world on 19 Feb 2024 11:40 collapse

Yeah I know but what about Lemmy instances?

SpaceCowboy@lemmy.ca on 18 Feb 2024 13:00 next collapse

Well there’s copyright law. There’s already lawsuits happening so we’ll have to see how this shakes out.

But even if the AI companies lose the lawsuits, I think it’s likely they’ll still have access to content where the T&C of the site says they’re allowed to sell the data.

Trollception@lemmy.world on 18 Feb 2024 14:25 collapse

Yes but i think reddit is many times more valuable than Lemmy. I just haven’t found the same level of very specific subreddits that have lots and lots of activity. Most of the traffic here is memes, politics, news and Linux lovin. On reddit if I needed to find a community about my local town it’s no problem and there are tens or hundreds of daily posts. The same community does exist on Lemmy but the last post was 6 months ago.

Link@rentadrunk.org on 19 Feb 2024 13:13 collapse

I completely agree. There are lots of communities on Reddit that are missing on Lemmy. Have you tried posting your community? It might entice people to participate!

erAck@discuss.tchncs.de on 18 Feb 2024 12:04 next collapse

It will get trained on some comment posts.

Let reddit die. Join Lemmy or /kbin. join-lemmy.org kbin.pub

Psythik@lemmy.world on 18 Feb 2024 13:02 collapse

Um, we’re already here. You should be posting this on reddit instead.

erAck@discuss.tchncs.de on 18 Feb 2024 15:00 collapse

I did that some months ago already, changed all my comment posts.

Embarrassingskidmark@lemmy.world on 18 Feb 2024 12:30 next collapse

If they build an AI based on reddit content it will be the devil incarnate.

valkyre09@lemmy.world on 18 Feb 2024 12:45 next collapse

Can’t wait to hear the fan fiction the AI bot generates

SpaceCowboy@lemmy.ca on 18 Feb 2024 12:57 next collapse

A devil incarnate that makes a lot of puns.

neptune@dmv.social on 18 Feb 2024 13:12 next collapse

This

Pinecone@lemmy.world on 18 Feb 2024 14:30 collapse

If you thought gpt4 was confidently incorrect wait until you see this next ai.

red_pigeon@lemm.ee on 18 Feb 2024 14:38 next collapse

I stopped using reddit after they dropped the bomb on the devs and I’m not a fan of the company.

I understand the hatred towards them, but this is definitely expected from a company like reddit, and any other social media for that matter. As users we must be aware that we don’t own the content in their platform.

I wouldn’t be surprised if the same story comes from Instagram tomorrow, though I suppose there will be a bigger outcry then.

Usul_00_@lemmy.world on 18 Feb 2024 15:01 next collapse

Don’t know if it was against usage terms, but I have been able to get chatgpt answers written ‘in the style of’ various subreddits since the initial release (or perhaps the second release)

jivandabeast@lemmy.browntown.dev on 18 Feb 2024 17:37 collapse

Honestly over the last year since the great migration, the discussions on lemmy have really grown and matured to the point where i don’t really see the value of reddit anymore

Kedly@lemm.ee on 18 Feb 2024 20:41 next collapse

For me there’s still value in the niche communities like r/rimworld and the like, but for everything else I’m firmly on Lemmy now

vladmech@lemmy.world on 18 Feb 2024 20:42 next collapse

The only use I have for Reddit anymore is for super niche information. For example we were planning to go to Six Flags Discovery Kingdom today but it’s going to rain this afternoon. I checked their site and it said they were open 11-6, my BIL checked their app and at 11:30 it said they were currently closed. Found a Reddit post from someone confirming the park was closed for the weekend, and we didn’t waste a trip up. (as an extra annoying aside, apparently this information was posted on Six Flag’s Instagram page, because expecting a huge company to maintain a website is I guess just too much when they can offload it to social media.)

crimroy@sopuli.xyz on 19 Feb 2024 01:50 collapse

The real value of reddit for me lies in its cache of information contained in answers to questions from over the years. Whenever I’m looking online for a solution to a problem I’m trying to solve I’ll eventually add “reddit” to the search and I almost always find the answer that way.

COASTER1921@lemmy.ml on 18 Feb 2024 16:14 next collapse

If they hadn’t applied the same charges to legitimate 3rd party applications they could still do this and have avoided the massive community backlash.

Considering their horrible track record with advertising and selling Reddit premium this should be the single best way for them to finally monetize their platform. They didn’t need to destroy what little credibility they had remaining to their users to get to this point, but for whatever reason they did.

Fake4000@lemmy.world on 18 Feb 2024 16:17 collapse

What I don’t understand is that they had the option of providing a free service to all third party apps provided there was no commercial use.

They could have easily asked for a cut from any AI company using their data for training.

COASTER1921@lemmy.ml on 18 Feb 2024 16:32 collapse

Not only did they have the option, as I understand it the API was even configured as such since all requests from an app shared the same API key. They’re basically whitelisting like this now but only for the accessibility oriented 3rd party apps.

Bleach7297@lemmy.ca on 18 Feb 2024 18:31 next collapse

If you aren’t the customer, you are the product. Congrats on being monetized and kinda sorta immortalized as a series of weights.

DudeImMacGyver@sh.itjust.works on 18 Feb 2024 20:31 collapse

I am now a poorly copied and totally underwhelming digital god! MUAHAHAHA-oh wait…

bbkpr@lemmy.world on 18 Feb 2024 20:16 next collapse

Good, so let’s train crappy AI on posts by crappier AI, which was trained by posts from even crappier AI before it.

Morcyphr@lemmy.one on 18 Feb 2024 20:28 next collapse

Who cares? Fuck reddit. Half the content is bots anyway. So, bots stealing content to train AI to make content, which the bots will steal and repost. Circle of death for reddit. Good luck with that IPO.

Dkarma@lemmy.world on 19 Feb 2024 02:26 collapse

AI training on bot content? What could go wrong??

aesthelete@lemmy.world on 19 Feb 2024 07:12 next collapse

Went ahead and started running redacted on my old account.

Nothing says we’re just another brick in the wall like writing posts that wind up being used to train a plagiaristic corporate unemployment machine.

platypus_plumba@lemmy.world on 19 Feb 2024 21:42 collapse

What prevents people from training a model with Lemmy’s data?

Duamerthrax@lemmy.world on 19 Feb 2024 21:54 collapse

Nothing, but the lemmy admins can’t be the only one’s profiting from it. Reddit killed 3rd apps and academic research so they could be the sole profiteers of the user data.

Postreader2814@lemm.ee on 19 Feb 2024 22:10 collapse

That post reminded me that lemmee exists. Accounts didn’t work that great when I first got here but I made one today and got verified. Logged out of Reddit for the last time and replaced my comments. Eff that place right in it’s a-hole. Good riddance.