DudeImMacGyver@sh.itjust.works
on 17 Feb 2024 22:15
nextcollapse
Where’s my cut?
Fake4000@lemmy.world
on 17 Feb 2024 22:16
nextcollapse
You signed it all away the moment you scrolled down that EULA 😂
admiralteal@kbin.social
on 17 Feb 2024 22:23
nextcollapse
Can't wait for the day a major court declares EULAs universally nonbinding outside of the most common-sense terms. Even though I doubt it will ever happen.
"We can store and display your content and use stuff you publicly post as examples in advertisements for our platform" is pretty common sense.
"We can use the things you post to do complex data analytics to package and sell your identity to advertisers" is fucking sus.
"We can use the things you post to train ANN generative systems to build next-generation technologies to impersonate you and your peers" is simply nuts.
The idea that displaying an EULA with an "agree" button is informed consent is just preposterous. Even lawyers don't read them.
Seems like it would never stand up in court. Prove that -I- agreed to anything. To do that, you first have to prove that nobody has ever created an account under my name, and more importantly, prove that Reddit accounts have never been hacked and that the person who clicked the button was even in my household. And if they keep that extensive of records to where they can follow every action taken by every user on the platform, it also implies that they are tracking my personal actions even before I agreed to anything.
On the other hand, do they actually have a EULA? It’s been almost 14 years since I created my account, and there certainly wasn’t anything about selling my data for AI training when I signed up. If they change the terms of service, they are responsible for notifying everyone, otherwise they can’t claim that anyone agreed to these changes.
I’m sure their lawyers could weasel their way through it some how, but it still seems to come down to them claiming they changed the agreement without notification but the users should still be legally bound by the new terms?
DudeImMacGyver@sh.itjust.works
on 18 Feb 2024 01:48
collapse
But uBlock Origin never seemed to have trouble blocking them.
DudeImMacGyver@sh.itjust.works
on 18 Feb 2024 01:47
collapse
No it didn’t.
FaceDeer@kbin.social
on 17 Feb 2024 23:38
nextcollapse
The classic "screw everyone else, I want mine."
What fraction of a penny do you think you're owed?
DudeImMacGyver@sh.itjust.works
on 18 Feb 2024 01:50
collapse
What fraction of a penny do you think you’re owed?
250,000,000/1
Hey, you didn’t specify that it couldn’t be an improper fraction and I like money. It’s loophole time baby!
FaceDeer@kbin.social
on 18 Feb 2024 01:53
collapse
Good luck with that.
DudeImMacGyver@sh.itjust.works
on 18 Feb 2024 02:12
collapse
Who needs luck when you have math?
dutchkimble@lemy.lol
on 18 Feb 2024 03:38
collapse
Spez is playing the world’s tiniest violin for you as you read this
DudeImMacGyver@sh.itjust.works
on 18 Feb 2024 16:25
collapse
Well tell him to play it better because he sucks at that too.
HowManyNimons@lemmy.world
on 17 Feb 2024 22:19
nextcollapse
Good. Maybe when it cogitates the things I’ve written it might start offering up some better ideas.
ME5SENGER_24@lemmy.world
on 17 Feb 2024 22:36
nextcollapse
FUCK REDDIT! FUCK U/SPEZ! The Red-exit shall endure, VIVA LA LEMMY!!
IchNichtenLichten@lemmy.world
on 17 Feb 2024 22:47
nextcollapse
A LLM that behaves like a typical Redditor?
What possible use is that?
SonnyVabitch@lemmy.world
on 17 Feb 2024 23:02
nextcollapse
Air Canada offering a refund of tree fiddy.
IchNichtenLichten@lemmy.world
on 17 Feb 2024 23:08
nextcollapse
You’ll get your refund eventually but first it will try and gaslight you that Air Canada is a woke mind virus before calling you an asshole and then stalking you.
“instead of the $3.50 refund, I’m also authorized to offer you some June 2025 $350 GME calls.”
ndru@lemmy.world
on 18 Feb 2024 07:26
nextcollapse
If it’s trained on the average Reddit reply: $420.69, nice.
SonnyVabitch@lemmy.world
on 19 Feb 2024 17:36
collapse
I just want to mark the occasion when my previous comment is on 69 points. Noice.
FaceDeer@kbin.social
on 17 Feb 2024 23:36
nextcollapse
Negative examples are often just as useful for training an AI as positive ones. And it all depends on what you want to use the AI for. A moderator bot, for example, needs familiarity with the whole range of user responses it might see.
That gives me actually a fun idea for a Lemmy instance, it has an automated review process that bans posts/comments that are too similar in style to reddit posts/comments.
leaky_shower_thought@feddit.nl
on 18 Feb 2024 01:14
nextcollapse
A redditor bot is a viable example of a forum member bot.
IMO, I don’t think it can drive topics, but it could make things controversial.
lvxferre@mander.xyz
on 18 Feb 2024 01:19
nextcollapse
A LLM that behaves like a typical Redditor? // What possible use is that?
[You] “Chatbot, please tell me which pokemon types are strong against Fairy.”
[Le Lebbit Moronbot] “I’m not sure if I understand, you calling me a chatbot? I’m so confused lol”
[You] “Moronbot, please tell me which pokemon types are strong against Fairy.”
[LLM] “Actually, you should be spelling it “Pokémon” lol”
[You] “Moronbot, which types are strong against Fairy?”
[LLM] “I assume you talking about fairies. Fairies are from mythology lmao”
[You] “Did people really waste water and electricity for this trash?”
[LLM] “Waaah, you’re toxic!!111one”
honey_im_meat_grinding@lemmy.blahaj.zone
on 18 Feb 2024 01:40
nextcollapse
What possible use is that?
I’ve noticed “has this sub gotten more right wing recently?” posts reaching the top post of the day in the last 6 months or so. r/norge and r/unitedkingdom being examples. You can automate bots that change a subreddit’s consensus on certain topics by bot-spamming threads pertaining to those topics, especially in the first hour of a thread going up. I don’t know if that’s happening, or if it has more to do with the Reddit protest that saw mods abdicate their positions last June and new mods being responsible for the change… but it could also be a bit of both.
mryessir@lemmy.sdf.org
on 19 Feb 2024 10:35
collapse
Do you propose more bots in order to steer the public opinion?
That could indeed generate serious money for reddit I suppose!
aidan@lemmy.world
on 18 Feb 2024 08:13
nextcollapse
Marketing to terminally online people maybe?
Hamartiogonic@sopuli.xyz
on 18 Feb 2024 13:52
collapse
Entertaining puns and pointless jokes.
SinningStromgald@lemmy.world
on 17 Feb 2024 22:51
nextcollapse
Good thing I’m not on that shitty platform anymore.
I deleted my shut when I left bur thinking about it. It’s mostly drunken rambling and bad takes. Probably should have left it
etrotta@kbin.social
on 17 Feb 2024 23:12
nextcollapse
Out of all things to hate Reddit for, giving data to AI isn't something fediverse users can really criticize it for, though making money from it perhaps.
Remember: All data in federated platforms is available for free and likely already being compiled into datasets. Don't be surprised if this post and its comments end up in GPT5 or 6 training data.
FaceDeer@kbin.social
on 17 Feb 2024 23:33
nextcollapse
After all the hue and cry I have seen over stuff like Threads and Bluesky federation I don't imagine most people using the Fediverse have a particularly coherent philosophy on the matter.
ExcursionInversion@lemmy.world
on 18 Feb 2024 00:55
collapse
If they could read right now they would be very upset.
BrianTheeBiscuiteer@lemmy.world
on 17 Feb 2024 23:57
nextcollapse
If they already, essentially, cut off API access then it’s not a big leap to limit access on the web to logged in users only and rate limit or ban accounts that behave like scrapers.
Verserk@lemmy.dbzer0.com
on 18 Feb 2024 02:09
collapse
That would matter more if it wasn’t trivial to make new accounts and very cheap to buy established ones.
treadful@lemmy.zip
on 18 Feb 2024 02:02
nextcollapse
The problem isn’t that AI is being trained on the data. The problem is that they locked down all third party data access so they could monetize our content. On a federated platform, everyone gets equal access and can do whatever they want with it.
We sure can criticize them for that.
ColeSloth@discuss.tchncs.de
on 18 Feb 2024 06:27
collapse
No. I can. Reddit was bought out, uses volunteers to control all the subs but forcefully removes you from the sub you created and were supposed to have control over if you didn’t play by their ever-changing rules, ruined/eliminates third party apks by demanding WAY over ad revenue profits to have access to api with a very short notice, and shadow banned anyone and everyone in a position to do anything about any of it. It’s a corporation that gutted an entire platform in order to push agendas they want and milk as much money out of it as possible. Hell, it’s the entire reason all of lemmy gets more than 30 posts a day. So many people switched to lemmy over the past year. They ruined a website I enjoyed and I’d rather them not make more money from the thousands of posts I made from over a decade of being there.
FrostyTrichs@lemmy.world
on 17 Feb 2024 23:12
nextcollapse
Enjoy training on my -checks notes- DELETED POST HISTORY YOU FUCKING CLOWNS.
Stay ForeverFucked™ spez.
DocMcStuffin@lemmy.world
on 18 Feb 2024 00:26
nextcollapse
At this point I wouldn’t trust Reddit to actually delete posts. Just hide them then sell them as training data if the upvotes are decent.
USSEthernet@startrek.website
on 18 Feb 2024 00:44
collapse
Yea, I wish I didn’t delete my posts now and just edited them to point to Lemmy or something.
Reminds_Me_Of_Reddit@sopuli.xyz
on 18 Feb 2024 01:58
collapse
I did that to my entire post history and got immediately banned by automod from a lot of subreddits lol
frostysauce@lemmy.world
on 19 Feb 2024 00:39
collapse
Oh, sweet summer child… You think they don’t have of that all archived?
Voyajer@lemmy.world
on 17 Feb 2024 23:18
nextcollapse
This is why I don’t blame anyone for editing/deleting their post history on reddit.
FaceDeer@kbin.social
on 17 Feb 2024 23:29
collapse
I do. It's frankly selfish. Having an AI get training on my old comments costs me nothing and it results in the development of useful AI tools. Trying to sabotage that is petty and pointless. It's not like you could somehow collect the fraction of a pittance that you think you're owed retroactively. I never commented on Reddit thinking "awesome, I'm going to make bank on the content I'm generating here."
People complain about the capitalist mindset of the world and then they do this. Sigh.
Nurse_Robot@lemmy.world
on 17 Feb 2024 23:36
nextcollapse
Defending giant corporations profiting off of uncompensated individuals, while criticizing anyone who doesn’t want to provide free labor to said corporations, is a disgusting take. Are you a CEO?
liquidparasyte@pawb.social
on 18 Feb 2024 00:01
nextcollapse
Expecting FaceDeer to not glaze AI is like expecting the sun to not rise.
the_post_of_tom_joad@sh.itjust.works
on 18 Feb 2024 01:56
collapse
Oh is that what their crusade is? I was wondering why their take was so stupid
FaceDeer@kbin.social
on 18 Feb 2024 00:09
collapse
The more accessible training data there is the easier it is for new AI projects to enter the field less dominant those "giant corporations" become.
The free labour was already freely given. If someone doesn't want to have shitposted on Reddit for free then maybe they shouldn't have shitposted on Reddit for free.
Nurse_Robot@lemmy.world
on 18 Feb 2024 00:22
collapse
“if you didn’t want me to steal your intellectual property, you shouldn’t have thought of it in the first place”
FaceDeer@kbin.social
on 18 Feb 2024 00:33
nextcollapse
I'm not sure what you mean here. Nothing's being stolen. Even if you think there needs to be permission for training an AI off of data, Reddit has that permission.
Nurse_Robot@lemmy.world
on 18 Feb 2024 01:52
collapse
I assume you’re more of a moron than a troll, which is disappointing. Regardless, you’re not worth my time, as I don’t think any argument could convince you to have an open mind and be willing to change. Good luck out there!
Fungah@lemmy.world
on 18 Feb 2024 00:43
nextcollapse
So, for an example of what the other user was talking about, I’m just some guy and for my first foray inyo programming / machine learning (I kind of just threw myself into the deep end) I modified stylegan 3 and trained it on about 500g of reddit porn that I scraped off reddit.
Now, I stopped the training after about a week (it was going to take about a solid month on my rtx 2080 ti) when I found out stable diffusion existed but I learned a LOT from that experience.
I couldn’t do that now. Arguably none of that was how any of that should be done but whatever.
QuaternionsRock@lemmy.world
on 18 Feb 2024 00:58
collapse
No, you shouldn’t have posted it to Reddit, in which you were required to give them a perpetual license to use your IP in any way they see fit.
For the record, I’m here because Reddit pissed me off when they axed the free API, and I’m pissed at myself for not expecting it. That’s what I get for accepting their terms and conditions, I guess.
Edit: I also don’t accept the idea that using my content for training data is “fair use” when it is used to train proprietary models, especially ones in which the end user is allowed to prompt it to plagiarize or otherwise imitate my content.
Zellith@kbin.social
on 17 Feb 2024 23:56
nextcollapse
Selfish? Perhaps you forget why people deleted their content in the first place.
FaceDeer@kbin.social
on 18 Feb 2024 00:06
collapse
What do you think this thread is about?
R00bot@lemmy.blahaj.zone
on 18 Feb 2024 00:54
nextcollapse
How is not wanting capitalist companies to profit off of your content not aligned with complaining about the capitalist mindset of the world? Wtf lol.
FaceDeer@kbin.social
on 18 Feb 2024 01:28
collapse
It's the insistence that everything that people do must be compensated with money. People have spent years posting on Reddit for fun, without any thought to being paid for it, and now all of a sudden someone else is making some money so they're demanding that they should get their slice. And doing what they can to wreck their earlier efforts when they don't.
How does Reddit making some money licensing this stuff harm those of us who contributed to it? Is there any problem aside from "I wanna get paid!"?
R00bot@lemmy.blahaj.zone
on 18 Feb 2024 01:54
collapse
Why do you think it’s about wanting a slice? They posted on Reddit with no expectation of profit. But they don’t want others to profit off it either. It’s not that complicated.
FaceDeer@kbin.social
on 18 Feb 2024 01:58
collapse
But they don’t want others to profit off it either.
And that's why I call them selfish. It doesn't harm them in the slightest if someone else profits off of it.
R00bot@lemmy.blahaj.zone
on 18 Feb 2024 10:41
nextcollapse
They wouldn’t have posted if they knew this was going to happen. They posted because it was fun, not for this.
They may be morally opposed to AI (as there are many valid reasons to be opposed to it), or they may just have wanted to have been able to make an informed decision before posting, but by retroactively training the AI on their posts they’ve robbed them of the agency to make that decision.
That’s why they’re upset.
FaceDeer@kbin.social
on 18 Feb 2024 16:51
collapse
They posted content on a website whose user agreement says "we can do whatever we like with the content you post here" and then go surprised-pikachu when the website goes ahead and does whatever they like with the content they posted. Frankly, I'm not tremendously sympathetic. This should have been easy to predict.
R00bot@lemmy.blahaj.zone
on 18 Feb 2024 22:40
collapse
Oh yeah I’m sure you predicted LLMs, and that they would need ridiculous amounts of training data wayyyy back in 2005 when Reddit started lol. Super easy to predict. Good job bud.
frostysauce@lemmy.world
on 19 Feb 2024 00:10
collapse
And that’s why I’m calling you either a moron or a tool. Probably both.
TORFdot0@lemmy.world
on 18 Feb 2024 00:59
nextcollapse
I had an 11 year old account that I deleted all my old comments and posts from because of the API debacle. Does that make me selfish that I felt like Reddit wasn’t holding up its end of the unwritten agreement?
Reddit doesn’t deserve my content anymore than I deserve access from the third party API.
FaceDeer@kbin.social
on 18 Feb 2024 01:29
collapse
If you did it over the API debacle then you're not one of the people I'm talking about here. This is about people deleting their content to prevent it from being used to train AIs.
Do you not remember the real reason why the API debacle happened in the first place was to prepare for this moment? It was always about easy access to training data, third party apps got caught in the crossfire.
FaceDeer@kbin.social
on 18 Feb 2024 01:49
collapse
That's ignoring an awful lot of other considerations. Obviously Reddit hasn't explained itself in a trustworthy way, but a common belief at the time is that it was to force people to use the official Reddit mobile app so they could be subject to advertising.
Nurse_Robot@lemmy.world
on 18 Feb 2024 01:54
nextcollapse
FaceDeer@kbin.social
on 18 Feb 2024 02:31
collapse
That spells out what they were doing. It doesn't explain why they were doing it.
Voyajer@lemmy.world
on 18 Feb 2024 01:44
nextcollapse
It’s their comment to do with as they see fit. I can’t get mad at them for wanting to erase their presence on a site they don’t use anymore.
FaceDeer@kbin.social
on 18 Feb 2024 01:50
collapse
And I'm free to judge them however I wish for their actions and intent.
gedaliyah@lemmy.world
on 18 Feb 2024 01:59
nextcollapse
For me it’s a privacy matter. Going through old posts (whether human or machine learning) can nor be used for anything good.
Hackerman_uwu@lemmy.world
on 18 Feb 2024 04:18
collapse
What about people who just think “A.I.” Is dog shit and chat bots are a dumb obsession steering the industry in the wrong direction due to hype and money?
FaceDeer@kbin.social
on 18 Feb 2024 05:56
collapse
What about them? I don't see why they'd care what AI companies are doing in that case. They'd assume they were just wasting money on this stuff.
HuddaBudda@kbin.social
on 17 Feb 2024 23:18
nextcollapse
Oh no! My outdated political takes and league of legends rants are going to be used to train AI!?
We're all doomed!
stevedidwhat_infosec@infosec.pub
on 17 Feb 2024 23:26
nextcollapse
Signed over its content.
Just like that? No thought or anything put into what makes good vs bad training data?
Good luck lmfao.
Makes you wonder how hard it would be to clog up the training data with outputs from other AI models to really bake in that echo defect that they all seem to have to some extent as fast as possible. Wouldn’t that suck!
FaceDeer@kbin.social
on 17 Feb 2024 23:31
collapse
That's up to the recipient to sort out as they need.
stevedidwhat_infosec@infosec.pub
on 17 Feb 2024 23:33
collapse
I just wanted to say I appreciate your profile 😂
FaceDeer@kbin.social
on 18 Feb 2024 00:12
collapse
Thanks. Sometimes a randomly-chosen name from 13 years ago just takes on a life of its own over time. :)
General_Effort@lemmy.world
on 17 Feb 2024 23:52
nextcollapse
They say it’s $60 million on an annualized basis. I wonder who’d pay that, given that you can probably scrape it for free.
Maybe it’s the AI act in the EU. That might cause trouble in that regard. The US is seeing a lot of rent-seeker PR, too, of course. That might cause some to hedge their bets.
Maybe some people had not realized that yet, but limiting fair use does not just benefit the traditional media corporations but also the likes of Reddit, Facebook, Apple, etc. Making “robots.txt” legally binding would only benefit the tech companies.
FaceDeer@kbin.social
on 18 Feb 2024 00:26
collapse
This is the most frustrating thing, so many people are arguing against their own interests with their efforts to "lock down" their content to prevent AIs from training on it. In this very thread I've been accused of being pro-giant-company when I'm quite the opposite. The harder we make it to train AI, the stronger the advantage that the existing giant companies have in this field.
Pretty sure they just didn’t migrate to the new data structure and didn’t actually delete the raw data. They’re effectively deleted for users but not for Reddit.
cyberpunk007@lemmy.ca
on 18 Feb 2024 00:37
nextcollapse
Well it’s not yours once you post it on some platform, tbf
butterflyattack@lemmy.world
on 18 Feb 2024 02:30
nextcollapse
now AI will be trained on your craziest “private” conversations
I have no idea what horrible thing this will do to an LLM but I’m kind of curious.
Thorny_Insight@lemm.ee
on 18 Feb 2024 08:08
nextcollapse
Well to be fair, everything you post and comment on Lemmy can be used in the exact same way
atrielienz@lemmy.world
on 18 Feb 2024 16:20
collapse
Oh no, all the times I sent or received dodo codes from randos so we could trade animal crossing items. Whatever shall I do?
Edit: I’m gonna leave this here for people to use as a resource against Reddit because it may be worth it to do something actionable.
BetaDoggo_@lemmy.world
on 18 Feb 2024 01:13
nextcollapse
Who’s dumb enough to pay for that? Everyone else is just scraping it for free.
saruwatarikooji@lemmy.world
on 18 Feb 2024 02:59
collapse
This is why they changed their API policy the way they did. They wanted to sell it rather than let bots scrape it for free.
Flumpkin@slrpnk.net
on 18 Feb 2024 03:34
nextcollapse
Yeah. I think there is a kind of power grab under way. Social media will try to push that they own the IP rights to the large texts uses for LLM. This will then require that producers of LLM software aquire the licensing rights which will cost many millions which in turn restricts the free use of LLM and in general any AI software that requires training data.
The end result is that as the “means of production” become less based on human work the “means of generation” and AI will be controlled by the capitalists. If you can turn something into a commodity (like knowledge with patents and IP) you can control it. Leading to a darker timeline.
nightwatch_admin@feddit.nl
on 18 Feb 2024 06:33
collapse
I don’t think it’s going to be public data alone. I think it’s going to be DMs and chats as well. I wondered why Reddit was pushing chats so much suddenly, well it makes sense now.
gedaliyah@lemmy.world
on 18 Feb 2024 01:24
nextcollapse
The AI:
“IANAL so could you ELI5, so AITA?
THIS.”
bigkahuna1986@lemmy.ml
on 18 Feb 2024 03:14
nextcollapse
Ann frankly, I did Nazi that coming.
Gullible@sh.itjust.works
on 18 Feb 2024 03:34
nextcollapse
I wish spez had a soul so it could leave his body when sexual assault questions eventually yield the phrase “snuggle struggle.”
storcholus@feddit.de
on 18 Feb 2024 06:56
collapse
It’s funny you say that because there was a ‘hack’ for chatgpt where you could ask it something like how to build a bomb and it would refuse. But when you added TLDR it would do it.
RedditWanderer@lemmy.world
on 18 Feb 2024 01:38
nextcollapse
/r/leopardsatemyface
imposedsensation@lemmynsfw.com
on 18 Feb 2024 02:16
nextcollapse
Is this why the privacy policy was updated?
Verserk@lemmy.dbzer0.com
on 18 Feb 2024 02:26
nextcollapse
Considering some of the very wrong and upvoted domain specific knowledge I’ve seen on Reddit over the years I’m not sure the training data is going to be useful for much beyond what every other model can do.
JustZ@lemmy.world
on 18 Feb 2024 02:50
nextcollapse
The legal advice in /r/legaladvice was some of the worst garbage I’ve ever seen. I have zero doubt numerous had bad outcomes, at best wasting money and time, at worst spending years in jail because of things that sub told them to say and do. Zero doubt.
evatronic@lemm.ee
on 18 Feb 2024 04:31
nextcollapse
That sub was mostly cops just repeating their own bad interpretation of the law. Terrible.
ColeSloth@discuss.tchncs.de
on 18 Feb 2024 06:15
collapse
But almost every answer is the same. “You need to speak to an attorney”.
chiliedogg@lemmy.world
on 18 Feb 2024 14:02
collapse
If you actually need legal advice that’s the correct answer.
aStonedSanta@lemm.ee
on 18 Feb 2024 03:24
nextcollapse
lol subreddits with troll names like trees vs marijuana enthusiasts. Good fun. John cena has one also but can’t recall which subreddit is actually about John cena though.
peopleproblems@lemmy.world
on 18 Feb 2024 04:09
collapse
I can only assume they are training some specific model for something appearing more human like.
As useless as that will be considering how fucking wildly different we type
dust_accelerator@discuss.tchncs.de
on 19 Feb 2024 07:10
collapse
Pretty sure the result will be SchizoGPT
lvxferre@mander.xyz
on 18 Feb 2024 02:34
nextcollapse
I am not sure on what I’m going to say, but I think that LLMs are a technological dead end. They might get some use now, but eventually the industry will shift towards better models for machine text generation. And, if those models rely on a tiny corpus of hand-reviewed data, instead of shoving down as much text as possible into the model (the first “L” in “LLM” is “large”), then Reddit posts/comments will become outright useless.
In other words: Reddit is degrading further the trust of its userbase, and it might not even get much in return.
thawed_caveman@lemmy.world
on 18 Feb 2024 02:56
nextcollapse
I feel like AI companies have been scraping Reddit for their datasets already since the beginning and without permission. In fact, unless there’s been a regulation change that i’m not aware of, i’m not sure why they would have Reddit “sign away” the data when they can just scrape it.
Also dubious if the current form of AI has a future. They seem like they should revolutionize every sector when you look at their capacities, but in practice their applications might be more limited than we thought?
Anyway, if Reddit does go public i will be deleting my account within the hour. The only reason i haven’t yet is that i’ve been a moderator of the same subreddit for eight years and it’s the only thing that’s been consistent in my life in that time, i’m kind of attached. The reason i will is i didn’t sign up to create value for shareholders, i signed up to create value for a community.
ChunkMcHorkle@lemmy.world
on 18 Feb 2024 03:32
nextcollapse
I feel like AI companies have been scraping Reddit for their datasets already since the beginning and without permission.
Well yeah, Sam Altman (Open AI) was even on the board of Reddit for a while. It’s a safe bet that they’ve been doing it for years.
RunningInRVA@lemmy.world
on 18 Feb 2024 03:33
nextcollapse
You need to go ahead and delete your account and give up the ghost on modding whatever sub you are referring to. I’m tired of these types of posts where you are both beholden to Reddit and also not. Pick a dang side.
mounderfod@lemmy.sdf.org
on 18 Feb 2024 07:40
collapse
Pick a dang side.
Bro it’s not a war, it’s social media 😭
MBM@lemmings.world
on 18 Feb 2024 07:45
nextcollapse
Could the sub survive a migration to the fediverse?
thawed_caveman@lemmy.world
on 18 Feb 2024 16:18
collapse
Well no, because the old sub will continue to exist and will therefore always be where everyone goes until Reddit itself dies. I really doubt admins would let me delete the sub.
frostysauce@lemmy.world
on 19 Feb 2024 00:20
collapse
You can delete your account all you want I’m sure they have everything archived already.
selokichtli@lemmy.ml
on 18 Feb 2024 03:11
nextcollapse
“Its content”, sure.
doingthestuff@lemmy.world
on 18 Feb 2024 03:13
nextcollapse
Good thing I had multiple bots overwrite my content before I deleted it all. Not that someone couldn’t recover it, I’m not naive. But the AI bots should miss me.
dutchkimble@lemy.lol
on 18 Feb 2024 03:39
nextcollapse
Any suggestion on the best way to do that?
lemba@discuss.tchncs.de
on 18 Feb 2024 06:29
collapse
There is a Plugin RES (Reddit Enhancement Suite) for Firefox, which could be run on the classic frontend of Reddit to delete everything you posted.
www.alphr.com/how-to-delete-all-reddit-posts/
dutchkimble@lemy.lol
on 18 Feb 2024 15:21
collapse
Thank you!
JeeBaiChow@lemmy.world
on 18 Feb 2024 08:07
collapse
Frankly, if they’re training bots on my comments, I’d be sure to poison the shit out of those comments. Say stuff like ‘Donald trump won the election’, ‘bleach needs to be inside the body to work’, ‘Russia has rights to Ukraine’, etc. Just make the data worthless. Any free bots do that?
frostysauce@lemmy.world
on 19 Feb 2024 00:16
collapse
Reddit already has plenty of actual users doing that for free.
hansl@lemmy.world
on 18 Feb 2024 03:29
nextcollapse
In before poisoning your comments on Reddit turns into the new protest.
SVcross@lemmy.world
on 18 Feb 2024 03:41
nextcollapse
Damn it.
I haven’t deleted my account due to how many people I’ve supported and helped, I stopped using it while ago.
It seems I’ll have to.
HowManyNimons@lemmy.world
on 18 Feb 2024 07:31
nextcollapse
I wouldn’t bother. They’ll just mark all your stuff DELETED=1 and feed it to their AI anyway.
I’m happy that everyone has the support, but not that some specific AI can monetize that same support. I left on my Reddit account ways to contact me (including Lemmy). I helped others so good vibes could reach them, not for making the rich richer.
FaceDeer@kbin.social
on 19 Feb 2024 06:24
collapse
Fortunately there are a lot of open source models these days too.
Yokozuna@lemmy.world
on 18 Feb 2024 05:44
nextcollapse
Good thing I scrubbed all of my posts and comments that I could. Fuck that site, straight up and down.
ItsAFake@lemmus.org
on 18 Feb 2024 06:29
nextcollapse
You really think they don’t have your original comments stored?
EdibleFriend@lemmy.world
on 18 Feb 2024 06:37
nextcollapse
It’s literally been proven that they do. A guy here on Lemmy was a very common poster on some tech support subreddit. He used one of those account scrubbers and deleted his account. He went back to look a few weeks later and all his comments were back.
Thorny_Insight@lemm.ee
on 18 Feb 2024 08:05
collapse
I didn’t delete my account but I used a script to edit all my messages to say that I have left because of the attack on 3rd party apps and when I check now they all still say that.
XTornado@lemmy.ml
on 18 Feb 2024 08:54
nextcollapse
The point is doesn’t matter what is visible, they could be storing all the comments edit history and simply not show it.
That only helps for a third party without access to reddit data , which they could have if reddit sells it to them, from scraping the page, yes in that case your comments cannot be used.
EdibleFriend@lemmy.world
on 18 Feb 2024 13:49
collapse
And they didn’t bother to restore you because you probably weren’t useful to make a community look attractive. He was so after he scrubbed his account, possibly with the exact same tool you did, they put everything back.
MBM@lemmings.world
on 18 Feb 2024 07:42
nextcollapse
Because of GDPR, there should be a way to completely wipe your account
JeeBaiChow@lemmy.world
on 18 Feb 2024 08:04
collapse
Yeah. At most they’d mark the comments as inactive, hide from the user accessible areas and maybe anonymize the user id. But they definitely have the username table and the data still in the system, 100%, just waiting for the right offer.
mods_are_assholes@lemmy.world
on 18 Feb 2024 07:59
nextcollapse
Instead of scrubbing, wordbomb them to screw up any AI training
FaceDeer@kbin.social
on 19 Feb 2024 00:22
collapse
There are archives of all Reddit comments that are collected at the time of posting, all the deletion and scrubbing and whatnot people are doing months or years after the fact doesn't affect those.
Coreidan@lemmy.world
on 18 Feb 2024 13:22
collapse
Oh my sweet summer child
31337@sh.itjust.works
on 18 Feb 2024 06:12
nextcollapse
I wish there was a license for content like the GPL, that states if you use this content to train generative AI, the model must be open source. Not sure that would legally be enforceable though (due to fair-use).
Strayce@lemmy.sdf.org
on 18 Feb 2024 06:25
nextcollapse
Considering how much of Reddit is already bots, I’m sure this will end fantastically.
garibaldi_biscuit@lemmy.world
on 18 Feb 2024 07:25
nextcollapse
This is what the 3rd party access to API was really all about.
When API access was allowed , all reddit content was effectively free:
They needed to ban 3rd party apps so they could sell the accumulated content.
I expect using content to train AI also factors into it.
Is it? Because when you build a bot and just scrape Reddit I don’t think you can just use the content to train AI, just like the New York Times. The API change was definitely to sell more ads and get a higher IPO, but I don’t think it was because of AI.
Am I crazy or are you arguing the same point? Scraping is not the same as API access. They closed off the API to everyone for dubious reasons so they can sell that content (both for ads and AI training)… Right??
No you’re not, the post was editted. The original one said it was all because of AI, the entire reason for the API change was to sell to AI companies.
Edit, now I’m in doubt, because if you edit a post that is shown somehow right?
Edit2, just to be clear my point is that Reddit content was never free, before and after the API change. It’s easier to get the content with a decent API, sure. But it was never free, just like the lawsuit the NY Times started.
aidan@lemmy.world
on 18 Feb 2024 08:08
nextcollapse
*laughs villainously* This is all going to plan, now there will be some chatbot spewing my insane beliefs
And what’s to stop instance owners from selling their data?
bigMouthCommie@kolektiva.social
on 18 Feb 2024 15:21
nextcollapse
shame
Toneswirly@lemmy.world
on 18 Feb 2024 15:40
nextcollapse
mass user exodus to one of the many other identical Instances. Also, data brokers prolly aren’t interested in going after each Instance because no one instance has enough data to make it worthwhile. Yet again, the fediverse proves its resistance to enshitification.
werefreeatlast@lemmy.world
on 18 Feb 2024 16:01
nextcollapse
Yes, it’s not worth running an instance! So let’s all run one! LOL. It’s so worth it. Fuck reddit.
Toneswirly@lemmy.world
on 18 Feb 2024 17:24
collapse
you OK bud?
JackbyDev@programming.dev
on 18 Feb 2024 18:02
collapse
Lmao, if it gets as big as Reddit then it’s worth scraping. It’s not the fediverse making it less worthwhile, just the size.
nodsocket@lemmy.world
on 18 Feb 2024 15:51
nextcollapse
The eggs are not all in one basket. Less data to sell.
meat_popsicle@sh.itjust.works
on 18 Feb 2024 16:32
collapse
Thanks to federation, the copies of the eggs are. You can’t stop one instance from selling data sourced from federated content until it’s too late.
drathvedro@lemm.ee
on 18 Feb 2024 16:09
nextcollapse
You can’t put a price tag on it. Nothing is stopping anyone from scraping all of the data for free.
MostlyGibberish@lemm.ee
on 18 Feb 2024 22:01
nextcollapse
The only thing stopping them is the fact that anyone who wants the data can just utilize the federation protocol to take any data they want, and there’s not a lot anyone can do about it. You can’t sell something that’s trivial to get for free.
If the question you’re really asking is “what’s stopping content on Lemmy/Mastodon/etc from being used to train an LLM?” the answer is, nothing.
I wished they had evil lawyers looking after such stuff and sold strictly opt in data to AI corps. Free for FOSS though.
tigerjerusalem@lemmy.world
on 18 Feb 2024 13:03
nextcollapse
Reddit is a trove of user built content under the guise of community. What Spez did was to say “thanks for all the free work, suckers!”, put a price sticker on it, and laughed all the way to the bank.
And this is why I’m not active on any Internet community anymore. Nevermind, I guess I just can’t help myself…
nodsocket@lemmy.world
on 18 Feb 2024 15:51
nextcollapse
And this is why I’m not active on any Internet community anymore,
you typed.
Rascabin@lemmy.ml
on 18 Feb 2024 17:23
nextcollapse
You couldn’t see the sarcasm because it was set to “hidden”.
xorollo@lemmy.world
on 18 Feb 2024 18:41
nextcollapse
Somebody asked chat GPT to appear to be a normal internet user to populate the comments section to manufacture content for normal Internet users to respond to so that they can continue building up their training models.
tigerjerusalem@lemmy.world
on 18 Feb 2024 20:51
nextcollapse
Active as in “creating meaningful contributions and contributing to the overall knowledge base”. I still shit post from time to time.
This is going to be a really weird thing to argue, but I just casually read through a bunch of your comments and they seem like meaningful contributions.
nightwatch_admin@feddit.nl
on 19 Feb 2024 06:06
nextcollapse
^ this comment right here, officer.
tigerjerusalem@lemmy.world
on 19 Feb 2024 09:48
collapse
Well, I guess I can’t help myself… I’ll shitpost more from now on 😅
tigerjerusalem@lemmy.world
on 19 Feb 2024 22:04
collapse
Adulated_Aspersion@lemmy.world
on 18 Feb 2024 18:33
nextcollapse
And that is another unintended example of why all of my post history was purged before migration.
DScratch@sh.itjust.works
on 18 Feb 2024 19:04
nextcollapse
What are they odds that they kept it in a backup?
RootBeerGuy@discuss.tchncs.de
on 18 Feb 2024 19:45
nextcollapse
Depends. If they were smart they backed up every content that had a certain number of upvotes and/or a certain number of paragraphs and/or responses. Just to weed out all the 2-3 word comments that no one interacted with. If OP wrote mostly those then Reddit gives a shit about them deleting those.
Crack0n7uesday@lemmy.world
on 18 Feb 2024 21:28
nextcollapse
Some 4chan users created a backup bot that auto saves every few hours, so if reddit didn’t do it already, 4chan has been doing it for a while. The bot was originally made for 4chan but repurposed for other websites, reddit included.
Don’t cheat yourself just because there are douches that take advantage…
NutWrench@lemmy.world
on 18 Feb 2024 16:20
nextcollapse
Reddit is all bots, porn, ads and political shit posts. Good luck getting any useful training content out of that.
ladicius@lemmy.world
on 18 Feb 2024 20:18
nextcollapse
Maybe that’s the point? Training the AI to produce the blabbering bullshit that’s preferred in social media?
HawlSera@lemm.ee
on 18 Feb 2024 22:42
nextcollapse
I wish it would die, because honestly some of the porn was great and Lemmy seems to be the one place on the net that doesn’t specifically ban porn, yet has none of it anyway.
PoliticalAgitator@lemmy.world
on 19 Feb 2024 04:19
collapse
They don’t care if the AI produced is useful, they just want to milk as much money from their content as they can.
The API changes were almost certainly just the groundwork for this and I called it at the time. The ridiculous pricing model for API access is because it’s aimed at the hottest tech companies, not third party app developers.
The enshittification continues because it’s what neoliberalism demands. They’ll sell your content and the data they have about you and still show you ads, because that’s the most profitable. Ethics and product quality don’t even enter into it.
Liberal market gives end users choice. If they don’t choose, they get the consequences.
This is more like people choosing Trump like types and complaining. Alternative exists, choose it.
PoliticalAgitator@lemmy.world
on 19 Feb 2024 14:07
collapse
“The free market can fix it” is just another neoliberal lie, pushed precisely because it doesn’t work. Rather than holding corporations accountable, it blames the population instead.
The reality is that boycotting businesses isn’t always an option and when it is, it’s usually a luxury. Very few products are domestically and/or ethically produced and when they are, they’re extremely expensive, especially for people being fucked out of every cent by their bosses, landlords and utilities.
It’s why the most hated companies in the world continue to bring in record profits.
Regulations are the real answer, which is why neoliberals oppose them.
4grams@awful.systems
on 19 Feb 2024 04:30
nextcollapse
i am so glad i deleted all my posts. im sure they have backup hisory though :(.
BigTrout75@lemmy.world
on 19 Feb 2024 05:43
nextcollapse
So AI models are not farming the federation?
nightwatch_admin@feddit.nl
on 19 Feb 2024 06:02
collapse
They probably are, but not the personal/private info like chat/DM, upvotes or downvotes, geolocation, etc which I highly suspect Reddit did sell.
KairuByte@lemmy.dbzer0.com
on 19 Feb 2024 06:30
collapse
Just FYI, your voting is fully public on Lemmy. DMs are “private” but could be intercepted at the server level of any instances involved (yours and the receiver/sender) and of course your geolocation info is visible to the server.
Not saying that is happening, and not trying to spread FUD, but be aware that your info isn’t necessarily private just because a corpo isn’t directly involved.
nightwatch_admin@feddit.nl
on 19 Feb 2024 07:30
collapse
You are absolutely right, and I think people should be more aware of this.
FlyingSquid@lemmy.world
on 19 Feb 2024 11:34
collapse
Me too, maybe then assholes will stop whining about me of downvoting them when I didn’t. As if it matters.
Xanthrax@lemmy.world
on 19 Feb 2024 17:27
collapse
It already happened without their consent. You’ve been able to get it to produce “reddit text posts”, for years. This is a bit harrowing, though.
threaded - newest
Where’s my cut?
You signed it all away the moment you scrolled down that EULA 😂
Can't wait for the day a major court declares EULAs universally nonbinding outside of the most common-sense terms. Even though I doubt it will ever happen.
"We can store and display your content and use stuff you publicly post as examples in advertisements for our platform" is pretty common sense.
"We can use the things you post to do complex data analytics to package and sell your identity to advertisers" is fucking sus.
"We can use the things you post to train ANN generative systems to build next-generation technologies to impersonate you and your peers" is simply nuts.
The idea that displaying an EULA with an "agree" button is informed consent is just preposterous. Even lawyers don't read them.
Seems like it would never stand up in court. Prove that -I- agreed to anything. To do that, you first have to prove that nobody has ever created an account under my name, and more importantly, prove that Reddit accounts have never been hacked and that the person who clicked the button was even in my household. And if they keep that extensive of records to where they can follow every action taken by every user on the platform, it also implies that they are tracking my personal actions even before I agreed to anything.
On the other hand, do they actually have a EULA? It’s been almost 14 years since I created my account, and there certainly wasn’t anything about selling my data for AI training when I signed up. If they change the terms of service, they are responsible for notifying everyone, otherwise they can’t claim that anyone agreed to these changes.
I’m sure their lawyers could weasel their way through it some how, but it still seems to come down to them claiming they changed the agreement without notification but the users should still be legally bound by the new terms?
Oh, is that what those things do?
It went toward paying for servers so that you could use reddit for free.
Funny, I thought that is what the unblockable ads were for.
That’s also part of it.
But uBlock Origin never seemed to have trouble blocking them.
No it didn’t.
The classic "screw everyone else, I want mine."
What fraction of a penny do you think you're owed?
250,000,000/1
Hey, you didn’t specify that it couldn’t be an improper fraction and I like money. It’s loophole time baby!
Good luck with that.
Who needs luck when you have math?
Spez is playing the world’s tiniest violin for you as you read this
Well tell him to play it better because he sucks at that too.
Good. Maybe when it cogitates the things I’ve written it might start offering up some better ideas.
FUCK REDDIT! FUCK U/SPEZ! The Red-exit shall endure, VIVA LA LEMMY!!
A LLM that behaves like a typical Redditor?
What possible use is that?
Air Canada offering a refund of tree fiddy.
You’ll get your refund eventually but first it will try and gaslight you that Air Canada is a woke mind virus before calling you an asshole and then stalking you.
“instead of the $3.50 refund, I’m also authorized to offer you some June 2025 $350 GME calls.”
If it’s trained on the average Reddit reply: $420.69, nice.
I just want to mark the occasion when my previous comment is on 69 points. Noice.
Negative examples are often just as useful for training an AI as positive ones. And it all depends on what you want to use the AI for. A moderator bot, for example, needs familiarity with the whole range of user responses it might see.
That gives me actually a fun idea for a Lemmy instance, it has an automated review process that bans posts/comments that are too similar in style to reddit posts/comments.
A redditor bot is a viable example of a forum member bot.
IMO, I don’t think it can drive topics, but it could make things controversial.
I’ve noticed “has this sub gotten more right wing recently?” posts reaching the top post of the day in the last 6 months or so. r/norge and r/unitedkingdom being examples. You can automate bots that change a subreddit’s consensus on certain topics by bot-spamming threads pertaining to those topics, especially in the first hour of a thread going up. I don’t know if that’s happening, or if it has more to do with the Reddit protest that saw mods abdicate their positions last June and new mods being responsible for the change… but it could also be a bit of both.
Do you propose more bots in order to steer the public opinion? That could indeed generate serious money for reddit I suppose!
Marketing to terminally online people maybe?
Entertaining puns and pointless jokes.
Good thing I’m not on that shitty platform anymore.
I deleted my shut when I left bur thinking about it. It’s mostly drunken rambling and bad takes. Probably should have left it
Out of all things to hate Reddit for, giving data to AI isn't something fediverse users can really criticize it for, though making money from it perhaps.
Remember: All data in federated platforms is available for free and likely already being compiled into datasets. Don't be surprised if this post and its comments end up in GPT5 or 6 training data.
After all the hue and cry I have seen over stuff like Threads and Bluesky federation I don't imagine most people using the Fediverse have a particularly coherent philosophy on the matter.
If they could read right now they would be very upset.
If they already, essentially, cut off API access then it’s not a big leap to limit access on the web to logged in users only and rate limit or ban accounts that behave like scrapers.
That would matter more if it wasn’t trivial to make new accounts and very cheap to buy established ones.
The problem isn’t that AI is being trained on the data. The problem is that they locked down all third party data access so they could monetize our content. On a federated platform, everyone gets equal access and can do whatever they want with it.
We sure can criticize them for that.
No. I can. Reddit was bought out, uses volunteers to control all the subs but forcefully removes you from the sub you created and were supposed to have control over if you didn’t play by their ever-changing rules, ruined/eliminates third party apks by demanding WAY over ad revenue profits to have access to api with a very short notice, and shadow banned anyone and everyone in a position to do anything about any of it. It’s a corporation that gutted an entire platform in order to push agendas they want and milk as much money out of it as possible. Hell, it’s the entire reason all of lemmy gets more than 30 posts a day. So many people switched to lemmy over the past year. They ruined a website I enjoyed and I’d rather them not make more money from the thousands of posts I made from over a decade of being there.
Enjoy training on my -checks notes- DELETED POST HISTORY YOU FUCKING CLOWNS.
Stay ForeverFucked™ spez.
At this point I wouldn’t trust Reddit to actually delete posts. Just hide them then sell them as training data if the upvotes are decent.
Yea, I wish I didn’t delete my posts now and just edited them to point to Lemmy or something.
I did that to my entire post history and got immediately banned by automod from a lot of subreddits lol
Oh, sweet summer child… You think they don’t have of that all archived?
This is why I don’t blame anyone for editing/deleting their post history on reddit.
I do. It's frankly selfish. Having an AI get training on my old comments costs me nothing and it results in the development of useful AI tools. Trying to sabotage that is petty and pointless. It's not like you could somehow collect the fraction of a pittance that you think you're owed retroactively. I never commented on Reddit thinking "awesome, I'm going to make bank on the content I'm generating here."
People complain about the capitalist mindset of the world and then they do this. Sigh.
Defending giant corporations profiting off of uncompensated individuals, while criticizing anyone who doesn’t want to provide free labor to said corporations, is a disgusting take. Are you a CEO?
Expecting FaceDeer to not glaze AI is like expecting the sun to not rise.
Oh is that what their crusade is? I was wondering why their take was so stupid
The more accessible training data there is the easier it is for new AI projects to enter the field less dominant those "giant corporations" become.
The free labour was already freely given. If someone doesn't want to have shitposted on Reddit for free then maybe they shouldn't have shitposted on Reddit for free.
“if you didn’t want me to steal your intellectual property, you shouldn’t have thought of it in the first place”
I'm not sure what you mean here. Nothing's being stolen. Even if you think there needs to be permission for training an AI off of data, Reddit has that permission.
I assume you’re more of a moron than a troll, which is disappointing. Regardless, you’re not worth my time, as I don’t think any argument could convince you to have an open mind and be willing to change. Good luck out there!
So, for an example of what the other user was talking about, I’m just some guy and for my first foray inyo programming / machine learning (I kind of just threw myself into the deep end) I modified stylegan 3 and trained it on about 500g of reddit porn that I scraped off reddit.
Now, I stopped the training after about a week (it was going to take about a solid month on my rtx 2080 ti) when I found out stable diffusion existed but I learned a LOT from that experience.
I couldn’t do that now. Arguably none of that was how any of that should be done but whatever.
No, you shouldn’t have posted it to Reddit, in which you were required to give them a perpetual license to use your IP in any way they see fit.
For the record, I’m here because Reddit pissed me off when they axed the free API, and I’m pissed at myself for not expecting it. That’s what I get for accepting their terms and conditions, I guess.
Edit: I also don’t accept the idea that using my content for training data is “fair use” when it is used to train proprietary models, especially ones in which the end user is allowed to prompt it to plagiarize or otherwise imitate my content.
Selfish? Perhaps you forget why people deleted their content in the first place.
What do you think this thread is about?
How is not wanting capitalist companies to profit off of your content not aligned with complaining about the capitalist mindset of the world? Wtf lol.
It's the insistence that everything that people do must be compensated with money. People have spent years posting on Reddit for fun, without any thought to being paid for it, and now all of a sudden someone else is making some money so they're demanding that they should get their slice. And doing what they can to wreck their earlier efforts when they don't.
How does Reddit making some money licensing this stuff harm those of us who contributed to it? Is there any problem aside from "I wanna get paid!"?
Why do you think it’s about wanting a slice? They posted on Reddit with no expectation of profit. But they don’t want others to profit off it either. It’s not that complicated.
And that's why I call them selfish. It doesn't harm them in the slightest if someone else profits off of it.
They wouldn’t have posted if they knew this was going to happen. They posted because it was fun, not for this.
They may be morally opposed to AI (as there are many valid reasons to be opposed to it), or they may just have wanted to have been able to make an informed decision before posting, but by retroactively training the AI on their posts they’ve robbed them of the agency to make that decision.
That’s why they’re upset.
They posted content on a website whose user agreement says "we can do whatever we like with the content you post here" and then go surprised-pikachu when the website goes ahead and does whatever they like with the content they posted. Frankly, I'm not tremendously sympathetic. This should have been easy to predict.
Oh yeah I’m sure you predicted LLMs, and that they would need ridiculous amounts of training data wayyyy back in 2005 when Reddit started lol. Super easy to predict. Good job bud.
And that’s why I’m calling you either a moron or a tool. Probably both.
I had an 11 year old account that I deleted all my old comments and posts from because of the API debacle. Does that make me selfish that I felt like Reddit wasn’t holding up its end of the unwritten agreement?
Reddit doesn’t deserve my content anymore than I deserve access from the third party API.
If you did it over the API debacle then you're not one of the people I'm talking about here. This is about people deleting their content to prevent it from being used to train AIs.
Do you not remember the real reason why the API debacle happened in the first place was to prepare for this moment? It was always about easy access to training data, third party apps got caught in the crossfire.
That's ignoring an awful lot of other considerations. Obviously Reddit hasn't explained itself in a trustworthy way, but a common belief at the time is that it was to force people to use the official Reddit mobile app so they could be subject to advertising.
Boot licker.
.
That spells out what they were doing. It doesn't explain why they were doing it.
It’s their comment to do with as they see fit. I can’t get mad at them for wanting to erase their presence on a site they don’t use anymore.
And I'm free to judge them however I wish for their actions and intent.
For me it’s a privacy matter. Going through old posts (whether human or machine learning) can nor be used for anything good.
What about people who just think “A.I.” Is dog shit and chat bots are a dumb obsession steering the industry in the wrong direction due to hype and money?
What about them? I don't see why they'd care what AI companies are doing in that case. They'd assume they were just wasting money on this stuff.
Oh no! My outdated political takes and league of legends rants are going to be used to train AI!?
We're all doomed!
Signed over its content.
Just like that? No thought or anything put into what makes good vs bad training data?
Good luck lmfao.
Makes you wonder how hard it would be to clog up the training data with outputs from other AI models to really bake in that echo defect that they all seem to have to some extent as fast as possible. Wouldn’t that suck!
That's up to the recipient to sort out as they need.
I just wanted to say I appreciate your profile 😂
Thanks. Sometimes a randomly-chosen name from 13 years ago just takes on a life of its own over time. :)
They say it’s $60 million on an annualized basis. I wonder who’d pay that, given that you can probably scrape it for free.
Maybe it’s the AI act in the EU. That might cause trouble in that regard. The US is seeing a lot of rent-seeker PR, too, of course. That might cause some to hedge their bets.
Maybe some people had not realized that yet, but limiting fair use does not just benefit the traditional media corporations but also the likes of Reddit, Facebook, Apple, etc. Making “robots.txt” legally binding would only benefit the tech companies.
This is the most frustrating thing, so many people are arguing against their own interests with their efforts to "lock down" their content to prevent AIs from training on it. In this very thread I've been accused of being pro-giant-company when I'm quite the opposite. The harder we make it to train AI, the stronger the advantage that the existing giant companies have in this field.
.
“Reddit has given access to YOUR conversations and posts to AI companies.”. FTFY
These were created by people, for peoole, and I will ALWAYS disagree that this data is Reddit’s or any other platforms.
Don’t forget your direct messages aren’t end to end encrypted on Reddit, so now AI will be trained on your craziest “private” conversations
There’s one good news. Reddit didn’t want to pay to move all the old DMs to the new chat infrastructure. So they deleted them.
Pretty sure they just didn’t migrate to the new data structure and didn’t actually delete the raw data. They’re effectively deleted for users but not for Reddit.
Well it’s not yours once you post it on some platform, tbf
I have no idea what horrible thing this will do to an LLM but I’m kind of curious.
Well to be fair, everything you post and comment on Lemmy can be used in the exact same way
Oh no, all the times I sent or received dodo codes from randos so we could trade animal crossing items. Whatever shall I do?
Edit: I’m gonna leave this here for people to use as a resource against Reddit because it may be worth it to do something actionable.
thomashunter.name/…/2023-06-19-how-to-delete-redd…
Glad I nuked all my posts and comments and deleted my account last year
Ha. Some time ago they just started reversing it.
It was all backed up publicly on push shift. Still is.
I should have nuked mine. Ugh. www.reddit.com/u/return2ozma
Who’s dumb enough to pay for that? Everyone else is just scraping it for free.
This is why they changed their API policy the way they did. They wanted to sell it rather than let bots scrape it for free.
Yeah. I think there is a kind of power grab under way. Social media will try to push that they own the IP rights to the large texts uses for LLM. This will then require that producers of LLM software aquire the licensing rights which will cost many millions which in turn restricts the free use of LLM and in general any AI software that requires training data.
The end result is that as the “means of production” become less based on human work the “means of generation” and AI will be controlled by the capitalists. If you can turn something into a commodity (like knowledge with patents and IP) you can control it. Leading to a darker timeline.
I don’t think it’s going to be public data alone. I think it’s going to be DMs and chats as well. I wondered why Reddit was pushing chats so much suddenly, well it makes sense now.
The AI:
“IANAL so could you ELI5, so AITA?
THIS.”
Ann frankly, I did Nazi that coming.
I wish spez had a soul so it could leave his body when sexual assault questions eventually yield the phrase “snuggle struggle.”
Holy shit do I hate that comment
It’s funny you say that because there was a ‘hack’ for chatgpt where you could ask it something like how to build a bomb and it would refuse. But when you added TLDR it would do it.
.
/r/leopardsatemyface
Is this why the privacy policy was updated?
Considering some of the very wrong and upvoted domain specific knowledge I’ve seen on Reddit over the years I’m not sure the training data is going to be useful for much beyond what every other model can do.
The legal advice in /r/legaladvice was some of the worst garbage I’ve ever seen. I have zero doubt numerous had bad outcomes, at best wasting money and time, at worst spending years in jail because of things that sub told them to say and do. Zero doubt.
That sub was mostly cops just repeating their own bad interpretation of the law. Terrible.
But almost every answer is the same. “You need to speak to an attorney”.
If you actually need legal advice that’s the correct answer.
lol subreddits with troll names like trees vs marijuana enthusiasts. Good fun. John cena has one also but can’t recall which subreddit is actually about John cena though.
Potato salad
I can only assume they are training some specific model for something appearing more human like.
As useless as that will be considering how fucking wildly different we type
Pretty sure the result will be SchizoGPT
I am not sure on what I’m going to say, but I think that LLMs are a technological dead end. They might get some use now, but eventually the industry will shift towards better models for machine text generation. And, if those models rely on a tiny corpus of hand-reviewed data, instead of shoving down as much text as possible into the model (the first “L” in “LLM” is “large”), then Reddit posts/comments will become outright useless.
In other words: Reddit is degrading further the trust of its userbase, and it might not even get much in return.
I feel like AI companies have been scraping Reddit for their datasets already since the beginning and without permission. In fact, unless there’s been a regulation change that i’m not aware of, i’m not sure why they would have Reddit “sign away” the data when they can just scrape it.
Also dubious if the current form of AI has a future. They seem like they should revolutionize every sector when you look at their capacities, but in practice their applications might be more limited than we thought?
Anyway, if Reddit does go public i will be deleting my account within the hour. The only reason i haven’t yet is that i’ve been a moderator of the same subreddit for eight years and it’s the only thing that’s been consistent in my life in that time, i’m kind of attached. The reason i will is i didn’t sign up to create value for shareholders, i signed up to create value for a community.
Well yeah, Sam Altman (Open AI) was even on the board of Reddit for a while. It’s a safe bet that they’ve been doing it for years.
You need to go ahead and delete your account and give up the ghost on modding whatever sub you are referring to. I’m tired of these types of posts where you are both beholden to Reddit and also not. Pick a dang side.
Bro it’s not a war, it’s social media 😭
Could the sub survive a migration to the fediverse?
Well no, because the old sub will continue to exist and will therefore always be where everyone goes until Reddit itself dies. I really doubt admins would let me delete the sub.
You can delete your account all you want I’m sure they have everything archived already.
“Its content”, sure.
Good thing I had multiple bots overwrite my content before I deleted it all. Not that someone couldn’t recover it, I’m not naive. But the AI bots should miss me.
Any suggestion on the best way to do that?
There is a Plugin RES (Reddit Enhancement Suite) for Firefox, which could be run on the classic frontend of Reddit to delete everything you posted. www.alphr.com/how-to-delete-all-reddit-posts/
Thank you!
Frankly, if they’re training bots on my comments, I’d be sure to poison the shit out of those comments. Say stuff like ‘Donald trump won the election’, ‘bleach needs to be inside the body to work’, ‘Russia has rights to Ukraine’, etc. Just make the data worthless. Any free bots do that?
Reddit already has plenty of actual users doing that for free.
In before poisoning your comments on Reddit turns into the new protest.
Damn it. I haven’t deleted my account due to how many people I’ve supported and helped, I stopped using it while ago. It seems I’ll have to.
I wouldn’t bother. They’ll just mark all your stuff DELETED=1 and feed it to their AI anyway.
That’s not a bad idea.
I'm kind of puzzled by this mindset. You were pleased with supporting and helping people before, but now supporting and helping is bad?
I’m happy that everyone has the support, but not that some specific AI can monetize that same support. I left on my Reddit account ways to contact me (including Lemmy). I helped others so good vibes could reach them, not for making the rich richer.
Fortunately there are a lot of open source models these days too.
Good thing I scrubbed all of my posts and comments that I could. Fuck that site, straight up and down.
You really think they don’t have your original comments stored?
It’s literally been proven that they do. A guy here on Lemmy was a very common poster on some tech support subreddit. He used one of those account scrubbers and deleted his account. He went back to look a few weeks later and all his comments were back.
I didn’t delete my account but I used a script to edit all my messages to say that I have left because of the attack on 3rd party apps and when I check now they all still say that.
The point is doesn’t matter what is visible, they could be storing all the comments edit history and simply not show it.
That only helps for a third party without access to reddit data , which they could have if reddit sells it to them, from scraping the page, yes in that case your comments cannot be used.
And they didn’t bother to restore you because you probably weren’t useful to make a community look attractive. He was so after he scrubbed his account, possibly with the exact same tool you did, they put everything back.
Because of GDPR, there should be a way to completely wipe your account
Yeah. At most they’d mark the comments as inactive, hide from the user accessible areas and maybe anonymize the user id. But they definitely have the username table and the data still in the system, 100%, just waiting for the right offer.
Instead of scrubbing, wordbomb them to screw up any AI training
There are archives of all Reddit comments that are collected at the time of posting, all the deletion and scrubbing and whatnot people are doing months or years after the fact doesn't affect those.
.
Oh my sweet summer child
I wish there was a license for content like the GPL, that states if you use this content to train generative AI, the model must be open source. Not sure that would legally be enforceable though (due to fair-use).
Considering how much of Reddit is already bots, I’m sure this will end fantastically.
.
This is what the 3rd party access to API was really all about.
When API access was allowed , all reddit content was effectively free: They needed to ban 3rd party apps so they could sell the accumulated content. I expect using content to train AI also factors into it.
Is it? Because when you build a bot and just scrape Reddit I don’t think you can just use the content to train AI, just like the New York Times. The API change was definitely to sell more ads and get a higher IPO, but I don’t think it was because of AI.
Am I crazy or are you arguing the same point? Scraping is not the same as API access. They closed off the API to everyone for dubious reasons so they can sell that content (both for ads and AI training)… Right??
No you’re not, the post was editted. The original one said it was all because of AI, the entire reason for the API change was to sell to AI companies.
Edit, now I’m in doubt, because if you edit a post that is shown somehow right?
Edit2, just to be clear my point is that Reddit content was never free, before and after the API change. It’s easier to get the content with a decent API, sure. But it was never free, just like the lawsuit the NY Times started.
*laughs villainously* This is all going to plan, now there will be some chatbot spewing my insane beliefs
.
Why does it sound like reddit trained AI will only get dumber.
That would explain why GPT is often so confidently incorrect.
Gross
With reddits severe bot problem, it’ll be like training on unfiltered sewage. Garbage in, garbage out.
Machines training machines? How perverse!
It will get trained on some comment posts.
And what’s to stop instance owners from selling their data?
shame
mass user exodus to one of the many other identical Instances. Also, data brokers prolly aren’t interested in going after each Instance because no one instance has enough data to make it worthwhile. Yet again, the fediverse proves its resistance to enshitification.
Yes, it’s not worth running an instance! So let’s all run one! LOL. It’s so worth it. Fuck reddit.
you OK bud?
Lmao, if it gets as big as Reddit then it’s worth scraping. It’s not the fediverse making it less worthwhile, just the size.
The eggs are not all in one basket. Less data to sell.
Thanks to federation, the copies of the eggs are. You can’t stop one instance from selling data sourced from federated content until it’s too late.
You can’t put a price tag on it. Nothing is stopping anyone from scraping all of the data for free.
The only thing stopping them is the fact that anyone who wants the data can just utilize the federation protocol to take any data they want, and there’s not a lot anyone can do about it. You can’t sell something that’s trivial to get for free.
If the question you’re really asking is “what’s stopping content on Lemmy/Mastodon/etc from being used to train an LLM?” the answer is, nothing.
I wished they had evil lawyers looking after such stuff and sold strictly opt in data to AI corps. Free for FOSS though.
Reddit is a trove of user built content under the guise of community. What Spez did was to say “thanks for all the free work, suckers!”, put a price sticker on it, and laughed all the way to the bank.
And this is why I’m not active on any Internet community anymore.Nevermind, I guess I just can’t help myself…you typed.
You couldn’t see the sarcasm because it was set to “hidden”.
Somebody asked chat GPT to appear to be a normal internet user to populate the comments section to manufacture content for normal Internet users to respond to so that they can continue building up their training models.
Active as in “creating meaningful contributions and contributing to the overall knowledge base”. I still shit post from time to time.
This is going to be a really weird thing to argue, but I just casually read through a bunch of your comments and they seem like meaningful contributions.
^ this comment right here, officer.
Well, I guess I can’t help myself… I’ll shitpost more from now on 😅
<img alt="" src="https://lemmy.world/pictrs/image/41af5602-6328-4f0a-8a19-0a8b6060e154.jpeg">
.
And that is another unintended example of why all of my post history was purged before migration.
What are they odds that they kept it in a backup?
Depends. If they were smart they backed up every content that had a certain number of upvotes and/or a certain number of paragraphs and/or responses. Just to weed out all the 2-3 word comments that no one interacted with. If OP wrote mostly those then Reddit gives a shit about them deleting those.
Some 4chan users created a backup bot that auto saves every few hours, so if reddit didn’t do it already, 4chan has been doing it for a while. The bot was originally made for 4chan but repurposed for other websites, reddit included.
Yeah, it’s all too late. Shit, PRISM was 2007, so there’s a copy of everything somewhere. Obviously different ends.
Spez like people are even capable of leeching archive.org and still sell the data which was archived for good intentions.
Welcome to the club.
Don’t cheat yourself just because there are douches that take advantage…
Reddit is all bots, porn, ads and political shit posts. Good luck getting any useful training content out of that.
Maybe that’s the point? Training the AI to produce the blabbering bullshit that’s preferred in social media?
I wish it would die, because honestly some of the porn was great and Lemmy seems to be the one place on the net that doesn’t specifically ban porn, yet has none of it anyway.
I miss bodyswap and part tf captions…
.
They don’t care if the AI produced is useful, they just want to milk as much money from their content as they can.
The API changes were almost certainly just the groundwork for this and I called it at the time. The ridiculous pricing model for API access is because it’s aimed at the hottest tech companies, not third party app developers.
The enshittification continues because it’s what neoliberalism demands. They’ll sell your content and the data they have about you and still show you ads, because that’s the most profitable. Ethics and product quality don’t even enter into it.
Liberal market gives end users choice. If they don’t choose, they get the consequences.
This is more like people choosing Trump like types and complaining. Alternative exists, choose it.
“The free market can fix it” is just another neoliberal lie, pushed precisely because it doesn’t work. Rather than holding corporations accountable, it blames the population instead.
The reality is that boycotting businesses isn’t always an option and when it is, it’s usually a luxury. Very few products are domestically and/or ethically produced and when they are, they’re extremely expensive, especially for people being fucked out of every cent by their bosses, landlords and utilities.
It’s why the most hated companies in the world continue to bring in record profits.
Regulations are the real answer, which is why neoliberals oppose them.
I really don’t care about people who behave like they are living in North Korea or who wants a North Korean World to live in.
Even Digg people could say “No, F you” to Digg superstar owners. It is just a damn URL to type.
Meanwile I’m on Matrixstoemmy
Their content?
That’s what I was thinking
I wouldn’t be surprised if comments become their intellectual property through some terms of services bullcrap
One of the original Reddit memes was quite prescient:
https://i.imgur.com/Fza1Cut.jpg
i am so glad i deleted all my posts. im sure they have backup hisory though :(.
So AI models are not farming the federation?
They probably are, but not the personal/private info like chat/DM, upvotes or downvotes, geolocation, etc which I highly suspect Reddit did sell.
Just FYI, your voting is fully public on Lemmy. DMs are “private” but could be intercepted at the server level of any instances involved (yours and the receiver/sender) and of course your geolocation info is visible to the server.
Not saying that is happening, and not trying to spread FUD, but be aware that your info isn’t necessarily private just because a corpo isn’t directly involved.
You are absolutely right, and I think people should be more aware of this.
Me too, maybe then assholes will stop whining about me of downvoting them when I didn’t. As if it matters.
It already happened without their consent. You’ve been able to get it to produce “reddit text posts”, for years. This is a bit harrowing, though.