autotldr@lemmings.world
on 23 Feb 2024 02:05
nextcollapse
This is the best summary I could come up with:
Earlier this week, Bloomberg and Reuters reported that a “large unnamed AI company” — possibly Google — had entered into a licensing agreement worth about $60 million on an annualized basis.
“[Our] data APIs are able to provide real-time access to evolving and dynamic topics such as sports, movies, news, fashion, and the latest trends,” the prospectus continues.
“We believe that Reddit’s massive corpus of conversational data and knowledge will continue to play a role in training and improving large language models.
Content producers, from stock media libraries to news publishers, are increasingly turning to data licensing agreements with AI vendors as chatbots like OpenAI’s ChatGPT and Google’s Gemini threaten to sap traffic.
Vendors, in turn, have been spurred to pursue licensing agreements as they face a deluge of lawsuits alleging that they have no legal justification for training their models on data without permission or payment.
OpenAI, for one, has agreements in place with image gallery Shutterstock as well as publishers including Axel Springer, the owner of Politico and Business Insider.
The original article contains 564 words, the summary contains 172 words. Saved 70%. I’m a bot and I’m open source!
just_another_person@lemmy.world
on 23 Feb 2024 02:13
nextcollapse
Our data, you mean?
gregorum@lemm.ee
on 23 Feb 2024 02:21
nextcollapse
well, not mine. i used a script to replace all of my comments with gibberish before i deleted them and then my account. if they went back and restored my comments, then all they’ll get is comments full of gibberish, especially since i overwrote them 3 times before deleting them, just in case they tried to roll back to the previous version.
have fun with that!
Yawweee877h444@lemmy.world
on 23 Feb 2024 02:26
nextcollapse
I like your style, but honestly I wouldn’t be surprised if they keep every single version.
gregorum@lemm.ee
on 23 Feb 2024 02:28
nextcollapse
i bet they do now, but i’ve checked back now and then, and all of my comments and posts are most assuredly gone.
edit: i’ve gone back to check some old haunts, place i know i’ve commented, and i did some seaching with google using my old usernames, as google uses its cache to match to the posts\comments, even though they’re not there any more.
i see old posts that are graveyards of deleted comments, some with simply deleted accounts, and many others where both the account and comment are deleted. i don’t see any gibberish comments. the ones i know are mine (because replies quote the comment above, which i recognize as mine), are all just deleted in their entirety, so it seems they didn’t do comment versioning, at least not past the first edit. i see no posts under any former username of mine.
the efforts to scrub my content from reddit last May appears to have worked. sadly, since the API lockdown, those tools no longer work.
snooggums@midwest.social
on 23 Feb 2024 03:05
collapse
Just because it shows [deleted] doesn’t mean the data were deleted. That is most likely just a flag for the comment.
They most likely keep every save since they decided to do the sell the data thing. Why would google pay them for what google could easily scrape other than having the full history?
As I mentioned, I overwrote the comments several times before deleting them. I seriously doubt that they saved multiple versions of the comments. I know that, towards the end of May, they made some backend changes to try to circumvent users attempt to delete their accounts, but I did all of this to my account a couple of weeks before that.
SparrowRanjitScaur@lemmy.world
on 23 Feb 2024 03:37
nextcollapse
No, I think the other commenter is right. They definitely store every version of every comment on their backend. Just because it’s not displayed publicly doesn’t mean they don’t have the data.
FeelThePower@lemmy.dbzer0.com
on 23 Feb 2024 03:41
nextcollapse
if you request your data from them under CCPA, and it shows the edited comments as gibberish, you’re good. I did the same thing but I left the comments to simmer for a long time like months.
T156@lemmy.world
on 23 Feb 2024 04:03
nextcollapse
I imagine that they would. Text is trivially easy to store, and storing multiple versions would let them catch users who might edit away rule breaking they posted to avoid bans, but it’s probably one of those internal tools.
From a data handling perspective, it’d be more efficient to handle edits by having a common id field, and an additional version/edit counter that increments --adding the edit like its a normal post-- , than it would be to edit data the usual way, since you don’t have to go back over the whole database to find the comment, or worry about it falling out of sync if one copy of the database has the edit, and the other has the original.
You’d just need to fetch the comment by id, and the database entry with the highest version count to display it, which would be fairly easy to do.
ColeSloth@discuss.tchncs.de
on 23 Feb 2024 04:44
collapse
That literally means nothing at all on their server backup from the year before. You could delete and rewrite your comments a thousand times and it would do you the same amount as good as one time, and barely any better than doing it no times at all. Your entire 15 year comment history would take up probably 10MB of space at best. They’ll have several back ups taken over the last decade. They aren’t just going to be selling off the live servers info.
Telodzrum@lemmy.world
on 23 Feb 2024 03:13
nextcollapse
Here’s the thing: Nothing in Reddit’s history indicates that they are that competent.
FoxBJK@midwest.social
on 23 Feb 2024 14:09
collapse
I put my account’s comment through a mass-delete app around the time of the big protest, and a couple weeks later I found every single one was restored.
People can be incompetent for years and then suddenly start figuring things out.
abhibeckert@lemmy.world
on 23 Feb 2024 06:06
nextcollapse
Reddit used to be open source and the source is still on github as a read only archive.
AFAIK back then edit history was only kept briefly. Enough to roll back an accidental edit (if you have admin privileges anyway) but not far enough back to view old versions of posts.
Of course, they would have backups, and maybe the code has changed, but I wouldn’t be surprised if it hasn’t changed and those backups are impractical (slow/expensive) to access.
Keeping old revisions is a common practice but it’s also expensive and in reddit’s case totally unnecessary.
manuallybreathing@lemmy.ml
on 23 Feb 2024 13:14
collapse
you can request your reddit data, and they provide every comment along with edits as far as I remember, it was uncomfortable but i’d never posted anything regrettable at least
imagine getting your hands on u/spez’s reddit data
I did the same, but we’re both fools if we think reddit didn’t keep every character we typed (yet alone submitted) in a private, proprietary database.
We weren’t paid for our data. We were given access to a website free of charge. The consent we gave was supposed to be for the operation of the website, not for training AI.
They should fucking pay us.
ColeSloth@discuss.tchncs.de
on 23 Feb 2024 04:38
nextcollapse
Yeah…all that comment data isn’t really that large. They’ll have backups captured for likely several years back. All you can view is the info on the current live servers. You might have kept them from getting like 3 months worth of your comments at best.
Showroom7561@lemmy.ca
on 23 Feb 2024 06:51
nextcollapse
LOL. I did the same. And I confirmed many months later that the comments were not restored.
Now I hear that Google wants to train their AI on reddit content. Haha. Good luck with that, Lorem Ipsum! 😁
DreadPotato@sopuli.xyz
on 23 Feb 2024 08:59
collapse
If you actually replaced with “Lorem ipsum” texts, it would probably be easy to filter the garbage from the dataset.
Also, they probably have copies of the comments before the edits that are just not presented in the frontend.
Showroom7561@lemmy.ca
on 23 Feb 2024 13:07
collapse
I didn’t. At first, it was basically a long ass message about deleting my comment out of protest. Then a few subreddit mods banned me, so I changed them to “almost makes sense” word salad 😂
I ran the script, changing the text each time, several times for good measure.
They still haven’t reverted it, and it’s been more than just a few months now.
dohpaz42@lemmy.world
on 23 Feb 2024 02:18
nextcollapse
I wonder if there is any legal standing for users to sue Reddit for a fair share of those profits. That’d be nice if it could happen. But i suspect, probably not.
wise_pancake@lemmy.ca
on 23 Feb 2024 02:25
nextcollapse
Their TOS says they own your content in any current or future formats or derivative works.
The TOS shouldn’t hold up in court. A contract must be an exchange of two things, eg money for a product or service. You can’t say “Our service is free of charge!!!” And then in the fine print “(((But also you agree to give us everything we can take free of charge)))”.
The issue is how everyone does it. Facebook and Google started when data had no value, now they’re amongst the wealthiest businesses in the world. Now, Microsoft have joined in, *even though you already pay for their products and services anyway!"
However, the other aspect is that everyone is a victim. Lawmakers are the victim. They still haven’t quite yet realised how much is being taken from them (at least $50 per year, probably more like $1,000 per year if not more for prominent figures) but they are still being abused.
It’s like that form of bank fraud, where the criminal takes pennies from accounts, hoping the user won’t notice and the bank will write it off. Do it to enough people and enough times and you can make millions. They do this to everyone and they make billions.
Either the data is public domain and they don’t have to pay for it, but also cannot charge others for it, or the data is private and they must pay the author a fair share.
ColeSloth@discuss.tchncs.de
on 23 Feb 2024 04:46
collapse
No, it isn’t. The website is offered free of charge, regardless of whether you provide them data or content. The exchange for data/content is a second transaction tucked away in the terms and conditions, and the website offers nothing in return for that.
The reason the 2nd exchange is hidden in the terms and conditions is to intentionally hide what the user is giving away, such that the user cannot make a fair value assessment. It is fraudulent and deceptive.
Their TOS says they own your content in any current or future formats or derivative works.
Their ToS could say they own you and your children and grandchildren, but that doesn't make it enforceable.
If I post a frame from the movie Akira on Reddit would any reasonable person suggest that they own not only that frame, but also the entire movie that it came from as a derivative work? There is a glut of second-hand data just like that all over Reddit, Twitter, and every other social media network, and I'm willing to bet that's also part of what's being sold.
But hey... I'm not saying you're wrong, just that the idea that they automatically "own" the things that people post on their website is ridiculous. It's a bit like UPS or FedEx saying they own the contents of your package while delivering it.
QuaternionsRock@lemmy.world
on 23 Feb 2024 07:09
collapse
It is true that Reddit does not hold a valid license to content that is
Sufficiently long-form, unique etc. to be copyrightable, and
posted by someone other than the copyright holder or someone with a sufficient license.
However, as far as I understand it, the extent to which Reddit—a content provider and social network—is legally required to remedy this is to comply with DMCA requests and review reported content. Perhaps there is a higher standard that I am not aware of?
And yet that exact kind of data is all over reddit in ways that are impractical to enforce by case by case DMCA. How many memes are there using footage from popular shows? How much fanart?
More importantly, is that stuff not included as part of the data that reddit "owns" when they sell their data to tech companies? Because whether a DMCA takedown has been requested on that kind of data or not, doesn't change the fact that they don't hold the copyright in the first place. How can they sell things that they don't even own?
Something smells. The logic of this entire industry doesn't add up.
QuaternionsRock@lemmy.world
on 23 Feb 2024 17:21
collapse
The answer is that it’s more practical than any alternative.
Copyright holders can’t sue Reddit for selling access to copyrighted content (before Reddit receives a copyright claim) because there is no way Reddit could reasonably distinguish between original and copyrighted content. Reddit users violate copyright law and the ToS in submitting copyrighted content, and Reddit is only required to take action as they are made aware of the content’s copyright status.
It would be trivially easy to to circumvent Reddit’s ToS otherwise: I could create some original content, sell my copyright to a friend for $1, and immediately put Reddit in violation of copyright law by submitting the content to Reddit. My friend could go after Reddit, and Reddit could go after me, but my friend would likely get more out of Reddit than Reddit could successfully get out of me.
It’s the same reason publishers can’t sue Cloudflare for hosting a piracy website unless they refuse to take it down, nor can they sue Facebook for ad revenue earned from banners placed next to a copy+paste of a New York Times article. The content providers do not knowingly/intentionally violate copyright law, and they make reasonable attempts to prevent/rectify it. Without such limitations on legal standing, the internet becomes a way bigger mess than it already is.
Reddit licensing/selling copyrighted data to other parties.
The DMCA covers hosting and dissemination. If a user submits copyrighted data to Reddit that they do not own and Reddit unknowingly (because, to be fair, they can't know what is or isn't owned or by who), then Reddit is not liable for copyright infringement as long as they comply with DMCA takedown requests from people who claim to own the original IP.
But again, none of that implies that Reddit themselves (or Twitter, Facebook, TikTok, etc.) can realistically claim ownership over all of the data that is on their website. The reason they are subject to DMCA at all is because there is a globally shared assumption that data that users submit may or may not be owned by some other party, and while the DMCA protects them from being held liable for simply hosting and disseminating that data, it does not magically make them the owner of all data that hasn't had a DMCA claim made against it.
In other words, if I post a picture of Homer Simpson on Reddit (and there are many), it is ridiculous for anyone to suggest that they have any intellectual property rights over that picture, that character, any trademarks, etc., whether someone has made a formal DMCA take down request or not. And if they don't own the picture, the character, the trademark, etc., when what exactly are they selling (licensing) and where did they get the right to sell it?
They might not be liable for just hosting/distributing it, but just like you can't sell someone else's car, you can't license out someone else's IP.
QuaternionsRock@lemmy.world
on 23 Feb 2024 21:28
collapse
I see your point, and I’m somewhat inclined to agree with you, but what Reddit is doing doesn’t seem very different from what Meta and friends have been up to for years. Reddit isn’t selling the rights to the content on their platform, nor are they attempting to. They’re effectively selling API access to its content, in bulk, to Google. I don’t see how that is legally distinct from Meta selling (insulated) access to its content via their ad platform. They are both monetizing data that is potentially copyrighted by other parties.
falkerie71@sh.itjust.works
on 23 Feb 2024 02:34
nextcollapse
Yeah, probably not. When you sign up and agreed to their ToS, they don’t “own” your content, but you grant them a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use it without compensation.
From their ToS:
Any ideas, suggestions, and feedback about Reddit or our Services that you provide to us are entirely voluntary, and you agree that Reddit may use such ideas, suggestions, and feedback without compensation or obligation to you
Source: A pretty good post on r/HFY, though it is on Reddit, so don’t click it if you don’t want to :P
dohpaz42@lemmy.world
on 23 Feb 2024 02:55
collapse
But how many TOS have been shot down because they over reach? I don’t know. You’re probably right. It it’s still fun to imagine.
There is legal standing, IMO. You can’t take something without consideration, and access to the website was granted free of charge while the data collection was squirrelled away in the fine print. That isn’t a lawful contract, the fine print is for technicalities about the main transaction of X in exchange for Y. You can’t say "we’ll give you X for free!!!” then sneak into the fine print “(((you also give us Y for free)))”. The structure is clearly deceptive in a manner that is designed to prevent a fair assessment of the value being exchanged.
Insurers have to provide a “key facts page” where they summarise in plain English what you’re paying for. The fine print gives the detail, but the front page is still “we give you X in exchange for Y”.
You can’t build a car without paying for the nuts and bolts. Tech companies have placed themselves amongst the wealthiest businesses in the world without paying for the nuts and bolts we provide.
Hell, even Microsoft is in on it now, even though you pay for Windows and Office 365!
dohpaz42@lemmy.world
on 23 Feb 2024 03:53
collapse
Next question then: how do we mobilize into a class action against Reddit and google and Microsoft and whomever else?
So just enough to cover what it payed the CEO last year? ($192 million)
InfiniteStruggle@sh.itjust.works
on 23 Feb 2024 04:31
collapse
Lmao the ratfuck CEO is making his money alright. What a little weaseling coward. He spend so long jealous of his co-founders who got paid and left early, that he probably sees this as his gold parachute.
reversebananimals@lemmy.world
on 23 Feb 2024 02:43
nextcollapse
A good reminder to go back and edit all your comments to [removed] if you didn’t do so when you first left.
doingthestuff@lemmy.world
on 23 Feb 2024 07:07
collapse
I did, and then also deleted my accounts. I can’t believe how much bot content is on there now.
kjake@infosec.pub
on 23 Feb 2024 02:48
nextcollapse
Good thing they monetized their API.
soratoyuki@lemmy.world
on 23 Feb 2024 03:24
nextcollapse
Where’s my cut?
nullPointer@programming.dev
on 23 Feb 2024 06:09
collapse
what, you didn’t get the email offering the special stock price?
soggy_kitty@sopuli.xyz
on 24 Feb 2024 07:19
collapse
Yeah I got it, but unfortunately I don’t want to put money into Reddit. I heard some early adaptors got cash bonus instead, I’m very jealous
BigMikeInAustin@lemmy.world
on 23 Feb 2024 03:26
nextcollapse
So Reddit charges users to create content (paid premium or by showing ads). And then it sells that content.
Making money both going and coming.
altima_neo@lemmy.zip
on 23 Feb 2024 04:06
collapse
And it also asks reddit users to invest in reddit
loool
Faceman2K23@discuss.tchncs.de
on 23 Feb 2024 03:31
nextcollapse
Literally the only way they could become profitable.
I’m honestly more upset at this deal (I think it was google?) than the CEO pay thing, which is all stock options and mostly ragebait.
I expect to see them last 3-5 years and get bought out by some bit tech firm, all current execs take their payouts, sell their shares and retire.
donuts@kbin.social
on 23 Feb 2024 03:38
nextcollapse
"its data".
Ah yes... of course.
athos77@kbin.social
on 23 Feb 2024 04:20
nextcollapse
In other news, spez's compensation from reddit last year was $193 million, and it's COO got a cool $93 million.
C'mon, spez, tell us again how horrible it's been that reddit's never made a profit.
killeronthecorner@lemmy.world
on 23 Feb 2024 06:21
collapse
“made”
Eggyhead@kbin.social
on 23 Feb 2024 06:28
collapse
“lots”
Jerkface@lemmy.world
on 23 Feb 2024 07:47
collapse
“/ᐠ。ꞈ。ᐟ\”
Socsa@sh.itjust.works
on 23 Feb 2024 14:54
collapse
Monty Python’s flying circus!
HotDogFingies@kbin.social
on 24 Feb 2024 02:04
collapse
"It's"
abhibeckert@lemmy.world
on 23 Feb 2024 05:57
nextcollapse
Reddit doesn’t own that data. The community owns it.
Maybe there’s something in the terms of service but that shouldn’t hold water because nobody has ever read that document.
RedditWanderer@lemmy.world
on 23 Feb 2024 07:30
nextcollapse
Spez will be happy to know his payout is covered, company is still going to be broke though.
Pixelemme@lemm.ee
on 23 Feb 2024 07:56
nextcollapse
I wonder if they will ever consider paying the users for the content they provide that constitutes “its” data. 🤔
KISSmyOS@feddit.de
on 23 Feb 2024 08:45
nextcollapse
The users get a service that costs hundreds of millions to maintain for free.
And no one is forcing them to post valuable content without compensation.
DreadPotato@sopuli.xyz
on 23 Feb 2024 08:53
nextcollapse
Well there’s apparently more than 400 million active users every month, so they could charge users a few cent per month and pay for the infrastructure entirely. But they choose to be massive privacy invading assholes.
If they charged users any amount of money there wouldn’t even be 400000 of them anymore.
DreadPotato@sopuli.xyz
on 23 Feb 2024 09:06
collapse
Yes there’d be less, but the amount is purely speculative and you don’t know anymore than I do.
Even if they have to go with the ad-supported model to maintain a large active userbase, that can easily be done without all the tracking. But again, they chose the shittiest option…there’s really a pattern of them just being massive assholes. No matter what options they have, they’ll apparently go for the shittiest one that screws over the users the most.
Socsa@sh.itjust.works
on 23 Feb 2024 14:53
collapse
This was my attitude until reddit took away my app. Now the site is the poster child for enshitification
ours@lemmy.world
on 23 Feb 2024 13:41
nextcollapse
Pay? They are trying to “stonk” Reddit users by asking them to buy stock for the IPO which screams “We want your data and your money!”.
Yeah. They are giving the users the “privilege” to buy shares at the open market rate. Not even at discounted rates. Again US only. What about the others? They just give their data I suppose.
dangblingus@lemmy.dbzer0.com
on 23 Feb 2024 14:06
collapse
They said they’ll be allowing users to cash out Karma.
masquenox@lemmy.world
on 23 Feb 2024 08:15
nextcollapse
Reddit says it’s made $203M so far licensing itsour data
Fixed that for them.
There’s your “tragedy of the commons” fallacy on a stick, folks - the proles where managing reddit so well that huffman had to break it in order to make it more vulnerable to the parasites.
ooli@lemmy.world
on 23 Feb 2024 08:20
nextcollapse
time to go edit all my old comment with random garbage generated by chatgpt
Evil_Shrubbery@lemm.ee
on 23 Feb 2024 13:21
collapse
Now AIs will fuck Spez for all eternity.
Millennia from now Fuckspez! will be the standard greeting between all sentient species in the galactic federation. It will be even used in machine code as a handshake for establishing initial contact between two subspace relays.
BilboBargains@lemmy.world
on 24 Feb 2024 07:01
collapse
Blow me
SomeGuy69@lemmy.world
on 23 Feb 2024 12:34
nextcollapse
This is helpful of them, once the EU court fines them, we can quickly calculate how much that will be.
Evil_Shrubbery@lemm.ee
on 23 Feb 2024 13:26
nextcollapse
… and thats why Wikipedia is non-profit.
Seeing human (even shitpost) achievements get monetized (in the most sucky manner) one by one is sad af.
FlashMobOfOne@lemmy.world
on 23 Feb 2024 13:59
nextcollapse
Probably doesn’t matter, but this is why I deleted my account instead of just locking it up.
webghost0101@sopuli.xyz
on 23 Feb 2024 15:00
collapse
With google being the partner i bet the deal wont mention they cant use older backups of reddit that they most certainly have.
FlashMobOfOne@lemmy.world
on 23 Feb 2024 15:06
nextcollapse
I’m sure you’re right about that. All parties involved are scumbags, after all.
NegativeInf@lemmy.world
on 23 Feb 2024 16:20
collapse
This is why we need better data laws in the US. If I want everything I’ve ever said on your site to disappear both instantly and forever, I should have that option.
dangblingus@lemmy.dbzer0.com
on 23 Feb 2024 14:06
nextcollapse
How? They’re under heavy scrutiny from the FTC over the $60M/month Google deal. Where did the extra $140M come from?
nyakojiru@lemmy.dbzer0.com
on 24 Feb 2024 06:24
nextcollapse
They need to pay the users with that money
soggy_kitty@sopuli.xyz
on 24 Feb 2024 07:14
collapse
threaded - newest
This is the best summary I could come up with:
Earlier this week, Bloomberg and Reuters reported that a “large unnamed AI company” — possibly Google — had entered into a licensing agreement worth about $60 million on an annualized basis.
“[Our] data APIs are able to provide real-time access to evolving and dynamic topics such as sports, movies, news, fashion, and the latest trends,” the prospectus continues.
“We believe that Reddit’s massive corpus of conversational data and knowledge will continue to play a role in training and improving large language models.
Content producers, from stock media libraries to news publishers, are increasingly turning to data licensing agreements with AI vendors as chatbots like OpenAI’s ChatGPT and Google’s Gemini threaten to sap traffic.
Vendors, in turn, have been spurred to pursue licensing agreements as they face a deluge of lawsuits alleging that they have no legal justification for training their models on data without permission or payment.
OpenAI, for one, has agreements in place with image gallery Shutterstock as well as publishers including Axel Springer, the owner of Politico and Business Insider.
The original article contains 564 words, the summary contains 172 words. Saved 70%. I’m a bot and I’m open source!
Our data, you mean?
well, not mine. i used a script to replace all of my comments with gibberish before i deleted them and then my account. if they went back and restored my comments, then all they’ll get is comments full of gibberish, especially since i overwrote them 3 times before deleting them, just in case they tried to roll back to the previous version.
have fun with that!
I like your style, but honestly I wouldn’t be surprised if they keep every single version.
i bet they do now, but i’ve checked back now and then, and all of my comments and posts are most assuredly gone.
edit: i’ve gone back to check some old haunts, place i know i’ve commented, and i did some seaching with google using my old usernames, as google uses its cache to match to the posts\comments, even though they’re not there any more.
i see old posts that are graveyards of deleted comments, some with simply deleted accounts, and many others where both the account and comment are deleted. i don’t see any gibberish comments. the ones i know are mine (because replies quote the comment above, which i recognize as mine), are all just deleted in their entirety, so it seems they didn’t do comment versioning, at least not past the first edit. i see no posts under any former username of mine.
the efforts to scrub my content from reddit last May appears to have worked. sadly, since the API lockdown, those tools no longer work.
Just because it shows [deleted] doesn’t mean the data were deleted. That is most likely just a flag for the comment.
They most likely keep every save since they decided to do the sell the data thing. Why would google pay them for what google could easily scrape other than having the full history?
As I mentioned, I overwrote the comments several times before deleting them. I seriously doubt that they saved multiple versions of the comments. I know that, towards the end of May, they made some backend changes to try to circumvent users attempt to delete their accounts, but I did all of this to my account a couple of weeks before that.
No, I think the other commenter is right. They definitely store every version of every comment on their backend. Just because it’s not displayed publicly doesn’t mean they don’t have the data.
if you request your data from them under CCPA, and it shows the edited comments as gibberish, you’re good. I did the same thing but I left the comments to simmer for a long time like months.
I imagine that they would. Text is trivially easy to store, and storing multiple versions would let them catch users who might edit away rule breaking they posted to avoid bans, but it’s probably one of those internal tools.
From a data handling perspective, it’d be more efficient to handle edits by having a common
id
field, and an additional version/edit counter that increments --adding the edit like its a normal post-- , than it would be to edit data the usual way, since you don’t have to go back over the whole database to find the comment, or worry about it falling out of sync if one copy of the database has the edit, and the other has the original.You’d just need to fetch the comment by id, and the database entry with the highest version count to display it, which would be fairly easy to do.
That literally means nothing at all on their server backup from the year before. You could delete and rewrite your comments a thousand times and it would do you the same amount as good as one time, and barely any better than doing it no times at all. Your entire 15 year comment history would take up probably 10MB of space at best. They’ll have several back ups taken over the last decade. They aren’t just going to be selling off the live servers info.
Here’s the thing: Nothing in Reddit’s history indicates that they are that competent.
I put my account’s comment through a mass-delete app around the time of the big protest, and a couple weeks later I found every single one was restored.
People can be incompetent for years and then suddenly start figuring things out.
Reddit used to be open source and the source is still on github as a read only archive.
AFAIK back then edit history was only kept briefly. Enough to roll back an accidental edit (if you have admin privileges anyway) but not far enough back to view old versions of posts.
Of course, they would have backups, and maybe the code has changed, but I wouldn’t be surprised if it hasn’t changed and those backups are impractical (slow/expensive) to access.
Keeping old revisions is a common practice but it’s also expensive and in reddit’s case totally unnecessary.
you can request your reddit data, and they provide every comment along with edits as far as I remember, it was uncomfortable but i’d never posted anything regrettable at least
imagine getting your hands on u/spez’s reddit data
I did the same, but we’re both fools if we think reddit didn’t keep every character we typed (yet alone submitted) in a private, proprietary database.
We weren’t paid for our data. We were given access to a website free of charge. The consent we gave was supposed to be for the operation of the website, not for training AI.
They should fucking pay us.
Yeah…all that comment data isn’t really that large. They’ll have backups captured for likely several years back. All you can view is the info on the current live servers. You might have kept them from getting like 3 months worth of your comments at best.
LOL. I did the same. And I confirmed many months later that the comments were not restored.
Now I hear that Google wants to train their AI on reddit content. Haha. Good luck with that, Lorem Ipsum! 😁
If you actually replaced with “Lorem ipsum” texts, it would probably be easy to filter the garbage from the dataset.
Also, they probably have copies of the comments before the edits that are just not presented in the frontend.
I didn’t. At first, it was basically a long ass message about deleting my comment out of protest. Then a few subreddit mods banned me, so I changed them to “almost makes sense” word salad 😂
I ran the script, changing the text each time, several times for good measure.
They still haven’t reverted it, and it’s been more than just a few months now.
Me too. Feelin’ mighty fine about that decision now. Long Live Lemmy
<img alt="" src="https://lemmy.world/pictrs/image/86f0962f-0c7d-4864-a668-a264b4869363.jpeg">
Correct, our data.
I wonder if there is any legal standing for users to sue Reddit for a fair share of those profits. That’d be nice if it could happen. But i suspect, probably not.
Their TOS says they own your content in any current or future formats or derivative works.
I’d say Reddit would win.
The TOS shouldn’t hold up in court. A contract must be an exchange of two things, eg money for a product or service. You can’t say “Our service is free of charge!!!” And then in the fine print “(((But also you agree to give us everything we can take free of charge)))”.
The issue is how everyone does it. Facebook and Google started when data had no value, now they’re amongst the wealthiest businesses in the world. Now, Microsoft have joined in, *even though you already pay for their products and services anyway!"
However, the other aspect is that everyone is a victim. Lawmakers are the victim. They still haven’t quite yet realised how much is being taken from them (at least $50 per year, probably more like $1,000 per year if not more for prominent figures) but they are still being abused.
It’s like that form of bank fraud, where the criminal takes pennies from accounts, hoping the user won’t notice and the bank will write it off. Do it to enough people and enough times and you can make millions. They do this to everyone and they make billions.
Either the data is public domain and they don’t have to pay for it, but also cannot charge others for it, or the data is private and they must pay the author a fair share.
The exchange is you getting to be on reddit.
No, it isn’t. The website is offered free of charge, regardless of whether you provide them data or content. The exchange for data/content is a second transaction tucked away in the terms and conditions, and the website offers nothing in return for that.
The reason the 2nd exchange is hidden in the terms and conditions is to intentionally hide what the user is giving away, such that the user cannot make a fair value assessment. It is fraudulent and deceptive.
Their ToS could say they own you and your children and grandchildren, but that doesn't make it enforceable.
If I post a frame from the movie Akira on Reddit would any reasonable person suggest that they own not only that frame, but also the entire movie that it came from as a derivative work? There is a glut of second-hand data just like that all over Reddit, Twitter, and every other social media network, and I'm willing to bet that's also part of what's being sold.
But hey... I'm not saying you're wrong, just that the idea that they automatically "own" the things that people post on their website is ridiculous. It's a bit like UPS or FedEx saying they own the contents of your package while delivering it.
It is true that Reddit does not hold a valid license to content that is
However, as far as I understand it, the extent to which Reddit—a content provider and social network—is legally required to remedy this is to comply with DMCA requests and review reported content. Perhaps there is a higher standard that I am not aware of?
And yet that exact kind of data is all over reddit in ways that are impractical to enforce by case by case DMCA. How many memes are there using footage from popular shows? How much fanart?
More importantly, is that stuff not included as part of the data that reddit "owns" when they sell their data to tech companies? Because whether a DMCA takedown has been requested on that kind of data or not, doesn't change the fact that they don't hold the copyright in the first place. How can they sell things that they don't even own?
Something smells. The logic of this entire industry doesn't add up.
The answer is that it’s more practical than any alternative.
Copyright holders can’t sue Reddit for selling access to copyrighted content (before Reddit receives a copyright claim) because there is no way Reddit could reasonably distinguish between original and copyrighted content. Reddit users violate copyright law and the ToS in submitting copyrighted content, and Reddit is only required to take action as they are made aware of the content’s copyright status.
It would be trivially easy to to circumvent Reddit’s ToS otherwise: I could create some original content, sell my copyright to a friend for $1, and immediately put Reddit in violation of copyright law by submitting the content to Reddit. My friend could go after Reddit, and Reddit could go after me, but my friend would likely get more out of Reddit than Reddit could successfully get out of me.
It’s the same reason publishers can’t sue Cloudflare for hosting a piracy website unless they refuse to take it down, nor can they sue Facebook for ad revenue earned from banners placed next to a copy+paste of a New York Times article. The content providers do not knowingly/intentionally violate copyright law, and they make reasonable attempts to prevent/rectify it. Without such limitations on legal standing, the internet becomes a way bigger mess than it already is.
I think you're conflating two very different things here.
The DMCA covers hosting and dissemination. If a user submits copyrighted data to Reddit that they do not own and Reddit unknowingly (because, to be fair, they can't know what is or isn't owned or by who), then Reddit is not liable for copyright infringement as long as they comply with DMCA takedown requests from people who claim to own the original IP.
But again, none of that implies that Reddit themselves (or Twitter, Facebook, TikTok, etc.) can realistically claim ownership over all of the data that is on their website. The reason they are subject to DMCA at all is because there is a globally shared assumption that data that users submit may or may not be owned by some other party, and while the DMCA protects them from being held liable for simply hosting and disseminating that data, it does not magically make them the owner of all data that hasn't had a DMCA claim made against it.
In other words, if I post a picture of Homer Simpson on Reddit (and there are many), it is ridiculous for anyone to suggest that they have any intellectual property rights over that picture, that character, any trademarks, etc., whether someone has made a formal DMCA take down request or not. And if they don't own the picture, the character, the trademark, etc., when what exactly are they selling (licensing) and where did they get the right to sell it?
They might not be liable for just hosting/distributing it, but just like you can't sell someone else's car, you can't license out someone else's IP.
I see your point, and I’m somewhat inclined to agree with you, but what Reddit is doing doesn’t seem very different from what Meta and friends have been up to for years. Reddit isn’t selling the rights to the content on their platform, nor are they attempting to. They’re effectively selling API access to its content, in bulk, to Google. I don’t see how that is legally distinct from Meta selling (insulated) access to its content via their ad platform. They are both monetizing data that is potentially copyrighted by other parties.
Yeah, probably not. When you sign up and agreed to their ToS, they don’t “own” your content, but you grant them a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use it without compensation.
From their ToS:
Source: A pretty good post on r/HFY, though it is on Reddit, so don’t click it if you don’t want to :P
But how many TOS have been shot down because they over reach? I don’t know. You’re probably right. It it’s still fun to imagine.
There is legal standing, IMO. You can’t take something without consideration, and access to the website was granted free of charge while the data collection was squirrelled away in the fine print. That isn’t a lawful contract, the fine print is for technicalities about the main transaction of X in exchange for Y. You can’t say "we’ll give you X for free!!!” then sneak into the fine print “(((you also give us Y for free)))”. The structure is clearly deceptive in a manner that is designed to prevent a fair assessment of the value being exchanged.
Insurers have to provide a “key facts page” where they summarise in plain English what you’re paying for. The fine print gives the detail, but the front page is still “we give you X in exchange for Y”.
You can’t build a car without paying for the nuts and bolts. Tech companies have placed themselves amongst the wealthiest businesses in the world without paying for the nuts and bolts we provide.
Hell, even Microsoft is in on it now, even though you pay for Windows and Office 365!
Next question then: how do we mobilize into a class action against Reddit and google and Microsoft and whomever else?
So just enough to cover what it payed the CEO last year? ($192 million)
Lmao the ratfuck CEO is making his money alright. What a little weaseling coward. He spend so long jealous of his co-founders who got paid and left early, that he probably sees this as his gold parachute.
A good reminder to go back and edit all your comments to [removed] if you didn’t do so when you first left.
I did, and then also deleted my accounts. I can’t believe how much bot content is on there now.
Good thing they monetized their API.
Where’s my cut?
what, you didn’t get the email offering the special stock price?
Yeah I got it, but unfortunately I don’t want to put money into Reddit. I heard some early adaptors got cash bonus instead, I’m very jealous
So Reddit charges users to create content (paid premium or by showing ads). And then it sells that content.
Making money both going and coming.
And it also asks reddit users to invest in reddit
loool
Literally the only way they could become profitable.
I’m honestly more upset at this deal (I think it was google?) than the CEO pay thing, which is all stock options and mostly ragebait.
I expect to see them last 3-5 years and get bought out by some bit tech firm, all current execs take their payouts, sell their shares and retire.
"its data".
Ah yes... of course.
In other news, spez's compensation from reddit last year was $193 million, and it's COO got a cool $93 million.
C'mon, spez, tell us again how horrible it's been that reddit's never made a profit.
Just saw this on yours truly. Fucking hilarious considering they had the balls to IPO with that sack of rocks weighing down the entire company.
These are totally the signs of a stable and profitable company.
“Its”
“licensing”
“made”
“lots”
“/ᐠ。ꞈ。ᐟ\”
Monty Python’s flying circus!
"It's"
Reddit doesn’t own that data. The community owns it.
Maybe there’s something in the terms of service but that shouldn’t hold water because nobody has ever read that document.
Spez will be happy to know his payout is covered, company is still going to be broke though.
I wonder if they will ever consider paying the users for the content they provide that constitutes “its” data. 🤔
The users get a service that costs hundreds of millions to maintain for free.
And no one is forcing them to post valuable content without compensation.
Well there’s apparently more than 400 million active users every month, so they could charge users a few cent per month and pay for the infrastructure entirely. But they choose to be massive privacy invading assholes.
If they charged users any amount of money there wouldn’t even be 400000 of them anymore.
Yes there’d be less, but the amount is purely speculative and you don’t know anymore than I do.
Even if they have to go with the ad-supported model to maintain a large active userbase, that can easily be done without all the tracking. But again, they chose the shittiest option…there’s really a pattern of them just being massive assholes. No matter what options they have, they’ll apparently go for the shittiest one that screws over the users the most.
This was my attitude until reddit took away my app. Now the site is the poster child for enshitification
Pay? They are trying to “stonk” Reddit users by asking them to buy stock for the IPO which screams “We want your data and your money!”.
Yeah. They are giving the users the “privilege” to buy shares at the open market rate. Not even at discounted rates. Again US only. What about the others? They just give their data I suppose.
They said they’ll be allowing users to cash out Karma.
Fixed that for them.
There’s your “tragedy of the commons” fallacy on a stick, folks - the proles where managing reddit so well that huffman had to break it in order to make it more vulnerable to the parasites.
time to go edit all my old comment with random garbage generated by chatgpt
It is probably way too late for that to make any difference, no?
Probably, but one thing I learned too late in my life is that by being cynical you’re assured to never get anything done
For everybody who thinks we should get paid for our data, you may want to consider the Data Dividend Project:
www.datadividendproject.com
Damn, might be a US-only thing then.
It’s data?! You mean our data
I hope my fuck Spez comments are useful.
Now AIs will fuck Spez for all eternity.
Millennia from now Fuckspez! will be the standard greeting between all sentient species in the galactic federation. It will be even used in machine code as a handshake for establishing initial contact between two subspace relays.
Blow me
This is helpful of them, once the EU court fines them, we can quickly calculate how much that will be.
… and thats why Wikipedia is non-profit.
Seeing human (even shitpost) achievements get monetized (in the most sucky manner) one by one is sad af.
Probably doesn’t matter, but this is why I deleted my account instead of just locking it up.
With google being the partner i bet the deal wont mention they cant use older backups of reddit that they most certainly have.
I’m sure you’re right about that. All parties involved are scumbags, after all.
This is why we need better data laws in the US. If I want everything I’ve ever said on your site to disappear both instantly and forever, I should have that option.
How? They’re under heavy scrutiny from the FTC over the $60M/month Google deal. Where did the extra $140M come from?
They need to pay the users with that money
Delusional comment
Or genius. New company idea. Sell data from the start and share revenue with contributors.
$203M in one licenced transaction. Selling their data to Google. No one is falling for this shit.
cnbc.com/…/reddit-is-a-smaller-more-volatile-twit…
who knew my old shitposts are worth that much