I’d almost like to think an LLC would be enough, but I suspect that only works if you also have a billion in VC funding and political connections.
curbstickle@lemmy.dbzer0.com
on 06 Feb 2025 20:02
collapse
Oh for sure, since the law is basically toilet paper for billionaires at this point.
Damage@slrpnk.net
on 06 Feb 2025 21:02
nextcollapse
They’ll be fined 100k
Telorand@reddthat.com
on 06 Feb 2025 21:15
collapse
And they’ll ham up how punished and sorry they are, and how thankful they are for the judge handing down “fair and impartial” justice.
roofuskit@lemmy.world
on 07 Feb 2025 18:17
collapse
He already referred them to the justice department, this is a civil case, he cannot sentence them criminally.
SnotFlickerman@lemmy.blahaj.zone
on 06 Feb 2025 16:59
nextcollapse
“Meta downloaded millions of pirated books from LibGen through the bit torrent protocol using a platform called LibTorrent. Internally, Meta acknowledged that using this protocol was legally problematic,” the third amended complaint noted.
Just want to make clear that Libtorrent is just the torrent application they were using, while the Libgen torrents are easily accessible on the libgen site, not through a separate “platform” called Libtorrent.
I wish people like us could help with these complaints, because then they might actually get the details more accurate to reality.
The amended complaint makes it sound like Libtorrent is a private tracker website when its just the application they were using on the publicly available torrents.
corsicanguppy@lemmy.ca
on 06 Feb 2025 19:00
collapse
People are putting an S on the end of words like ‘traffic’ and ‘email’. They will never understand the semantics of that correction.
paraphrand@lemmy.world
on 06 Feb 2025 19:54
collapse
Meta Horizons
akilou@sh.itjust.works
on 06 Feb 2025 17:02
nextcollapse
But did they keep a good ratio though?
SnotFlickerman@lemmy.blahaj.zone
on 06 Feb 2025 17:09
nextcollapse
Asking the real questions.
empireOfLove2@lemmy.dbzer0.com
on 06 Feb 2025 17:18
nextcollapse
1000% guarantee those mf’s had their upload choked to 20kbps
guaraguaito@lemmy.blahaj.zone
on 06 Feb 2025 17:36
nextcollapse
Nah they used a leeching client. No upload at all.
empireOfLove2@lemmy.dbzer0.com
on 06 Feb 2025 17:45
collapse
Gotta have some upload just for the protocol traffic tho.
bamboo@lemmy.blahaj.zone
on 06 Feb 2025 17:51
collapse
I would assume that the requests sent from the torrent client to download data are not factored into the Upload amount for the torrent. When they mean no upload, it would be that none of the data in the files they downloaded were shared with anyone else, making them a piece of shit leecher.
20 was the lead engineer ‘mishearing’ Zuck after he said 2.
rottingleaf@lemmy.world
on 06 Feb 2025 19:02
collapse
In copyright protection terms the ratio shouldn’t matter. They should pay for all the lost profits from pirating everything they’ve downloaded. Every time someone pirated it should be counted. And every time someone uses the AI trained on the data.
They can become the corporate Jesus of the interwebs, having paid for our sins.
Technically, copyright infringement is committed by the entity making and sending the copy, not the entity receiving it. Leeching could indeed remove liability.
I’m not sure if the courts have cared about that nuance when persecuting the ‘small fish,’ but I bet they would in this ‘big fish’ case.
If the receiving entity then ingests all that copyrighted material into its AI, and the AI sends it piece at a time to other receiving entities, that should be the AI infringing on everything it is copying to make its answers.
SinningStromgald@lemmy.world
on 06 Feb 2025 17:49
nextcollapse
Given the extent it should be considered criminal so $250k per offense and the higher ups who authorized the torrenting should get conspiracy charges at a minimum.
But this is America so they’ll probably pay a small amount, for Meta, and a light slap on the wrist with a finger wagging.
30,000,000 books * $250,000 per offence = $7.5 trillion
(are you sure you’re from programming.dev?)
Pika@sh.itjust.works
on 06 Feb 2025 18:46
nextcollapse
you are being optimistic, it’s likely going to be considered “fair use” and then be business as usual. Meta themselves have claimed that they aren’t filing to dismiss because they believe they are on the legal side, due to the fact they aren’t distributing the pirated content, only using it for training which is currently a massive grey area that hasen’t been ruled as non-fair use
Knock_Knock_Lemmy_In@lemmy.world
on 07 Feb 2025 19:13
collapse
But this is America
Maybe they hosted their servers in Eritrea, Turkmenistan or San Marino. No copyright laws there
shittydwarf@lemmy.dbzer0.com
on 06 Feb 2025 18:03
nextcollapse
Facebook: I’ll just torrent what I need burden your underfunded project and volunteers with over 81 TB of bandwidth costs without contributing anything in return, see yaa
FTFY
C126@sh.itjust.works
on 07 Feb 2025 12:38
collapse
Yeah the least they could do is seed forever.
Knock_Knock_Lemmy_In@lemmy.world
on 07 Feb 2025 19:06
collapse
Agreed. Seed forever and release the AI weights and model. That would be fair payment.
The entirely of Annas archive would be an excellent benchmark training set. Particularly a cleaned processed dataset.
daggermoon@lemmy.world
on 06 Feb 2025 19:13
nextcollapse
Damn leeches
Grimy@lemmy.world
on 06 Feb 2025 20:34
nextcollapse
Meta has open sourced every single one of their llms. They essentially gave birth to the whole open llm scene.
If they start losing all these lawsuits, the whole scene dies and all those nifty models and their fine-tunes get removed from huggingface, to be repackaged and sold to us with a subscription fee. All the other domestic open source players will close down.
The copyright crew aren’t the good guys here, even if it’s spearheaded by Sarah Silverman and Meta has traditionally played the part of the villain.
Meta stole from everyone, including those that struggle to make ends meet, so it doesn’t matter that they gave you back some of it. Any moral qualms should evaporate when you consider that they did it to create shareholder value and the rest is philanthropy (aka pretend tax). As a socialist I believe that man is owed for his work and you can’t take from him even though technology makes it so easy.
Grimy@lemmy.world
on 06 Feb 2025 20:45
nextcollapse
Don’t give me that slop. No one except the biggest names are getting a dime out it once OpenAI buys up all the data and kills off their competition. It’s also highly transformative, which used to be perfectly legal.
Copyright laws have been turned into a joke, only protecting big money and their interests.
LainTrain@lemmy.dbzer0.com
on 06 Feb 2025 21:04
nextcollapse
As a socialist I believe intellectual property is a falsehood and technological advancement should be for the public good. Open source LLMs are for the public good.
Given the options between having open source LLMs and the US Govt banning non-corpo non-proprietary LLMs and giving a free pass to people like Musk and Altman and Zucc to monopolize, I happily pick the former.
You’re delusional if you think they will pay anyone, the only way zucc will pay is with a guillotine.
Corpos will make inter-platform deals that’ll simply make all online data licensable for the right price and enrich each other so you can’t avoid it while still actually being a career creative, but price out academic researchers and the public sector so that all fruits of it stay behind closed R&D doors and be free of ethics etc.
Continuing in your role as a useful idiot, you’ll also most likely also foot the bill for it via subsidies from your taxes to “develop the AI sector” in some anti-China dick measuring contest by the US.
You will then be sold this data back via proprietary chat bots via a monthly subscription and you better pay up because once it gets really good, it will become mandatory to use for just about any job, leaving you with no choice.
Or you can support FOSS LLMs.
foenkyfjutschah@programming.dev
on 06 Feb 2025 21:50
nextcollapse
Lieber Genosse, der Hype um Affirming Incompetence (AI) ist der dieser Zeit die höchste Ausdruck der Entfremdung der Menschen von sich selbst, Zeugnis des Begehrens nach und Voraussetzung also der weitergehenden Fetischisierung seines Zugriffs auf Welt. Wie jedoch Bernard Stiegler so schön bemerkte: Kein Savoir-vivre ohne Savoir-faire! Dies seien die unabdingbaren Bedingungen für die Befreiung der Menschheit aus den sich selbst angelegten Ketten zur Errichtung einer geschwisterlichen Ordnung!
(now have fun w/ an LLM’s attempt of “advancement”!)
You’re confusing self entitlement to stuff with the left.
LainTrain@lemmy.dbzer0.com
on 07 Feb 2025 09:05
collapse
Lolwut? Public good is self-entitlement? Go read a fucking book. Communists are not pro-copyright, especially not when it only benefits the giant corpos.
Another day, another entitled artoid larping as progressive blocked.
Whether it’s appropriately licensed is an unsolved question though.
The dataset itself and the text portion of the text-imags pairs needed for training is CC-BY-SA, the newer versions linked above are CC-BY-4.0. creativecommons.org/licenses/by/4.0/deed.en
The images however are technically under their own copyright, which in practice means each of the billions of images could or could not have a licence that implicitly or explicitly forbids AI training use or forbids it only for commercial use.
Whether such a license is legally binding is at present unknown though, since licenses primarily deal with reproductions, which the pro-AI folks argue isn’t the case, and that training of NNs is more akin to viewing an image and memorising the patterns and relationships within, like a person viewing it.
That would make it non-infringing and therefore the model itself libre. In that case Mistral and LLaMa are also libre as long as the model itself is open source, which in this case really means “open weights”, so not like GPT and anything by “”“OpenAI”“”.
Weights are the result of a model being trained essentially. They’re they key bit that makes it or breaks it and how it works. Given that and knowing the structure of the model and framework used you can refine, modify and distribute it.
Those against AI will say that it’s more akin to file compression and that in one form or another it’s misuse. That would make the model an infringing derivative work and as such nor libre even if the model weights are open source.
In a way though you could argue that me vaguely memorising the imagery of a dude dressed in white holding a laser sword is just a lossy compressed copy of the copyrighted work of Star wars, and it’d be absurd to think that’s a violation and that infringement only occurs if I reproduce a work of substantial similarity commercially from that memory.
If I use Krita and draw a beautiful landscape which has been informed and inspired by at least in part by a movie I saw, is that copyright infringement or not? What if I use AI?
Well, current laws don’t say. We measure infringement in substantial similarity, provenance of information only comes in later (e.g. to prove against accidental similarity).
That’s also my own personal stance on the legal side of things, so up to you how you see it.
General_Effort@lemmy.world
on 06 Feb 2025 22:40
collapse
Calling property labor, doesn’t make you a socialist.
The world is in a mess is that we were told to choose between fascists and pro-market technocrat libertarians pretending to be leftists. This is a worldwide issue that’s doubly important because those liberals guilt trip us for not supporting them and that’s why I’m just laying little bricks here and there. At the end of the tunnel we either rework our society into a socialist one or we succumb to feudal lords again. Years of neoliberal hegemony needs to be undone so I try to go against the grain like that sometimes, hoping I made someone think.
General_Effort@lemmy.world
on 07 Feb 2025 12:20
collapse
When you call yourself a socialist, what do you mean by that term?
I assume you probably want to know how this kind of leftism is different from others or other ideologies calling themself leftist, rather than for me to write an essay on myself.
I believe in equal opportunity but reject that you should be able to „win” in any system. I believe in empathy over soulless meritocracy. I believe in collective ownership but don’t reject that one is owed for his work. You could say it all stems from egalitarianism but this term has been caricatured by liberals too. For a long time I thought social democracy as an ideology gives you enough levers in the system to steer it toward that goal but time and time again it turned out that in most places SocDem parties are no different from liberal ones and so I learned from past mistakes.
General_Effort@lemmy.world
on 07 Feb 2025 15:36
collapse
I assume you probably want to know how this kind of leftism is different from others or other ideologies calling themself leftist, rather than for me to write an essay on myself.
What confuses me is that you argue that property owners should be able to demand payment for the use of their property without any further consideration. That is a very conservative capitalist stance. It’s not compatible with any flavor of socialism that I am aware of. In fact, most pro-capitalists would reject it as too far right. The only ideologue, I can think of, that holds this stance even for copyrights is Ayn Rand. Your ideas seem compatible with hers. I don’t understand why you would think of that as socialist or even left.
When you think payment you think „money” but I think „fair” :) We’ve been broken by capitalist hegemony to the point it’s hard of thinking of something different.
General_Effort@lemmy.world
on 07 Feb 2025 17:06
collapse
It sounds like a European soviet republic. Most of them were working reasonably well and were really good at preventing poverty but were stuck in-between being exploited by Russia and artificially cut off from half the world (big reason why they had to fail). Those countries solved problems progressive western democracies couldn’t ever solve, for example gender wage inequality (to the point it endures today). Unfortunately all of us in the „west” are stuck in a death spiral after US and Russia went tits up in the 70s/80s. Maybe we’ll have another go once this is finally done.
General_Effort@lemmy.world
on 07 Feb 2025 18:49
collapse
I’m not sure what you are trying to say here. Do you think that soviet states would have negotiated with owners of private property before using it for public benefit?
No, why would they? There’s a difference between strong taking from the weak and community taking surplus from everyone.
General_Effort@lemmy.world
on 07 Feb 2025 20:23
collapse
I’m trying to follow you. It would be ok if a soviet government did it, but if a private company does it, then it’s stealing. Because a soviet government is strong? Has control of the military and all that, unlike some start-up or even an established company?
I’m not sure I’m following you either, it appears to me that you don’t see a difference between tax and theft. It was common to outgrow this belief but it appears to be common now. I’ll try to explain.
When Meta takes from everyone it’s a bully that takes from the weak who can’t fight back. Meta does it so that they become the biggest fish in the pond as an end goal.
When a state takes from everyone and rich in particular it’s because we don’t to have this kind of big fish in the pond. We just want to chill.
General_Effort@lemmy.world
on 07 Feb 2025 21:25
collapse
I’m not sure I’m following you either, it appears to me that you don’t see a difference between tax and theft.
That’s an odd thing to write. Why do you believe that?
When Meta takes from everyone it’s a bully that takes from the weak who can’t fight back. Meta does it so that they become the biggest fish in the pond as an end goal.
When a state takes from everyone and rich in particular it’s because we don’t to have this kind of big fish in the pond. We just want to chill.
Ok, I think I get this now. You believe in far-reaching intellectual property, and that property is inviolable, except to limit inequality. So, you reject US-style Fair Use which has a public benefit in mind. Instead, copying only doesn’t require permission if the rights-owner is wealthier than oneself. So, most people could freely copy Taylor Swift songs but perhaps not songs by some street musician. Does that cover it?
I wasted enough of my time which can be spent more productively. I’m pretty sure you’re not really interested. For some it must get much worse before they get it. Goodbye.
Telodzrum@lemmy.world
on 07 Feb 2025 00:20
nextcollapse
Nope. Get fucked
antonim@lemmy.dbzer0.com
on 07 Feb 2025 02:24
nextcollapse
If the existence of open source LLMs hinges on the benevolence of one of the few most cancerous tech companies in the world, maybe they’re not really worth it?
This isn’t about “heroes” and “villains”. Facebook has been and has stayed the “villain”, they’ve done something colossally illegal that any mere mortal would be sued to death for (by an another “villainous” instance, the media system that has made piracy a necessity in the first place), and they’re hoping to get away with it simply on technicalities and by having more money for better lawyers. Rules are rules, if you don’t like them maybe Facebook should try to change them (and not just for themselves, but for the rest of us too)?
The existence hinges on the rewriting and strengthening of copyright laws by data brokers and other cancerous tech companies. It’s not Meta vs us, but opensource vs Google and Openai.
They are being sued for copyright infringement when it’s clearly highly transformative. The rules are fine as is, Meta isn’t the one trying to change them. I shouldn’t go against my own interests and support frivolous lawsuits that will negatively impact me just because Meta is a boogeyman.
antonim@lemmy.dbzer0.com
on 07 Feb 2025 22:47
collapse
It’s not Meta vs us, but opensource vs Google and Openai.
I never said it’s Meta vs us. It’s Meta vs (in this particular case) the book publishing industry. You can’t reduce the whole situation to open source vs closed source, there’s other “axes” at play here as well.
They are being sued for copyright infringement when it’s clearly highly transformative
They downloaded the entire Libgen and more. Going by the traditional explanations of piracy, that’s like stealing several hundred bookstores worth of books all at once, and then claiming it’s alright because your own writing is not plagiarised from any of the books you’ve stolen. (Piracy is not the same as actual stealing of course, but countless people have been being legally bullied and ruined with that logic.) Meta also got its data from Internet Archive; unless they only obtained their materials that are public domain or under a similar license, they’ve obtained a lot of material that IA has been sentenced for allowing unlimited access to back in 2020 (if you’ve followed the Hachette v. Internet Archive case). The brainfucking conclusion of your and Facebook’s case is that using illegal services is perfectly legal as long as you sufficiently transform the results of the illegal activity.
The rules are fine as is
Actually they’re not. Copyright law is insanely restrictive, and I don’t think you’ve dealt much with media if you think it’s fine (but I don’t wish to delve into this further as it’s beyond the scope of discussion).
Meta isn’t the one trying to change them
Of course they’re not trying to change them, that’s the point, they will get away with breaking them while being perfectly fine with other actors not being able to do so.
All LLMs and Gen AI use data they don’t own. The Pile is all scraped or pirated info, which served as a starting point for most LLMs. Image gen is all scraped from the web. Speech to text and video gen mainly uses YouTube data.
So either you put a price tag on that data, which means only a handful of companies can afford to build these tools (including Meta), or you understand that piracy is the only way for most to aquire this data but since it’s highly transformative, it isn’t breaching copyrights or directly stealing from them as piracy “normally” is.
I’m being pragmatic.
LodeMike@lemmy.today
on 07 Feb 2025 03:48
collapse
Where is the source content then
Knock_Knock_Lemmy_In@lemmy.world
on 07 Feb 2025 19:09
collapse
Annas archive. Keep up. Pffff.
jaybone@lemmy.world
on 06 Feb 2025 20:44
nextcollapse
It’s a popular search engine that works with shadow libraries like Sci-Hub or Library Genesis. Shadow libraries are hosts to copies of works of literature and science. Their legal status is murky at best but it’s incredibly impractical to persecute those accessing them.
jaybone@lemmy.world
on 06 Feb 2025 21:05
nextcollapse
So it’s like thepiratebay or 1337x.to but for books?
Also I think you mean prosecuting, not persecuting.
Corkyskog@sh.itjust.works
on 06 Feb 2025 21:07
nextcollapse
Those are torrents, Annas Archive is typically used for direct downloads.
Thanks. It’s confusing because everyone is talking about torrents. It’s in the title even, but I didn’t read the article.
Corkyskog@sh.itjust.works
on 06 Feb 2025 21:22
collapse
Well i think you can also torrent off of there too. There are massive backup files on their home page that they are basically begging people to download and seed… So maybe it’s that?
SharkAttak@kbin.melroy.org
on 06 Feb 2025 22:23
nextcollapse
Also I think you mean prosecuting, not persecuting.
Nowadays, I'm not so sure anymore.
PM_Your_Nudes_Please@lemmy.world
on 06 Feb 2025 23:05
collapse
TPB and 1337x are torrents, whereas Anna’s Archive typically uses direct downloads. So it’s more akin to the old CoolROMs back before the massive takedown purges.
Anna’s Archive does offer torrents, but it’s not for individual files. Their torrents are more like database backups, with thousands of books each. In fact, people will download and seed them to help increase AA’s resilience. Since they aren’t super useful for individual files, very few people use them as such. But clearly, Meta just used them to feed into an LLM, because they didn’t care about the content of the files as long as they were properly written. It was less “looking for your favorite fantasy book” and more “looking to grab every fantasy book ever written.”
MonkderVierte@lemmy.ml
on 07 Feb 2025 12:03
collapse
it’s incredibly impractical to persecute those accessing them.
Always was. If you’re serious, persecute those hosting it.
SpikesOtherDog@ani.social
on 06 Feb 2025 23:08
nextcollapse
ulterno@programming.dev
on 09 Feb 2025 00:39
collapse
And I’d guess all that money would then go to military funding, with Anna’s Archive, again getting nothing out of it?
SpikesOtherDog@ani.social
on 09 Feb 2025 00:45
collapse
It would go to… Uh…
HEY SOMEONE PUT A DEAD CAT ON THE TABLE!
njordomir@lemmy.world
on 07 Feb 2025 01:57
nextcollapse
If someone was to acquire a few hundred gigs of books and feed them to something like paperless-ngx, would it work as a sort of google of books? Are there any software projects better suited for doing thisand understand synonyms and perhaps some context? I guess AI search but guided for the intermediate user.
Google is so bad lately. Basically every result is official sponsored corporate biased BS. It would be nice to be able to instantly query a bunch of ebooks.
werefreeatlast@lemmy.world
on 07 Feb 2025 05:17
nextcollapse
drascus@sh.itjust.works
on 07 Feb 2025 11:38
nextcollapse
Just gotta love these big tech companies and their bullshit double standards.
bungalowtill@lemmy.dbzer0.com
on 07 Feb 2025 14:32
nextcollapse
The Pirates of the Crown
ad_on_is@lemm.ee
on 07 Feb 2025 18:09
nextcollapse
If buying ain’t owning, than downloading…
oh wait, that’s our slogan
meowmeowbeanz@sh.itjust.works
on 07 Feb 2025 20:42
collapse
Oh look, another tech giant treating open knowledge initiatives like their personal data buffet. Let me translate this corporate nonsense for you:
Meta: “We need training data for our AI!”
Also Meta: Let’s leech 81.7TB from a community project without contributing anything back.
The absolute audacity of downloading terabytes through torrents while their employees were internally admitting it was “legally problematic”. And the best part? They couldn’t even be bothered to seed properly - just grab and go, classic corporate behavior.
Remember when companies actually contributed to open source instead of just parasitically consuming it? But no, they’d rather burden volunteer-run projects with massive bandwidth costs while their lawyers probably bill more per hour than these projects’ entire monthly budget.
Pro tip Meta: If you’re going to pilfer knowledge from the commons, at least seed back properly. Your “move fast and break things” motto isn’t supposed to apply to community archives.
General_Effort@lemmy.world
on 07 Feb 2025 23:02
nextcollapse
When you’re shilling for copyright, at least pick a lane. Are they bad for “pirating” or bad for not supporting “piracy”?
I guess it doesn’t matter as long as the owners collect their rent.
ulterno@programming.dev
on 09 Feb 2025 00:34
collapse
They are pirating, while also DOSing the providers.
interdimensionalmeme@lemmy.ml
on 07 Feb 2025 23:11
nextcollapse
My seedbox is locked and load, please point me to the. Torrent in need.
Archive team assemble!
ILikeBoobies@lemmy.ca
on 07 Feb 2025 23:32
nextcollapse
underwire212@lemm.ee
on 08 Feb 2025 17:07
collapse
Yes please support annas-archive!! It is a wonderful project. I can essentially get an epub file for any book (including banned books) I want. They have so much more than that too.
Anti_Face_Weapon@lemmy.world
on 08 Feb 2025 15:39
collapse
threaded - newest
Do it, Judge. Protect the wealthy and say it’s not piracy. Do it.
Please! Think of the shareholders, we must protect them!
It’s not piracy. For corporations. For you and me believe it or not, straight to jail!
Just make an llc, now its legal again.
I’d almost like to think an LLC would be enough, but I suspect that only works if you also have a billion in VC funding and political connections.
Oh for sure, since the law is basically toilet paper for billionaires at this point.
They’ll be fined 100k
And they’ll ham up how punished and sorry they are, and how thankful they are for the judge handing down “fair and impartial” justice.
He already referred them to the justice department, this is a civil case, he cannot sentence them criminally.
Just want to make clear that Libtorrent is just the torrent application they were using, while the Libgen torrents are easily accessible on the libgen site, not through a separate “platform” called Libtorrent.
I wish people like us could help with these complaints, because then they might actually get the details more accurate to reality.
libgen.is/repository_torrent/
www.libtorrent.org
The amended complaint makes it sound like Libtorrent is a private tracker website when its just the application they were using on the publicly available torrents.
Totes yeet, yo.
People are putting an S on the end of words like ‘traffic’ and ‘email’. They will never understand the semantics of that correction.
Meta Horizons
But did they keep a good ratio though?
Asking the real questions.
1000% guarantee those mf’s had their upload choked to 20kbps
Nah they used a leeching client. No upload at all.
Gotta have some upload just for the protocol traffic tho.
I would assume that the requests sent from the torrent client to download data are not factored into the Upload amount for the torrent. When they mean no upload, it would be that none of the data in the files they downloaded were shared with anyone else, making them a piece of shit leecher.
20 was the lead engineer ‘mishearing’ Zuck after he said 2.
In copyright protection terms the ratio shouldn’t matter. They should pay for all the lost profits from pirating everything they’ve downloaded. Every time someone pirated it should be counted. And every time someone uses the AI trained on the data.
They can become the corporate Jesus of the interwebs, having paid for our sins.
Technically, copyright infringement is committed by the entity making and sending the copy, not the entity receiving it. Leeching could indeed remove liability.
I’m not sure if the courts have cared about that nuance when persecuting the ‘small fish,’ but I bet they would in this ‘big fish’ case.
If the receiving entity then ingests all that copyrighted material into its AI, and the AI sends it piece at a time to other receiving entities, that should be the AI infringing on everything it is copying to make its answers.
Yes, yes it should. But that’s a different act than the one being discussed here.
I agree. Still doesn’t hurt to bring it up on appropriate tangents.
<img alt="" src="https://feddit.it/pictrs/image/3b48cd8b-bf63-447a-9e90-7785b1d2af19.png">
Given the extent it should be considered criminal so $250k per offense and the higher ups who authorized the torrenting should get conspiracy charges at a minimum.
But this is America so they’ll probably pay a small amount, for Meta, and a light slap on the wrist with a finger wagging.
.
Each time someone uses their LLM it should be considered a violation.
People are using these things millions of times a day in aggregate. That adds up fast. $250k multiplied by millions suddenly isn’t so cheap.
$250k * [every book in existence] is literally nothing?
Remember, “offense” doesn’t mean “per torrent,” it means “per copyrighted work infringed.”
Average ebook size: 2.5 MB or so.
Meta downloaded 81 TB, or 81,000,000 MB.
81,000,000 / 2.5 = Approx 30 million books.
30,000,000 books * $250,000 per offence = $7.5 trillion
(are you sure you’re from programming.dev?)
you are being optimistic, it’s likely going to be considered “fair use” and then be business as usual. Meta themselves have claimed that they aren’t filing to dismiss because they believe they are on the legal side, due to the fact they aren’t distributing the pirated content, only using it for training which is currently a massive grey area that hasen’t been ruled as non-fair use
Maybe they hosted their servers in Eritrea, Turkmenistan or San Marino. No copyright laws there
<img alt="" src="https://media1.giphy.com/media/v1.Y2lkPTc5MGI3NjExcGJrc2R0OGtqaTdoNG80anpjaW1uemo4cXlhcHdpNDM1czlidG1nZyZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/E5R9miQPOqqjgIUl5l/giphy.gif">
Rules for thee, not for me
.
Facebook: I’ll just
torrent what I needburden your underfunded project and volunteers with over 81 TB of bandwidth costs without contributing anything in return, see yaaFTFY
Yeah the least they could do is seed forever.
Agreed. Seed forever and release the AI weights and model. That would be fair payment.
The entirely of Annas archive would be an excellent benchmark training set. Particularly a cleaned processed dataset.
Damn leeches
Meta has open sourced every single one of their llms. They essentially gave birth to the whole open llm scene.
If they start losing all these lawsuits, the whole scene dies and all those nifty models and their fine-tunes get removed from huggingface, to be repackaged and sold to us with a subscription fee. All the other domestic open source players will close down.
The copyright crew aren’t the good guys here, even if it’s spearheaded by Sarah Silverman and Meta has traditionally played the part of the villain.
Meta stole from everyone, including those that struggle to make ends meet, so it doesn’t matter that they gave you back some of it. Any moral qualms should evaporate when you consider that they did it to create shareholder value and the rest is philanthropy (aka pretend tax). As a socialist I believe that man is owed for his work and you can’t take from him even though technology makes it so easy.
Don’t give me that slop. No one except the biggest names are getting a dime out it once OpenAI buys up all the data and kills off their competition. It’s also highly transformative, which used to be perfectly legal.
Copyright laws have been turned into a joke, only protecting big money and their interests.
As a socialist I believe intellectual property is a falsehood and technological advancement should be for the public good. Open source LLMs are for the public good.
Given the options between having open source LLMs and the US Govt banning non-corpo non-proprietary LLMs and giving a free pass to people like Musk and Altman and Zucc to monopolize, I happily pick the former.
You’re delusional if you think they will pay anyone, the only way zucc will pay is with a guillotine.
Corpos will make inter-platform deals that’ll simply make all online data licensable for the right price and enrich each other so you can’t avoid it while still actually being a career creative, but price out academic researchers and the public sector so that all fruits of it stay behind closed R&D doors and be free of ethics etc.
Continuing in your role as a useful idiot, you’ll also most likely also foot the bill for it via subsidies from your taxes to “develop the AI sector” in some anti-China dick measuring contest by the US.
You will then be sold this data back via proprietary chat bots via a monthly subscription and you better pay up because once it gets really good, it will become mandatory to use for just about any job, leaving you with no choice.
Or you can support FOSS LLMs.
Lieber Genosse, der Hype um Affirming Incompetence (AI) ist der dieser Zeit die höchste Ausdruck der Entfremdung der Menschen von sich selbst, Zeugnis des Begehrens nach und Voraussetzung also der weitergehenden Fetischisierung seines Zugriffs auf Welt. Wie jedoch Bernard Stiegler so schön bemerkte: Kein Savoir-vivre ohne Savoir-faire! Dies seien die unabdingbaren Bedingungen für die Befreiung der Menschheit aus den sich selbst angelegten Ketten zur Errichtung einer geschwisterlichen Ordnung!
(now have fun w/ an LLM’s attempt of “advancement”!)
You’re confusing self entitlement to stuff with the left.
Lolwut? Public good is self-entitlement? Go read a fucking book. Communists are not pro-copyright, especially not when it only benefits the giant corpos.
Another day, another entitled artoid larping as progressive blocked.
That must have touched a sensitive spot lol.
I support FOSS LLMs, but which actually exist? Which LLMs have open-sourced all their training data?
Mistral? Deepseek?
Not LLM but also SD which uses a very popular free dataset.
Can I freely download all the training data for any of those? I was under the impression they were all trained on non-licensed and copyrighted data.
It’s complicated.
I know Stable Diffusion best so I’ll speak to that, they used to the LAION-5B dataset, which is, in practice freely available to download and use:
kaggle.com/…/guie-laion-5b-collect-and-download
github.com/opendatalab/laion5b-downloader
It’s also on HuggingFace but it’s unavailable.
huggingface.co/datasets/danielz01/laion-5b
But you can use this smaller newer version:
huggingface.co/datasets/…/relaion2B-en-research
Whether it’s appropriately licensed is an unsolved question though.
The dataset itself and the text portion of the text-imags pairs needed for training is CC-BY-SA, the newer versions linked above are CC-BY-4.0. creativecommons.org/licenses/by/4.0/deed.en
The images however are technically under their own copyright, which in practice means each of the billions of images could or could not have a licence that implicitly or explicitly forbids AI training use or forbids it only for commercial use.
Whether such a license is legally binding is at present unknown though, since licenses primarily deal with reproductions, which the pro-AI folks argue isn’t the case, and that training of NNs is more akin to viewing an image and memorising the patterns and relationships within, like a person viewing it.
That would make it non-infringing and therefore the model itself libre. In that case Mistral and LLaMa are also libre as long as the model itself is open source, which in this case really means “open weights”, so not like GPT and anything by “”“OpenAI”“”.
Weights are the result of a model being trained essentially. They’re they key bit that makes it or breaks it and how it works. Given that and knowing the structure of the model and framework used you can refine, modify and distribute it.
Those against AI will say that it’s more akin to file compression and that in one form or another it’s misuse. That would make the model an infringing derivative work and as such nor libre even if the model weights are open source.
In a way though you could argue that me vaguely memorising the imagery of a dude dressed in white holding a laser sword is just a lossy compressed copy of the copyrighted work of Star wars, and it’d be absurd to think that’s a violation and that infringement only occurs if I reproduce a work of substantial similarity commercially from that memory.
If I use Krita and draw a beautiful landscape which has been informed and inspired by at least in part by a movie I saw, is that copyright infringement or not? What if I use AI?
Well, current laws don’t say. We measure infringement in substantial similarity, provenance of information only comes in later (e.g. to prove against accidental similarity).
That’s also my own personal stance on the legal side of things, so up to you how you see it.
Calling property labor, doesn’t make you a socialist.
You’re confusing libleft with left.
No. Seriously, why do you want to call yourself a socialist?
The world is in a mess is that we were told to choose between fascists and pro-market technocrat libertarians pretending to be leftists. This is a worldwide issue that’s doubly important because those liberals guilt trip us for not supporting them and that’s why I’m just laying little bricks here and there. At the end of the tunnel we either rework our society into a socialist one or we succumb to feudal lords again. Years of neoliberal hegemony needs to be undone so I try to go against the grain like that sometimes, hoping I made someone think.
When you call yourself a socialist, what do you mean by that term?
I assume you probably want to know how this kind of leftism is different from others or other ideologies calling themself leftist, rather than for me to write an essay on myself.
I believe in equal opportunity but reject that you should be able to „win” in any system. I believe in empathy over soulless meritocracy. I believe in collective ownership but don’t reject that one is owed for his work. You could say it all stems from egalitarianism but this term has been caricatured by liberals too. For a long time I thought social democracy as an ideology gives you enough levers in the system to steer it toward that goal but time and time again it turned out that in most places SocDem parties are no different from liberal ones and so I learned from past mistakes.
What confuses me is that you argue that property owners should be able to demand payment for the use of their property without any further consideration. That is a very conservative capitalist stance. It’s not compatible with any flavor of socialism that I am aware of. In fact, most pro-capitalists would reject it as too far right. The only ideologue, I can think of, that holds this stance even for copyrights is Ayn Rand. Your ideas seem compatible with hers. I don’t understand why you would think of that as socialist or even left.
When you think payment you think „money” but I think „fair” :) We’ve been broken by capitalist hegemony to the point it’s hard of thinking of something different.
Still sounds like Ayn Rand and not socialism.
It sounds like a European soviet republic. Most of them were working reasonably well and were really good at preventing poverty but were stuck in-between being exploited by Russia and artificially cut off from half the world (big reason why they had to fail). Those countries solved problems progressive western democracies couldn’t ever solve, for example gender wage inequality (to the point it endures today). Unfortunately all of us in the „west” are stuck in a death spiral after US and Russia went tits up in the 70s/80s. Maybe we’ll have another go once this is finally done.
I’m not sure what you are trying to say here. Do you think that soviet states would have negotiated with owners of private property before using it for public benefit?
No, why would they? There’s a difference between strong taking from the weak and community taking surplus from everyone.
I’m trying to follow you. It would be ok if a soviet government did it, but if a private company does it, then it’s stealing. Because a soviet government is strong? Has control of the military and all that, unlike some start-up or even an established company?
I’m not sure I’m following you either, it appears to me that you don’t see a difference between tax and theft. It was common to outgrow this belief but it appears to be common now. I’ll try to explain.
When Meta takes from everyone it’s a bully that takes from the weak who can’t fight back. Meta does it so that they become the biggest fish in the pond as an end goal.
When a state takes from everyone and rich in particular it’s because we don’t to have this kind of big fish in the pond. We just want to chill.
That’s an odd thing to write. Why do you believe that?
Ok, I think I get this now. You believe in far-reaching intellectual property, and that property is inviolable, except to limit inequality. So, you reject US-style Fair Use which has a public benefit in mind. Instead, copying only doesn’t require permission if the rights-owner is wealthier than oneself. So, most people could freely copy Taylor Swift songs but perhaps not songs by some street musician. Does that cover it?
I wasted enough of my time which can be spent more productively. I’m pretty sure you’re not really interested. For some it must get much worse before they get it. Goodbye.
Nope. Get fucked
If the existence of open source LLMs hinges on the benevolence of one of the few most cancerous tech companies in the world, maybe they’re not really worth it?
This isn’t about “heroes” and “villains”. Facebook has been and has stayed the “villain”, they’ve done something colossally illegal that any mere mortal would be sued to death for (by an another “villainous” instance, the media system that has made piracy a necessity in the first place), and they’re hoping to get away with it simply on technicalities and by having more money for better lawyers. Rules are rules, if you don’t like them maybe Facebook should try to change them (and not just for themselves, but for the rest of us too)?
The existence hinges on the rewriting and strengthening of copyright laws by data brokers and other cancerous tech companies. It’s not Meta vs us, but opensource vs Google and Openai.
They are being sued for copyright infringement when it’s clearly highly transformative. The rules are fine as is, Meta isn’t the one trying to change them. I shouldn’t go against my own interests and support frivolous lawsuits that will negatively impact me just because Meta is a boogeyman.
.
I never said it’s Meta vs us. It’s Meta vs (in this particular case) the book publishing industry. You can’t reduce the whole situation to open source vs closed source, there’s other “axes” at play here as well.
They downloaded the entire Libgen and more. Going by the traditional explanations of piracy, that’s like stealing several hundred bookstores worth of books all at once, and then claiming it’s alright because your own writing is not plagiarised from any of the books you’ve stolen. (Piracy is not the same as actual stealing of course, but countless people have been being legally bullied and ruined with that logic.) Meta also got its data from Internet Archive; unless they only obtained their materials that are public domain or under a similar license, they’ve obtained a lot of material that IA has been sentenced for allowing unlimited access to back in 2020 (if you’ve followed the Hachette v. Internet Archive case). The brainfucking conclusion of your and Facebook’s case is that using illegal services is perfectly legal as long as you sufficiently transform the results of the illegal activity.
Actually they’re not. Copyright law is insanely restrictive, and I don’t think you’ve dealt much with media if you think it’s fine (but I don’t wish to delve into this further as it’s beyond the scope of discussion).
Of course they’re not trying to change them, that’s the point, they will get away with breaking them while being perfectly fine with other actors not being able to do so.
All LLMs and Gen AI use data they don’t own. The Pile is all scraped or pirated info, which served as a starting point for most LLMs. Image gen is all scraped from the web. Speech to text and video gen mainly uses YouTube data.
So either you put a price tag on that data, which means only a handful of companies can afford to build these tools (including Meta), or you understand that piracy is the only way for most to aquire this data but since it’s highly transformative, it isn’t breaching copyrights or directly stealing from them as piracy “normally” is.
I’m being pragmatic.
Where is the source content then
Annas archive. Keep up. Pffff.
What is Anna’s Archive?
It’s a popular search engine that works with shadow libraries like Sci-Hub or Library Genesis. Shadow libraries are hosts to copies of works of literature and science. Their legal status is murky at best but it’s incredibly impractical to persecute those accessing them.
So it’s like thepiratebay or 1337x.to but for books?
Also I think you mean prosecuting, not persecuting.
Those are torrents, Annas Archive is typically used for direct downloads.
Thanks. It’s confusing because everyone is talking about torrents. It’s in the title even, but I didn’t read the article.
Well i think you can also torrent off of there too. There are massive backup files on their home page that they are basically begging people to download and seed… So maybe it’s that?
Nowadays, I'm not so sure anymore.
TPB and 1337x are torrents, whereas Anna’s Archive typically uses direct downloads. So it’s more akin to the old CoolROMs back before the massive takedown purges.
Anna’s Archive does offer torrents, but it’s not for individual files. Their torrents are more like database backups, with thousands of books each. In fact, people will download and seed them to help increase AA’s resilience. Since they aren’t super useful for individual files, very few people use them as such. But clearly, Meta just used them to feed into an LLM, because they didn’t care about the content of the files as long as they were properly written. It was less “looking for your favorite fantasy book” and more “looking to grab every fantasy book ever written.”
Always was. If you’re serious, persecute those hosting it.
phys.org/…/2010-11-million-dollar-verdict-music-p…
In all fairness, meta should be assessed a fee of 250k per EACH pirated work.
This would amount to forfeiting all assets to doge.
They might end up having to pay more money than exists on the planet at that rate.
Good
Edit - See Gary Bowser
theguardian.com/…/the-man-who-owes-nintendo-14m-g…
Yes, that one.
I’m a reasonable man so I’ll allow it.
Assuming 2.6 MB per book.
81 TB would be 32,667,175 books.
At $250k per book that would come out to:
$8.17 trillion.
<img alt="" src="https://ani.social/pictrs/image/2f847ad7-95dd-42fc-b230-ecd03a8cc88c.webp">
And I’d guess all that money would then go to military funding, with Anna’s Archive, again getting nothing out of it?
It would go to… Uh…
HEY SOMEONE PUT A DEAD CAT ON THE TABLE!
If someone was to acquire a few hundred gigs of books and feed them to something like paperless-ngx, would it work as a sort of google of books? Are there any software projects better suited for doing thisand understand synonyms and perhaps some context? I guess AI search but guided for the intermediate user.
Google is so bad lately. Basically every result is official sponsored corporate biased BS. It would be nice to be able to instantly query a bunch of ebooks.
Yes. This exactly.
GPT, Meta, Deepseek and Google have probably all been trained on the data.
The problem is, training on the data, and actually training for knowledge of the data are VERY different things.
www.youtube.com/watch?v=_GkHZQYFOGM
Just gotta love these big tech companies and their bullshit double standards.
The Pirates of the Crown
If buying ain’t owning, than downloading…
oh wait, that’s our slogan
Oh look, another tech giant treating open knowledge initiatives like their personal data buffet. Let me translate this corporate nonsense for you:
Meta: “We need training data for our AI!” Also Meta: Let’s leech 81.7TB from a community project without contributing anything back.
The absolute audacity of downloading terabytes through torrents while their employees were internally admitting it was “legally problematic”. And the best part? They couldn’t even be bothered to seed properly - just grab and go, classic corporate behavior.
Remember when companies actually contributed to open source instead of just parasitically consuming it? But no, they’d rather burden volunteer-run projects with massive bandwidth costs while their lawyers probably bill more per hour than these projects’ entire monthly budget.
Pro tip Meta: If you’re going to pilfer knowledge from the commons, at least seed back properly. Your “move fast and break things” motto isn’t supposed to apply to community archives.
When you’re shilling for copyright, at least pick a lane. Are they bad for “pirating” or bad for not supporting “piracy”?
I guess it doesn’t matter as long as the owners collect their rent.
They are pirating, while also DOSing the providers.
My seedbox is locked and load, please point me to the. Torrent in need. Archive team assemble!
annas-archive.org
This is the website listed in the article
An alternate domain annas-archive.li
Yes please support annas-archive!! It is a wonderful project. I can essentially get an epub file for any book (including banned books) I want. They have so much more than that too.
Not seeding is crazy …