Meta wins artificial intelligence copyright case in blow to authors

Meta wins artificial intelligence copyright case in blow to authors (www.ft.com)
from IsaamoonKHGDT_6143@lemmy.zip to technology@lemmy.world on 26 Jun 00:08
https://lemmy.zip/post/42352911

Link without the paywall

archive.ph/OgKUM

#technology

threaded - newest

mienshao@lemm.ee on 26 Jun 00:35 next collapse

American law has become a literal fucking joke (IAAL). I could’ve guessed the could get the outcome of this case without any facts: the huge corporation wins over authors. American law is no longer capable of holding major corporations to account, so we need a new legal system—one that’s actually functional.

MCasq_qsaCJ_234@lemmy.zip on 26 Jun 00:39 next collapse

Do you want a new constitution in the United States?

DarkDarkHouse@lemmy.sdf.org on 26 Jun 02:26 next collapse

Could start with a guillotine for corporations and see how that goes.

technocrit@lemmy.dbzer0.com on 27 Jun 14:56 collapse

If a pact between enslavers is a complete failure, why not just make another one? \s

en.wikipedia.org/wiki/Civil_religion

drmoose@lemmy.world on 26 Jun 02:32 collapse

But the actual process of an AI system distilling from thousands of written works to be able to produce its own passages of text qualified as “fair use” under U.S. copyright law because it was “quintessentially transformative,” Alsup wrote.

Thats the actual argument and the judge is right here. LLMs are transformative in every sense of the word. The technology is even called “transformers”.

actionjbone@sh.itjust.works on 26 Jun 02:43 next collapse

Yeah, well, I could call my dick the Magnum Opus but that wouldn’t make it two feet long.

Leesi@lemmy.blahaj.zone on 26 Jun 06:44 collapse

Fallacious argument.

Something that can’t generate wine glass full to the brim without a band-aid fix is far from “transformative.” Even if it were:

Only the owner of copyright in a work has the right to prepare, or to authorize someone else to create, a new version of that work.

More like obfuscated plagiarism.

drmoose@lemmy.world on 26 Jun 10:59 collapse

Nope I’m literally a data programmer working in this field. Any sufficiently transformed data even coming from hard copyright is transformative work and currently LLMs meet this criteria and will continue to do so. Wanna bet?

LwL@lemmy.world on 26 Jun 12:48 next collapse

I think there’s a blurry line here where you can easily train an LLM to just regurgitate the source material by overfitting, and at what point is it “transformative enough”? I think there’s little doubt that current flagship models usually are transformative enough, but that doesn’t apply to everything using the same technology - even though this case will be used as precedence for all of that.

There’s also another issue in that while safeguards are generally in place, without them llms would be very capable of quoting entire pages at least of popular books. And jailbreaking llms isn’t exactly unheard of. They also at least used to really like just verbatim repeating news articles on obscure topics.

What I’m mainly getting at is that LLMs can be transformative, but they also can plagiarize. Much like any human could. The question is then, if training LLMs on copyrighted data is allowed, will the company be held accountable when their LLM does plagiarize, the same way a person would be? Or would the better decision be to prohibit training on copyrighted data because actually transforming it meaningfully can not be guaranteed, and copyright holders actually finding these violations is very hard?

Though idk the case details, if the argument was purely focused on using the material to produce the model, rather than including the ultimate step of outputting text to anyone who asks, it was probably doomed to fail from the start and the decision makes perfect sense. And that doesn’t seem too unlikely to have happened because realizing this would require the lawyer making the case to actually understand what training an LLM does.

Natanael@infosec.pub on 26 Jun 19:02 next collapse

This case didn’t cover the copyright status of outputs. The ruling so far is just about the process of training itself.

IMHO the generative ML companies should be required to build a process tracking the influence of distinct samples on the outputs, and inform users of potential licensing status

Division of liability / licensing responsibility should depend on who contributes what to the prompt / generation. The less it takes for the user to trigger the model to generate an output clearly derived from a protected work, the more liability lies on the model operator. If the user couldn’t have known, they shouldn’t be liable. If the user deliberately used jailbreaks, etc, the user is clearly liable.

But you get a weird edge case when users unknowingly copy prompts containing jailbreaks, though

infosec.pub/comment/16682120

FatCrab@slrpnk.net on 27 Jun 14:15 collapse

You are agreeing with the post you responded to. This ruling is only about training a model on legally obtained training data. It does not say it is ok to pirate works–if you pirate a work, no matter what you do with the infringing copy you’ve made, you’ve committed copyright infringement. It does not talk about model outputs, which is a very nuanced issue and likely to fall along similar analyses as music copyright imo. It only talks about whether training a model is intrinsically an infringement of copyright. And it isn’t because anything else is insane and be functionally impossible to differentiate from learning a writing technique by reading a book you bought from an author. Even a model that has overfit training data, it is in no way recognizable to any particular training datum. It’s hyperdimensioned matrix of numbers defining relationships between features and relationships between relationships.

technocrit@lemmy.dbzer0.com on 27 Jun 14:57 collapse

It depends on your definition of “is”. In reality it depends on the original art and how it’s transformed. But legally it’s whatever benefits capital (aka your boss). I wouldn’t bet against your boss paying off the courts, lawyers, etc.

drmoose@lemmy.world on 27 Jun 15:03 collapse

Ok so if I don’t generate capital from it theres no crime? You can see how the original argument that all copying is copyright breach. Then you can infinitely dig into this - is my monitor copying pixels on my screen copying? What about the browser cache? So copyright can only be argued from the pov that breach has to be capital generating or direct damage creating like using that data for libel or something.

DeathByBigSad@sh.itjust.works on 26 Jun 01:15 next collapse

I’m torrenting movies in order to develop my own AI, your honor. I rest my case. 😎

drmoose@lemmy.world on 26 Jun 02:33 next collapse

Someone didn’t read the article

DeathByBigSad@sh.itjust.works on 26 Jun 02:37 collapse

I actually did. The judge made up some BS “your arguments were bad” ruling. Activist judges. This is selective enforcement so that the plebs cannot legally pirate while corporate has free reign.

drmoose@lemmy.world on 26 Jun 04:44 collapse

Again maybe you should give it another shot - piracy is still illegal but training is legal. How would “you torrenting movies” be alright here? You see how it makes no sense?

gratux@lemmy.blahaj.zone on 26 Jun 06:45 collapse

Meta literally torrented 82TB of books to train their AI. Why would torrenting movies to train AI be different?

drmoose@lemmy.world on 26 Jun 10:57 collapse

Dude just read the article. Torrenting IS illegal even by Meta here. What’s up with people being willingly illiterate here?

The entire thing judge saying is that you can’t sue for AI training but come back with a different lawsuit hinting at piracy which was just set as precedent in another lawsuit last week.

stephen01king@lemmy.zip on 27 Jun 12:17 collapse

The anti-AI brainrot is making them hallucinate like the AI they hate so much.

Natanael@infosec.pub on 26 Jun 19:08 collapse

The judge explicitly did not allow piracy here. Only legally acquired media can be used for training.

Edit: techdirt.com/…/two-judges-same-district-opposite-…

frustrated_phagocytosis@fedia.io on 26 Jun 01:27 next collapse

All I'm hearing is that pirating is A-OK as long as you claim it's for model training

WatDabney@fedia.io on 26 Jun 01:32 next collapse

Sorry, but no. That's just the paper-thin excuse.

Pirating, like pretty much anything else that's sometimes a crime in the current US, is A-OK if you can buy enough judges and politicians.

Alwaysnownevernotme@lemmy.world on 26 Jun 19:13 collapse

I just put a thin blue line sticker on my case and full send that bitch.

Natanael@infosec.pub on 26 Jun 18:55 collapse

The ruling explicitly does not allow pirating. It only lets you run ML training on legally acquired media.

They still haven’t ruled on copyright infringement from pirating the media used to train, and they haven’t ruled on copyright status of outputs (what it takes to be considered transformative).

This is judge Alsup, same guy who ruled in Oracle vs Google

Edit: techdirt.com/…/two-judges-same-district-opposite-…

ieatpwns@lemmy.world on 26 Jun 11:12 next collapse

I’d feel better about this if meta actually produced anything of value and I was able to also violate their copyright, but they’re just fucking leeches bro

3dcadmin@lemmy.relayeasy.com on 26 Jun 12:56 next collapse

The tinternet is getting like the Wild West again, circa early 200s and Napster and all that… This will pass, dunno when or how but it will pass, or perhaps we will start getting everything paywalled so LLMs can’t just scrape data without some sort of payment. I don’t actually know many people that like AI nowadays

rikudou@lemmings.world on 26 Jun 18:37 next collapse

That’s probably your social bubble. My company is currently deepthroating everything that has AI in its name. I jokingly mentioned they should rename the company to Jira&AI, the joke was not well received.

Anyway, most people I know (including me) are somewhere in the middle - not quite fans in the traditional sense, but definitely not disliking AI.

3dcadmin@lemmy.relayeasy.com on 27 Jun 10:11 collapse

It is, but as many are artists in some ways it is that they dislike the most

technocrit@lemmy.dbzer0.com on 27 Jun 15:00 collapse

It’s sort of like that… Except instead of a bunch of regular people sharing music, it’s a bunch of capitalists stealing all art for profit. Ofc the major difference is that’s it’s legal for capital.

technocrit@lemmy.dbzer0.com on 27 Jun 14:54 next collapse

Wow you mean the state serves capital? I thought for sure it would once again fight for the rights of artists and their extremely profitable IP. \s \s \s

C1pher@lemmy.world on 30 Jun 22:45 collapse

Of course it ended up that way. Who do you think lobbies for this kind of shit? Whos paying those who make the decisions?