Stephen King: My Books Were Used to Train AI (www.theatlantic.com)
from L4s@lemmy.world to technology@lemmy.world on 24 Aug 2023 06:00
https://lemmy.world/post/3737374

Stephen King: My Books Were Used to Train AI::One prominent author responds to the revelation that his writing is being used to coach artificial intelligence.

#technology

threaded - newest

InvertedParallax@lemm.ee on 24 Aug 2023 06:09 next collapse

That doesn’t bode well for humanity.

j4k3@lemmy.world on 24 Aug 2023 06:18 next collapse

Lol. AI training is more like human awareness of a subject or style. LLM’s are not Artificial General Intelligence. They have no persistent memory. They are just a complicated way of categorizing and associating subjects mixed with a probability of what word comes next. The only thing they are really doing is answering what word comes next. Thar be no magic in them thar dragons

This is just another market hype article. AI can’t reproduce a work or replace the author. It can write a few lines that may reflect a similar style just like any human also familiar with the work and style.

Unless you want to go back to the medieval era of thought policing, all of these questions about AI training are irrelevant.

dave@feddit.uk on 24 Aug 2023 06:49 next collapse

It’s an interesting philosophical question to ask whether we humans, when writing something, based on the sum total of all the things we’ve seen, heard, read, etc., aren’t just also working out which is the most likely next word to make a good story. *

One question that could be worth asking though is whether this should have been done without permission. From experience talking with authors, that’s a bigger concern than whether they’ll be replaced.

*Totally agree with you that current LLMs are a long way from that. And humans don’t work at the word level either, so the abstraction is different, but the principle might be the same.

j4k3@lemmy.world on 24 Aug 2023 07:10 next collapse

I only play with offline AI stuff, but I usually use a model that is way more powerful than the average in the offline LLM space (Lama 2 70B). I have tried building a Leto II character from God Emperor of Dune, and I did several LoRA training layers on various data including the entire GEoD book. The book itself had very little value. It lacks the right context scope and detail needed to do anything relevant. It becomes possible to question some high level plot or character info, but I remember more than the model could accurately pull out.

The really cool cutting edge thing of the future will be authors embracing AI for their own creative inspiration and whomever is the first to fully integrate with AI. Train yourself and your characters so that a screen can generate images of characters and locations as they develop page by page. The identity of each character can build with a random seed with a persistent token so that every read has a different accompanied look and feel. In the future I’m sure it will be possible to limit a character’s context to the information read by the user, (this is a major key to individualized learning - probably the biggest future application of LLMs). A character/user matched scope would make interaction and direct questioning next level cool. Forget book clubs, now you converse with a circle of characters, or the author as a digitized character.

AI is the framework. All the hype and BS is because of stupid greedy people trying to make AI a product. This is next level privacy invasive nonsense. Proprietary AI must die. The future is offline AI as a simple tool unitized to create a richer or accessible future experience.

Hamartiogonic@sopuli.xyz on 24 Aug 2023 07:38 collapse

I’ve been trying to make GPT4 turn my ideas into short stories, but that has turned out to be a really tough nut to crack. I need to specify my vision in excruciating detail, and I still need to spend hours editing the text. Occasionally, GPT goes totally off the rails, so I need to hold its hand every step along the way in order to get what I want.

Other than that, it’s been very fun and educational.

You could compare the situation to a farmer using a horse to plow the field. Without the horse, it’s going to take forever, but with the horse you can get stuff done. It’s just that you have to steer the horse all the time. You can’t expect the horse to do all the work, but you can make it do a lot of it.

esadatari@lemmy.world on 24 Aug 2023 07:34 collapse

all i have to say is show me one truly original thought that doesn’t have a basis in other thoughts and i’ll start thinking this AI learning other people’s styles is a bad thing.

just one.

unfortunately for everyone poo pooing it, the fact of the matter is everything is a remix of a remix of a remix. ai is just one more step in a long line of steps of mimicry and adaptation.

and if someone carbon copies someone else’s work, that’s really obvious and then gets called the fuck out. but if you are taking someone’s style and combining it with subjects they’d never use and other styles, then you’ve just used ai to do the exact thing we’ve always done.

this knee jerk reaction stuff, honestly, i can’t wait for it to dissipate

Aceticon@lemmy.world on 25 Aug 2023 13:49 collapse

People believing this stuff is AGI also makes me think of how my poor, illiterate, provincial grandmother who when she moved to live with us “in the big city” used to get really confused when she saw the same actor in more than one soap opera on TV: she confused the immitation of real life which is acting (soap operate acting, even, which is generally pretty bad) with actual real life.

Kinglink@lemmy.world on 24 Aug 2023 06:46 next collapse

Glad I’m not the only one.

I like Stephen King’s work. But it’s pulp fiction. There’s a few really solid books from him, but he mostly writes for the masses.

It’d be like if Dan Brown claimed this, or Stephanie Meyers. I’m only against this because I think it brings down the quality of the AI.

It’s also wicked strange that so many of these authors hate this, but ignore how many authors grew up on their work, learned from it and tried to write in the same style (And failed). Stephen King is a unique brilliant author. His best work is not because he’s a good author because of the ideas and subjects of his book, not his writing style. An AI wouldn’t have created Misery with out it existed unless someone prompted it with that exact idea.

afraid_of_zombies@lemmy.world on 27 Aug 2023 01:17 collapse

And? If we are that pathetic maybe we deserve extinction.

How does having a new tool leave you worse off?

CitizenKong@lemmy.world on 24 Aug 2023 07:01 next collapse

The AI in black fled into the desert and the wordslinger followed.

discomatic@lemmy.ca on 26 Aug 2023 00:55 next collapse

This is my favourite comment on Lemmy so far.

afraid_of_zombies@lemmy.world on 27 Aug 2023 01:14 collapse

Don’t worry, a later AI will republish it and it will suck.

The Gunslinger was one of my favorites, before King decided to George Lucas it.

Turun@feddit.de on 24 Aug 2023 08:26 next collapse

Yes, and all of modern fantasy is heavily influenced by Tolkien’s writing, who in turn took inspiration from old legends like Beowulf.

As if human artists and writers are blind to anything ever created.

sab@lemmy.world on 24 Aug 2023 10:35 collapse

Humans with imperfect memories being influenced by a work <> AI language models being trained on a work.

Turun@feddit.de on 24 Aug 2023 14:11 next collapse

Sure, if you want to see it like that. But if you try out StableDiffusion, etc you will notice that “imperfect memory” describes the AI as well. You can ask it for famous paintings and it will get the objects and colors generally correct, but only as well as a human artist would. The details will be severely lacking. And that’s the best case scenario for the AI, because famous paintings will be over represented in the training data.

sab@lemmy.world on 24 Aug 2023 15:54 collapse

Nah.

By default an AI will draw from its entire memory, and so will have lots of different influences. But by tuning your prompt (or restricting your input dataset) you can make it so specific, it’s basically creating near perfect clones. And contrary to a human, it can then produce similar works hundreds of times per minute.

But even that is beside the point. Those works were sold under the presumption that people will read them. Not to ingest them into a LLM or text-to-image model. And now, companies like openai and others profit from the models they trained without permission from the original author. That’s just wrong.

stephen01king@lemmy.zip on 25 Aug 2023 19:47 next collapse

If you wanna make the claim that AI can make perfect clones, you gotta provide more proof than just your own words. I personally has never managed to make that happen.

BetaDoggo_@lemmy.world on 25 Aug 2023 20:11 next collapse

Obviously restricting the input will cause the model to overfit, but that’s not an issue for most models where Billions of samples are used. In the case of stable diffusion this paper had a ~0.03% success rate extracting training data after 500 attempts on each image, ~6.23E-5% per generation. And that was on a targeted set with the highest number of duplicates in the dataset.

The reason they were sold doesn’t matter, as long as the material isn’t being redistributed copyright isn’t being violated.

AEsheron@lemmy.world on 26 Aug 2023 05:09 collapse

Have you used Stable Diffusion. I defy you to make a perfect clone of any image. Take a whole week to try and refine it if you want. It is basically impossible by definition, unless you only trained it on that one image.

p03locke@lemmy.dbzer0.com on 25 Aug 2023 05:42 collapse

You seem to imply that AI has perfect memory. It doesn’t.

Stable Diffusion is a 4GB file of weights. ChatGPT’s model is of a similar size. It is mathematically impossible for it to store the entire internet on a few GBs of data, just like it is physically impossible for one human brain to store the entire internet with its neutral network.

sab@lemmy.world on 25 Aug 2023 06:14 collapse

But you can easily fit all of Kings work in a 4gb model. Just because it isn’t done in the most popular models, doesn’t make it ethical to do it in the first place.

In my opinion, you should only be able to use a work to train an AI model, if the work is public domain or if you have explicit permission to do so by the license holder. Especially if you then use that model for profit or charge orders to use ie.

p03locke@lemmy.dbzer0.com on 25 Aug 2023 13:28 collapse

But you can easily fit all of Kings work in a 4gb model.

But, uhhhh, they didn’t. They didn’t copy everything, word for word, and put it into a model. That’s not how AI models work.

sab@lemmy.world on 25 Aug 2023 15:45 collapse

I didn’t claim it was.

We can discuss technicalities all day long, but that’s so beside the point. Thread OP claimed that creating an LLM based on a copyrighted work is okay, because humans are influenced by other works as well. But a human can’t crank out hundreds of Stephen King-like chapters per hour. Or hundreds of Dali-like paintings pretty minute.

If King or Dali had given permission for their works to be used in this way, it might have been a different story, but as it is, AI models are being trained on (and profit from) huge amounts of data that they did not have permission for.

Edit: nevermind, I think trying to discuss AI ethics with you is pointless. Have a nice weekend!

commie@lemmy.dbzer0.com on 25 Aug 2023 18:46 collapse

But a human can’t crank out hundreds of Stephen King-like chapters per hour. Or hundreds of Dali-like paintings pretty minute.

so?

Treczoks@lemmy.world on 24 Aug 2023 10:23 next collapse

Now that might give an AI scary ideas…

[deleted] on 24 Aug 2023 11:30 collapse

.

TheFrogThatFlies@lemmy.world on 24 Aug 2023 11:01 next collapse

We need an AI with all human knowledge, or various with different specializations. But those AIs must not be in the hands of companies.

darth_helmet@sh.itjust.works on 25 Aug 2023 07:42 next collapse

if you think companies are going to misuse and abuse ai, just wait until we find out how the man is using them.

BloodForTheBloodGod@lemmy.ca on 25 Aug 2023 13:45 next collapse

AI doesn’t currently have any knowledge of facts. It just knows patterns.

Steeve@lemmy.ca on 26 Aug 2023 03:39 collapse

Problem is, how are you gunna run it? Meta has already open sourced an LLM that rivals GPT-4 with only 65B parameters, but you can’t even come close to running it with a top of line GPU.

afraid_of_zombies@lemmy.world on 27 Aug 2023 01:15 collapse

Maybe the Wikipedia Federation can do it.

Steeve@lemmy.ca on 27 Aug 2023 01:46 collapse

They could! It is open source.

ashok36@lemmy.world on 25 Aug 2023 14:41 next collapse

I mean, yeah, duh. Just ask any of them to write a paragraph “in the style of INSERT AUTHOR”.

If it can, then it was trained on that author. I’m not sure how that’s a problem though.

ForgotAboutDre@lemmy.world on 25 Aug 2023 19:09 collapse

We don’t have the legal framework for this type of thing. So people are going to disagree with how using training data for a commercial AI product should work.

I imagine Steven King would argue they didn’t have licenses or permission to use his books to train their AI. So he should be compensated or the AI deleted/retrained. He would argue buying a copy of the book only lets it be used for humans to read. Similar to buying a CD doesn’t allow you to put that song in your advert.

Drewelite@lemmynsfw.com on 25 Aug 2023 19:19 collapse

I would argue we do have a legal precedent for this sort of thing. Companies hire creatives all the time and ask them to do things in the style of other creatives. You can’t copyright a style. You don’t own what you inspire.

IchNichtenLichten@lemmy.world on 25 Aug 2023 19:30 collapse

That’s not what’s happening though. His works are being incorporated into a LLM without permission. I hope he sues the hell out of these people.

BetaDoggo_@lemmy.world on 25 Aug 2023 19:42 next collapse

Is that illegal though? As long as the model isn’t reproducing the original then copyright isn’t being violated. Maybe in the future there will be laws against it but as of now the grounds for a lawsuit are shaky at best.

IchNichtenLichten@lemmy.world on 25 Aug 2023 19:46 collapse

There are already laws around what you can’t and can’t do with copyrighted material. If the owners of the LLM didn’t obtain written permission I’d say they are on very shaky ground here.

BetaDoggo_@lemmy.world on 25 Aug 2023 20:48 collapse

What laws specifically? The only ones I can find refer to limits on redistribution, which isn’t happening here. If the models were able to reproduce the contents of the books that would be another issue that would need to be resolved. But I can’t find anything that would prohibit training.

IchNichtenLichten@lemmy.world on 25 Aug 2023 20:56 collapse

What laws specifically?

Existing laws to protect copywritten material.

“AI systems are “trained” to create literary, visual, and other artistic works by exposing the program to large amounts of data, which may consist of existing works such as text and images from the internet. This training process may involve making digital copies of existing works, carrying a risk of copyright infringement. As the U.S. Patent and Trademark Office has described, this process “will almost by definition involve the reproduction of entire works or substantial portions thereof.” OpenAI, for example, acknowledges that its programs are trained on “large, publicly available datasets that include copyrighted works” and that this process “necessarily involves first making copies of the data to be analyzed.” Creating such copies, without express or implied permission from the various copyright owners, may infringe the copyright holders’ exclusive right to make reproductions of their work.”

crsreports.congress.gov/product/pdf/LSB/LSB10922

BetaDoggo_@lemmy.world on 25 Aug 2023 22:17 collapse

By that definition of copying Google is infringing on millions of copyrights through their search engine, and anyone viewing a copyrighted work online is also making an unauthorized copy. These companies are using data from public sources that others have access to. They are doing no more copying than a normal user viewing a webpage.

IchNichtenLichten@lemmy.world on 25 Aug 2023 22:39 collapse

I don’t think so. Your comparisons aren’t really relevant. If Google scrapes a page containing copywritten material inadvertently and serves this to a user there are mechanisms to take down that content or face a lawsuit. Try posting a movie on Youtube, if a copyright holder notifies Google that content will be taken down.

Training a LLM is different, that material was used to help build the model and is now a part of that product. That creates a legal liability.

Drewelite@lemmynsfw.com on 25 Aug 2023 19:49 collapse

But that is what’s happening in the minds of creatives. Reading a book and taking inspiration is functionally the same mechanism that an LLM uses to learn. They read Stephen King, they copy some part of the style. Potentially very closely and for a corporation’s gain if that’s what’s asked of them.

IchNichtenLichten@lemmy.world on 25 Aug 2023 19:54 collapse

One person being influenced by a prose style isn’t the same as a company using a copyrighted work without permission to train a LLM.

Drewelite@lemmynsfw.com on 25 Aug 2023 20:05 collapse

Every learning material a company or university has ever used has been used to train an LLM. Us.

Okay I’m being a bit facetious here. I know people and chat GPT aren’t equivalent. But the gap is closing. Maybe LLMs will never bridge the gap, but something will. I hesitate to write into law now that any work can never be ingested or emulated by another intelligent entity. While the difference between a machine and a human are clear to you now, one day they won’t be.

The longer we hold onto the idea that our brains are somehow magically different from the way computers (are) will learn to think, the harder we’ll get blindsided by reality when they’re indistinguishable from us.

IchNichtenLichten@lemmy.world on 25 Aug 2023 20:15 collapse

There’s very little a LLM has in common with the human brain. We can’t do AGI yet and there’s no evidence that we will be able to create AGI any time soon.

The main issue as I see it is that we have companies trying to make money by creating LLMs. The people who created the source materials for these LLMs are not only not getting paid, they’re not even being asked permission. To me that’s dead wrong and I hope the courts agree.

Drewelite@lemmynsfw.com on 25 Aug 2023 21:03 collapse

I agree AGIs aren’t going to happen soon. But it sounds like we agree they WILL happen. LLMs do have one important thing in common with humans, their output is transformative based on what they learn.

I think what you take issue with is the scale. People wouldn’t care if this was something that existed on one computer somewhere. Where someone could type, “Write me a spooky story about Top Ramen in the style of Stephen King”. It’s that anyone can get a story in Stephen Kings style when all OpenAI had to do is buy a couple digital copies of Cujo. However, no one is upset that James Cameron bought one ticket to Pocahontas and thought, “What if that were on another planet?”. But 400 million people saw that movie.

People want to protect creatives buy casting a net over machines saying they can’t use the works of artists, even when transforming them, without payment to the original creator. While that sounds like it makes sense now, what happens when the distinction between human and machine disappears? That net will be around us too. Corporations will just use this to empower their copyright rule even further.

Stephen King was largely inspired by Ray Bradbury and H.P. Lovecraft. I doubt he paid them beyond the original price of a couple books.

BTW thanks for the thought provoking conversation. None of my friends care about this stuff 😅

afraid_of_zombies@lemmy.world on 27 Aug 2023 01:11 next collapse

So if your AI responses are biased towards car crashes you will know why now.

Take a Stephen King book you have never read. Open a random page and point to a random paragraph. Do this 3x. You will find a car crash, a memory of a car crash, someone talking about a car crash, or someone concluding X happened because of a car crash.

Kolanaki@yiffit.net on 27 Aug 2023 01:50 next collapse

Is that why when I ask ChatGPT to tell me a scary story, it invariably contains a sex scene between minors?

5BC2E7@lemmy.world on 27 Aug 2023 19:00 collapse

Does he also have a problem with people that were changed by his books?