Nvidia’s biggest product is absolutely AI by a massive landslide, I’m pretty sure I read that the point of them downloading these videos and doing the training is to build a pipeline for their AI users to do the same with their own shit. (Can’t be bothered to double-check cuz I really don’t care)
So they aren’t downloading all this video to make a crazy AI model. They’re downloading all this video to make a tool for their AI customers to use, you may not agree but improving their product is exactly what they’re doing.
Grimy@lemmy.world
on 05 Aug 2024 21:22
nextcollapse
There’s only a handful of video datasets and all of it is owned by Google through YouTube or big Hollywood companies like Disney and Netflix.
These companies are foaming at the mouth with rage thinking about what generative AI will do to their industry and how much it will help the currently non existant indie one. They will do whatever it takes to fence in the playbox and make sure they get to be the toll man.
This was never about AI getting to live or not, but who gets to own it. 404media is essentially a mouthpiece for these corporations, willingly or not, and the strengthening of copyright laws will not help the consumers or the small time creators. The only exception being laws that force copy left licenses onto models but that’s not what is being pushed right now, as well as aocs Deepfake act which is well thought out imo.
Anyone should be permitted to train on YouTube and Netflix data, and Nvidia might even open source it in any case.
Sconrad122@lemmy.world
on 06 Aug 2024 01:04
collapse
Nvidia does not have a strong history of open sourcing things, to say the least. That last bit sounds like pure hopium
trollbearpig@lemmy.world
on 06 Aug 2024 13:25
nextcollapse
The guy you are replying to is in all AI posts defending AIs. He is probably heavily invested in this BS or being paid for it, don’t waste your time with him.
Tbh, someone has to. Have you ever asked yourself if the intense hate AI gets and how 99% of articles are against it is organic?
There’s a handful of companies that are poised to win big if they can put up a fence around AI while making sure the public can’t run strong models. There is an intense media campaign to make sure the public thinks either AI is dangerous (so they can be the only ones legally allowed to distribute them) or that AI is theft (So they can be the only ones to afford building them).
Do not let yourself be manipulated, almost all strengthening of copyrights related to AI is completely against our interests.
And no, I’m not getting paid lol. I have a vested interest because I use generative technology for work and for fun in my free time. I’m also interested in not handing out our whole economy on a silver platter to Google and Microsoft, if I can maybe help with a couple of comments a week, I will. Why don’t you explain why I’m wrong instead of sending out baseless accusations?
trollbearpig@lemmy.world
on 07 Aug 2024 11:53
collapse
Nah my man, you are either brainwashed or are being paid hahaha. Is copyright a mess? Of fucking course, I haven’t meet a single person (except crazy ass libertarians funnily enough hahaha) that likes copyright. Are big corporations using copyright to exploit artists, create monopolies, and generally being dicks? Again, of fucking course.
But anyone, like you, saying that we should just let AIs destroy copyright effectively is a fucking prick, that simple. And your agruments are dissingenous at best or outright lies. For example, just as big copyright holder companies are pushing to strengthen copyright law, the big tech companies are pushing for effectively destroying copyright through AI models. I have seen you pushing in multiple thread for open source models like that’s a solution. But if you were a serious person researching about the software open source community you would see that pretty much no one there agrees with your position because it would effectively destroy the copyleft open source licenses. After all, if an “AI” model, open source or not, is allowed to just “train” on my AGPL code and spit it back (with minor modifications at best) to an engineer in AWS that’s it for my project. Amazon will do the Amazon thing and steal the project. So say goodbye to any software freedom we have.
And let’s be 100% clear here, this is not being pushed by the evil copyright holders like you seem to imply (and they are totally evil just to be clear hahahah). This is being pushed by the big tech companies and people like you spreading their propaganda. The fact that the copyright holders happen to be in the right this time is just a broken clock being right and all that, but it’s still good that they are pushing back to big tech. I do agree we have to keep an eye on them, the objective here can’t be to make copyright bigger, just to close the “loophole” that big tech companies are exploting to steal everything.
People like you who want to destroy copyright without offering any alternatives to allow creatives to work in a market are either missinformed or just assholes. Again, of fucking course it’s not an ideal system, but going full kamikaze and just destroying any possibility for artists and creatives of making a living with their work is the most evil thing goung on right now, so bad that the big copyright holders happen to fall on the less bad side this time hahaha. And all for what? So people can be lied to by dumb chatbots? Or so people can create mediocre derivative “art” without putting any effort? Or so we can get mediocre code autocomplete that is subtly wrong all the time? Is fucking ridiculous.
31337@sh.itjust.works
on 07 Aug 2024 18:24
nextcollapse
After all, if an “AI” model, open source or not, is allowed to just “train” on my AGPL code and spit it back (with minor modifications at best) to an engineer in AWS that’s it for my project. Amazon will do the Amazon thing and steal the project. So say goodbye to any software freedom we have.
An engineer at AWS can already just copy your code, make minor modifications, and use it. I would think the same legal recourse would apply if it was outputted from an LLM or just a copy-paste? This seems like a tangential issue to whether the LLM was trained on your code or not (not training on your code obviously reduces the probability of the LLM spitting it back out near-verbatim though). Personally, I don’t see anything wrong with anyone using public code to build statistical models. And I think the pay-to-scrape models that Reddit, Xitter, and others are employing will help big tech build the “moat” they’re looking for. Big tech is asking for AI regulation for similar reasons.
trollbearpig@lemmy.world
on 07 Aug 2024 18:32
collapse
An engineer at AWS can already just copy your code, make minor modifications, and use it.
You are 100% wrong here my man. If an engineer does this they are creating a derivative work and they have to fullfil the conditions of the license of the code. No wonder you don’t see anything wrong here, you AI people live in a fantasy world when it comes to how copyright works hahahaha. Please stop talking about shit you know nothing about.
31337@sh.itjust.works
on 07 Aug 2024 18:54
collapse
I stated that they can do this, and asked if they could be sued if they used near-verbatim code generated from an LLM, just like they could be sued if they copy-pasted AGPL code.
Edit: Tools like CoPilot tell you if your code is similar to publicly available code so you can avoid these issues.
Edit: Just looked up EFF’s position and I tend to agree with it:
Artificial Intelligence and Copyright Law
Artists are understandably concerned about the possibility that automatic image generators will undercut the market for their work. However, much of what is criticized is already considered fair use under copyright law, even if done at scale. Efforts to change copyright law to transform certain fair uses into infringement carry serious implications, are likely to interfere with the innovative potential of AI tools, and ultimately do not benefit artists. In fact, the use of these tools could expand the capacity of artists to create expressive works. Policymakers should emphasize the importance of human labor and investment in what receives copyright protection to maintain wages and dignity. Artists should be protected from efforts by large corporations to both substitute their labor with AI tools and create a new, unnecessary copyright regime around AI-generated art.
Machine Learning is a Fair Use
The process of machine learning for generative AI art is like how humans learn—studying other works—it is just done at a massive scale. Huge swaths of data (images, videos, and other copyrighted works) are analyzed and broken into their factual elements where billions of images, for example, could be distilled into billions of bytes, sometimes as small as less than one byte of information per image. In many instances, the process cannot be reversed because too little information is kept to faithfully recreate a copy of the original work.
The analysis work underlying the creation and use of training sets is like the process to create search engines. Where the search engine process is fair use, it is very likely that processes for machine learning are too. While the act of analysis may potentially implicate copyright, when that act is a necessary step to enabling a non-infringing use, it regularly qualifies as fair use. If the intermediate step were not permitted, fair use would be ineffective. As such, when factual elements of copyrighted works are studied and processed to create training sets—which, once again, is how we humans learn and are inspired by themes and styles in art and other works—that is likely to be found a fair use.
trollbearpig@lemmy.world
on 07 Aug 2024 19:09
collapse
What point are you trying to make? That the fact that someone can break the law means we should not have laws? I honestly don’t get what you are trying to say.
31337@sh.itjust.works
on 07 Aug 2024 19:11
collapse
I’m saying using code for training is a different issue that copyright infringement. I edited my post above to better lay out my position.
trollbearpig@lemmy.world
on 07 Aug 2024 19:21
collapse
And that’s the whole point of my comment, did you even read it? To summarize, there is currently a loophole in law that allows these bullshit arguments about it being different than straight up copying shit (though this haven’t been litigated yet, so it’s not yet clear if these arguments are actually valid). This means that while a person reading my AGPL code and copying it (without following the license) is 100% illegal, doing the same through an LLM may be legal. So this means that open source licenses can be bypassed by first training an LLM with the code and then extracting the code from the LLM. This is terrible for open source, and in general for anyone who wants to make a living from creating copyrighted work. So we should close this loophole, and I’m glad there is a push to close this through better laws. Even if these laws are comming from Disney, Sony, and all those awful companies.
So again, what’s the point you are trying to make here? That we shouldn’t make these laws stronger to prevent this bullshit? I honestly don’t understand what you are trying to argue here, nothing of what you have said has anything to do with this conversation.
31337@sh.itjust.works
on 07 Aug 2024 19:29
collapse
That we already have laws that protect copyright infringement (which seem like they would still apply if it was spit out by an LLM or not), and no more should be made. That training on public data is fine.
trollbearpig@lemmy.world
on 07 Aug 2024 19:44
collapse
Any arguments to defend your position? I’m giving you a very clear example of the awful consecuences of following that path. And the same applies to any creative work. You are just being dismissive without proposing any real solution. Do better man.
31337@sh.itjust.works
on 07 Aug 2024 19:55
collapse
The EFF link I posted above provides evidence. Again, here’s a quote from part of it:
The process of machine learning for generative AI art is like how humans learn—studying other works—it is just done at a massive scale. Huge swaths of data (images, videos, and other copyrighted works) are analyzed and broken into their factual elements where billions of images, for example, could be distilled into billions of bytes, sometimes as small as less than one byte of information per image. In many instances, the process cannot be reversed because too little information is kept to faithfully recreate a copy of the original work.
As I mentioned before, Copilot at least, helps people avoid copyright infringement by notifying you if your code is similar to public code. The solution I’m proposing is no new laws, and just enforcing the ones we have. Most of the laws being proposed look like attempts at regulatory capture to me.
Doomsider@lemmy.world
on 07 Aug 2024 21:54
collapse
That is a long winded way to say you are a copyright defender. Your insisting on finding an alternative to a broken system so rent seekers can continue to exist is naive to say the least.
I think most people with your stance (don’t throw out the baby with the bathwater) really have no idea how broken copyright and intellectual property is.
AI companies have already proven copyright is DOA. It was never designed for the little guy. That is just propaganda you have fallen prey to.
Simply put copyright was not needed for all of human history and it is still not needed. Pretending you have a unique idea, song, painting, etc in a world of billions of humans is beyond ridiculous.
The concept was already broken from the start because everything in science and art is iterative. Giving monopoly power to rent seekers is the natural result of a broken concept.
Their nematron 320b model was released on what essentially is an open source licence (available for commercial use except if you are doing shady things like spamming and collecting biometric data).
Having a robust open source ecosystem directly benefits Nvidia since they sell more higher end consumer GPUs.
Obviously, there’s a real chance that this isn’t open sourced since it’s a video model and there’s huge money involved. Doesnt really change the fact that having YouTube and Netflix dictate who gets to make video models and at what cost isn’t a good idea.
R00bot@lemmy.blahaj.zone
on 06 Aug 2024 02:23
nextcollapse
I feel like the amount of training data required for these AIs serves as a pretty compelling argument as to why AI is clearly nowhere near human intelligence. It shouldn’t take thousands of human lifetimes of data to train an AI if it’s truly near human-level intelligence. In fact, I think it’s an argument for them not being intelligent whatsoever. With that much training data, everything that could be asked of them should be in the training data. And yet they still fail at any task not in their data.
Put simply; a human needs less than 1 lifetime of training data to be more intelligent than AI. If it hasn’t already solved it, I don’t think throwing more training data/compute at the problem will solve this.
Hunter232@programming.dev
on 06 Aug 2024 02:48
nextcollapse
Humans have the advantage of billions of years of evolution.
Cyteseer@lemmy.world
on 06 Aug 2024 04:12
collapse
“ai” also has the advantage of billions of years of evolution.
noobdoomguy8658@feddit.org
on 06 Aug 2024 04:14
nextcollapse
We’re very proficient at walking, but somehow haven’t produced a walking home or anything like that.
It’s not very linear.
wizardbeard@lemmy.dbzer0.com
on 06 Aug 2024 12:52
collapse
Definitely not the same thing. Just because you can make use of the end result of major efforts does not somehow magically give you access to all the knowledge from those major efforts.
You can use a smart phone easily, but that doesn’t mean you magically know how to make one.
stupidcasey@lemmy.world
on 06 Aug 2024 03:07
nextcollapse
You’ve had the entire history of evolution to get the instinct you have today.
Nature Vs Nurture is a huge ongoing debate.
Just because it takes longer to train doesn’t mean it’s not intelligent, kids develop slower than chimps.
Also intelligent doesn’t really mean anything, I personally think Intelligence is the ability to distillate unusable amounts of raw data and intuit a result beneficial to one’s self. But very few people agree with me.
Peanutbjelly@sopuli.xyz
on 07 Aug 2024 13:56
collapse
I see intelligence as filling areas of concept space within an econiche in a way that proves functional for actions within that space.
I think we are discovering more that “nature” has little commitment, and is just optimizing preparedness for expected levels of entropy within the functional eco-niche.
Most people haven’t even started paying attention to distributed systems building shared enactive models, but they are already capable of things that should be considered groundbreaking considering the time and finances of development.
That being said, localized narrow generative models are just building large individual models of predictive process that doesn’t by default actively update information.
People who attack AI for just being prediction machines really need to look into predictive processing, or learn how much we organics just guess and confabulate ontop of vestigial social priors.
But no, corpos are using it so computer bad human good, even though the main issue here is the humans that have unlimited power and are encouraged into bad actions due to flawed social posturing systems and the confabulating of wealth with competency.
rdri@lemmy.world
on 06 Aug 2024 03:53
nextcollapse
There is no “intelligence”, ai is a pr word. Just a language model that feeds on a lot of data.
R00bot@lemmy.blahaj.zone
on 06 Aug 2024 04:18
collapse
Oh yeah we’re 100% agreed on that. I’m thinking of the AI evangelicals who will argue tooth and nail that LLMs have “emergent properties” of intelligence, and that it’s simply an issue of training data/compute power before we’ll get some digital god being. Unfortunately these people exist, and they’re depressingly common. They’ve definitely reduced in numbers since AI hype has died down though.
todd_bonzalez@lemm.ee
on 06 Aug 2024 23:41
collapse
A human lifetime worth of video is not anywhere close to equalling a human lifetime of actual corporeal existence, even in the perfect scenario where the AI is as capable as a human brain.
R00bot@lemmy.blahaj.zone
on 07 Aug 2024 00:36
collapse
Strange to equate the other senses to performance in intellectual tasks but sure. Do you think feeding data from smells, touch, taste, etc. into an AI along with the video will suddenly make it intelligent? No, it will just make it more likely to guess what something smells like. I think it’s very clear that our current approach to AI is missing something much more fundamental to thought than that, it’s not just a dataset problem.
noobdoomguy8658@feddit.org
on 06 Aug 2024 04:24
nextcollapse
Obligatory fuck AI and the illeterate bros pushing it.
What kind of videos, though? A lot of such material is very far from being proper educational material that we show other people to really teach them much, let alone educate them well enough to be anywhere trustworthy. This is a very processed material, with years of preparation once you consider the prior education of the individuals involved in the creative process - think of the past experiences silently influencing them, their initial knowledge on the subject obtained from somewhat basic facts from school or otherwise, their misconceptions, iterations that nobody knows about, and many other things that we don’t usually directly associate with the act of working on something like a video, but that eventually do dictate a lot of the decisions and opinions put into it.
It’s one thing that the AI has no intelligence in it whatsoever, but the fact that it’s being pumped with information and “knowledge” in basically the reverse order doesn’t help it become any better.
On the other hand, the entire thing is not about making something that works well, but something that sells well. And then there’s people putting too much faith into the thing and trusting it with way too much stuff than they should (which is also the case with a lot of other tech, though, admittedly).
Some things of today are so damn unexciting.
MonkderVierte@lemmy.ml
on 06 Aug 2024 08:21
nextcollapse
Properly following licensing, right?
lemmyvore@feddit.nl
on 06 Aug 2024 09:11
nextcollapse
No, see, because it’s “learning like a human”, and everybody knows that you’re allowed to bypass any licensing for learning. /s
But seriously I don’t know how they make the jump to these conclusions either.
areyouevenreal@lemm.ee
on 06 Aug 2024 11:07
collapse
This is a massive strawman argument. No one is saying you shouldn’t have a license to view the content in order to train an AI on it. Most of the information used to train these models is publicly available and licensed for public viewing.
Just because something is available for public viewing does not mean it’s licensed for anything except personal use.
The strawman here is that since physical people benefit from personal use exceptions in the law, machine learning software should too. But why should they? Since when is a piece of software ran by a corporation equivalent to an individual person?
VoterFrog@lemmy.world
on 06 Aug 2024 12:46
nextcollapse
Copyright licensing allows the owner to control how a work is distributed, not how it’s consumed. “Personal use” just means that you can’t turn around and redistribute a work that you’ve obtained. Not that you’re not allowed to consume it in a corporate setting.
FunnyUsername@lemmy.world
on 06 Aug 2024 12:50
nextcollapse
Consuming is not the same thing as training. A machine is not a consumer, it is a tool.
areyouevenreal@lemm.ee
on 06 Aug 2024 13:12
nextcollapse
A program of machine can be a consumer of something, although if you want to be technical you could say the person using the machine is the consumer. In actual computer science we talk about programs consuming things all the time.
FunnyUsername@lemmy.world
on 06 Aug 2024 13:15
collapse
In actual computer science you talk about AI all the time as well but it’s not actually intelligent is it? It’s just SmarterChild 2.0 and literally has no idea what word it said just before it’s current one. Not intelligent. Words are often used inappropriately. The only thing computers can consume is data and electricity by definition, and consuming data is not the same as implementing it in a language (or visual) model that you intend to profit from. This is data theft, unless properly licensed.
areyouevenreal@lemm.ee
on 06 Aug 2024 13:48
nextcollapse
How intelligent it is or isn’t is irrelevant. We talk about much dumber programs than AI as being consumers of files and data including things like compilers. Would it not be person use for you to view a picture in a photo viewer or try and edit it in GIMP?
It’s not data theft at all unless the courts and law says it is. Ranting on lemmy won’t change that fact. Theft is a construct of law.
You can add clauses against use as AI training data to your licence if you wish.
FunnyUsername@lemmy.world
on 06 Aug 2024 13:52
collapse
You can try to equate humans to computers all day, and you can even pass laws that says they’re the same thing. That does not make it true. A company using software to profit off data they have not licensed (whether it’s public or not does not matter! That is not how copyright law works!) is theft.
Please try to sell DVDs of markiplier’s publicaly available YouTube content and tell people how you’re allowed to because it’s publicaly available.
areyouevenreal@lemm.ee
on 06 Aug 2024 14:14
collapse
I am not equating humans with computers. These businesses are not selling people’s data when doing AI training (unlike actual data brokers). You can’t say something AI generated is a clone of the original anymore than you can say parody is.
FunnyUsername@lemmy.world
on 06 Aug 2024 14:23
collapse
I absolutely can. Parody is an art form, which is something that can exclusively only be created by human beings. AI is an art laundering service. Not an artist.
The law should reflect that these companies need to be first granted permission to use datasets by the rights holders, and creative commons licenses need to be given an opportunity to opt out of being crawled for these datasets. Anything else is wrong. Machines are not humans. Creative common copyright law was not written with the concept of machines being “consumers”. These companies took advantage of the sudden emergence of these models and the delay of law in holding their hunger for data in check. They need to be held accountable for their theft.
areyouevenreal@lemm.ee
on 06 Aug 2024 15:01
collapse
There are already anti-AI licenses out there. If you didn’t license your stuff with that in mind that’s on you. Deep learning models have been around for a lot longer than GPT 3 or anything that’s happened in the current news cycle. They have needed training data for that long too. It was predictable stuff like this would happen eventually, and if you didn’t notice in time it’s because you haven’t been paying attention.
You don’t get to dictate what’s right and wrong. As far as I am concerned all copyright is wrong and dumb, but the law is what the law is. Obviously not everyone shares my opinion and not everyone shares yours.
Whether an artist is involved or not it’s still a transformative use.
areyouevenreal@lemm.ee
on 06 Aug 2024 13:50
collapse
Also the way you imply children can’t be intelligent is disgusting.
VoterFrog@lemmy.world
on 06 Aug 2024 13:58
collapse
Training literally is consuming. A copyright license doesn’t get to dictate what computer programs the work is allowed to be used with. There’s a ton a entertainment mega corps that would love for that to be the case, though.
You’re saying that you’re not allowed to do a statistical analysis on a copyrighted work. It’s nonsense. It’s well-established that copyright does not prevent that kind of use.
FunnyUsername@lemmy.world
on 06 Aug 2024 14:00
collapse
What makes you think copyright law doesn’t apply to companies using copy written data to sell and profit off of? That is not the case. Also, you’re putting words in my mouth. Feel free to read my other replies on this thread but I don’t feel like repeating myself, but I think it’s clear I’m not saying computers aren’t allowed to process data that’s absurd.
VoterFrog@lemmy.world
on 07 Aug 2024 01:02
collapse
Because that’s not what copyright is for. It exists to give the creator exclusive rights over distribution. That’s it. So unless the company is planning to distribute the work and they obtained a copy willingly and legally distributed to them, then copyright is the wrong law to lean on.
Copyright licensing allows the owner to control how a work is distributed, not how it’s consumed.
First of all, that’s incorrect.
Secondly, by default you have zero rights to someone else’s work. If something doesn’t explicitly grant you rights, you have none. If there’s a law or license, and if it’s applicable to you, you get exactly what’s specified in there.
The “personal use” or “fair use” exceptions in some places grant some basic rights but they are very narrow in scope and generally applicable only to individuals.
VoterFrog@lemmy.world
on 07 Aug 2024 00:23
collapse
I mean, it’s in the name. The right to make copies. Not to be glib, but it really is
A copyright is a type of intellectual property that gives its owner the exclusive legal right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time.
You may notice a conspicuous absence of control over how a copied work is used, short of distributing it. You can reencode it, compress it, decompress it, make a word cloud, statistically analyze its tone, anything you want as long as you’re not redistributing the work or an adaptation (which has a pretty limited meaning as well). “Personal use” and “fair use” are stipulations that weaken a copyright owner’s control over the work, not giving them new rights above and beyond copyright. And that’s a great thing. You get to do whatever you want with the things you own.
You don’t have a right to other people’s work. That’s what copyright enables. But that’s beside the point. The owner doesn’t get to say what you use a work for that they’ve distributed to you.
wizardbeard@lemmy.dbzer0.com
on 06 Aug 2024 12:49
nextcollapse
A tangentially related but good example of this sort of thing is BluRays and community movie nights (like setting up a projector in a park).
Most of these movie nights are de facto illegal, as even though you own the BluRay, it is not licensed for public showings, just for personal use. Obviously no one gives enough of a shit to enforce this against small groups, especially if they aren’t making money off it, but if a theater started offering showings of shit the owner just bought on BluRay or UHD disks, it wouldn’t last too long.
Similar thing here. Just because you can access the content to view it yourself doesn’t mean you have the rights to do more than that with it. As an individual, you’re likely fine to break those rules. As a giant fucking corporation, it’s time for you to pay up.
technocrit@lemmy.dbzer0.com
on 07 Aug 2024 16:11
collapse
Since when is a piece of software ran by a corporationperson equivalent to an individual person?
Gotta remember that legally a corporation IS a person.
Another great example of how the law is batshit serving capital and destroying the planet.
31337@sh.itjust.works
on 07 Aug 2024 17:06
collapse
Information wants to be free.
MonkderVierte@lemmy.ml
on 07 Aug 2024 21:58
collapse
I mean, i agree, but artists want to eat too.
riodoro1@lemmy.world
on 06 Aug 2024 09:18
nextcollapse
Can we stop with this bullshit? Nobody will buy into it. WE DON’T WANT IT.
boyi@lemmy.sdf.org
on 06 Aug 2024 09:47
nextcollapse
Sorry, I disagree with this kind of generalisation. To be rational, Just because you don’t want it, it doesn’t mean everyone else is on the same ship. I am very sure there are certain people who will benefit from this and want it.
riodoro1@lemmy.world
on 06 Aug 2024 10:00
collapse
„Certain people” do not justify spending billions in money and tons of resources to create more and more of the same shit just because there is a hype for it.
yes, I am one of those who are also getting bored of it. But this doesn’t mean that I am part of the market that they targeted. They might me targeting certain segments or even service providers such as game developers or console makers etc. The technology is still in it advent stage so it is too early to say wether they are going to fail.
sunbytes@lemmy.world
on 06 Aug 2024 12:43
collapse
It’s not for you as a consumer.
It’s to reduce your usefulness as a worker.
Which would be lovely, if our value wasn’t calculated by our usefulness to the market.
rottingleaf@lemmy.world
on 06 Aug 2024 09:27
nextcollapse
I’ve just had a thought:
There’s a little country where the way its leadership still hasn’t been all voted out and put behind bars for life is that it constantly invents new subjects for discussion. Some outrageous, some showing them in good light, but the point is that everyone forgets the real bad things they’ve done (they are basically a collaborationist puppet government of a neighboring fascist country).
I wonder if it’s today’s world as a whole showing itself in that little country.
I’ve recently read an article seen on Lemmy, suggesting that the “AI” hype is the same. theluddite.org/#!post/ai-hype - found it. The conclusion is very important.
They are wasting enormous amounts of energy to make those "AI"s, collect training data and so on, to make oligopolized platforms and industries shittier and shittier.
But we are wasting our energy, which is much more limited, to track myriads of false targets. We are like an air defense system being saturated.
No one has ever won a war by sitting in defense. We must search for critical joints to attack.
Also no, voting for one of two candidates presented to you in some election is not that, neither is arguing for one of two sides in a discourse presented to you. There are better and worse choices there, but that’s not what attack means.
SomeGuy69@lemmy.world
on 06 Aug 2024 13:44
nextcollapse
So they use VMs to simulate user accounts, in future this will be blocked and whatever new AI startup is there won’t have the option to do so. Competition blocked. Forever.
anon_8675309@lemmy.world
on 07 Aug 2024 01:12
collapse
threaded - newest
Humans don't live that long. That's only about 1.5 million 30 min videos, which isn't a huge amount for a whole day's worth of scraping.
Yeah this is honestly an order of magnitude less that I would've thought
Maybe they’re running out
I would be lucky if I get to watch more than 10000 videos in my entire lifetime.
Bro you’re doing it with your eyes, right now!
aka 2 videos from Quinton Reviews
Something like that was a plot point in Black Mirror. In that case it was with consciousnesses.
Can relate, I watched the English patient once.
instead of focusing on their products and improving them for everyone, some shitty ceo is pushing their shitty ai agenda down everyones throat.
Well it sounds like they’re doing something to make their products better, you just disagree that it’s going to be successful.
Nvidia’s biggest product is absolutely AI by a massive landslide, I’m pretty sure I read that the point of them downloading these videos and doing the training is to build a pipeline for their AI users to do the same with their own shit. (Can’t be bothered to double-check cuz I really don’t care)
So they aren’t downloading all this video to make a crazy AI model. They’re downloading all this video to make a tool for their AI customers to use, you may not agree but improving their product is exactly what they’re doing.
For FUCK SAKE, why do you even bother posting your garbage opinions then? and with such authority too!
¯\_(ツ)_/¯ great question
There’s only a handful of video datasets and all of it is owned by Google through YouTube or big Hollywood companies like Disney and Netflix.
These companies are foaming at the mouth with rage thinking about what generative AI will do to their industry and how much it will help the currently non existant indie one. They will do whatever it takes to fence in the playbox and make sure they get to be the toll man.
This was never about AI getting to live or not, but who gets to own it. 404media is essentially a mouthpiece for these corporations, willingly or not, and the strengthening of copyright laws will not help the consumers or the small time creators. The only exception being laws that force copy left licenses onto models but that’s not what is being pushed right now, as well as aocs Deepfake act which is well thought out imo.
Anyone should be permitted to train on YouTube and Netflix data, and Nvidia might even open source it in any case.
Nvidia does not have a strong history of open sourcing things, to say the least. That last bit sounds like pure hopium
The guy you are replying to is in all AI posts defending AIs. He is probably heavily invested in this BS or being paid for it, don’t waste your time with him.
Tbh, someone has to. Have you ever asked yourself if the intense hate AI gets and how 99% of articles are against it is organic?
There’s a handful of companies that are poised to win big if they can put up a fence around AI while making sure the public can’t run strong models. There is an intense media campaign to make sure the public thinks either AI is dangerous (so they can be the only ones legally allowed to distribute them) or that AI is theft (So they can be the only ones to afford building them).
Do not let yourself be manipulated, almost all strengthening of copyrights related to AI is completely against our interests.
And no, I’m not getting paid lol. I have a vested interest because I use generative technology for work and for fun in my free time. I’m also interested in not handing out our whole economy on a silver platter to Google and Microsoft, if I can maybe help with a couple of comments a week, I will. Why don’t you explain why I’m wrong instead of sending out baseless accusations?
Nah my man, you are either brainwashed or are being paid hahaha. Is copyright a mess? Of fucking course, I haven’t meet a single person (except crazy ass libertarians funnily enough hahaha) that likes copyright. Are big corporations using copyright to exploit artists, create monopolies, and generally being dicks? Again, of fucking course.
But anyone, like you, saying that we should just let AIs destroy copyright effectively is a fucking prick, that simple. And your agruments are dissingenous at best or outright lies. For example, just as big copyright holder companies are pushing to strengthen copyright law, the big tech companies are pushing for effectively destroying copyright through AI models. I have seen you pushing in multiple thread for open source models like that’s a solution. But if you were a serious person researching about the software open source community you would see that pretty much no one there agrees with your position because it would effectively destroy the copyleft open source licenses. After all, if an “AI” model, open source or not, is allowed to just “train” on my AGPL code and spit it back (with minor modifications at best) to an engineer in AWS that’s it for my project. Amazon will do the Amazon thing and steal the project. So say goodbye to any software freedom we have.
And let’s be 100% clear here, this is not being pushed by the evil copyright holders like you seem to imply (and they are totally evil just to be clear hahahah). This is being pushed by the big tech companies and people like you spreading their propaganda. The fact that the copyright holders happen to be in the right this time is just a broken clock being right and all that, but it’s still good that they are pushing back to big tech. I do agree we have to keep an eye on them, the objective here can’t be to make copyright bigger, just to close the “loophole” that big tech companies are exploting to steal everything.
People like you who want to destroy copyright without offering any alternatives to allow creatives to work in a market are either missinformed or just assholes. Again, of fucking course it’s not an ideal system, but going full kamikaze and just destroying any possibility for artists and creatives of making a living with their work is the most evil thing goung on right now, so bad that the big copyright holders happen to fall on the less bad side this time hahaha. And all for what? So people can be lied to by dumb chatbots? Or so people can create mediocre derivative “art” without putting any effort? Or so we can get mediocre code autocomplete that is subtly wrong all the time? Is fucking ridiculous.
An engineer at AWS can already just copy your code, make minor modifications, and use it. I would think the same legal recourse would apply if it was outputted from an LLM or just a copy-paste? This seems like a tangential issue to whether the LLM was trained on your code or not (not training on your code obviously reduces the probability of the LLM spitting it back out near-verbatim though). Personally, I don’t see anything wrong with anyone using public code to build statistical models. And I think the pay-to-scrape models that Reddit, Xitter, and others are employing will help big tech build the “moat” they’re looking for. Big tech is asking for AI regulation for similar reasons.
You are 100% wrong here my man. If an engineer does this they are creating a derivative work and they have to fullfil the conditions of the license of the code. No wonder you don’t see anything wrong here, you AI people live in a fantasy world when it comes to how copyright works hahahaha. Please stop talking about shit you know nothing about.
I stated that they can do this, and asked if they could be sued if they used near-verbatim code generated from an LLM, just like they could be sued if they copy-pasted AGPL code.
Edit: Tools like CoPilot tell you if your code is similar to publicly available code so you can avoid these issues.
Edit: Just looked up EFF’s position and I tend to agree with it:
www.eff.org/document/eff-two-pager-ai
What point are you trying to make? That the fact that someone can break the law means we should not have laws? I honestly don’t get what you are trying to say.
I’m saying using code for training is a different issue that copyright infringement. I edited my post above to better lay out my position.
And that’s the whole point of my comment, did you even read it? To summarize, there is currently a loophole in law that allows these bullshit arguments about it being different than straight up copying shit (though this haven’t been litigated yet, so it’s not yet clear if these arguments are actually valid). This means that while a person reading my AGPL code and copying it (without following the license) is 100% illegal, doing the same through an LLM may be legal. So this means that open source licenses can be bypassed by first training an LLM with the code and then extracting the code from the LLM. This is terrible for open source, and in general for anyone who wants to make a living from creating copyrighted work. So we should close this loophole, and I’m glad there is a push to close this through better laws. Even if these laws are comming from Disney, Sony, and all those awful companies.
So again, what’s the point you are trying to make here? That we shouldn’t make these laws stronger to prevent this bullshit? I honestly don’t understand what you are trying to argue here, nothing of what you have said has anything to do with this conversation.
That we already have laws that protect copyright infringement (which seem like they would still apply if it was spit out by an LLM or not), and no more should be made. That training on public data is fine.
Any arguments to defend your position? I’m giving you a very clear example of the awful consecuences of following that path. And the same applies to any creative work. You are just being dismissive without proposing any real solution. Do better man.
The EFF link I posted above provides evidence. Again, here’s a quote from part of it:
As I mentioned before, Copilot at least, helps people avoid copyright infringement by notifying you if your code is similar to public code. The solution I’m proposing is no new laws, and just enforcing the ones we have. Most of the laws being proposed look like attempts at regulatory capture to me.
That is a long winded way to say you are a copyright defender. Your insisting on finding an alternative to a broken system so rent seekers can continue to exist is naive to say the least.
I think most people with your stance (don’t throw out the baby with the bathwater) really have no idea how broken copyright and intellectual property is.
AI companies have already proven copyright is DOA. It was never designed for the little guy. That is just propaganda you have fallen prey to.
Simply put copyright was not needed for all of human history and it is still not needed. Pretending you have a unique idea, song, painting, etc in a world of billions of humans is beyond ridiculous.
The concept was already broken from the start because everything in science and art is iterative. Giving monopoly power to rent seekers is the natural result of a broken concept.
Their nematron 320b model was released on what essentially is an open source licence (available for commercial use except if you are doing shady things like spamming and collecting biometric data).
Having a robust open source ecosystem directly benefits Nvidia since they sell more higher end consumer GPUs.
Obviously, there’s a real chance that this isn’t open sourced since it’s a video model and there’s huge money involved. Doesnt really change the fact that having YouTube and Netflix dictate who gets to make video models and at what cost isn’t a good idea.
I feel like the amount of training data required for these AIs serves as a pretty compelling argument as to why AI is clearly nowhere near human intelligence. It shouldn’t take thousands of human lifetimes of data to train an AI if it’s truly near human-level intelligence. In fact, I think it’s an argument for them not being intelligent whatsoever. With that much training data, everything that could be asked of them should be in the training data. And yet they still fail at any task not in their data.
Put simply; a human needs less than 1 lifetime of training data to be more intelligent than AI. If it hasn’t already solved it, I don’t think throwing more training data/compute at the problem will solve this.
Humans have the advantage of billions of years of evolution.
“ai” also has the advantage of billions of years of evolution.
We’re very proficient at walking, but somehow haven’t produced a walking home or anything like that.
It’s not very linear.
Definitely not the same thing. Just because you can make use of the end result of major efforts does not somehow magically give you access to all the knowledge from those major efforts.
You can use a smart phone easily, but that doesn’t mean you magically know how to make one.
You’ve had the entire history of evolution to get the instinct you have today.
Nature Vs Nurture is a huge ongoing debate.
Just because it takes longer to train doesn’t mean it’s not intelligent, kids develop slower than chimps.
Also intelligent doesn’t really mean anything, I personally think Intelligence is the ability to distillate unusable amounts of raw data and intuit a result beneficial to one’s self. But very few people agree with me.
I see intelligence as filling areas of concept space within an econiche in a way that proves functional for actions within that space. I think we are discovering more that “nature” has little commitment, and is just optimizing preparedness for expected levels of entropy within the functional eco-niche.
Most people haven’t even started paying attention to distributed systems building shared enactive models, but they are already capable of things that should be considered groundbreaking considering the time and finances of development.
That being said, localized narrow generative models are just building large individual models of predictive process that doesn’t by default actively update information.
People who attack AI for just being prediction machines really need to look into predictive processing, or learn how much we organics just guess and confabulate ontop of vestigial social priors.
But no, corpos are using it so computer bad human good, even though the main issue here is the humans that have unlimited power and are encouraged into bad actions due to flawed social posturing systems and the confabulating of wealth with competency.
There is no “intelligence”, ai is a pr word. Just a language model that feeds on a lot of data.
Oh yeah we’re 100% agreed on that. I’m thinking of the AI evangelicals who will argue tooth and nail that LLMs have “emergent properties” of intelligence, and that it’s simply an issue of training data/compute power before we’ll get some digital god being. Unfortunately these people exist, and they’re depressingly common. They’ve definitely reduced in numbers since AI hype has died down though.
A human lifetime worth of video is not anywhere close to equalling a human lifetime of actual corporeal existence, even in the perfect scenario where the AI is as capable as a human brain.
Strange to equate the other senses to performance in intellectual tasks but sure. Do you think feeding data from smells, touch, taste, etc. into an AI along with the video will suddenly make it intelligent? No, it will just make it more likely to guess what something smells like. I think it’s very clear that our current approach to AI is missing something much more fundamental to thought than that, it’s not just a dataset problem.
Obligatory fuck AI and the illeterate bros pushing it.
What kind of videos, though? A lot of such material is very far from being proper educational material that we show other people to really teach them much, let alone educate them well enough to be anywhere trustworthy. This is a very processed material, with years of preparation once you consider the prior education of the individuals involved in the creative process - think of the past experiences silently influencing them, their initial knowledge on the subject obtained from somewhat basic facts from school or otherwise, their misconceptions, iterations that nobody knows about, and many other things that we don’t usually directly associate with the act of working on something like a video, but that eventually do dictate a lot of the decisions and opinions put into it.
It’s one thing that the AI has no intelligence in it whatsoever, but the fact that it’s being pumped with information and “knowledge” in basically the reverse order doesn’t help it become any better.
On the other hand, the entire thing is not about making something that works well, but something that sells well. And then there’s people putting too much faith into the thing and trusting it with way too much stuff than they should (which is also the case with a lot of other tech, though, admittedly).
Some things of today are so damn unexciting.
Properly following licensing, right?
No, see, because it’s “learning like a human”, and everybody knows that you’re allowed to bypass any licensing for learning. /s
But seriously I don’t know how they make the jump to these conclusions either.
This is a massive strawman argument. No one is saying you shouldn’t have a license to view the content in order to train an AI on it. Most of the information used to train these models is publicly available and licensed for public viewing.
Just because something is available for public viewing does not mean it’s licensed for anything except personal use.
The strawman here is that since physical people benefit from personal use exceptions in the law, machine learning software should too. But why should they? Since when is a piece of software ran by a corporation equivalent to an individual person?
Copyright licensing allows the owner to control how a work is distributed, not how it’s consumed. “Personal use” just means that you can’t turn around and redistribute a work that you’ve obtained. Not that you’re not allowed to consume it in a corporate setting.
Consuming is not the same thing as training. A machine is not a consumer, it is a tool.
A program of machine can be a consumer of something, although if you want to be technical you could say the person using the machine is the consumer. In actual computer science we talk about programs consuming things all the time.
In actual computer science you talk about AI all the time as well but it’s not actually intelligent is it? It’s just SmarterChild 2.0 and literally has no idea what word it said just before it’s current one. Not intelligent. Words are often used inappropriately. The only thing computers can consume is data and electricity by definition, and consuming data is not the same as implementing it in a language (or visual) model that you intend to profit from. This is data theft, unless properly licensed.
How intelligent it is or isn’t is irrelevant. We talk about much dumber programs than AI as being consumers of files and data including things like compilers. Would it not be person use for you to view a picture in a photo viewer or try and edit it in GIMP?
It’s not data theft at all unless the courts and law says it is. Ranting on lemmy won’t change that fact. Theft is a construct of law.
You can add clauses against use as AI training data to your licence if you wish.
You can try to equate humans to computers all day, and you can even pass laws that says they’re the same thing. That does not make it true. A company using software to profit off data they have not licensed (whether it’s public or not does not matter! That is not how copyright law works!) is theft.
Please try to sell DVDs of markiplier’s publicaly available YouTube content and tell people how you’re allowed to because it’s publicaly available.
I am not equating humans with computers. These businesses are not selling people’s data when doing AI training (unlike actual data brokers). You can’t say something AI generated is a clone of the original anymore than you can say parody is.
I absolutely can. Parody is an art form, which is something that can exclusively only be created by human beings. AI is an art laundering service. Not an artist.
The law should reflect that these companies need to be first granted permission to use datasets by the rights holders, and creative commons licenses need to be given an opportunity to opt out of being crawled for these datasets. Anything else is wrong. Machines are not humans. Creative common copyright law was not written with the concept of machines being “consumers”. These companies took advantage of the sudden emergence of these models and the delay of law in holding their hunger for data in check. They need to be held accountable for their theft.
There are already anti-AI licenses out there. If you didn’t license your stuff with that in mind that’s on you. Deep learning models have been around for a lot longer than GPT 3 or anything that’s happened in the current news cycle. They have needed training data for that long too. It was predictable stuff like this would happen eventually, and if you didn’t notice in time it’s because you haven’t been paying attention.
You don’t get to dictate what’s right and wrong. As far as I am concerned all copyright is wrong and dumb, but the law is what the law is. Obviously not everyone shares my opinion and not everyone shares yours.
Whether an artist is involved or not it’s still a transformative use.
Also the way you imply children can’t be intelligent is disgusting.
en.m.wikipedia.org/wiki/SmarterChild
Training literally is consuming. A copyright license doesn’t get to dictate what computer programs the work is allowed to be used with. There’s a ton a entertainment mega corps that would love for that to be the case, though.
You’re saying that you’re not allowed to do a statistical analysis on a copyrighted work. It’s nonsense. It’s well-established that copyright does not prevent that kind of use.
What makes you think copyright law doesn’t apply to companies using copy written data to sell and profit off of? That is not the case. Also, you’re putting words in my mouth. Feel free to read my other replies on this thread but I don’t feel like repeating myself, but I think it’s clear I’m not saying computers aren’t allowed to process data that’s absurd.
.
Because that’s not what copyright is for. It exists to give the creator exclusive rights over distribution. That’s it. So unless the company is planning to distribute the work and they obtained a copy willingly and legally distributed to them, then copyright is the wrong law to lean on.
First of all, that’s incorrect.
Secondly, by default you have zero rights to someone else’s work. If something doesn’t explicitly grant you rights, you have none. If there’s a law or license, and if it’s applicable to you, you get exactly what’s specified in there.
The “personal use” or “fair use” exceptions in some places grant some basic rights but they are very narrow in scope and generally applicable only to individuals.
I mean, it’s in the name. The right to make copies. Not to be glib, but it really is
You may notice a conspicuous absence of control over how a copied work is used, short of distributing it. You can reencode it, compress it, decompress it, make a word cloud, statistically analyze its tone, anything you want as long as you’re not redistributing the work or an adaptation (which has a pretty limited meaning as well). “Personal use” and “fair use” are stipulations that weaken a copyright owner’s control over the work, not giving them new rights above and beyond copyright. And that’s a great thing. You get to do whatever you want with the things you own.
You don’t have a right to other people’s work. That’s what copyright enables. But that’s beside the point. The owner doesn’t get to say what you use a work for that they’ve distributed to you.
A tangentially related but good example of this sort of thing is BluRays and community movie nights (like setting up a projector in a park).
Most of these movie nights are de facto illegal, as even though you own the BluRay, it is not licensed for public showings, just for personal use. Obviously no one gives enough of a shit to enforce this against small groups, especially if they aren’t making money off it, but if a theater started offering showings of shit the owner just bought on BluRay or UHD disks, it wouldn’t last too long.
Similar thing here. Just because you can access the content to view it yourself doesn’t mean you have the rights to do more than that with it. As an individual, you’re likely fine to break those rules. As a giant fucking corporation, it’s time for you to pay up.
Gotta remember that legally a corporation IS a person.
Another great example of how the law is batshit serving capital and destroying the planet.
Information wants to be free.
I mean, i agree, but artists want to eat too.
Can we stop with this bullshit? Nobody will buy into it. WE DON’T WANT IT.
Sorry, I disagree with this kind of generalisation. To be rational, Just because you don’t want it, it doesn’t mean everyone else is on the same ship. I am very sure there are certain people who will benefit from this and want it.
techradar.com/…/ai-a-turn-off-tech-customers-are-…
„Certain people” do not justify spending billions in money and tons of resources to create more and more of the same shit just because there is a hype for it.
yes, I am one of those who are also getting bored of it. But this doesn’t mean that I am part of the market that they targeted. They might me targeting certain segments or even service providers such as game developers or console makers etc. The technology is still in it advent stage so it is too early to say wether they are going to fail.
It’s not for you as a consumer.
It’s to reduce your usefulness as a worker.
Which would be lovely, if our value wasn’t calculated by our usefulness to the market.
I’ve just had a thought:
There’s a little country where the way its leadership still hasn’t been all voted out and put behind bars for life is that it constantly invents new subjects for discussion. Some outrageous, some showing them in good light, but the point is that everyone forgets the real bad things they’ve done (they are basically a collaborationist puppet government of a neighboring fascist country).
I wonder if it’s today’s world as a whole showing itself in that little country.
I’ve recently read an article seen on Lemmy, suggesting that the “AI” hype is the same. theluddite.org/#!post/ai-hype - found it. The conclusion is very important.
They are wasting enormous amounts of energy to make those "AI"s, collect training data and so on, to make oligopolized platforms and industries shittier and shittier.
But we are wasting our energy, which is much more limited, to track myriads of false targets. We are like an air defense system being saturated.
No one has ever won a war by sitting in defense. We must search for critical joints to attack.
Also no, voting for one of two candidates presented to you in some election is not that, neither is arguing for one of two sides in a discourse presented to you. There are better and worse choices there, but that’s not what attack means.
So they use VMs to simulate user accounts, in future this will be blocked and whatever new AI startup is there won’t have the option to do so. Competition blocked. Forever.
I hope they aren’t on Comcast.