Maven Imported 1.12 Million Fediverse Posts (wedistribute.org)
from hedge@beehaw.org to technology@beehaw.org on 13 Jun 2024 10:41
https://beehaw.org/post/14412103

#technology

threaded - newest

Freeman@lemmings.world on 13 Jun 2024 11:26 next collapse

They pulled DMs of two users of the same instance?! Quite concerning tbh

[deleted] on 13 Jun 2024 12:51 collapse

.

jherazob@beehaw.org on 13 Jun 2024 14:47 next collapse

I recall somebody’s working on actual, E2EE Mastodon DMs, but couldn’t give you details, i guess when it’s ready we’ll know when people start using it

Peter1986C@lemmings.world on 13 Jun 2024 16:10 collapse

That would be Sup: github.com/theSupApp

By the same person who started Pixelfed.

jherazob@beehaw.org on 13 Jun 2024 17:02 collapse

How the hell does he do so much? 😄

4am@lemm.ee on 13 Jun 2024 15:30 collapse

Seems if the messages are sent in an inherently insecure fashion, all one would need to do is set up an instance that purposefully does not filter out all the things it’s supposed to be kind/competent enough to filter out, and boom it has everything.

[deleted] on 13 Jun 2024 16:33 next collapse

.

kevincox@lemmy.ml on 13 Jun 2024 16:56 collapse

It’s not “inherently insecure” at least not to that degree. (Once could argue that lack of E2EE is insecure.) If you stand up an unrelated instance you shouldn’t be able to access private messages that don’t relate to an account on your instance. So only bugs in your instance, or your conversation partner’s instance, will be able to leak those messages.

IllNess@infosec.pub on 13 Jun 2024 12:29 next collapse

If we hit these AI companies with targeted suing, like how Scientology got their way with the IRS, maybe we then they can listen to not steal our shit.

The MPAA and RIAA have created all these laws and used our own government againat us. Maybe we can use these same laws and do the same.

sfera@beehaw.org on 13 Jun 2024 17:00 next collapse

I was confused for a minute, not understanding what (Apache) Maven has to do with social networks.

Pekka@feddit.nl on 14 Jun 2024 11:11 collapse

Maybe we have some bias on this topic, but I had the same thought. Maven is such a well known tool in IT, that I’m surprised they just created a social network with the same name. Until they get a bit famous this won’t be good for SEO.

darkphotonstudio@beehaw.org on 14 Jun 2024 14:46 collapse

I wouldn’t have a problem with all this scraping, if these companies had to release their models trained on this data as open source.

esaru@beehaw.org on 14 Jun 2024 16:36 collapse

That’s a great idea. Can we not apply a license to that social content that forces AI models trained on it to be open source?

renard_roux@beehaw.org on 17 Jul 2024 20:28 collapse

That’s actually pretty good. And then they’re open to getting sued when caught.

I guess it could be done on an instance basis, although I’m not sure how happy fediverse users will be if their instance has an official policy of open-sourcing (or maybe it’s public-domaining?) all their content by default.

esaru@beehaw.org on 18 Jul 2024 05:06 collapse

Well, such a license could just obligat to open source the AI model that has been trained on it. If the instance prohibits training of AI models, or allow it, would be a separate condition that’s up to the instance owner, and its users can decide if they want to contribute under that condition, or not.