AIs can guess where Reddit users live and how much they earn (www.newscientist.com)
from Bebo@literature.cafe to technology@lemmy.world on 01 Nov 2023 14:04
https://literature.cafe/post/3293072

Large language models (LLMs) like GPT-4 can identify a person’s age, location, gender and income with up to 85 per cent accuracy simply by analysing their posts on social media.

But the AIs also picked up on subtler cues, like location-specific slang, and could estimate a salary range from a user’s profession and location.

Reference:

arXiv DOI: 10.48550/arXiv.2310.07298

#technology

threaded - newest

KpntAutismus@lemmy.world on 01 Nov 2023 14:09 next collapse

not really hard, because people will post anything on the internet. including me.

fluke@snake.substantialplumbing.repair on 01 Nov 2023 14:17 next collapse

Anything

Gregorech@lemmy.world on 01 Nov 2023 14:30 collapse

I like flying tickle farts, when a cucumber is used…

AbouBenAdhem@lemmy.world on 01 Nov 2023 14:17 next collapse

It sounds like the reason they used reddit was so they could easily find users who had expressly revealed the information in question, and use it to verify that the AI was accurately deducing the same info from style alone.

imgprojts@lemmy.ml on 04 Nov 2023 12:12 collapse

They used reddit because it has corraled dumb users. Users a no longer around anywhere else in the Internet, just here on social media. And yes, what better place to find dumb users than on reddit!

Bishma@discuss.tchncs.de on 01 Nov 2023 15:09 next collapse

Yeah, even if I didn’t belong to a local community and a bunch of communities surrounding my profession, the amount of intrigue and fascination emanating from my comments would cause anyone to guess that I’m the Dos Equis guy.

Ace0fBlades@lemmy.world on 01 Nov 2023 16:54 collapse

You seem less of like the Dos Equis guy and more of a wind chimes instead of a penis kind of guy

Bishma@discuss.tchncs.de on 01 Nov 2023 17:16 collapse

Just makes me extra interesting on a breezy day. Ask Mindy.

p03locke@lemmy.dbzer0.com on 02 Nov 2023 00:01 next collapse

People will just post their real name on Facebook. It’s crazy!

chatokun@lemmy.dbzer0.com on 02 Nov 2023 17:52 collapse

Same. I’m sure I’ve posted about my location, my job, my race, my history, my real first name, general details of my family makeup etc. I also have a pretty unique name so searching just my first and last name will find stuff about me anyway. I’m even listed by name in books (I was young and dumb and answered some questions about work life).

theKalash@feddit.ch on 01 Nov 2023 14:12 next collapse

You can also do that without AI. We’ve had metadata analysis for a while now.

KoboldCoterie@pawb.social on 01 Nov 2023 14:26 next collapse

Sure, but AI is the hot buzzword right now, so it’s got to be shoehorned into every discussion about technology!

lemmyvore@feddit.nl on 01 Nov 2023 14:33 collapse

I think it’s overall a good thing if it helps laymen understand just how much privacy matters and how much can be gleaned from seemingly innocuous data online. If an “AI” label makes it hit home, cool. As long as they get it.

helenslunch@feddit.nl on 01 Nov 2023 14:37 next collapse

Well the difference is that AI can process billions of accounts, assign those profiles to them, and use them to serve ads appropriately.

theKalash@feddit.ch on 01 Nov 2023 14:41 next collapse

That’s what facebook/google have been doing for years without AI.

helenslunch@feddit.nl on 01 Nov 2023 14:47 next collapse

This AI presumably doesn’t have access to the information users have explicitly given Meta and Google. Just their comments.

silasmariner@programming.dev on 01 Nov 2023 18:43 collapse

They used to have AI, until everyone decided it’s only AI if it’s got an LLM backing it

_haha_oh_wow_@sh.itjust.works on 01 Nov 2023 14:54 collapse

Yeah, uh, you can still do this without “AI”.

phx@lemmy.ca on 01 Nov 2023 14:46 next collapse

Yup, and plenty of people have no issues posting about local events or joining region/city specific groups, so it’s not exactly hard to put two and two together.

I don’t have much issue posting about the city I grew up in or former jobs, but generally work at being fairly vague about anything current

pc486@reddthat.com on 01 Nov 2023 14:47 collapse

As is typical, this science reporting isn’t great. It’s not only that AI can do it effectively, but that it can do it at scale. To quote the paper:

“Despite these models achieving near-expert human performance, they come at a fraction of the cost, requiring 100× less financial and 240× lower time investment than human labelers—making such privacy violations at scale possible for the first time.”

They also demonstrate how interacting with an AI model can quickly extract more private info without looking like it is. A game of 20 questions, except you don’t realize you’re playing.

jiberish@lemmy.world on 01 Nov 2023 14:20 next collapse

Anyone can guess anything! Give it a try!

I can guestimate the number of turkeys it would take to fill any given space. It’s my superpower.

Seraph@kbin.social on 01 Nov 2023 14:32 next collapse

How many turkeys can fit in Rhode Island?

PoolloverNathan@programming.dev on 01 Nov 2023 14:36 collapse

At least one.

Gigan@lemmy.world on 01 Nov 2023 14:33 next collapse

How many turkeys would it take to fill the Titanic on it’s maiden voyage?

bigkahuna1986@lemmy.ml on 02 Nov 2023 17:07 collapse

African or European?

helenslunch@feddit.nl on 01 Nov 2023 14:41 next collapse

Can you do it with 85% accuracy?

dependencyinjection@discuss.tchncs.de on 01 Nov 2023 14:57 collapse

Right.

What a ridiculous comment.

Aatube@kbin.social on 01 Nov 2023 14:52 next collapse

How many turkeys can fit in a black hole?

jiberish@lemmy.world on 01 Nov 2023 15:28 collapse

All of the turkeys will fit

Sir_Kevin@lemmy.dbzer0.com on 01 Nov 2023 17:06 collapse

Will they still be turkeys?

stringere@reddthat.com on 03 Nov 2023 14:49 collapse

We can be certain they will or will not be.

Boozilla@lemmy.world on 01 Nov 2023 14:53 collapse

Red Rocks Amphitheater: go!

jiberish@lemmy.world on 01 Nov 2023 15:30 collapse

No roof on Red Rocks, so you can stack about half a million turkeys in that space.

Infynis@midwest.social on 01 Nov 2023 14:25 next collapse

My city’s subreddit did a thread a while back asking people what they were making in the area for what jobs, to try to crowd source salary transparency. So this is not very impressive lol

Gregorech@lemmy.world on 01 Nov 2023 14:32 next collapse

I wonder how it accounts for bullshit and randomness.

rynzcycle@kbin.social on 01 Nov 2023 15:01 collapse

It's pretty sure everyone is making $69,420 a year.

korny@lemmy.world on 01 Nov 2023 16:35 next collapse

Wow, lucky ducks. I only make $42,069 a year.

lemann@lemmy.one on 01 Nov 2023 17:11 collapse

$69,420 is a very Nice salary

[deleted] on 01 Nov 2023 14:46 collapse

.

rtxn@lemmy.world on 01 Nov 2023 14:29 next collapse

Nonintelligent pattern-based algorithm good at finding patterns, study finds.

MajorHavoc@lemmy.world on 01 Nov 2023 14:31 next collapse

Holy cow! Stop the presses! /s

Gregorech@lemmy.world on 01 Nov 2023 14:33 collapse

Pattern what? none if

Rentlar@lemmy.ca on 01 Nov 2023 14:49 next collapse

Well, if you look at the subreddits where a Redditor posts and there’s a lot of r/Seattle or Washington State then it’s not that hard to deduce.

Although I try to leave a mild aura of mystery around my personal life, it wouldn’t be hard to snoop around a bit to find details here and there about me.

FigMcLargeHuge@sh.itjust.works on 01 Nov 2023 15:54 collapse

At least your scat fetish is kept on the down low.

Rentlar@lemmy.ca on 01 Nov 2023 19:23 collapse

Thanks hahaha!

atocci@kbin.social on 01 Nov 2023 14:53 next collapse

I tried asking Bing to make an assumption about who I am based on my Reddit account and wrote a nonsense made up story. Maybe I could have phrased it better?

Nilz@sopuli.xyz on 01 Nov 2023 15:00 collapse

This is basically what an LLM does, making up stories that might seem correct.

themurphy@lemmy.world on 01 Nov 2023 15:32 next collapse

It’s statistics, basically.

People have to remember that, when they think it’s an all in one solution. AI is very powerful, but comes with realistic limitations.

bigkahuna1986@lemmy.ml on 02 Nov 2023 17:08 collapse

Confidently making up stories.

[deleted] on 01 Nov 2023 15:06 next collapse

.

guyrocket@kbin.social on 01 Nov 2023 15:13 next collapse

I wonder how long it will take for the media to get past the "AI is GOD DAMN AMAZING" phase and start real journalism about AI.

Seriously, neural networks have existed since the 1990s. The tech is not all that amazing, really.

Find someone that can explain what's going on inside a neural net. Then I'll be impressed.

TheChurn@kbin.social on 01 Nov 2023 15:20 collapse

Explaining what happens in a neural net is trivial. All they do is approximate (generally) nonlinear functions with a long series of multiplications and some rectification operations.

That isn't the hard part, you can track all of the math at each step.

The hard part is stating a simple explanation for the semantic meaning of each operation.

When a human solves a problem, we like to think that it occurs in discrete steps with simple goals: "First I will draw a diagram and put in the known information, then I will write the governing equations, then simplify them for the physics of the problem", and so on.

Neural nets don't appear to solve problems that way, each atomic operation does not have that semantic meaning. That is the root of all the reporting about how they are such 'black boxes' and researchers 'don't understand' how they work.

sharkfucker420@lemmy.ml on 01 Nov 2023 16:04 next collapse

Yeah but most people don’t know this and have never looked. It seems way more complex to the layman than it is because instinctually we assume that anything that accomplishes great feats must be incredibly intricate

lemann@lemmy.one on 01 Nov 2023 17:10 collapse

When a human solves a problem, we like to think that it occurs in discrete steps with simple goals: “First I will draw a diagram and put in the known information, then I will write the governing equations, then simplify them for the physics of the problem”, and so on.

I wonder how our brain even comes to formulate these steps in a way we can comprehend, the amount of neurons and zones firing on all cylinders seems tiring to imagine

aviationeast@lemmy.world on 01 Nov 2023 16:12 next collapse

I’m just gonna put it out there that I live in the state of Georgia, I work for a office supply company as acoordinator making $153,000 a year working 30 hours a week.

OrangeJoe@lemm.ee on 01 Nov 2023 16:41 next collapse

Alright I’ll get the AI on the case to see if it can determine those things from your post.

TimeSquirrel@kbin.social on 01 Nov 2023 18:18 next collapse

I'm a crackhead in MD, I make about $100 a week scrapping catalytic converters.

photonic_sorcerer@lemmy.dbzer0.com on 01 Nov 2023 18:59 collapse

I’m a janitor in Jersey City making 18 bucks an hour.

ColeSloth@discuss.tchncs.de on 01 Nov 2023 17:25 next collapse

Well only because I’ve said where I live and how much I earn, before.

trolololol@lemmy.world on 01 Nov 2023 18:22 next collapse

Anyone can estimate salary from profession and location. That’s not a bot, that’s a salary matrix.

icepuncher69@sh.itjust.works on 01 Nov 2023 18:27 next collapse

They can probably do that with any social media including lemmy…

SatanicNotMessianic@lemmy.ml on 01 Nov 2023 18:48 next collapse

Okay, I think I must absolutely be misreading this. They started with 1500 potential accounts, then picked 500 that, by hand, they could make guesses about based on people doing things like actually posting where they live or how much they make.

And then they’re claiming their LLMs have 85% accuracy based on that subset of data? There has to be more than this. Were they 85% on the full 1500? How did they confirm that? Was it just on the 500? Then what’s the point?

There was a study on Facebook that showed that they could predict with between 80-95% accuracy (or some crazy number like that) your gender, orientation, politics, and so on just based on your public likes. That was ten years ago at least. What is this even showing?

cucumber_sandwich@lemmy.world on 01 Nov 2023 19:52 next collapse

There was a study on Facebook that showed that they could predict with between 80-95% accuracy (or some crazy number like that) your gender, orientation, politics, and so on just based on your public likes. That was ten years ago at least. What is this even showing?

Advocates diabolo: that a large language model can do it without extra training, I guess. The Facebook study presented a statistical model on “like space” while this study relies on text alone, a much less structured type of input.

I’m not saying it’s a good study. Just pointing out some differences.

p03locke@lemmy.dbzer0.com on 02 Nov 2023 00:00 collapse

SnoopSnoo was able to pick out phrases from Reddit posters based on declarative statements they made in their posts, and that site has been down for years.

bigkahuna1986@lemmy.ml on 02 Nov 2023 17:06 collapse

You could guess California, software dev, 55K/year and be right like half the time.

Kage520@lemmy.world on 02 Nov 2023 18:15 collapse

That seems low for a California software dev. Maybe in the Midwest?