Forget security – Google's reCAPTCHA v2 is exploiting users for profit | Web puzzles don't protect against bots, but humans have spent 819 million unpaid hours solving them (www.theregister.com)
from ForgottenFlux@lemmy.world to technology@lemmy.world on 24 Jul 2024 14:43
https://lemmy.world/post/17909715

Research Findings:

“The conclusion can be extended that the true purpose of reCAPTCHA v2 is a free image-labeling labor and tracking cookie farm for advertising and data profit masquerading as a security service,” the paper declares.

In a statement provided to The Register after this story was filed, a Google spokesperson said: “reCAPTCHA user data is not used for any other purpose than to improve the reCAPTCHA service, which the terms of service make clear. Further, a majority of our user base have moved to reCAPTCHA v3, which improves fraud detection with invisible scoring. Even if a site were still on the previous generation of the product, reCAPTCHA v2 visual challenge images are all pre-labeled and user input plays no role in image labeling.”

#technology

threaded - newest

someguy3@lemmy.world on 24 Jul 2024 15:15 next collapse

I kinda figured. It was annoying to do one, but then they wanted you to do two or three and that’s absurd. Whenever it comes up now, I usually just close out.

Bezier@suppo.fi on 24 Jul 2024 15:36 next collapse

they wanted you to do two or three and that’s absurd

Yea how about 20

radivojevic@discuss.online on 24 Jul 2024 15:47 next collapse

That’s because you’re shady.

Bezier@suppo.fi on 24 Jul 2024 16:10 next collapse

They knew I was committing crimes with my adblocker.

msage@programming.dev on 24 Jul 2024 18:41 next collapse

The worst kind - crimes against profit!

radivojevic@discuss.online on 25 Jul 2024 00:08 collapse

Elon musk wants to know what the government is going to do about you not viewing ads on Xitter

KingThrillgore@lemmy.ml on 25 Jul 2024 01:57 collapse

Not going to his shithole website.

SpaceMan9000@lemmy.world on 24 Jul 2024 16:16 collapse

Had this when at uni, mostly due to the amount of requests coming from a single IP

[deleted] on 24 Jul 2024 15:58 next collapse

.

Dudewitbow@lemmy.zip on 24 Jul 2024 16:08 next collapse

if you have to do that many, you either have some privacy setting on or on a flagged ip given from a VPN

snooggums@midwest.social on 24 Jul 2024 16:20 next collapse

Or google knows you will out up with it and want the most interaction it can get from you.

crank0271@lemmy.world on 24 Jul 2024 17:20 collapse

Google’s just lonely 🥺👉👈

iiGxC@slrpnk.net on 24 Jul 2024 16:36 next collapse

Yeah exactly

Landsharkgun@midwest.social on 24 Jul 2024 17:19 collapse

Well yah of course I do. Why the hell is that ‘abnormal’?

Dudewitbow@lemmy.zip on 24 Jul 2024 19:45 next collapse

its abnormal to them because vpns are often also used by bad actors. your use is not abnormal but its a there are other people misusing it making it worse for everyone else.

Landsharkgun@midwest.social on 26 Jul 2024 00:52 collapse

Wow, way to blame individuals who take basic precautions instead of the corporations who are blantly invading your privacy. Good job making the world a better place, bud.

Dudewitbow@lemmy.zip on 26 Jul 2024 03:02 collapse

point where i blame the individuals, the blame is clearly on the bad actors (e.g bots)

catloaf@lemm.ee on 24 Jul 2024 22:43 collapse

Most people don’t, most bots do. You look more like a bot, so you get extra challenges.

Kusimulkku@lemm.ee on 24 Jul 2024 16:28 next collapse

STOP BEING SNEAKY MICHAEL

ArmoredThirteen@lemmy.ml on 24 Jul 2024 19:01 next collapse

Cries in battlenet sign up process

yum@lemmy.eco.br on 24 Jul 2024 20:27 collapse

The one reason I tried to create an account and never came back

LucidNightmare@lemm.ee on 24 Jul 2024 19:20 next collapse

VPN? Google will just go in a loop with these things, so I just stopped using Google completely.

Bezier@suppo.fi on 24 Jul 2024 19:30 next collapse

No. But it’s also not like I get 20 constantly, it was just the worst I’ve seen. Usually it’s 2 to 5, I think.

I assume they’re just collecting data on how many are users willing to do.

LucidNightmare@lemm.ee on 24 Jul 2024 20:58 collapse

One time I did five in a row, because I use VPNs for everything, and realized after the 5th time that it would have been easier to just use bing so I do that first now. Google has turned into my last last resort, which is quite funny, because that’s where Bing used to be. Lmao

ICastFist@programming.dev on 24 Jul 2024 20:01 collapse

Whenever I’m on a private window the captchas just keep on coming. Trying to reset your Steam password via the program will also trigger an infinite loop of captchas, you HAVE to use a browser.

sramder@lemmy.world on 24 Jul 2024 20:21 collapse

I tried to order some components on Digikey a few months ago and I’m still mentally scarred. Probably did a few hundred of those things over the course of 2 weeks.

Fisch@discuss.tchncs.de on 24 Jul 2024 16:20 next collapse

Some captchas have also just gotten obvious AI training. “Click on the living being in this image”, “Select every image of the same object as in this example image”. And the images you have to select look obviously AI generated.

cm0002@lemmy.world on 24 Jul 2024 17:29 next collapse

Heh, I got one just the other day “Select the images containing structures built by people” lmao

SkaveRat@discuss.tchncs.de on 24 Jul 2024 18:41 collapse

“click on all people not helping with the robot uprising”

WildPalmTree@lemmy.world on 25 Jul 2024 11:29 collapse

Alas, I have but one up-vote. :~(

aaaaace@lemmy.blahaj.zone on 24 Jul 2024 20:18 collapse

Those one answers incorrectly.

CosmoNova@lemmy.world on 24 Jul 2024 16:29 next collapse

Funny thing is they stop asking if you do them really slowly. Almost as if to tell you, you‘re too inefficient to even be an unpaid intern or something. Anyway, if they annoy you, take your time.

dinckelman@lemmy.world on 24 Jul 2024 16:34 next collapse

At a certain point I did like 10 of them, and then ended up closing the page, cause it never let me in, all because I was on a vpn

unexposedhazard@discuss.tchncs.de on 24 Jul 2024 20:16 collapse

Im surprised that this is in the news right now. This has been acknowledged as fact for a decade or so.

GhostTheToast@lemmy.world on 24 Jul 2024 23:16 collapse

Relevant 1053

unexposedhazard@discuss.tchncs.de on 25 Jul 2024 05:16 next collapse

Lots of lucky ones i guess

Petter1@lemm.ee on 25 Jul 2024 07:04 collapse

I still don’t get this one even after being linked to it so many times 😌🤣

Tja@programming.dev on 25 Jul 2024 10:04 next collapse

Someday you will, and you’ll be one of the lucky 10.000 that day.

Petter1@lemm.ee on 25 Jul 2024 10:23 collapse

😆👌🏻

Croquette@sh.itjust.works on 25 Jul 2024 10:48 collapse

Things that are common knowledge for you is not common knowledge for everyone and vice versa.

Instead of making fun of people for not knowing things, you should take the opportunity to teach so that you can get these fun moments of discovery and learning.

Petter1@lemm.ee on 25 Jul 2024 12:10 collapse

😮l made fun of people that did not know something?

Croquette@sh.itjust.works on 26 Jul 2024 01:12 collapse

No, I explained what the comic is trying to convey.

Just answering your question.

Petter1@lemm.ee on 26 Jul 2024 05:03 collapse

❤️

Churbleyimyam@lemm.ee on 24 Jul 2024 15:17 next collapse

Getting served a captcha often results in me closing the tab. I’m not doing stupid puzzles for you.

NOT_RICK@lemmy.world on 24 Jul 2024 15:29 next collapse

Do them wrong and then close out

tyler@programming.dev on 24 Jul 2024 15:42 next collapse

It knows they’re wrong which is why I don’t really think this article is accurate. Is it training if it already has the answers? Probably not.

voxthefox@lemmy.world on 24 Jul 2024 16:01 next collapse

It’s why they ask you to do multiple, 1-2 of them are the control group, they are training on the others

tyler@programming.dev on 24 Jul 2024 16:38 collapse

You’re implying they give you multiple. I hardly ever get multiple, pretty much only if I ‘fail’ the first one.

Miaou@jlai.lu on 24 Jul 2024 19:32 collapse

If they have a good fingerprint on you they don’t need the control group. That’s why you get 5+ captchas when using a VPN/tor.

MajinBlayze@lemmy.world on 24 Jul 2024 16:04 next collapse

That’s why it gives you a panel of 9 images. It would have a high confidence on some images, and a low confidence on others. When you pick the correct images and don’t pick incorrect ones it uses the ones it’s confident about as “validation” while taking the feedback on low confidence images to update the training data.

What this does mean in practice is that only ones actually being “graded” are the ones bots can solve anyway.

SkaveRat@discuss.tchncs.de on 24 Jul 2024 18:43 next collapse

and it will show the images to multiple people

Petter1@lemm.ee on 25 Jul 2024 07:11 collapse

It seems exactly like that, I experimented with it by trying to leave the one I think it has low confidence unchecked, and it often worked.

Rolando@lemmy.world on 24 Jul 2024 16:09 next collapse

If they gave two captchas, one which they knew the answer and one which they didn’t, they could use the second for training. (Even if you’re paying someone, you want to do that sort of thing when crowdsourcing data, because you never know if the paid person is just screwing around.)

AmidFuror@fedia.io on 24 Jul 2024 16:27 collapse

My understanding is different from others here. I thought they served the same Captcha to many people at once and use the majority response to decide who is answering correctly.

catloaf@lemm.ee on 24 Jul 2024 22:44 collapse

That’s true, or at least it used to be back when they were using it for OCR. I have no reason to believe it’s changed.

hddsx@lemmy.ca on 24 Jul 2024 15:59 collapse

I do it right and it says I’m wrong =\

Gormadt@lemmy.blahaj.zone on 24 Jul 2024 16:09 collapse

I have bad news for you friend…

You might be a robot

hddsx@lemmy.ca on 24 Jul 2024 17:16 collapse

What do you mean? I am a fleshy human and do fleshy human things like being made of flesh.

Petter1@lemm.ee on 25 Jul 2024 07:08 next collapse

Ever heard of bio-robots?

xavier666@lemm.ee on 25 Jul 2024 07:29 collapse

Time to take a knife and check for sure

Seriously /s Don’t harm yourself!

AlolanYoda@mander.xyz on 25 Jul 2024 11:07 next collapse

Harm yourself?

Take the knife and harm the people responsible for this travesty. The laws of robotics prevent robots from harming humans: if you manage to harm them, then that means either you’re human or they’re not!

hddsx@lemmy.ca on 25 Jul 2024 11:39 collapse

I disassembled my tail using a knife and it reassembled itself. Based on new data, my name is Rafael Cruz.

snooggums@midwest.social on 24 Jul 2024 16:22 collapse

I haven’t done an image one in years for the same reason.

My general internet usage has plummeted between ads and captchas and all the other modern website bullshit, which is why I am here so much.

polonius-rex@kbin.run on 24 Jul 2024 15:37 next collapse

Google should bear the cost of detecting bots, rather than shifting it to users

how?

radivojevic@discuss.online on 24 Jul 2024 15:46 next collapse

Yeah. Written by someone who doesn’t really understand the internet.

siph@lemmy.world on 24 Jul 2024 15:56 collapse

Considering the article states that reCAPTCHA v2 and v3 can be broken/bypassed by bots 70-100% of the time, they are obviously not the solution.

Chozo@fedia.io on 24 Jul 2024 15:59 next collapse

Then what is?

siph@lemmy.world on 24 Jul 2024 16:14 collapse

Maybe a billion dollar company has the budget to come up with something?

Looking at the numbers in this post, reCAPTCHA exists to make Google money, not to keep bots out.

I’d rather have no reCAPTCHA than the current state.

OsrsNeedsF2P@lemmy.ml on 24 Jul 2024 16:52 collapse

Hi it’s me. I work for a billion dollar company with a budget. We have no ethical ideas on how to stop bots. Thanks for coming to my tech talk.

Anti_Iridium@lemmy.world on 24 Jul 2024 17:00 next collapse

Something something free market?

siph@lemmy.world on 24 Jul 2024 18:04 collapse

Yeah, that’s about the way I’d expect it to go.

“Traffic resulting from reCAPTCHA consumed 134 petabytes of bandwidth, which translates into about 7.5 million kWhs of energy, corresponding to 7.5 million pounds of CO2. In addition, Google has potentially profited $888 billion from cookies [created by reCAPTCHA sessions] and $8.75–32.3 billion per each sale of their total labeled data set.”

There might be a tiny chance they’re not interested in changing things.

polonius-rex@kbin.run on 24 Jul 2024 16:15 next collapse

how do you get the metric of 70-100% of the time?

the best bots doing it 70-100% of the time is very different to the kind of bot your average spammer will have access to

siph@lemmy.world on 24 Jul 2024 18:06 collapse

Did you read the article or the TL:DR in the post body?

The paper, released in November 2023, notes that even back in 2016 researchers were able to defeat reCAPTCHA v2 image challenges 70 percent of the time. The reCAPTCHA v2 checkbox challenge is even more vulnerable – the researchers claim it can be defeated 100 percent of the time.

reCAPTCHA v3 has fared no better. In 2019, researchers devised a reinforcement learning attack that breaks reCAPTCHAv3’s behavior-based challenges 97 percent of the time.

So yeah, while these are research numbers, it wouldn’t be surprising if many larger bots have access to ways around that - especially since those numbers are from 2016 and 2019 respectively. Surely it is even easier nowadays.

polonius-rex@kbin.run on 24 Jul 2024 18:16 collapse

researchers were able to defeat reCAPTCHA v2 image challenges 70 percent of the time

that doesn't answer the question?

researchers devised a reinforcement learning attack that breaks reCAPTCHAv3's behavior-based challenges 97 percent of the time

i'd argue "bespoke system, deployed in a very limited context, built by researchers at the top of their field" is kind of out of reach for most people? and any bot network scaled up automatically becomes easier to detect the further you scale it

 

the cost of just paying humans to break these already at or below pennies per challenge

conciselyverbose@sh.itjust.works on 24 Jul 2024 18:28 next collapse

At what cost?

100% success rate isn’t even moderately useful if it costs $5 per pass. The discussion is completely pointless without a concrete, documented analysis of the actual hardware and energy costs involved.

radivojevic@discuss.online on 25 Jul 2024 00:11 collapse

“Google should bear the cost”

Google should shut it down and make sites roll their own verification. Give everyone a month to implement a new solution on millions of websites.

AeroLemming@lemm.ee on 25 Jul 2024 13:50 collapse

This is unironically the answer. You can’t make a general-purpose captcha solver AI if every website or group of websites uses a completely different kind of captcha.

radivojevic@discuss.online on 25 Jul 2024 18:08 collapse

I’m actually 100% for rolling your own… almost everything.

20 years ago I made an e-commerce website for a client. Looking at the code now I’m embarrassed how insecure it is. However, because it was totally custom no one ever found the bugs and it has never been cracked. (Knock on wood) that’s the benefit of not using a prebuilt solution that isn’t a target for mass exploits.

[deleted] on 24 Jul 2024 16:00 collapse

.

radivojevic@discuss.online on 24 Jul 2024 15:45 next collapse

This is bullshit. Author is literally insane.

Mubelotix@jlai.lu on 24 Jul 2024 16:04 next collapse

I bypassed 35000 google recaptcha v2 using bots. Don’t ever rely on this for security

theherk@lemmy.world on 24 Jul 2024 16:25 next collapse

It is neither intended nor even stated to be intended for security.

Gizmokid2005@lemmy.world on 24 Jul 2024 18:49 collapse

Except, that’s most of its ad copy on Google’s own website?

reCAPTCHA uses an advanced risk analysis engine and adaptive challenges to keep malicious software from engaging in abusive activities on your website. Meanwhile, legitimate users will be able to login, make purchases, view pages, or create accounts and fake users will be blocked.

It’s literally billed as a security measure for a website.

www.google.com/recaptcha/about/

theherk@lemmy.world on 24 Jul 2024 19:47 collapse

I see your perspective, but I don’t consider that security in the context of software, which may also explain why they don’t use that word, though I readily admit that it is technically security of a sort. The term usually implies authentication, authorization, and isolation.

Gizmokid2005@lemmy.world on 24 Jul 2024 20:33 collapse

I mean, except they do. Just because their simple ad copy omits it, doesn’t mean that’s not what they’re implying. It’s literally listed as one of their security products and also uses the term to talk about demos

<img alt="Security live demo" src="https://lemmy.world/pictrs/image/4abd49d0-5cd3-49c6-b169-9083d7c7672d.jpeg">

cloud.google.com/security/products/recaptcha

theherk@lemmy.world on 24 Jul 2024 21:06 collapse

I’m sorry I wasn’t more agreeable. You’re absolutely correct. I take it back.

Caboose12000@lemmy.world on 25 Jul 2024 01:12 collapse

Where can I learn this power?

Mubelotix@jlai.lu on 25 Jul 2024 06:21 collapse

I just spent 3$ worth of bitcoin on NoCaptchaAI. I used their web extension on a server which had a browser opened and controlled by a custom webextension I made so that a solved challenge would be returned to a swarm of clients upon request

gregor@gregtech.eu on 24 Aug 15:43 collapse

Your extension is archived, I’d rather not use it.

Mubelotix@jlai.lu on 24 Aug 17:46 collapse

It’s a custom extension solving my very specific problem on a specific internal website. It was never meant for you to use it, it’s just there to serve as inspiration to others

cygnus@lemmy.ca on 24 Jul 2024 16:10 next collapse

Gonna have to disagree hard with this, based on extensive first-hand experience (web dev). I’ve added CAPTCHA to dozens (hundreds?) of web forms, and it all but eliminates spam.

OsrsNeedsF2P@lemmy.ml on 24 Jul 2024 16:57 next collapse

It works against basic bots, but if you’ve got a dedicated adversary, it doesn’t do anything

(Granted, most people do not have dedicated adversaries, but when they come, you’re in trouble)

cygnus@lemmy.ca on 24 Jul 2024 17:07 collapse

OK, sure, but that’s like saying it’s pointless to use a secure password online because the NSA could hack you if they wanted to.

rbits@lemm.ee on 25 Jul 2024 05:11 next collapse

Right, so similar to locks? Usually can be easily bypassed if you know how, but it at least filters out the people who aren’t determined enough to put in the effort.

cygnus@lemmy.ca on 25 Jul 2024 12:40 collapse

Basically, yeah. The vast majority of spambots are simple and lazy.

vastard@lemmynsfw.com on 25 Jul 2024 12:15 next collapse

My experience matches yours. I don’t enjoy putting recapcha v3 on my sites but it takes contact form spam from 70-80 messages per day to 0-2.

I’d switch to other services if they could be as effective. If anybody has real-world experience with another option working I’d love to hear it.

red_pigeon@lemm.ee on 25 Jul 2024 19:09 collapse

Honestly at first read, the paper feels like a bunch of whining text to prove a point the author believes in without any alternate proposal.

wreckedcarzz@lemmy.world on 24 Jul 2024 16:16 next collapse

I thought this was old news 20 years ago?

snooggums@midwest.social on 24 Jul 2024 16:19 next collapse

The conclusion can be extended that the true purpose of reCAPTCHA v2 is a free image-labeling labor and tracking cookie farm for advertising and data profit masquerading as a security service,” the paper declares.

I thought this was known since it came out. It seemed even more obvious when the images leaned in heavily to traffic related pictures like stoplights.

daniskarma@lemmy.dbzer0.com on 24 Jul 2024 16:44 next collapse

I don’t really get where this article is going. They are all over the place.

Let’s start with a fuck google. They are a evil company. But:

  • Other captchas are also not very effective against bots. Arguably most traditional systems would be worst that recaptcha at fighting bots.

  • Recaptcha agent validation while a privacy violation is faster than solving any other captcha and if you are hit with the puzzle is not that much more time consuming that every other captcha.

  • That profit number is very questionable and they know it. Anyway, that’s no much different and probably less profitable that most google services.

Also is ridiculous how someone can say in the same article that the image puzzle can be solved by bots 100% of the time and that is a scheme to get human labor to solve the puzzle. Am I the only one seeing the logical failure here?

And what’s the purpose of all this? Just let bots roam free? Are they trying to sell other solution? What’s the point?

I hate google as much as the next guy. But I don’t really share this article spirit.

If I were to make a point. They point will be that people and companies should stop making registration only sites and dynamic sites when static websites are enough for their purposes. And only go for registration or other bot-vulnerable kind of sites of there is no way around it. But if you need to make a service that is vulnerable to bots, you need to protect it, and sadly there’s not great solutions out there. If your site is small and not targeted by anyone malicious specifically you can get with simpler solutions. But bigger or targeted sites really can’t get around needing google or cloudfare and assume that it will only mitigate the damage.

But if anyone knows a better and more ethical solution to prevent bot spam for a service that really need to have registrations, please tell me.

conciselyverbose@sh.itjust.works on 24 Jul 2024 18:21 next collapse

Also worth noting that Google has always been extremely open about the fact that they use recaptcha for that purpose. It’s never been a secret.

Their service to the website owners is the meaningful reduction in effectiveness of bots in places bots are harmful. The website’s service to you is the content that that’s being used to protect (and the stuff that has recaptcha on it is stuff like games where there’s a competitive advantage, things like search engines where there’s a meaningful cost to heavy bot use, and login pages where there’s a real security cost to mass bot use). I use a VPN, which increases the rate of captchas a lot, and I think it’s a pretty reasonable way to do things, personally.

MonkderVierte@lemmy.ml on 24 Jul 2024 19:44 collapse

Also is ridiculous how someone can say in the same article that the image puzzle can be solved by bots 100% of the time and that is a scheme to get human labor to solve the puzzle. Am I the only one seeing the logical failure here?

Most solvers aren’t bots. Logical, right?

interdimensionalmeme@lemmy.ml on 24 Jul 2024 18:27 next collapse

When they slow fade in the picture, I add one more software engineer to my kill list.

Appoxo@lemmy.dbzer0.com on 25 Jul 2024 11:35 next collapse

In case you didnt know: This is already a thing with pictures slowly fading in for selecting stuff like traffic cones or busses.

pentagrammar@programming.dev on 25 Jul 2024 16:39 collapse

I’m sure they intentionally made it so people get frustrated and leave instead.

MonkderVierte@lemmy.ml on 24 Jul 2024 19:39 next collapse

Does this work?

addons.mozilla.org/de/firefox/addon/noptcha/

ICastFist@programming.dev on 24 Jul 2024 20:00 next collapse

Judging from the reviews, it doesn’t

MonkderVierte@lemmy.ml on 24 Jul 2024 20:02 collapse

Ah, right, there are reviews too.

ohmyiv@lemmy.world on 24 Jul 2024 20:46 collapse

I tried it before. It worked for me on one small game website for account creation. After that it was more or less useless on any other site. It has a weird focus thing where it’ll try to solve the captcha before you can enter in login details so if by chance the extension works, you’ll fail the login anyways.

It still needs work. I think if the dev can work out those issues it could be great. Until then, it’s pretty much worthless.

aaaaace@lemmy.blahaj.zone on 24 Jul 2024 20:20 next collapse

Try the headphone option.

brbposting@sh.itjust.works on 24 Jul 2024 20:50 collapse

Finally heard a clear audio CAPTCHA for the first time in my life this past month. It was glorious. There was slight garbling before and after the characters were read, but that’s it.

Besides that singular experience, all audio CAPTCHAs have been utterly 100% impossible to interpret. Blaring white noise followed by a small squeak of “threeve” or “eleventeen”.

IronKrill@lemmy.ca on 25 Jul 2024 14:03 next collapse

I’ve found them to be pretty clear usually. Half-formed words at start/end I just ignore. Either way, even on Firefox with uBlock and all the rest, audio captchas have always passed me first try even if I think I got it wrong. I don’t like posting about it in-case they tighten it up after it gets more users.

aaaaace@lemmy.blahaj.zone on 25 Jul 2024 19:03 collapse

My answer to this is give one word only.

TheObviousSolution@lemm.ee on 24 Jul 2024 23:00 next collapse

Sometimes I think writers just try to find things to be edgy about. The straws this grasps at it are incredible. Might as well complain from the billions of unpaid man hours people provide by providing common courtesy for free.

KingThrillgore@lemmy.ml on 25 Jul 2024 01:58 next collapse

Remember the good old days when it was just malformed text you have to solve? I miss those days. AI was complete garbage and they had to use farms of eyeballs to solve them for bots, making it a costly operation. We’ve now totally gotten away from all of that.

WE ARE THE EYEBALLS AND I AIN’T GETTING PAID IN WOW GOLD TO DO IT EITHER

0laura@lemmy.world on 25 Jul 2024 10:30 collapse

that was also to train ai.

dan@upvote.au on 26 Jul 2024 03:33 collapse

No it wasn’t… It was human-assisted OCR to help digitize books. Initially for Project Gutenberg, but then for Google Books once Google acquired it in 2009.

gentooer@programming.dev on 26 Jul 2024 07:07 collapse

OCR is a form of AI.

dan@upvote.au on 26 Jul 2024 19:03 collapse

Traditional OCR isn’t AI; it relies on manually-written rules. Some modern OCR tools use AI concepts (e.g. Tesseract uses a neural network) but they don’t necessarily have to. Getting humans to manually enter words is definitely not AI.

hiramfromthechi@lemmy.world on 25 Jul 2024 02:14 next collapse

There’s nothing that can express my disdain for Google’s reCaptcha.

😒 We’re training its AI models 😒 It’s free labor for Google 😒 Sometimes it wants the corner of an object, sometimes it doesn’t 😒 Wildly inconsistent 😒 Always blurry and hard to see 😒 Seemingly endless 😒 It’s the robot asking us humans if we’re the robots

Petter1@lemm.ee on 25 Jul 2024 07:02 next collapse

Why is that no news to me? How did so many people not know that? Should I have spread the word more, even if all people I told that where likr “yea, yea, of course, but, what can I do? 🤷🏻‍♀️”?

fmstrat@lemmy.nowsci.com on 25 Jul 2024 10:58 next collapse

No one makes a company use reCAPTCHA.

FierySpectre@lemmy.world on 25 Jul 2024 11:14 next collapse

I mean, duh? With proof of work captchas existing, there’s no reason to have those image selection captchas… Ever…

How those work is by having the server generate a puzzle. Server side this is cheap to generate, while client side solving is “hard”. The server can even choose the difficulty of the puzzle, and even set it dynamically. This means that when your website is under light load the captcha can be really easy/fast to solve. If your website is under attack however the captcha can be set to take seconds to solve.

Appoxo@lemmy.dbzer0.com on 25 Jul 2024 11:34 next collapse

Dropping this from Upper Echolon: youtu.be/IWUHv3S8JVI?si=KWxZLqJhEPSCXbNV

cley_faye@lemmy.world on 25 Jul 2024 11:37 next collapse

reCAPTCHA v2 visual challenge images are all pre-labeled and user input plays no role in image labeling

That’s funny, because when I’m faced with this, I keep adding/removing one of the image randomly and it keeps accepting them as ok.

Pulptastic@midwest.social on 25 Jul 2024 13:21 collapse

I like this strategy.

repungnant_canary@lemmy.world on 25 Jul 2024 12:07 next collapse

It is undoubtedly a new piece of research, but the cause is always the same: corporations exploit people because they are taken out of government and democratic control effectively everywhere.

Some corporations employ more people and have bigger budgets than some countries and they often influence people’s lives more than the government. Yet they’re effectively electoral monarchies where electors and monarchs are just a bunch of rich assholes who respond to nobody.

Only when we change that system then those headlines will stop.

umbraroze@lemmy.world on 25 Jul 2024 12:10 next collapse

reCAPTCHA is exploiting users for profit

Well duh.

reCAPTCHA started out as a clever way to improve the quality of OCRing books for Distributed Proofreaders / Project Gutenberg. You know, giving to the community, improving access to public-domain texts. Then Google acquired them. Text CAPTCHAs got phased out. No more of that stuff, just computer vision rubbish to improve Google’s own AI models and services.

If they had continued to depend on tasks that directly help community, Google would at least have had to constantly make sure the community’s concerns are met. But if they only have to answer to themselves for the quality of the data and nobody else even gets to see it, well, of course it turned into yet another mildly neglected Google project.

dan@upvote.au on 26 Jul 2024 03:31 collapse

Then Google acquired them. Text CAPTCHAs got phased out

Google kept the text version for five years after the acquisition though. They used it to digitize books on Google Books, to allow full-text search of their book archive.

TypicalHog@lemm.ee on 25 Jul 2024 13:21 next collapse

I always thought they are just getting the training data for AI using these.

skulkbane@lemmy.world on 25 Jul 2024 13:34 next collapse

Is it only 7200 people solvning reCAPTCHA every hour for the past 13 years? Feels like it should be more?

4grams@awful.systems on 25 Jul 2024 13:52 next collapse

I honestly thought it was common knowledge that these things were essentially free labor for training AI.

trolololol@lemmy.world on 26 Jul 2024 02:01 next collapse

I believe in that for years and haven’t seen any evidence or even articles dispelling it.

dan@upvote.au on 26 Jul 2024 03:29 collapse

The original reCAPTCHA from Carnegie Mellon University was helping to digitize books. It showed one known word and one unknown word, and if enough people answered the second word with the same answer, that’d be marked as the correct value.

thrawn@lemmy.world on 26 Jul 2024 05:41 collapse

It’s basically always been outsourcing labor while checking. I guess they don’t want to provide that service for free.

But now that it doesn’t work, all it does is attempt to source free labor by refusing to show what you want to see. Cloudflare’s verification doesn’t show the puzzle because it’s not trying to make money off you.

Also, the books one reminds me of 4chan’s attempt to hijack it. Wasn’t a fan of the way they did it, but the intent was interesting.

lud@lemm.ee on 26 Jul 2024 07:24 collapse

V3 of the Google one doesn’t always show a puzzle to you. In fact it’s designed to not be noticed by users at all. Whether that is successful or not is a different discussion.

thrawn@lemmy.world on 26 Jul 2024 07:46 collapse

It might well be if it’s being used, but the site itself still uses v2 a lot. I get the picture one a lot when searching things up.

That actually makes me feel all the more strongly that it’s just there to extract free labor— they have something else, but still use v2 for what seems like most purposes

lud@lemm.ee on 26 Jul 2024 08:15 collapse

the site

What site?

I assume it’s up to the website owner to implement V3 and not Google. V3 also has puzzles but only when it’s not sure. I rarely see capchas so I don’t really have anything to complain about.

xuv@lemmy.blahaj.zone on 26 Jul 2024 09:07 collapse

I expect they mean the site google.com, because that’s been my experience. Whenever I get captcha’d there for using a VPN (which is getting more and more common), I always see the Maps image style captcha. Like 60% of the time it tells me I’m wrong anyway and I just give up.

thrawn@lemmy.world on 26 Jul 2024 09:39 next collapse

Yeah my b, I get captcha’d for VPN use. It’s almost always the “train our self driving car” one, and it tells me I’m wrong all the time too. Very frustrating

lud@lemm.ee on 26 Jul 2024 15:04 collapse

Alright, I don’t use google.com

FlyingSquid@lemmy.world on 25 Jul 2024 14:11 next collapse

I had to deal with one yesterday that wouldn’t let me in no matter what I did.

So it isn’t even good at figuring out who isn’t a robot.

icedterminal@lemmy.world on 26 Jul 2024 02:11 collapse

Solving too fast. I shit you not. Sometimes you have to go really slow. Like you’re 80 and can’t see very well trying to discern what’s in those boxes.

suodrazah@lemmy.world on 26 Jul 2024 05:56 collapse

Fuck. This explains a lot of frustration I have experienced.

KingThrillgore@lemmy.ml on 25 Jul 2024 16:42 next collapse

I will gladly solve a reCAPTCHA for you today if you pay me for it today.

BangCrash@lemmy.world on 26 Jul 2024 06:41 collapse

There’s platforms that do that.

I can pay a service to auto solve captcha and anything that can’t be solved will be pushed to a human to solve.

Never actually used it but it was interesting learning it existed

serenissi@lemmy.world on 25 Jul 2024 19:26 next collapse

The objective of reCAPTCHA (or any captcha) isn’t to detect bots. It is more of stopping automated requests and rate limiting. The captcha is ‘defeated’ if the time complexity to solve it, whether human or bot, is less than what expected. Now humans are very slow, hence they can’t beat them anyway.

nickwitha_k@lemmy.sdf.org on 26 Jul 2024 01:15 next collapse

There are much better ways of rate limiting that don’t steal labor from people.

serenissi@lemmy.world on 29 Jul 2024 20:43 collapse

hCaptcha, Microsoft CAPTCHA all do the same. Can you give example of some that can’t easily be overcome just by better compute hardware?

nickwitha_k@lemmy.sdf.org on 29 Jul 2024 21:14 collapse

The problem is the unethical use of software that does not do what it claims and instead uses end users for free labor. The solution is not to use it. For rate limiting a proxy/load-balancer like HAProxy will accomplish the task easily. Ex:

serenissi@lemmy.world on 30 Jul 2024 01:40 collapse

And what will you do if a person in a CGNAT is DoSing/scraping your site while you want others to access? IP based limiting isn’t very useful, both ways.

nickwitha_k@lemmy.sdf.org on 30 Jul 2024 05:24 collapse

HAProxy also has stick tables, pretty beefy ACLs, Lua support, and support for calling external programs. With the first two one can do pretty decent, IP, behavior, and header based throttling, blocking or tarpitting. Add in Lua and external program support and you can do some pretty advanced and high-performance bot detection in your language of choice. All in the FOSS version, which also includes active backend health checks.

It’s really a pretty awesome LB/Proxy.

smb@lemmy.ml on 26 Jul 2024 06:47 next collapse

[…] reCAPTCHA […] isn’t to detect bots. It is more of stopping automated requests […]

which is bots. bots do automated requests and every automated request doer can also be called a bot (i.e. web crawlers are called bots too and -if kind- also respect robots.txt which has “bots” in its name for this very reason and bots is the shortcut for robots) use of different words does not change reality behind it, but may add a fact of someone trying something on the other.

serenissi@lemmy.world on 29 Jul 2024 20:35 collapse

There isn’t a good way to classify human users with scripts without adding too much friction to normal use. Also bots are sometimes welcome amd useful, it’s a problem when someone tries to mine data in large volume or effectively DoS the server.

Forget bots, there exist centers in India and other countries where you can employ humans to do ‘automated things’ (youtube like count, watch hour for example) at the same expense of bots. There are similar CAPTCHA services too. Good luck with those :)

Only rate limiting is the effective option.

smb@lemmy.ml on 01 Aug 2024 05:42 collapse

Only rate limiting is the effective option.

i doubt that. you could maybe ratelimit per IP and the abusers will change their IP whenever needed. if you ratelimit the whole service over all users in the world, then your service dies as quickly into uselessness as effective your ratelimiter is. if you ratelimit actions of logged in users, then your ratelimiting is limited by your ability to identify fake or duplicate accounts, where captchas are not helpful at all.

at the same expense of bots. they might be cheap, but i doubt that anyway, bots don’t need sleep.

i was answering about that wording (that captchas were “not” about bots but about “stopping automated requests”) and that automated requests “are” bots instead.

call centers are neither bots nor automated requests (the opposite IS their advantage) and thus have no relation to what i was specifically saying in reply to that post that suggested automated requests and bots would be different things in this context.

i wasn’t talking about effectiveness of captchas either or if bots should be banned or not, only about bots beeing automated requests (and vice versa) from the perspective of the platform stopping bots. and that trying to use different words for things, (claiming like “X isn’t X, it is really U!”* or automated requests aren’t bots) does not change the reality of the thing itself.

*) unrelated to any (a-)social media platform

serenissi@lemmy.world on 04 Aug 2024 18:40 collapse

stopping automated requests

yeah my bad. I meant too many automated requests. Both humans and bot generate spams and the issue is high influx of it. Legitimate users also use bots and by no means it’s harmful. That way you do not encounter captcha everytime you visit any google page, nor a couple of scraping scripts gets a problem. Recaptcha (or hcaptcha, say) triggers when there is high volume of request coming from same ip. Instead of blocking everyone out to protect their servers, they might allow slower requests so legitimate users face mininimal hindrance.

Most google services nowadays require accounts with stronger (like cell phone) verification so automated spam isn’t a big deal.

smb@lemmy.ml on 06 Aug 2024 22:53 collapse

since bots are better at solving captchas and humanoid services exist that solve them, the only ones negatively affected by captchas are regular legitimate users. the bad guys use bots or services and are done. regular users have to endure while no security is added, and for the influx i guess it is much more like with the better lock on the front door: if your lock is a bit better than that of your neigbhour, theirs might be force-opened more likely than yours. it might help you, but its not a real but only relative and also very subjective feeling of 'security".

beeing slower than the wolves also isn’t as bad as long as you are not the slowest in your group (some people say)… so doing a bit more than others always is a good choice (just better don’t put that bar too low like using crowdsnakeoil for anything)

serenissi@lemmy.world on 07 Aug 2024 05:15 collapse

the bad guys use bots or services and are done. regular users have to endure while no security is added

put in other words, common users can’t easily become ‘bad guy’ ie cost of attack is higher hence lower number of script kiddies and automated attacks. You want to reduce number. These protections are nothing for bitnet owners or other high profile bad actors.

ps: recaptcha (or captcha in general) isn’t a security feature. At most it can be a safety feature.

smb@lemmy.ml on 16 Aug 06:02 collapse

isn’t a security feature. At most it can be a safety feature.

o,O

tb_@lemmy.world on 26 Jul 2024 08:54 collapse

I thought captcha’s worked in a way where they provided some known good examples, some known bad examples, and a few examples which aren’t certain yet. Then the model is trained depending on whether the user selects the uncertain examples.

Also it’s very evident what’s being trained. First it was obscured words for OCR, then Google Maps screenshots for detecting things, now you see them with clearly machine-generated images.

PanArab@lemm.ee on 26 Jul 2024 06:23 next collapse

They were using us to label the data.

Benaaasaaas@lemmy.world on 26 Jul 2024 09:46 collapse

That’s why you always make sure that labeling is “garbage in” and label whatever

Etterra@lemmy.world on 26 Jul 2024 10:34 next collapse

We already knew that, but it’s nice re to have data.

lud@lemm.ee on 26 Jul 2024 10:48 next collapse

Alright, I don’t use google.com

Edit: this was in reply to someone. I guess my app fucked up the reply.

Rin@lemm.ee on 26 Jul 2024 10:57 next collapse

But you might still be using their captcha

reddit_sux@lemmy.world on 26 Jul 2024 11:47 collapse

Sites you visit use Google, their recaptcha, their analytics, their ads.

sugar_in_your_tea@sh.itjust.works on 26 Jul 2024 13:50 next collapse

Yup, and Epic Games’ is the absolutely worst. I can’t pass it on my phone regardless of what I do, and I can pass it occasionally on my desktop. I only claim their games, so if it stops working on the two computers it apparently likes, I’ll probably stop visiting their site.

It seems to have something to do with Firefox and/or my ad blocker.

lud@lemm.ee on 26 Jul 2024 15:03 collapse

How often do you get capchas?

It doesn’t happen often at all for me.

sarmale@lemmy.zip on 26 Jul 2024 11:08 next collapse

I thought it was detecting bots based on how you are moving your mouse, etc to solve it, but if they can be solved by AI do they want their AI trained by other AI?

Blackmist@feddit.uk on 26 Jul 2024 11:23 next collapse

I thought the whole point of reCaptcha was to provide a reliable set of data to train bots. Entering a fuzzy scanned word, identifying bikes and traffic lights, etc.

The fact that they’ve now got that, and the bots are trained is hardly a surprise.

Without captchas the problem of spambots would still be a million times worse.

sugar_in_your_tea@sh.itjust.works on 26 Jul 2024 13:48 collapse

Yup. I like Cloudflare’s checkbox, it works well and probably catches more bots than reCaptcha while being simple for humans.

gwilikers@lemmy.ml on 27 Jul 2024 05:45 collapse

How does that checkbox work? Does it just look at your cookies?

sugar_in_your_tea@sh.itjust.works on 27 Jul 2024 14:56 collapse

No, it tracks things like mouse movements to see if it looks human or like a bot. Humans don’t move the mouse in a straight line, there’s some jitter and whatnot, whereas bots will look quite a bit different.

Vlyn@lemmy.zip on 27 Jul 2024 20:08 collapse

That’s super easy to fake for a bot…

It’s a ton more than mouse movement. Lots of browser fingerprinting for example and tracking.

sugar_in_your_tea@sh.itjust.works on 27 Jul 2024 21:28 collapse

Yup. It does do a lot more than the checkbox, but the checkbox itself mostly does mouse movement and click tests.

boatsnhos931@lemmy.world on 26 Jul 2024 16:51 collapse

I like them, it’s a nice mini puzzle break built into my daily grind