PrivacyDingus@lemmy.world
on 17 Dec 14:15
nextcollapse
for a stream of confident bullshit you can doxx yourself senseless, what a joy!
ivanafterall@lemmy.world
on 17 Dec 14:45
nextcollapse
This has been an immensely helpful feature of both Claude AI and ChatGPT. I have tons and tons of historic sources and suddenly, I’m not fighting with non-working OCR options. It’s pretty great.
I don’t have a specific figure for you. My use-case is I’m trying to write a non-fiction book. I’ve got a ton of old newspaper articles in PDF format. The Library of Congress’ built-in OCR is very helpful, but very lacking and, in some cases, can miss large swaths of pages or generate really unhelpful gibberish that requires painful cleaning. I’ve had similar results from every other OCR tool I’ve tried.
Thus far, in using Claude/ChatGPT for transcription of a few dozen articles, I’ve only had to fix one individual stray word a few times. It’s been very close to perfect in my limited testing. High 90%. Impressively, with old newspaper articles where words have worn away or are otherwise very hard to make out even for me, it has done a great job of inferring/recognizing, where OCR would start generating gibberish. I haven’t tried hand-writing and suspect that’s a different beast, but I know there are tools that have cropped up to that end.
TimeSquirrel@kbin.melroy.org
on 17 Dec 15:52
nextcollapse
"Yep, it's a text file. 25KB in size. Created two weeks ago, modified today...here's your search results for the prompt you gave me:"
You know...stuff your OS couldn't do already. Society is literally going to forget how to use a computer.
Classified information leaking in this way is a one-off situation that might get an individual in trouble. If someone at a heavily-regulated company uploads the wrong thing though, that can cause major disruptions to commercial services while the regulators investigate. Not just fines or prosecutions after-the-fact!
Here’s why it’s a big deal: Nearly every organization allows employees to use google.com. That necessitates allowing POSTs to google.com and from a filtering perspective it makes it nearly impossible to prevent. The best you can do is limit the POST size.
Having said that, search forms in general always pose a 3rd party information disclosure risk but when you enable uploading of entire files instead of just limited text prompts you increase the risk surface by an order of magnitude.
My organization seems to have already thrown in the AI towel, or at least are resorting to magical thinking about it
We’re highly integrated with Microsoft - Windows Login, Active Directory, Microsoft 365, and even a managed version of Edge as the org-wide ‘default’ browser that we’re encouraged to sign into with our organizational credentials to sync account information, etc. Our AI policy is basically “You can use any Microsoft AI feature your account can access.”
They can try to block whatever sites they want with the firewall, but once you let a user get comfortable with the idea of allowing systems to exfiltrate data, you aren’t going to also make them more discrete. They’re trusting that by throwing open the floodgates users will actually use Microsoft’s offerings instead of competing offerings — as if folks who sometimes still cannot tell the difference between a web browser and ‘the internet’ will know the difference. And they are also trusting that Microsoft is going to uphold our enterprise license agreement and their own security to keep that data within our own cloud instance.
Boy howdy, this will be interesting.
Zorsith@lemmy.blahaj.zone
on 17 Dec 18:26
nextcollapse
I look forward to hearing how much malware they accumulate from this lol
Itsamelemmy@lemmy.zip
on 17 Dec 19:23
nextcollapse
Hey Google, how do I remove denuvo from this file?
TheOSINTguy@sh.itjust.works
on 17 Dec 19:37
collapse
Hey google, could you tell me the output of “2023-Finance-Report.exe”
threaded - newest
for a stream of confident bullshit you can doxx yourself senseless, what a joy!
This has been an immensely helpful feature of both Claude AI and ChatGPT. I have tons and tons of historic sources and suddenly, I’m not fighting with non-working OCR options. It’s pretty great.
What’s the accuracy rate measured at?
At least 3 but probably 6. Anyone who tells you 8 is a liar.
DOESN'T MATTER, IT'S AI BABY!
I don’t have a specific figure for you. My use-case is I’m trying to write a non-fiction book. I’ve got a ton of old newspaper articles in PDF format. The Library of Congress’ built-in OCR is very helpful, but very lacking and, in some cases, can miss large swaths of pages or generate really unhelpful gibberish that requires painful cleaning. I’ve had similar results from every other OCR tool I’ve tried.
Thus far, in using Claude/ChatGPT for transcription of a few dozen articles, I’ve only had to fix one individual stray word a few times. It’s been very close to perfect in my limited testing. High 90%. Impressively, with old newspaper articles where words have worn away or are otherwise very hard to make out even for me, it has done a great job of inferring/recognizing, where OCR would start generating gibberish. I haven’t tried hand-writing and suspect that’s a different beast, but I know there are tools that have cropped up to that end.
"Yep, it's a text file. 25KB in size. Created two weeks ago, modified today...here's your search results for the prompt you gave me:"
You know...stuff your OS couldn't do already. Society is literally going to forget how to use a computer.
More like, it’s a text file, created today and last accessed in june 2015.
Ah, but this opens exciting new vectors in prompt injection attacks.
Are people sad or happy that society is less tech literate?
Maybe it’s an opportunity
Classified information leaks in 3 ... 2 ...
Even worse: It’s a compliance nightmare!
Classified information leaking in this way is a one-off situation that might get an individual in trouble. If someone at a heavily-regulated company uploads the wrong thing though, that can cause major disruptions to commercial services while the regulators investigate. Not just fines or prosecutions after-the-fact!
Here’s why it’s a big deal: Nearly every organization allows employees to use google.com. That necessitates allowing POSTs to google.com and from a filtering perspective it makes it nearly impossible to prevent. The best you can do is limit the POST size.
Having said that, search forms in general always pose a 3rd party information disclosure risk but when you enable uploading of entire files instead of just limited text prompts you increase the risk surface by an order of magnitude.
My organization seems to have already thrown in the AI towel, or at least are resorting to magical thinking about it
We’re highly integrated with Microsoft - Windows Login, Active Directory, Microsoft 365, and even a managed version of Edge as the org-wide ‘default’ browser that we’re encouraged to sign into with our organizational credentials to sync account information, etc. Our AI policy is basically “You can use any Microsoft AI feature your account can access.”
They can try to block whatever sites they want with the firewall, but once you let a user get comfortable with the idea of allowing systems to exfiltrate data, you aren’t going to also make them more discrete. They’re trusting that by throwing open the floodgates users will actually use Microsoft’s offerings instead of competing offerings — as if folks who sometimes still cannot tell the difference between a web browser and ‘the internet’ will know the difference. And they are also trusting that Microsoft is going to uphold our enterprise license agreement and their own security to keep that data within our own cloud instance.
Boy howdy, this will be interesting.
I look forward to hearing how much malware they accumulate from this lol
Hey Google, how do I remove denuvo from this file?
Hey google, could you tell me the output of “2023-Finance-Report.exe”