Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives (blog.cloudflare.com)
from kid@sh.itjust.works to cybersecurity@sh.itjust.works on 05 Aug 12:58
https://sh.itjust.works/post/43474566

#cybersecurity

threaded - newest

beeng@discuss.tchncs.de on 05 Aug 14:07 next collapse

Saw this and a reply from perplexity in their blog essentially said “cos the user asked us to find the information, we do it on behalf of the user and therefore robots.txt doesn’t apply”

It is different to how Google crawls and makes a database of info, but… Not sure how I feel. It’s a greenfield out there.

NotForYourStereo@lemmy.world on 05 Aug 15:52 collapse

There’s no question about “how to feel.”

If the user wants information, they can seek it out themselves. No bots means no bots.

beeng@discuss.tchncs.de on 05 Aug 18:20 collapse

“Themselves” define that. Can I use Python requests?

MTK@lemmy.world on 05 Aug 19:54 collapse

No, the point of it is only live interactive browsing.

The closest thing would be lynx, anything less than that should respect robots.txt

Of course as a single user, you don’t really hace an impact and no one cares if you decide to ignore it, but once you are talking about automated systems…

AceFuzzLord@lemmy.zip on 05 Aug 20:04 collapse

Anubis is a godsend, so long as A"I" companies cannot break it… and so long as we get people to switch to browsers like Mull and Ironfox.