How Alibaba builds its most efficient AI model to date (www.scmp.com)
from yogthos@lemmy.ml to technology@lemmy.ml on 16 Sep 18:16
https://lemmy.ml/post/36245375

web.archive.org/…/how-alibaba-builds-its-most-eff…

#technology

threaded - newest

muimota@lemmy.ml on 16 Sep 18:49 next collapse

AI race is between Chinese scientists and Chinese scientists based in US

algernon@lemmy.ml on 16 Sep 18:59 next collapse

…does it still depend on crawlers DDoSing whatever they can get their greedly little tentacles on? While also trying to pretend they’re not AI scrapers?

m532@lemmygrad.ml on 16 Sep 21:48 collapse

Ever heard of reusing data? Its not the AI wildwest anymore. Scraping random data gives low quality (try SD1.5 to see what I mean). Good models need high-quality datasets.

algernon@lemmy.ml on 16 Sep 22:48 collapse

I wonder why scrapers hit my sites with millions of requests every day. Alibaba in particular is quite aggressive there.

m532@lemmygrad.ml on 16 Sep 22:55 collapse

Prove it

algernon@lemmy.ml on 17 Sep 06:07 collapse

Here you go. Daily stats from my defense system. All those disguised bots? ~60% of them are from Alibaba’s ASN.

It is easy to verify, too: throw up any https site, and all the crawlers will be on your neck within days.

There is a reason why Anubis’s botPolicies.yaml includes Alibaba. There’s a reason why a whole lot of sites - Codeberg included - blocks their entire ASN on the firewall.

You’re welcome.

m532@lemmygrad.ml on 17 Sep 10:43 collapse

It seems like I was wrong and they do need more data. But I think they have every right to go into their enemy’s imperialism tool and disrupt it however they see fit.

m532@lemmygrad.ml on 16 Sep 22:24 collapse

Whoa its already released