aredridel@kolektiva.social ("Mx. Aria Stewart") wrote:
One thing I don't see anyone talking about that we probably should is the proliferation of captcha-busting anubis-busting browser-as-a-service services.
It's not that the big model companies are scraping the web and ignoring robots.txt. (Some are, almost certainly, but there are datasets to train on already and they're not scraping random sites so much)
It's that agent _users_ and the people serving them have a very large demand to access information with semi-automated systems. And they're building whole armies of ways around blocking.