Lemmy newb here, not sure if this is right for this /c.

An article I found from someone who hosts their own website and micro-social network, and their experience with web-scraping robots who refuse to respect robots.txt, and how they deal with them.

  • @Jason2357@lemmy.ca
    link
    fedilink
    English
    132 days ago

    This is signal detection theory combined with an arms race that keeps the problem hard. You cannot block scrapers without blocking people, and you cannot inconvenience bots without also inconveniencing readers. You might figure something clever out temporarily, but eventually this truism will resurface. Excuse me while I solve a few more captchas.

    • @Tobberone@lemm.ee
      link
      fedilink
      English
      32 days ago

      The internet as we know it is dead, we just need a few more years to realise it. And I’m afraid that telecommunications will be going the same way, when no-one can trust that anyone is who they say anymore.

    • irmadlad
      link
      fedilink
      English
      12 days ago

      Excuse me while I solve a few more captchas.

      Buster for captcha.