Lemmy newb here, not sure if this is right for this /c.

An article I found from someone who hosts their own website and micro-social network, and their experience with web-scraping robots who refuse to respect robots.txt, and how they deal with them.

  • @F04118F@feddit.nl
    link
    fedilink
    English
    242 days ago

    Interesting approach but looks like this ultimately ends up:

    • being a lot of babysitting / manual work
    • blocking a lot of humans
    • not being robust against scrapers

    Anubis seems like a much better option, for those wanting to block bots without relying on Cloudflare:

    https://anubis.techaro.lol/

    • irotsoma
      link
      fedilink
      English
      62 days ago

      Are there any guides to using it with reverse proxies like traefik? I’ve been wanting to try it out but haven’t had time to do the research yet.