|
You are here |
kevingimbel.de | ||
| | | | |
tsak.dev
|
|
| | | | | With the recent news of OpenAI's web crawler respecting robots.txt and the ensuing scramble by seemingly everybody ensuring their robots.txt is blocking GPTBot, I was thinking if there wasn't a better solution to help our future AI overlords make sense of the world. As I am hosting all my sites on a tiny NUC using nginx and having previously played with its return directive I decided to reuse the same trick for visits of GPTBot. | |
| | | | |
www.jeremiak.com
|
|
| | | | | How I sniffed the user agent in an edge function to prevent some AI crawlers from accessing my site. | |
| | | | |
www.andrlik.org
|
|
| | | | | It is now clear that at least some AI companies are ignoring robots.txt that forbid them from scraping a site. Robb Knight wrote up a great guide for explicitly blocking those scraping bots via your Nginx config. However, this site is currently served by AWS CloudFront, which means that the content gets served without the request touching the source server. I was sure there had to be a way to do something similar with a CloudFront function, so I set out to try. | |
| | | | |
thehackernews.com
|
|
| | | The Hacker News | Cybersecurity Webinars - The Hacker News | ||