Outer Web | Explore

Explore >> Select a destination

You are here		www.jeremiak.com Blockin' bots on Netlify
\|	\|	ericlathrop.com Blocking AI (Machine Learning) Bots from Training on Your Website Content by Eric Lathrop	1.3 parsecs away Travel
\|	\|	All sorts of companies are building machine learning models by crawling the web for training data. This is a form of copyright laundering, and the legality is questionable.	1.3 parsecs away Travel
\|	\|	www.andrlik.org TIL - Block AI bots with a CloudFront function · Ministry of Intrigue	1.1 parsecs away Travel
\|	\|	It is now clear that at least some AI companies are ignoring robots.txt that forbid them from scraping a site. Robb Knight wrote up a great guide for explicitly blocking those scraping bots via your Nginx config. However, this site is currently served by AWS CloudFront, which means that the content gets served without the request touching the source server. I was sure there had to be a way to do something similar with a CloudFront function, so I set out to try.	1.1 parsecs away Travel
\|	\|	lewisdale.dev Adding comments to my blog \| LewisDale.dev	1.3 parsecs away Travel
\|	\|		1.3 parsecs away Travel
\|	\|	audisto.com Website Indexability and Noindex Checker - Audisto Crawler	22.0 parsecs away Travel
\|		Use our scalable indexability checker to test how a robots directive noindex, robots.txt directive, canonical link, hreflang or duplicate content affects the indexability and SEO of your website.	22.0 parsecs away Travel