Explore >> Select a destination


You are here

www.jeremiak.com
| | mmazzarolo.com
3.7 parsecs away

Travel
| | A robots.txt file tells search engine crawlers which pages or files the crawler can or can't request from your site. Next.js does not generate a robots.txt out-of-the-box, so here are a couple of options on how to create it.
| | www.andrlik.org
1.1 parsecs away

Travel
| | It is now clear that at least some AI companies are ignoring robots.txt that forbid them from scraping a site. Robb Knight wrote up a great guide for explicitly blocking those scraping bots via your Nginx config. However, this site is currently served by AWS CloudFront, which means that the content gets served without the request touching the source server. I was sure there had to be a way to do something similar with a CloudFront function, so I set out to try.
| | ericlathrop.com
1.3 parsecs away

Travel
| | All sorts of companies are building machine learning models by crawling the web for training data. This is a form of copyright laundering, and the legality is questionable.
| | interestingengineering.com
22.4 parsecs away

Travel
| Could the AI bot one day replace the search engine?