Explore >> Select a destination


You are here

www.jeremiak.com
| | ethanmarcotte.com
4.9 parsecs away

Travel
| | Here's how I'm blocking "artificial intelligence" bots, crawlers, and scrapers.
| | www.andrlik.org
3.1 parsecs away

Travel
| | It is now clear that at least some AI companies are ignoring robots.txt that forbid them from scraping a site. Robb Knight wrote up a great guide for explicitly blocking those scraping bots via your Nginx config. However, this site is currently served by AWS CloudFront, which means that the content gets served without the request touching the source server. I was sure there had to be a way to do something similar with a CloudFront function, so I set out to try.
| | ericlathrop.com
6.2 parsecs away

Travel
| | All sorts of companies are building machine learning models by crawling the web for training data. This is a form of copyright laundering, and the legality is questionable.
| | www.dbaglobe.com
61.6 parsecs away

Travel
| A blog about on new technologie. Hands-on note about Hadoop, Cloudera, Hortonworks, NoSQL, Cassandra, Neo4j, MongoDB, Oracle, SQL Server, Linux, etc.