 
      
    | You are here | www.jeremiak.com | ||
| | | | | ethanmarcotte.com | |
| | | | | Here's how I'm blocking "artificial intelligence" bots, crawlers, and scrapers. | |
| | | | | www.andrlik.org | |
| | | | | It is now clear that at least some AI companies are ignoring robots.txt that forbid them from scraping a site. Robb Knight wrote up a great guide for explicitly blocking those scraping bots via your Nginx config. However, this site is currently served by AWS CloudFront, which means that the content gets served without the request touching the source server. I was sure there had to be a way to do something similar with a CloudFront function, so I set out to try. | |
| | | | | ericlathrop.com | |
| | | | | All sorts of companies are building machine learning models by crawling the web for training data. This is a form of copyright laundering, and the legality is questionable. | |
| | | | | www.dbaglobe.com | |
| | | A blog about on new technologie. Hands-on note about Hadoop, Cloudera, Hortonworks, NoSQL, Cassandra, Neo4j, MongoDB, Oracle, SQL Server, Linux, etc. | ||