|
You are here |
www.madebydusk.com | ||
| | | | |
www.woorank.com
|
|
| | | | | Optimize your site's crawling and indexing. Tell search engines exactly where to find your XML sitemap in your robots.txt file. | |
| | | | |
www.ross.ws
|
|
| | | | | Michael Ross, freelance writer and web developer | |
| | | | |
ericlathrop.com
|
|
| | | | | All sorts of companies are building machine learning models by crawling the web for training data. This is a form of copyright laundering, and the legality is questionable. | |
| | | | |
tsak.dev
|
|
| | | With the recent news of OpenAI's web crawler respecting robots.txt and the ensuing scramble by seemingly everybody ensuring their robots.txt is blocking GPTBot, I was thinking if there wasn't a better solution to help our future AI overlords make sense of the world. As I am hosting all my sites on a tiny NUC using nginx and having previously played with its return directive I decided to reuse the same trick for visits of GPTBot. | ||