 
      
    | You are here | arstechnica.com | ||
| | | | | www.searchenginejournal.com | |
| | | | | ChatGPT gets access to website content to learn from it. This is how to block your content from becoming AI training data. | |
| | | | | pxlnv.com | |
| | | | | After Robb Knight found - and Wired confirmed - Perplexity summarizes websites which have followed its opt out instructions, I noticed a number of people making a similar claim: this is nothing but a big misunderstanding of the function of controls like robots.txt. A Hacker News comment thread contains several versions of these two arguments: [...] | |
| | | | | sixcolors.com | |
| | | | | Six Colors by Jason Snell, Dan Moren and friends | |
| | | | | tsak.dev | |
| | | With the recent news of OpenAI's web crawler respecting robots.txt and the ensuing scramble by seemingly everybody ensuring their robots.txt is blocking GPTBot, I was thinking if there wasn't a better solution to help our future AI overlords make sense of the world. As I am hosting all my sites on a tiny NUC using nginx and having previously played with its return directive I decided to reuse the same trick for visits of GPTBot. | ||