Outer Web | Explore

Explore >> Select a destination

You are here		kevingimbel.de Startups, Trust, and AI \| kevingimbel.de
\|	\|	pxlnv.com On Robots and Text - Pixel Envy	4.1 parsecs away Travel
\|	\|	After Robb Knight found - and Wired confirmed - Perplexity summarizes websites which have followed its opt out instructions, I noticed a number of people making a similar claim: this is nothing but a big misunderstanding of the function of controls like robots.txt. A Hacker News comment thread contains several versions of these two arguments: [...]	4.1 parsecs away Travel
\|	\|	www.andrlik.org TIL - Block AI bots with a CloudFront function · Ministry of Intrigue	2.6 parsecs away Travel
\|	\|	It is now clear that at least some AI companies are ignoring robots.txt that forbid them from scraping a site. Robb Knight wrote up a great guide for explicitly blocking those scraping bots via your Nginx config. However, this site is currently served by AWS CloudFront, which means that the content gets served without the request touching the source server. I was sure there had to be a way to do something similar with a CloudFront function, so I set out to try.	2.6 parsecs away Travel
\|	\|	tsak.dev Feeding GPTBot	4.1 parsecs away Travel
\|	\|	With the recent news of OpenAI's web crawler respecting robots.txt and the ensuing scramble by seemingly everybody ensuring their robots.txt is blocking GPTBot, I was thinking if there wasn't a better solution to help our future AI overlords make sense of the world. As I am hosting all my sites on a tiny NUC using nginx and having previously played with its return directive I decided to reuse the same trick for visits of GPTBot.	4.1 parsecs away Travel
\|	\|	www.engadget.com Sarah Silverman sues OpenAI and Meta over copyright infringement	18.8 parsecs away Travel
\|		Sarah Silverman and two other authors allege OpenAI and Meta trained their large language models on copyrighted materials, including works they published, without obtaining consent.	18.8 parsecs away Travel