Explore >> Select a destination


You are here

skeptric.com
| | freeman.vc
6.0 parsecs away

Travel
| | In addition to forming a bulk of the foundation of modern language models, there's a ton of other data buried within Common Crawl. Incoming and external links to websites, referral codes, leaked data. If it's public on the Internet, there's a good chance CC has it somewhere within its index. Here we parse all of common crawl in a day, on the cheap.
| | avilpage.com
3.6 parsecs away

Travel
| | Building telugu web directory from common crawl dataset.
| | gist.github.com
10.7 parsecs away

Travel
| | The mapreduce job we use to transform datastore backups into JSON files that we then load into BigQuery. - bq_property_transform.py
| | gist.github.com
14.7 parsecs away

Travel
| Generic `printf` implementation in Idris2. GitHub Gist: instantly share code, notes, and snippets.