Explore >> Select a destination


You are here

simonwillison.net
| | cset.georgetown.edu
1.6 parsecs away

Travel
| | Place to find CSET's publications, reports, and people
| | deepmind.google
1.2 parsecs away

Travel
| | We ask the question: "What is the optimal model size and number of training tokens for a given compute budget?" To answer this question, we train models of various sizes and with various numbers...
| | blog.moonglow.ai
1.8 parsecs away

Travel
| | Parameters and data. These are the two ingredients of training ML models. The total amount of computation ("compute") you need to do to train a model is proportional to the number of parameters multiplied by the amount of data (measured in "tokens"). Four years ago, it was well-known that if
| | blog.vstelt.dev
8.5 parsecs away

Travel
| [AI summary] The article explains the process of building a neural network from scratch in Rust, covering forward and backward propagation, matrix operations, and code implementation.