|
You are here |
simonwillison.net | ||
| | | | |
cset.georgetown.edu
|
|
| | | | | Place to find CSET's publications, reports, and people | |
| | | | |
deepmind.google
|
|
| | | | | We ask the question: "What is the optimal model size and number of training tokens for a given compute budget?" To answer this question, we train models of various sizes and with various numbers... | |
| | | | |
blog.moonglow.ai
|
|
| | | | | Parameters and data. These are the two ingredients of training ML models. The total amount of computation ("compute") you need to do to train a model is proportional to the number of parameters multiplied by the amount of data (measured in "tokens"). Four years ago, it was well-known that if | |
| | | | |
blog.vstelt.dev
|
|
| | | [AI summary] The article explains the process of building a neural network from scratch in Rust, covering forward and backward propagation, matrix operations, and code implementation. | ||