Outer Web | Explore

/explore

Click through on any links that interest you or select the planets on the right to continue exploring the Outer Web.

You are here		minish.ai Model2Vec as a fasttext alternative \| Minish
\|	\|	blog.moonglow.ai Three Kuhnian Revolutions in ML Training	3.4 parsecs away Travel
\|	\|	Parameters and data. These are the two ingredients of training ML models. The total amount of computation ("compute") you need to do to train a model is proportional to the number of parameters multiplied by the amount of data (measured in "tokens"). Four years ago, it was well-known that if	3.4 parsecs away Travel
\|	\|	deepmind.google An empirical analysis of compute-optimal large language model training - Google DeepMind	4.5 parsecs away Travel
\|	\|	We ask the question: "What is the optimal model size and number of training tokens for a given compute budget?" To answer this question, we train models of various sizes and with various numbers...	4.5 parsecs away Travel
\|	\|	research.google PaLI: Scaling Language-Image Learning in 100+ Languages	4.7 parsecs away Travel
\|	\|	Posted by Xi Chen and Xiao Wang, Software Engineers, Google Research Advanced language models (e.g., GPT, GLaM, PaLM and T5) have demonstrated dive...	4.7 parsecs away Travel
\|	\|	zserge.com AI or ain't: LLMs	12.5 parsecs away Travel
\|		Finally, building a simple GPT model that would finish our sentences.	12.5 parsecs away Travel