Outer Web | Explore

Explore >> Select a destination

You are here		justindomke.wordpress.com Truncated Bi-Level Optimization - Justin Domke
\|	\|	blog.research.google Re-weighted gradient descent via distributionally robust optimization - Google Research Blog	10.2 parsecs away Travel
\|	\|		10.2 parsecs away Travel
\|	\|	teddykoker.com Learning to Learn with JAX \| Teddy Koker	7.2 parsecs away Travel
\|	\|	Gradient-descent-based optimizers have long been used as the optimization algorithm of choice for deep learning models. Over the years, various modifications to the basic mini-batch gradient descent have been proposed, such as adding momentum or Nesterovs Accelerated Gradient (Sutskever et al., 2013), as well as the popular Adam optimizer (Kingma & Ba, 2014). The paper Learning to Learn by Gradient Descent by Gradient Descent (Andrychowicz et al., 2016) demonstrates how the optimizer itself can be replac...	7.2 parsecs away Travel
\|	\|	francisbach.com Scaling laws of optimization - Machine Learning Research Blog	9.3 parsecs away Travel
\|	\|		9.3 parsecs away Travel
\|	\|	finnstats.com Best Books For Deep Learning » FINNSTATS	76.0 parsecs away Travel
\|		Best Books For Deep Learning. We've compiled a list of the top deep learning books for you. Check it out now.	76.0 parsecs away Travel