Outer Web | Explore

Explore >> Select a destination

You are here		www.inference.vc Notes on iMAML: Meta-Learning with Implicit Gradients
\|	\|	teddykoker.com Learning to Learn with JAX \| Teddy Koker	6.4 parsecs away Travel
\|	\|	Gradient-descent-based optimizers have long been used as the optimization algorithm of choice for deep learning models. Over the years, various modifications to the basic mini-batch gradient descent have been proposed, such as adding momentum or Nesterovs Accelerated Gradient (Sutskever et al., 2013), as well as the popular Adam optimizer (Kingma & Ba, 2014). The paper Learning to Learn by Gradient Descent by Gradient Descent (Andrychowicz et al., 2016) demonstrates how the optimizer itself can be replac...	6.4 parsecs away Travel
\|	\|	francisbach.com Rethinking SGD's noise - II: Implicit Bias - Machine Learning Research Blog	6.7 parsecs away Travel
\|	\|		6.7 parsecs away Travel
\|	\|	iclr-blogposts.github.io How to compute Hessian-vector products? \| ICLR Blogposts 2024	11.7 parsecs away Travel
\|	\|	The product between the Hessian of a function and a vector, the Hessian-vector product (HVP), is a fundamental quantity to study the variation of a function. It is ubiquitous in traditional optimization and machine learning. However, the computation of HVPs is often considered prohibitive in the context of deep learning, driving practitioners to use proxy quantities to evaluate the loss geometry. Standard automatic differentiation theory predicts that the computational complexity of an HVP is of the same order of magnitude as the complexity of computing a gradient. The goal of this blog post is to provide a practical counterpart to this theoretical result, showing that modern automatic differentiation frameworks, JAX and PyTorch, allow for efficient computat...	11.7 parsecs away Travel
\|	\|	www.johnmyleswhite.com Criticism 1 of NHST: Good Tools for Individual Researchers are not Good Tools for Research Communities · John Myles White	21.0 parsecs away Travel
\|		Introduction Over my years as a graduate student, I have built up a long list of complaints about the use of Null Hypothesis Significance Testing (NHST) in the empirical sciences. In the next few weeks, I'm planning to publish a series of blog posts, each of which will articulate one specific weakness of NHST. The weaknesses I will discuss are not novel observations about NHST: people have been complaining about the use of p-values since the 1950's.	21.0 parsecs away Travel